How can Perl split a line on whitespace except when the whitespace is in doublequotes?

Once upon a time I also tried to re-invent the wheel, and solve this myself Now I just use Text::ParseWords and let it do the job for me.

Once upon a time I also tried to re-invent the wheel, and solve this myself. Now I just use Text::ParseWords and let it do the job for me.

A working example would be great because I have not had success getting 6 fields using Text::Balanced and Text::ParseWords. Quotewords('"', 1, $_) gives me 'StartProgram 1 ', '"C:\\Program Files\\ABC\\ABC XYZ"', 'CleanProgramTimeout 1 30' – Sinan Ünür Oct 14 '09 at 14:40 And quotewords('\s+', 1, $_) splits the filename along spaces and gives eight fields. – Sinan Ünür Oct 14 '09 at 14:47 From reading the documentation, all you have to do is substitute single quotes with '\"' and double quotes with '"' and quotewords() should work fine.

– Oesor Oct 14 '09 at 15:43 Sorry, to make that more readable: From reading the documentation, all you have to do is substitute single quotes with '\"' and double quotes with '"' and quotewords() should work fine. – Oesor Oct 14 '09 at 15:44 @Oesor and @Colin Fine: Could you please post a working example? – Sinan Ünür Oct 14 '09 at 22:49.

Update: It looks like the fields are actually tab separated, not space. If that is guaranteed, just split on \t. First, let's see why (".

*? "|\S+) "does not work". Specifically, look at ".

*? " That means zero or more characters enclosed in double-quotes. Well, the field that is giving you problems is ""C:\Program Files\ABC\ABC XYZ"".

Note that each "" at the beginning and end of that field will match ". *? " because "" consists of zero characters surrounded with double quotes.It is better to match as specifically as possible rather than splitting.

So, if you have a configuration file with directives and a fixed format, form a regular expression match that is as close to the format you are trying to match as possible. Move the quotation marks outside of the capturing parentheses if you don't want them. #!

/usr/bin/perl use strict; use warnings; my $s = q{StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 1 30}; my @parts = $s =~ m{\A(\w+) (0-9) (""^"+"") (\w+) (0-9) (0-9{2})}; use Data::Dumper; print Dumper \@parts; Output: $VAR1 = 'StartProgram', '1', '""C:\\Program Files\\ABC\\ABC XYZ""', 'CleanProgramTimeout', '1', '30' ; In that vein, here is a more involved script: #! /usr/bin/perl use strict; use warnings; use Data::Dumper; my @strings = split /\n/, StartProgram)\s+ (?0-90-9?)\s+ (?"". +?

""|\S+)\s+ (?CleanProgramTimeout)\s+ (?0-90-9?)\s+(?0-9{2}) }x; for (@strings) { if ( $_ =~ $re ) { print Dumper \%+; } } Output: $VAR1 = { 'timeout_directive' => 'CleanProgramTimeout', 'timeout_seconds' => '30', 'path' => '""C:\\Program Files\\ABC\\ABC XYZ""', 'directive' => 'StartProgram', 'timeout_instance' => '1', 'instance' => '1' }; $VAR1 = { 'timeout_directive' => 'CleanProgramTimeout', 'timeout_seconds' => '30', 'path' => 'c:\\opt\\perl', 'directive' => 'StartProgram', 'timeout_instance' => '1', 'instance' => '1' }; Update: I cannot get Text::Balanced or Text::ParseWords to parse this correctly. I suspect the problem is the repeated quotation marks that delineate the substring that should not be split. The following code is my best (not very good) attempt at solving the generic problem by using split and then selective re-gathering of parts of the string.

#! /usr/bin/perl use strict; use warnings; use Data::Dumper; my $s = q{StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 1 30}; my $t = q{StartProgram 1 c:\opt\perl CleanProgramTimeout 1 30}; print Dumper parse_line($s); print Dumper parse_line($t); sub parse_line { my ($line) = @_; my @parts = split /(\s+)/, $line; my @real_parts; for (my $i = 0; $i.

Maybe the question isn't clear but your answer seems to be different from what was asked. I thought he wanted a way to find a regular expression which would split any line using spaces, but ignoring spaces between quotes. Your answer is a regex to parse one particular format.

– user181548 Oct 14 '09 at 13:06 1 @Kinopiko - s answer is also "This way to do it is better and less buggy than trying to split on questionable delimiters. Consider trying it instead of how you're currently doing it, since it achieves more or less the same result. " – Chris Lutz Oct 14 '09 at 20:40 The thing is, is that the question isn't necessarily a questionable delimiter.

Being able to parse an arbitrary line by spaces while ignoring spaces in a quoted string is useful, and this answer completely ignores the question, saying "You should parse by tabs instead". While it's useful in this specific case, it doesn't answer how to split generic string by spaces while ignoring spaces within quoted strings, – Oesor Oct 14 '09 at 21:21 Oesor I was not able to come up with a satisfying working way of dealing with the general problem. Is that not clear from my comment to Colin Fine's answer (which I upvoted)?

Please post a better way of solving the OP's problem, and I will upvote it. – Sinan Ünür Oct 14 '09 at 21:41.

My $x = 'StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 1 30'; my @parts = $x =~ /("". *? ""|^\s+?(?>\s|$))/g.

S+?(?>\s|$) can be simplified to \S+\b – John Kugelman Oct 14 '09 at 13:05 Bzzt! You are right about \S but \b is not the same as (?>\s|$). – user181548 Oct 14 '09 at 13:10 I copied parts of Sinan Unur's answer to demonstrate a different way of doing it with a regex which doesn't depend on the exact format.

I've also left a comment on his answer explaining that. Your answer was almost identical to mine, down to the form of the regex and the variable names, and it also contained the correction from John Kugelman. I don't see why you want to duplicate my answer like that.

– user181548 Oct 14 '09 at 13:36 @Kinopiko Arguing over variable names now? My post uses @parts. Your post uses @parts.

@FM's post used @parts. The only original part of your answer as the regex pattern. @FM edited the pattern and therefore posted an original answer.Relax.

– Sinan Ünür Oct 14 '09 at 13:38.

My $str = 'StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 1 30'; print "str:$str\n"; @A = $str =~ /(". +"|\S+)/g; foreach my $l (@A) { print "\n"; } That gives me: $ . /test.Pl str:StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 130.

That means zero or more characters enclosed in double-quotes. Well, the field that is giving you problems is ""C:\Program Files\ABC\ABC XYZ"". Note that each "" at the beginning and end of that field will match ".

*?" because "" consists of zero characters surrounded with double quotes. It is better to match as specifically as possible rather than splitting. So, if you have a configuration file with directives and a fixed format, form a regular expression match that is as close to the format you are trying to match as possible.

Move the quotation marks outside of the capturing parentheses if you don't want them. Update: I cannot get Text::Balanced or Text::ParseWords to parse this correctly. I suspect the problem is the repeated quotation marks that delineate the substring that should not be split.

The following code is my best (not very good) attempt at solving the generic problem by using split and then selective re-gathering of parts of the string.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions