Tokenize quoted string?

If regular expressions are acceptable in the general case you can use: re:split("abc \"def xyz\" ghi", " \"|\" ", {return, list}). "abc","def xyz","ghi" You can also use s\"|\"\s if you want to split based on any whitespace instead of just spaces If you happen to be parsing this from an input file, you may want to use strip_split/2 from estring.

If regular expressions are acceptable in the general case you can use: > re:split("abc \"def xyz\" ghi", " \"|\" ", {return, list}). "abc","def xyz","ghi" You can also use "\s\"|\"\s" if you want to split based on any whitespace instead of just spaces. If you happen to be parsing this from an input file, you may want to use strip_split/2 from estring.

You could use the re module. It comes with a split/3 function. For eg : re:split("abc \"def xyz \"ghi", "(\s\")\s\"", {return, list})."abc",,"def","xyz",,"ghi" The second argument is a regular expression (you might have to tweak my example to remove the empty lists...).

Thank you, but your output (ignoring empty elements) is "abc","def","xyz","ghi". What I need is with your example "abc","def xyz ", "ghi". Can this me obtained with your approach?

– Hyperboreus Aug 1 at 17:13 That would actually be easier :) re:split("abc \"def xyz \"ghi", "\s\"", {return, list}). – arun_suresh Aug 1 at 17:24 However, given the question I expect that you want tokens("abc def \"def xyz\" ghi", " ", "\""). To be "abc", "def", "def xyz", "ghi".

With this solution you instead get "abc def", "def xyz", "ghi". – Alexey Romanov Aug 1 at 23:07 @Alexey You are right. I didn't see that.Do you have a simple solution to get this behaviour?

– Hyperboreus Aug 2 at 13:57 I didn't quite get the difference.. let me get this straight.. you want a SPECIFIC splitter that splits if delimiter is " ", "\"" OR any sequence of the two? -OR- do you want a general solution to split based on delimiters given as regex?(If it is the latter all I just wanted to point you to the re module.. if its the former, I guess you should look at @David Weldon's solution.. it seems like it works) – arun_suresh Aug 2 at 14:20.

This is approximately how I would write it (not tested! ): tokens(String) -> lists:reverse(tokens(String, outside_quotes, )). Tokens(, outside_quotes, Tokens) -> Tokens; tokens(String, outside_quotes, Tokens) -> {Token, Rest0} = lists:splitwith(fun(C) -> (C!

= $ ) and (С! = $"), String), case Rest0 of -> Token | Tokens; $ | Rest -> tokens(Rest, outside_quotes, Token | Tokens); $" | Rest -> tokens(Rest, inside_quotes, Token | Tokens) end; tokens(String, inside_quotes, Tokens) -> %% exception on an unclosed quote {Token, $" | Rest} = lists:splitwith(fun(C) -> С! = $", String), tokens(Rest, outside_quotes, Token | Tokens).

String:tokens ("abc \"def ghi\" foo. Bla", " . \"").

Will tokenize the string on space, point and double quote. Result: "abc", "def", "ghi", "foo", "bla". If you want to preserve the quoted parts, you might want to consider creating a Token/Lexer, because regex is not very good at this work.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions