What you've got there is an example of Friedl's "unrolled loop" technique, but you seem to have some confusion about how to express it as a string literal. Here's how it should look to the regex compiler.
What you've got there is an example of Friedl's "unrolled loop" technique, but you seem to have some confusion about how to express it as a string literal. Here's how it should look to the regex compiler: "^"\\*(?:\\. ^"\\*)*" The initial "^"\\* matches a quotation mark followed by zero or more of any characters other than quotation marks or backslashes.
That part alone, along with the final ", will match a simple quoted string with no embedded escape sequences, like "this" or "". If it does encounter a backslash, \\. Consumes the backslash and whatever follows it, and ^"\\* (again) consumes everything up to the next backslash or quotation mark.
That part gets repeated as many times as necessary until an unescaped quotation mark turns up (or it reaches the end of the string and the match attempt fails). Note that this will match "foo\"- in \"foo\"-"bar". That may seem to expose a flaw in the regex, but it doesn't; it's the input that's invalid.
The goal was to match quoted strings, optionally containing backslash-escaped quotes, embedded in other text--why would there be escaped quotes outside of quoted strings? If you really need to support that, you have a much more complex problem, requiring a very different approach. As I said, the above is how the regex should look to the regex compiler.
But you're writing it in the form of a string literal, and those tend to treat certain characters specially--i.e. , backslashes and quotation marks. Fortunately, C#'s verbatim strings save you the hassle of having to double-escape backslashes; you just have to escape each quotation mark with another quotation mark: Regex r = new Regex(@"""^""\\*(?:\\.
^""\\*)*"""); So the rule is double quotation marks for the C# compiler and double backslashes for the regex compiler--nice and easy. This particular regex may look a little awkward, with the three quotation marks at either end, but consider the alternative: Regex r = new Regex("\"^\"\\\\*(?:\\\\. ^\"\\\\*)*\""); In Java, you always have to write them that way.
:-( By the way, if you want to make sure there are no line-separator characters in the quoted strings, you can include them in the negated character classes: Regex r = new Regex(@"""^""\r\n\\*(?:\\. ^""\r\n\\*)*"""); The dot in \\. Already excludes line separators, as long as you don't specify the Singleline option.
I like this explanation best. – Joshua Lowry Jan 28 '10 at 15:36.
Regex for capturing strings (with \ for character escaping), for the . NET engine: (?>(?(STR)(?(ESC).(?)|\\(?))|(?! ))|(?(STR)"(?)|"(?))|(?(STR).
|(?! )))+ Here, a "friendly" version: (?> | especify nonbacktracking (?(STR) | if (STRING MODE) then (?(ESC) | if (ESCAPE MODE) then .(?) | match any char and exits escape mode (pop ESC) | | else \\(?) | match '\' and enters escape mode (push ESC) ) | endif | | else (?! ) | do nothing (NOP) ) | endif | | -- OR (?(STR) | if (STRING MODE) then "(?) | match '"' and exits string mode (pop STR) | | else "(?) | match '"' and enters string mode (push STR) ) | endif | | -- OR (?(STR) | if (STRING MODE) then .
| matches any character | | else (?! ) | do nothing (NOP) ) | endif )+ | REPEATS FOR EVERY CHARACTER Based on http://tomkaminski.com/conditional-constructs-net-regular-expressions examples. It relies in quotes balancing.
I use it with great success. Use it with Singleline flag. To play around with regexes, I recommend Rad Software Regular Expression Designer, which has a nice "Language Elements" tab with quick access to some basic instructions.It's based at .
NET's regex engine.
Interesting breakdown. – Joshua Lowry Sep 20 '10 at 21:29.
(\\"|\\\\|^"\\)*" should work. Match either an escaped quote, an escaped backslash, or any other character except a quote or backslash character. Repeat.In C#: StringCollection resultList = new StringCollection(); Regex regexObj = new Regex(@"""(\\""|\\\\|^""\\)*"""); Match matchResult = regexObj.
Match(subjectString); while (matchResult. Success) { resultList. Add(matchResult.
Value); matchResult = matchResult.NextMatch(); } Edit: Added escaped backslash to the list to correctly handle "This is a test\\". Explanation: First match a quote character. Then the alternatives are evaluated from left to right.
The engine first tries to match an escaped quote. If that doesn't match, it tries an escaped backslash. That way, it can distinguish between " \" string continues" and "String ends here \\".
If either don't match, then anything else is allowed except for a quote or backslash character. Then repeat. Finally, match the closing quote.
Sorry for editing this post so much. But now I think I've got it elegant enough. And correct, too.
I hope. – Tim Pietzcker Jan 27 '10 at 20:05 This regex not work with this text: \"Some Text\" Some Text "Some Text", and "Some more Text" an""d "Even more text about \"this text\"" – Kamarey Jan 27 '10 at 20:31 This is excellent! I think part of the issue was that I was not using the @ which added more complexity with having to slash all over the place.
– Joshua Lowry Jan 27 '10 at 20:38 Kamarey is right though, it doesn't work properly in that case.... hmmmm. – Joshua Lowry Jan 27 '10 at 22:12 Well, texts that are enclosed in escaped quotes weren't part of the question; neither was doubling as another way of escaping quotes. – Tim Pietzcker Jan 27 '107 at 7:10.
This regex (?\\)" will also handle text that start with escaped quote: \"Some Text\" Some Text "Some Text", and "Some more Text" an""d "Even more text about \"this text.
– Joshua Lowry Jan 27 '10 at 22:16 This doesn't handle escaped backslashes at the end of strings: " – Tim Pietzcker Jan 28 '10 at 7:11.
I know this isn't the cleanest method but with your example I would check the character before the " to see of its a \ if it is I would ignore the quote.
This works correctly in Expresso: You might need to convert \\ to \\\\ depending on your language "(\\"|^").
Modified to: \"(\\\"|^\")*\" ... but now I just get the whole line back. – Joshua Lowry Jan 27 '10 at 19:17 Does not work. ` "(\\"|^")*"` erroneously matches: "\\" " – ridgerunner Apr 9 at 15:08.
Any chance you need to do: \"^\"\\\\*(?:\\. ^\"\\\\*).
This gives me: "Some Text"; "Some more Text"; "" – Joshua Lowry Jan 27 '10 at 19:13.
I recommend getting RegExBuddy. regexbuddy.com/ It lets you play around with it until you make sure everything in your test set matches. As for your problem, I would try 4 /'s instead of two... \"^\"\\\\*(?:\\.
^\"\\\\*).
1 One of RegexBuddy's selling points is that it can automatically convert the regex to source code in whatever language you specify. In this case, it converts the "raw" regex "^"\\*(?:\\. ^"\\*)*" to @"""^""\\*(?:\\.
^""\\*)*""". – Alan Moore Jan 28 '10 at 1:43.
Similar to RegExBuddy posted by @Blankasaurus, regexmagic helps too. (regexmagic.com/ ).
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.