Regex help: My regex pattern will match invalid strings?

Use ^ to anchor the start and $ to anchor the end. E.g. : (abc)*$ this matches zero or more repetitions of the group ("abc" in this example) and that must start at the start of the input string and end at the end of it (?:(?

^,}' +? ,S|D,\d{1}))+$ —using an ungreedy? Doesn't matter, as you require it to match until the end anyway.

However, your regex has a few issues (?:\^,+,SD,\d\)+$ —seems more like what you want I couldn't decipher what you meant by the first part, so my regex is more general than required ^, will match any sequence of non-commas followed by a comma, and in fact you should probably add to this negated character class S|D is a character class of three characters, as doesn't mean alternation here ( (S|D) would mean the same as SD though) {1} is the default for any atom, you don't need to specify it Pseudocode (run it at codepad.Org ): import re def find_segments(input_string): results = regex = re. Compile(r"\(^,+),(SD),(\d)\") start = 0 while True: m = regex. Match(input_string, start) if not m: # no match return None # whole string didn't match, do another action as appropriate results.

Append(m. Group(1, 2, 3)) start = m. End(0) if start == len(input_string): break return results print find_segments("A-Z,S,3klm,D,40-9,S,1") # output: #('A-Z', 'S', '3'), ('klm', 'D', '4'), ('0-9', 'S', '1') The big difference here is the expression matches only the complete ... part, but it is applied in succession, so they must start again where the last ends (or end at the end of the string).

Use ^ to anchor the start and $ to anchor the end. E.g. : ^(abc)*$, this matches zero or more repetitions of the group ("abc" in this example) and that must start at the start of the input string and end at the end of it.

^(?:(? ^,}' +? ,S|D,\d{1}))+$—using an ungreedy +?

Doesn't matter, as you require it to match until the end anyway. However, your regex has a few issues. ^(?:\^,+,SD,\d\)+$—seems more like what you want.

I couldn't decipher what you meant by the first part, so my regex is more general than required, ^,+, will match any sequence of non-commas followed by a comma, and in fact you should probably add to this negated character class. S|D is a character class of three characters, as | doesn't mean alternation here ((S|D) would mean the same as SD though). {1} is the default for any atom, you don't need to specify it.

Pseudocode (run it at codepad.Org): import re def find_segments(input_string): results = regex = re. Compile(r"\(^,+),(SD),(\d)\") start = 0 while True: m = regex. Match(input_string, start) if not m: # no match return None # whole string didn't match, do another action as appropriate results.

Append(m. Group(1, 2, 3)) start = m. End(0) if start == len(input_string): break return results print find_segments("A-Z,S,3klm,D,40-9,S,1") # output: #('A-Z', 'S', '3'), ('klm', 'D', '4'), ('0-9', 'S', '1') The big difference here is the expression matches only the complete ... part, but it is applied in succession, so they must start again where the last ends (or end at the end of the string).

Thanks! Great answer to my question. The thing is I would also like to extract the "segments".

Either in a match collection or in groups. If you look at my original pattern you see that I first have a non-capturing group, then a capturing group "extracting" the "segment". Is it possible to incorporate that into your pattern?

– David Jan 1 '10 at 9:41 Yes, exactly the same way, add the capturing group around what you're interested in. However, you'll likely need to call your regex library with a different function in order to capture all of them, instead of just the first or last, as the capturing group is then instead a repetition. I'll update with an example.

– Roger Pate Jan 1 '10 at 10:01 +1: That's a nice way to solve it in Python. It saves having two almost identical regexps, and the performance hit of matching on the same string twice. But does .

NET's Regex have an option to say where the match should start like Python or will this require copying strings, nullifying the performance advantage? – Mark Byers Jan 1 '10 at 10:15 "is then inside* a repetition. " (You're entering regex-engine-specific territory.) – Roger Pate Jan 1 '10 at 10:19 Mark: beats me---had this question been marked C# specific from the start I'd likely have refrained (but you seem to have that part covered well anyway).

Picking up where another regex stops is essential at times---you can apply a completely different expression, or try out various ones, at that point. – Roger Pate Jan 1 '10 at 10:26.

You want something like this: /^(\^,+,SD,\d\)+$/ Here is an example of how you could use this regular expression in C#: using System; using System.Text. RegularExpressions; class Program { static void Main(string args) { string tests = { "A-Z,S,3A-Za-z0-9,D,4", "A-Z,S,3aaaaA-Za-z0-9,D,4", "crapA-Z,S,3A-Za-z0-9,D,4", "A-Z,S,3", "A-Z,S,3klm,D,40-9,S,1" }; string segmentRegex = @"\(^,+,SD,\d)\"; string lineRegex = "^(" + segmentRegex + ")+$"; foreach (string test in tests) { bool isMatch = Regex. Match(test, lineRegex).

Success; if (isMatch) { Console. WriteLine("Successful match: " + test); foreach (Match match in Regex. Matches(test, segmentRegex)) { Console.

WriteLine(match. Groups1); } } } } } Output: Successful match: A-Z,S,3A-Za-z0-9,D,4 A-Z,S,3 A-Za-z0-9,D,4 Successful match: A-Z,S,3klm,D,40-9,S,1 A-Z,S,3 klm,D,4 0-9,S,1.

(abc)*$, this matches zero or more repetitions of the group ("abc" in this example) and that must start at the start of the input string and end at the end of it. ,S|D,\d{1}))+$—using an ungreedy +? Doesn't matter, as you require it to match until the end anyway.

However, your regex has a few issues. ^(?:\^,+,SD,\d\)+$—seems more like what you want. I couldn't decipher what you meant by the first part, so my regex is more general than required, ^,+, will match any sequence of non-commas followed by a comma, and in fact you should probably add to this negated character class.

S|D is a character class of three characters, as | doesn't mean alternation here ((S|D) would mean the same as SD though). {1} is the default for any atom, you don't need to specify it. The big difference here is the expression matches only the complete ... part, but it is applied in succession, so they must start again where the last ends (or end at the end of the string).

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions