What you're trying to do is recover intermediate captures from groups that match more than once per regex match. As far as I know, only . NET and Perl 6 provide that capability.
You'll have to do the job in two stages: match an attribute value with one or more %tag% sequences in it, then break out the individual sequences You don't seem to care which XML tag or attribute the values are associated with, so you could use this, somewhat simpler regex to find the values with %tag% sequences in them: (^"%*+%^%"++%^"*+)"|\'(^\'%*+%^%\'++%^\'*+) EDIT: That regex captures the attribute value in group 1 or group 2, depending in which quotes it used. Here's another version that merges the alternatives so it can always save the value in group 2: ("\')((?:(?! %|\1).)*+%(?:(?!
%|\1). )++%(?:(?! \1).
)*+)\1.
What you're trying to do is recover intermediate captures from groups that match more than once per regex match. As far as I know, only . NET and Perl 6 provide that capability.
You'll have to do the job in two stages: match an attribute value with one or more %tag% sequences in it, then break out the individual sequences. You don't seem to care which XML tag or attribute the values are associated with, so you could use this, somewhat simpler regex to find the values with %tag% sequences in them: '#"(^"%*+%^%"++%^"*+)"|\'(^\'%*+%^%\'++%^\'*+)\'#' EDIT: That regex captures the attribute value in group 1 or group 2, depending in which quotes it used. Here's another version that merges the alternatives so it can always save the value in group 2: '#("\')((?:(?!
%|\1). )*+%(?:(?! %|\1).)++%(?:(?!
\1). )*+)\1.
While the other solutions are much simpler and still solve the same essential problem, this one solves the mystery at the core of my question. THe key takeaway is, that in PHP (and most languages), I can't "recover intermediate captures". Makes sense, I suppose!
Good to know. :) – rinogo Aug 22 '09 at 15:57 The other answers also assume %tag% names can consist only of alphanumeric or "word" characters, and that %ThingsThatLookLikeTags% will always in fact be tags, no matter where they appear. Mine only matches them in quoted strings--which assumes they will always be attribute values.
But I could extend it to match the strings only within (XML) tags. – Alan Moore Aug 22 '09 at 16:44.
1 haha! Funny how sometimes we overlook the simplest solutions... :) I suppose this would work for most cases. THe only thing that makes me nervous is that the XML does get more complex, and it's possible that tag-like text could also appear within the body of an element... But again, this is probably a sufficient solution for now.
Thanks! :) – rinogo Aug 22 '09 at 15:53.
%\w+% would be an even simpler way of doing this.
1 for simplifying things even further. – rinogo Aug 22 '09 at 15:54 The Mentee is the ultimate regex guru – Dan Aug 31 '09 at 13:35.
My regex is working, except when there are two tags in a single attribute, only the last one is returned. In other words, this regex should find tag1, tag2, ..., tag6. However, it omits tag2 and tag5.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.