Python regular expression tag?

Discover How To Stop The Daily Pain And Heart Wrenching Suffering, Put An End To The Lying, Face The Truth About Your Marriage, And Create A New, Peaceful, Harmonious And Joyous Marriage Get it now!

Although I don't recommend using Regex for parsing HTML (there are libraries for that purpose in almost every language), this should work: text = " hello world . I am playing python " import re pattern1 = re. Compile(r'\(.*?)\') pattern2 = re.

Compile(r'\(.*? )\') replaced = re. Sub(pattern1, r'\1', text) replaced = re.

Sub(pattern2, r'\1', replaced) I think the problem you're having is because of how Python takes Groups. Test the following and you'll see what I mean: text = " hello world . I am playing python " import re pattern = re.

Compile(r'\(.*? )\|\(.*?)\') for match in pattern. Finditer(text): print match.groups() You will see the following: (' hello world ', None) # Here captured the 1st group (None, ' python ') # Here the 2nd ;) And anyway, take in count that it matched first what is between p> hello world ...

Although I don't recommend using Regex for parsing HTML (there are libraries for that purpose in almost every language), this should work: text = " hello world . I am playing python " import re pattern1 = re. Compile(r'\(.*?)\') pattern2 = re.

Compile(r'\(.*? )\') replaced = re. Sub(pattern1, r'\1', text) replaced = re.

Sub(pattern2, r'\1', replaced) I think the problem you're having is because of how Python takes Groups. Test the following and you'll see what I mean: text = " hello world . I am playing python " import re pattern = re.

Compile(r'\(.*? )\|\(.*?)\') for match in pattern. Finditer(text): print match.groups() You will see the following: (' hello world ', None) # Here captured the 1st group (None, ' python ') # Here the 2nd ;) And anyway, take in count that it matched first what is between so it took hello world (something you would like to match too) as the first match.

Maybe changin the order of the compiled regex in pattern would solve this, but could happen the opposite (having ... ) I wish I could provide more info, but I'm not very good in regex using Python. C# takes them differently. Edit: I understand you might want to do this using regex for learning/testing purpose, don't know, but in production code I would go for another alternative (like the one @Senthil gave you) or just use a HTML Parser.

If you choose not to use regex, then it simple as this: d = {'':'','':'','':'','':''} s = ' hello world . I am playing python ' for k,v in d.items(): s = s. Replace(k,v).

1 Yes. It is great to do in this way. But I still want to know how to do in regular expression – chnet Apr 16 at 3:27.

The problem is because the first group is the one within and the second group is within in the regexp. However, in your substitution you are referring to the first group when, if it matched to , there wasn't one. I offer a couple of solutions.

First, >>> pattern = re. Compile(r'(.*? )') >>> print re.

Sub(pattern, r'\2', "hello world. I am playing python") hello world. I am playing python will match a given pair of tags.

However, as you can see, it would have to be used twice on the string because when it matched the tags, it skipped over the nested tags. Here's the option that I would go with: >>> pattern = re. Compile(r'') >>> print re.

Sub(pattern, r'', "hello world. I am playing python") hello world. I am playing python.

Pretty solution. Could you explain what you did in the last one with? – Oscar Mederos Apr 16 at 6:07 Great.

Could you explain why r''? – chnet Apr 16 at 15:10 (/? ) means that the first group contains only the a / or is an empty string because the?

Means that the slash is optional. Pb just means to match one character that is either p or b. R'' just puts that first group, the slash or empty string, back where it should be.

– Justin Peel Apr 16 at 16:35 Yes, I knew that, but what I don't understand is in which part the and tags are replaced? I don't see them inside any group. – Oscar Mederos Apr 16 at 20:38 @Oscar, they don't need to be explicitly enclosed in ()'s.

Sub replaces the entire match. This is just the same as in the OP's match really. – Justin Peel Apr 16 at 20:56.

Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts.

So many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack.

HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp.

Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions