Python regex findall?

" line = "President P Barack Obama /P met Microsoft founder P Bill Gates /P, yesterday." person = re. Findall(regex, line) print(person) yields 'Barack Obama', 'Bill Gates' The regex ur"\u005B1P\u005D. +?

\u005B\u002FP\u005D+? " is exactly the same unicode as u'1P. +?

/P+? ' except harder to read. The first bracketed group 1P tells re that any of the characters in the list '', '1', 'P' should match, and similarly with the second bracketed group /P.

That's not what you want at all. So, Remove the outer enclosing square brackets. (Also remove the stray 1 in front of P.) To protect the literal brackets in P, escape the brackets with a backslash: \P\.

To return only the words inside the tags, place grouping parentheses around . +?.

Wow. You are very quick @unutbu. I really appreciate your answer.Thanks.It seems, I need to learn a lot more about regex.

Thanks again. :) – Aditya Eka Oct 13 at 10:26 Thanks for the explanation.. :) – Aditya Eka Oct 18 at 5:54.

Try this : for match in re. Finditer(r"\P^\*\(.*? )\/P\", subject): # match start: match.start() # match end (exclusive): match.end() # matched text: match.group().

Your question is not 100% clear, but I'm assuming you want to find every piece of text inside P/P tags: >>> import re >>> line = "President P Barack Obama /P met Microsoft founder P Bill Gates /P, yesterday. " >>> re. Findall('\P\\s?(.+?)\s?

\\/P\', line) 'Barack Obama', 'Bill Gates'.

Yup, that's what I need, thanks.. :) – Aditya Eka Oct 18 at 6:18.

You can replace your pattern with regex = ur"\P\(\w\s+)\\/P\.

Take care with your formatting; use the preview region. Because you didn't format it properly, the backslashes were guzzled (markdown is poor like that). – Chris Morgan Oct 13 at 12:43 Why do you do \w\s+ rather than .

*? Which is what he used? Seems to me .

*? Is more likely to be what he wants, anyway. \w\s is horribly limiting.

– Chris Morgan Oct 13 at 12:44 The limitation in intentional. I use \w\s+ because apparently the asker wants to extract names which rarely contains numbers. Also note that the asker wanted to extract words, not numbers.

Just my opinion though, cmiiw – pram Oct 18 at 11:32 What about names with such interesting features as accents? Not re. Match('\w', u'é').

If the names are arbitrary, you should not discount the possibility of non-Latin names. – Chris Morgan Oct 18 at 22:57 (That is, without the re. UNICODE flag, of course.) – Chris Morgan Oct 18 at 23:11.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions