How do I parse HTML using regular expressions in C#?

Regular expressions are a very poor way to parse HTML. If you can guarantee that your input will be well-formed XML (i.e. XHTML), you can use XmlReader to read the elements and then print them out however you like.

In my case, the input is NOT well-formed xml. – Mike108 Oct 15 '09 at 2:07 3 Then you're in for a very complex problem, in general... HTML parsing with all of its implied elements, optional end tags, etc. Is no fun. However, you might be able to leverage an existing library, such as... codeplex.

Com/htmlagilitypack – bobbymcr Oct 15 '09 at 2:10 2 No, regular expressions are not "a poor way to parse HTML", because that would imply that regular expressions can parse HTML at all, which is not the case. It is mathematically proven that regular expressions cannot parse HTML.In fact, pretty much every college student has to prove this at some point during a homework assignment or exam or something. – Jörg W Mittag Oct 15 '09 at 2:39 Heh, fair enough.

– bobbymcr Oct 15 '09 at 3:00.

This has already been answered literally dozens of times, but it bears repeating: regular expressions can only parse regular languages, that's why they are called regular expressions. HTML is not a regular language (as probably every college student in the last decade has proved at least once), and therefore cannot be parsed by regular expressions.

You might want to try the Html Agility Pack, codeplex.com/htmlagilitypack. It even handles malformed HTML.

I used this regx in C#, and it works. Thanks for all your answers. |(.

4 It works with the data you've tested it with. If that's all the data you ever need to process with it, then fine. – Robert Rossney Oct 15 '09 at 6:38 If not: now you've got two problems.

– Peter Hoffmann Oct 15 '09 at 23:44 – DrJokepu Jun 2 '10 at 23:14.

You might want to simply use string functions. Make as your indicator for parsing.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions