Any pitfalls with this regex that matches ampersands not already encoded?

Even better would be negative lookahead assertion to verify & isn't followed by amp &(?! Amp;) Though that will change any ampersands used for other entities. If you're likely to have others, then how about something like &(?!

#? A-z0-9+;) This will look for an ampersand, but asserting that it is NOT followed by an optional hash symbol (for numeric entities), a series of alphanumerics and a semicolon, which should cover named and numeric entities like "e or ª Test code $text="It’s 30 ° outside & very hot. T-shirt & shorts needed!"; $text=preg_replace('/&(?!

#? A-z0-9+;)/', '&', $text); echo "$text\n Which will output It’s 30 ° outside & very hot. T-shirt & shorts needed!

Which is more easily read as It’s 30 ° outside & very hot. T-shirt & shorts needed! Alternative for PHP 5.2.3 As Ionut G.

Stan points out below, from PHP 5.2.3 you can use htmlspecialchars with a fourth parameter of false to prevent double-encoding, e. G $text=htmlspecialchars($text,ENT_COMPAT,"UTF-8",false).

Even better would be negative lookahead assertion to verify & isn't followed by amp; /&(?! Amp;)/ Though that will change any ampersands used for other entities. If you're likely to have others, then how about something like /&(?!

#? A-z0-9+;)/ This will look for an ampersand, but asserting that it is NOT followed by an optional hash symbol (for numeric entities), a series of alphanumerics and a semicolon, which should cover named and numeric entities like "e; or ª Test code $text="It’s 30 ° outside & very hot. T-shirt & shorts needed!"; $text=preg_replace('/&(?!

#? A-z0-9+;)/', '&', $text); echo "$text\n"; Which will output It’s 30 ° outside & very hot. T-shirt & shorts needed!

Which is more easily read as "It’s 30 ° outside & very hot. T-shirt & shorts needed!" Alternative for PHP 5.2.3+ As Ionut G.

Stan points out below, from PHP 5.2.3 you can use htmlspecialchars with a fourth parameter of false to prevent double-encoding, e.g. $text=htmlspecialchars($text,ENT_COMPAT,"UTF-8",false).

Brilliant answer Paul! – alex Mar 12 '09 at 0:06.

It will apply it for any other encoded char.

Can't believe I overlooked this... – alex Mar 12 '09 at 0:28.

If your PHP version is >= 5.2.3 you could use the fourth parameter of the htmlspecialchars function. When set to false it will not convert existing entities.

Thank you, but at the moment I just want to encode ampersands. But your link is very useful! +1 – alex Mar 12 '09 at 0:04 +1 yes, I didn't know about that either, will mention in my answer – Paul Dixon Mar 12 '09 at 0:11.

I'd isolate the ampersand rather than guess at context, and then use backreferences in your replacement string /(\W)&(\W)/$1&$2.

That would fail in a case where the character 'a' follows an ampersand but wasn't "amp;" like &and &also &apple... &(?! Amp;).

In Perl that would be: $content =~ s/&(?! \w+;)/&/g; It uses a negative lookahead of 1 or more word chars, meaning "an ampersand that is not followed by one or more word chars and immediately followed a semicolon. Though the use os the shortcut \w is not as safe as a specific char range for this particular case.

A better option would be: $content =~ s/&(?! A-z+;)/&/g; And just case you have some uppercase animal in your data: $content =~ s/&(?! A-zA-Z+;)/&/g.

In PHP, I want to encode ampersands that have not already been encoded. It seems to work good so far, but seeing as how I'm not much of a regex expert, I am asking if any potential pitfalls can be seen in this regex? Thanks for the answers.

It seems I wasn't thinking broadly enough to cover all bases. This seems like a common pitfall of regexs themselves (having to think of all possibilities which may make your regex get false positives).

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions