HTML Purifier rocks!

HTML purifier

I had to create an RSS aggregator for my job, and I had to find (or create) a good tool that sanitizes the HTML that comes in. I stumbled upon HTML purifier, and I haven't seen a better tool for the job yet.

Some of the features:

  • It can turn the html into valid XHTML (transitional or string)
  • So it also balances tags out..
  • Removes any code that could expose a security risk. (tested with RSnakes XSS cheatcheat).
  • Allows you to truncate HTML (if you don't want to show an entire post) and still results in proper HTML!

So yea, if you need something similar; I'd suggest you check it out..

Web mentions

Comments

  • Stoyan

    Stoyan

    How exactly do you truncate HTML ? I wasn't able to find the method in the docs. Is there a way to truncate only the text (ignoring the html tags length) ?
  • Evert

    Evert

    Hey Stoyan, I simply do a substr on the text.. no fancyness
  • Thierry Schellenbach

    Thierry Schellenbach

    This is indeed a really nice tool. I needed to secure a templating system, great stuff :)
  • Edward Z. Yang

    Edward Z. Yang

    Stoyan: Generally, I recommend people use strip_tags on the HTML, and then using a smart string truncator. (Don't forget to properly escape the data on final output!) There is usually no need for the HTML to be shown in such cases. The behavior that Evert is describing probably has to do with HTML Purifier's tag balancing capabilities: asdf (with presumably the rest truncated) becomes asdf