CakePHP HTMLPurifier Component
Posted on: Sep 12th 2006 | Posted by: Rolan |
I needed to use HMTLPurifier on my CakePHP application. So I just saved it under the vendors folder inside the application folder. This how the directory strucure looked like.
+ myApplication
|-----+ config/
|-----+ controllers/
|-----+ models/
|-----+ plugins/
|-----+ tmp/
|-----+ vendors/
| |----- HTMLPurifier/
| |----- HTMLPurifier.php
|
|-----+ views/
|-----+ webroot/
|-----+ .htaccess
|-----+ index.php
But before including the vendor component, I needed to add the to HTMLPurifier so Cake can find it. So, I added something to HTMLPurifier.php, somewhere before the require_once() statements:
- // START edit -dchx
- //Add the path to the vendors folder where HTMLPurifier is located
- }
- // END edit -dchx
- require_once ‘HTMLPurifier/ConfigDef.php’;
- require_once ‘HTMLPurifier/Config.php’;
- require_once ‘HTMLPurifier/Lexer.php’;
- require_once ‘HTMLPurifier/HTMLDefinition.php’;
- require_once ‘HTMLPurifier/Generator.php’;
- require_once ‘HTMLPurifier/Strategy/Core.php’;
- require_once ‘HTMLPurifier/Encoder.php’;
Now I’m all set. I just need to to include the component using the CakePHP function uses().
UPDATE: Some little update on this. When using HTMLPurifier inside CakePHP (or even in other apps), make sure that the character encoding of the output page is UTF-8. I encountered this little bug where a paragraph tag (p) containing only a non-breaking space was converted into another character. But I checked on my html page and the meta tag Content-type was set to UTF-8 (and of course I’m using XHTML 1.0 Transitional DocType). I fixed it by sending a content-type header. In CakePHP, you can do this inside the beforeFilter() function of your controller.
- class MyController extends AppController {
- //… the usual
- function beforeFilter()
- {
- }
- }









HTML Purifier has the ability to use different character encodings, rather, UTF-8 is the default. I recommend that you not blindly set the application to UTF-8 and make sure that it is, indeed, UTF-8 aware. For instance, if you have ISO-8895-1 characters already, they probably will get mangled.
[...] Rediscoverer » Blog Archive » CakePHP HTMLPurifier Component (tags: cake) [...]
You might be interested in htmLawed, a 45-kb, single-file, non-OOP, GPLv3-licensed script with low basal memory usage (0.5 MB) to filter illegal/disallowed HTML (tags, attributes, etc.) from user input. It also reduces XSS vulnerabilities, balances tags, etc.
Visit the htmLawed website.