View Site News

CakePHP HTMLPurifier Component

Posted on: Sep 12th 2006 | Posted by: Rolan |

I needed to use HMTLPurifier on my CakePHP application. So I just saved it under the vendors folder inside the application folder. This how the directory strucure looked like.

+ myApplication
     |-----+ config/
     |-----+ controllers/
     |-----+ models/
     |-----+ plugins/
     |-----+ tmp/
     |-----+ vendors/
     |       |----- HTMLPurifier/
     |       |----- HTMLPurifier.php
     |
     |-----+ views/
     |-----+ webroot/
     |-----+ .htaccess
     |-----+ index.php

But before including the vendor component, I needed to add the to HTMLPurifier so Cake can find it. So, I added something to HTMLPurifier.php, somewhere before the require_once() statements:

PHP:
  1. // START edit -dchx
  2. //Add the path to the vendors folder where HTMLPurifier is located
  3. if (function_exists(‘ini_set’)) {
  4. ini_set(‘include_path’, ini_get(‘include_path’) . PATH_SEPARATOR . dirname(__FILE__));
  5. }
  6.  
  7. // END edit -dchx
  8.  
  9. require_once ‘HTMLPurifier/ConfigDef.php’;
  10. require_once ‘HTMLPurifier/Config.php’;
  11. require_once ‘HTMLPurifier/Lexer.php’;
  12. require_once ‘HTMLPurifier/HTMLDefinition.php’;
  13. require_once ‘HTMLPurifier/Generator.php’;
  14. require_once ‘HTMLPurifier/Strategy/Core.php’;
  15. require_once ‘HTMLPurifier/Encoder.php’;

Now I’m all set. I just need to to include the component using the CakePHP function uses().

UPDATE: Some little update on this. When using HTMLPurifier inside CakePHP (or even in other apps), make sure that the character encoding of the output page is UTF-8. I encountered this little bug where a paragraph tag (p) containing only a non-breaking space was converted into another character. But I checked on my html page and the meta tag Content-type was set to UTF-8 (and of course I’m using XHTML 1.0 Transitional DocType). I fixed it by sending a content-type header. In CakePHP, you can do this inside the beforeFilter() function of your controller.

PHP:
  1. class MyController extends AppController {
  2.  
  3. //… the usual
  4.  
  5. function beforeFilter()
  6. {
  7. header(‘Content-type:text/html;charset=UTF-8′);
  8. }
  9. }

Filed in: PHP, WebDev |

3 Responses to “CakePHP HTMLPurifier Component”

  1. Edward Z. Yangon 17 Sep 2006 at 6:23 am

    HTML Purifier has the ability to use different character encodings, rather, UTF-8 is the default. I recommend that you not blindly set the application to UTF-8 and make sure that it is, indeed, UTF-8 aware. For instance, if you have ISO-8895-1 characters already, they probably will get mangled.

  2. [...] Rediscoverer » Blog Archive » CakePHP HTMLPurifier Component (tags: cake) [...]

  3. SKPon 14 Jan 2008 at 11:22 am

    You might be interested in htmLawed, a 45-kb, single-file, non-OOP, GPLv3-licensed script with low basal memory usage (0.5 MB) to filter illegal/disallowed HTML (tags, attributes, etc.) from user input. It also reduces XSS vulnerabilities, balances tags, etc.

    Visit the htmLawed website.

Trackback URI | Comments RSS

Leave a Reply