View Site News

Archive for September, 2006

CakePHP HTMLPurifier Component

Posted on: Sep 12th 2006 | Posted by: Rolan

I needed to use HMTLPurifier on my CakePHP application. So I just saved it under the vendors folder inside the application folder. This how the directory strucure looked like.

+ myApplication
     |-----+ config/
     |-----+ controllers/
     |-----+ models/
     |-----+ plugins/
     |-----+ tmp/
     |-----+ vendors/
     |       |----- HTMLPurifier/
     |       |----- HTMLPurifier.php
     |
     |-----+ views/
     |-----+ webroot/
     |-----+ .htaccess
     |-----+ index.php

But before including the vendor component, I needed to add the to HTMLPurifier so Cake can find it. So, I added something to HTMLPurifier.php, somewhere before the require_once() statements:

PHP:
  1. // START edit -dchx
  2. //Add the path to the vendors folder where HTMLPurifier is located
  3. if (function_exists(‘ini_set’)) {
  4. ini_set(‘include_path’, ini_get(‘include_path’) . PATH_SEPARATOR . dirname(__FILE__));
  5. }
  6.  
  7. // END edit -dchx
  8.  
  9. require_once ‘HTMLPurifier/ConfigDef.php’;
  10. require_once ‘HTMLPurifier/Config.php’;
  11. require_once ‘HTMLPurifier/Lexer.php’;
  12. require_once ‘HTMLPurifier/HTMLDefinition.php’;
  13. require_once ‘HTMLPurifier/Generator.php’;
  14. require_once ‘HTMLPurifier/Strategy/Core.php’;
  15. require_once ‘HTMLPurifier/Encoder.php’;

Now I’m all set. I just need to to include the component using the CakePHP function uses().

UPDATE: Some little update on this. When using HTMLPurifier inside CakePHP (or even in other apps), make sure that the character encoding of the output page is UTF-8. I encountered this little bug where a paragraph tag (p) containing only a non-breaking space was converted into another character. But I checked on my html page and the meta tag Content-type was set to UTF-8 (and of course I’m using XHTML 1.0 Transitional DocType). I fixed it by sending a content-type header. In CakePHP, you can do this inside the beforeFilter() function of your controller.

PHP:
  1. class MyController extends AppController {
  2.  
  3. //… the usual
  4.  
  5. function beforeFilter()
  6. {
  7. header(‘Content-type:text/html;charset=UTF-8′);
  8. }
  9. }

Filed in: PHP, WebDev | | 3 Comments |

HTML Purifier

Posted on: Sep 9th 2006 | Posted by: Rolan

I’m currently doing an article submission application. Wanting to give the users more power over their articles, I’ve planned on using a WYSIWYG text editor for the article submission form. Using that kind of editor, users can format their articles easily, even if they have little experience with html. I tried using TinyMCE, an Open-Source WYSIWYG editor that runs using Javascript and I’m quite happy with the results. It provided some “MS Word”-like interface. It also has some mechanism that filters disallowed html tags like and other potentially dangerous tags that could make the application vulnerable to XSS attacks.

But what if javascript was disabled by the user? Expecting that the input would be processed by TinyMCE, the application won’t be doing some input checking. If javascript is disabled, TinyMCE won’t be able to do its job. The disallowed html code will be freely included and the application will be left open to attacks. PHP’s Built-in input filtering functions isn’t much of use here, since all they do is strip the tags or convert special characters like < and > into their equivalent entities and will no longer be recognized as mark-up. I wanted some PHP functioality that can do the filtering for me.

So I consulted sir Google and after searching some possible solutions, I found HTML Purifier and gave it a test run. Yep, it worked. I tried it with TinyMCE on, and the html fomartting was still intact after purification. Now I tried it with TinyMCE on, but then disabled javascript and inserted some not-so-malicious code and the purifier caught it. Nice! If I have time, I’ll test it further. I just need to make the application fully functional before doing detailed testing and debugging.

Filed in: PHP, WebDev | | 2 Comments |

Trim whitespaces

Posted on: Sep 8th 2006 | Posted by: Rolan

I was so conscious of properly sanitizing user input with htmlspecialchars() and addslashes() that sometimes I forget to trim() them for whitespaces. A small application I’m currently doing with CakePHP had this kind of bug related to unwanted whitespaces. It took me sometime to spot it.

Filed in: PHP, Reminders | | Add Comment |

A love song for Web Standards

Posted on: Sep 8th 2006 | Posted by: Rolan

Rjene gave me a link to a love song about Web Standards. Read more about it at Boagworld.com . Very funny, specially the part with the tables nesting fifteen levels deep. :))

Filed in: Daily, Funny, WebDev | | 2 Comments |