PHP Classes

Secure HTML parser and filter: Parse and filter insecure HTML tags and CSS styles

Recommend this page to a friend!
  Info   View files Example   Demos   Screenshots Screenshots   View files View files (37)   DownloadInstall with Composer Download .zip   Reputation   Support forum (1)   Blog (1)    
Ratings Unique User Downloads Download Rankings
StarStarStarStarStar 87%Total: 1,947 All time: 2,016 This week: 107Up
Version License PHP version Categories
secure-html-filter 1.0.0BSD License4HTML, Security, Parsers
Description 

Author

This package can be used to parse and filter insecure HTML tags and CSS styles.

It comes with a general purpose markup parser class that can parse any type of markup documents like HTML, XML and DTD files.

There are several other classes that can be chained together to retrieve the document token elements returned by the main markup parser class and filter the document elements in an useful way.

The markup validator filter class validates a document against a DTD, eventually removing invalid tags and attributes.

The safe HTML filter class uses several white lists to process HTML tags and data returned by the markup validator class and discards potentially harmful HTML tags and CSS that could be used to perform cross-site scripting (XSS) or cross-site request forgery (CSRF) security attacks.

The filtered HTML tokens can be reassembled to return a well-formed and secure HTML document.

The HTML links filter class can extract the links contained in an HTML document.

The DTD parser and CSS parser are utility classes used by the other classes.

Picture of Manuel Lemos
  Performance   Level  
Name: Manuel Lemos is available for providing paid consulting. Contact Manuel Lemos .
Classes: 45 packages by
Country: Portugal Portugal
Age: 55
All time rank: 1
Week rank: 2 Down1 in Portugal Portugal Equal

Example

<?php
/*
 * test_safe_html_filter.php
 *
 * @(#) $Header: /home/mlemos/cvsroot/markupparser/test_safe_html_filter.php,v 1.10 2009/08/21 05:21:12 mlemos Exp $
 *
 */

   
require_once('css_parser.php');
    require_once(
'dtd_parser.php');
    require_once(
'filecacheclass.php');
    require_once(
'markup_parser.php');
    require_once(
'markup_filter_validator.php');
    require_once(
'markup_filter_safe_html.php');

   
$message_file = ((IsSet($_SERVER['argv']) && count($_SERVER['argv'])>1) ? $_SERVER['argv'][1] : 'test/sample/simple.html');

   
$filter = new markup_filter_safe_html_class;

   
/* Set to 1 if you need to track line numbers of errors or element
     * positions
     */
   
$filter->track_lines = 1;

   
/* Add here the proprietary CSS properties that you know that are safe
     * to allow.
     */
   
$filter->safe_proprietary_css_properties = array(
       
'-moz-border-radius'=>array(),
       
'-moz-border-radius-topleft'=>array(),
       
'-moz-border-radius-topright'=>array(),
       
'-moz-border-radius-bottomleft'=>array(),
       
'-moz-border-radius-bottomright'=>array(),
       
'-webkit-border-radius'=>array(),
       
'-webkit-border-top-left-radius'=>array(),
       
'-webkit-border-top-right-radius'=>array(),
       
'-webkit-border-bottom-left-radius'=>array(),
       
'-webkit-border-bottom-right-radius'=>array(),
    );
   
   
/* Add here the CSS property function names properties that you know
     * that are safe to allow.
     */
   
$filter->safe_css_property_functions = array(
       
'alpha'=>array()
    );

   
$parameters=array(
       
'File'=>$message_file,

       
/* Read a markup from a string instead of a file */
        /* 'Data'=>'<html><head><title>My HTML data string</title></head>
                    <body><p>My HTML data string</p></body></html>', */

        /* Set to 1 if want to filter HTML that only contains the body
            part of a page */
       
'OnlyBody'=>0,

       
/* Set to the path of the directory where cache files will be
            stored with parsed DTD information to avoid parsing overhead,
            otherwise it may become very slow. */
       
'DTDCachePath'=>'',
    );

/*
 * The following lines are for testing purposes.
 * Remove these lines when adapting this example to real applications.
 */
   
if(defined('__TEST'))
    {
        if(IsSet(
$__test_options['parameters']))
           
$parameters = $__test_options['parameters'];
    }

   
$start = microtime();
    if((
$success = $filter->StartParsing($parameters)))
    {
       
$output = '';
        do
        {
            if(!(
$success = $filter->Parse($end, $elements)))
                break;
           
$te = count($elements);
            for(
$e = 0; $e < $te; ++$e)
            {
               
/*
                var_dump($elements[$e]);
                */
               
if(!($success = $filter->RewriteElement($elements[$e], $markup)))
                    break;
               
$output.= $markup;
            }
        }
        while(!
$end);
        if(
$success)
           
$success = $filter->FinishParsing();
        if(
$success)
            echo
$output;
    }
   
$end = microtime();
    if(!
$success)
    {
        echo
'Markup parsing error: '.$filter->error.' at position '.$filter->error_position;
        if(
$filter->track_lines
       
&& $filter->GetPositionLine($filter->error_position, $line, $column))
            echo
' line '.$line.' column '.$column;
        echo
"\n";
    }
    for(
$warning = 0, Reset($filter->warnings); $warning < count($filter->warnings); Next($filter->warnings), $warning++)
    {
       
$w = Key($filter->warnings);
        echo
'Warning: ', $filter->warnings[$w], ' at position ', $w;
        if(
$filter->track_lines
       
&& $filter->GetPositionLine($w, $line, $column))
            echo
' line '.$line.' column '.$column;
        echo
"\n";
    }
    if(!
defined('__TEST'))
        echo
'Timer: ', doubleval(strtok($end,' ')) + doubleval(strtok('')) - doubleval(strtok($start,' ')) - doubleval(strtok('')), "\n";
?>


  HTML and CSS filterExternal page  
Screenshots  
  • secure_html_filter.gif
  Files folder image Files  
File Role Description
Files folder imagedocumentation (6 files)
Files folder imagetest (1 file, 3 directories)
Accessible without login Plain text file test_safe_html_filter.php Example Example script that demonstrates how to parse and filter and HTML document file
Plain text file markup_filter_safe_html.php Class Secure HTML filter class
Plain text file css_parser.php Class CSS stylesheet parser class
Plain text file dtd_parser.php Class DTD parser class
Plain text file markup_filter_get_html_links.php Class HTML parser class to extract links from pages
Plain text file markup_filter_no_follow_html_links.php Class No follow HTML links filter class
Plain text file markup_filter_validator.php Class Filter class that validates HTML against a DTD
Plain text file markup_parser.php Class Main markup parser class
Accessible without login Plain text file secure_html_filter.php Example Script with forms to test the secure HTML filter classes
Accessible without login Plain text file test_css_parser.php Example CSS parser test script
Accessible without login Plain text file test_get_html_links.php Example Example script that demonstrates how to extract links from HTML pages
Accessible without login Plain text file test_markup_parser.php Example Example script that demonstrates how to parse any markup document into token elements
Accessible without login Plain text file test_xss_attacks.php Test Script that tests the results of the safe HTML filter class against the XSS attack vectors from ha.ckers.org

  Files folder image Files  /  documentation  
File Role Description
  Accessible without login Plain text file css_parser_class.html Doc. Documentation of the CSS parser class
  Accessible without login HTML file dtd_parser_class.html Doc. Documentation of the DTD parser class
  Accessible without login HTML file markup_filter_get_html_links_class.html Doc. Documentation of the filter get HTML links class
  Accessible without login HTML file markup_filter_safe_html_class.html Doc. Documentation of the filter HTML safe class
  Accessible without login HTML file markup_filter_validator_class.html Doc. Documentation of the filter validator class
  Accessible without login HTML file markup_parser_class.html Doc. Documentation of the main markup parser class

  Files folder image Files  /  test  
File Role Description
Files folder imageexpect (14 files)
Files folder imagegenerated (1 file)
Files folder imagesample (2 files)
  Accessible without login Plain text file test.php Test Markup parser unit test suite

  Files folder image Files  /  test  /  expect  
File Role Description
  Accessible without login Plain text file entities.txt Data Unit test expected results
  Accessible without login Plain text file entitiesinunsafeurl.txt Data Entities in unsafe URL test parsing output
  Accessible without login Plain text file quoteseparatingunsafeattribute.txt Data Quotes separating unsafe attribute test parsing output
  Accessible without login Plain text file safehtmlfilter.txt Data Test expected output
  Accessible without login Plain text file selectors.txt Data CSS selectors parsing output
  Accessible without login Plain text file simple.txt Data Unit test expected results
  Accessible without login Plain text file track_lines.txt Data Unit test expected results
  Accessible without login Plain text file unfinishedquotedtagattribute.txt Data Unit test expected results
  Accessible without login Plain text file unfinishedquotedtagattributevalue.txt Data Unit test expected results
  Accessible without login Plain text file unfinishedtag.txt Data Unit test expected results
  Accessible without login Plain text file unfinishedtagattribute.txt Data Unit test expected results
  Accessible without login Plain text file unfinishedtagattributevalue.txt Data Unit test expected results
  Accessible without login Plain text file unfinishedtagend.txt Data Unit test expected results
  Accessible without login Plain text file unicodestylevalues.txt Data Test expected output

  Files folder image Files  /  test  /  generated  
File Role Description
  Accessible without login Plain text file .cvsignore Data Dummy file to force the distribution of this directory

  Files folder image Files  /  test  /  sample  
File Role Description
  Accessible without login Plain text file simple.html Data HTML document used in the example scripts
  Accessible without login Plain text file xssAttacks.xml Data Definitions for the XSS attack vectors from ha.ckers.org

Downloadsecure-html-filter-2010-10-07.zip 122KB
Downloadsecure-html-filter-2010-10-07.tar.gz 103KB
Install with ComposerInstall with Composer
Needed packages  
Class DownloadWhy it is needed Dependency
PHP Forms Class with HTML Generator and JavaScript Validation Download .zip .tar.gz Used in the secure_html_filter.php Web interface test script Conditional
Generic XML parser class Download .zip .tar.gz It is neeeded to parse the xssAttacks.xml file with tested XSS attack vectors definitions Conditional
File cache class Download .zip .tar.gz It is necessary to manage parsed DTD cache files Conditional
 Version Control Reuses Unique User Downloads Download Rankings  
 0%1
Total:1,947
This week:0
All time:2,016
This week:107Up
User Ratings User Comments (1)
 All time
Utility:100%StarStarStarStarStarStar
Consistency:100%StarStarStarStarStarStar
Documentation:91%StarStarStarStarStar
Examples:95%StarStarStarStarStar
Tests:87%StarStarStarStarStar
Videos:-
Overall:87%StarStarStarStarStar
Rank:3