Class HTMLPurifier_Lexer_DOMLex
Parser that uses PHP 5's DOM extension (part of the core).
In PHP 5, the DOM XML extension was revamped into DOM and added to the core. It gives us a forgiving HTML parser, which we use to transform the HTML into a DOM, and then into the tokens. It is blazingly fast (for large documents, it performs twenty times faster than HTMLPurifier_Lexer_DirectLex,and is the default choice for PHP 5.
- HTMLPurifier_Lexer
- HTMLPurifier_Lexer_DOMLex
Direct known subclasses
HTMLPurifier_Lexer_PH5PNote: PHP's DOM extension does not actually parse any entities, we use our own function to do that.
Warning: DOM tends to drop whitespace, which may wreak havoc on indenting. If this is a huge problem, due to the fact that HTML is hand edited and you are unable to get a parser cache that caches the the output of HTML Purifier while keeping the original HTML lying around, you may want to run Tidy on the resulting output or use HTMLPurifier_DirectLex
Located at xoops_trust_path/libs/htmlpurifier/library/HTMLPurifier/Lexer/DOMLex.php
public
|
|
public
|
|
protected
|
#
tokenizeDOM( $node $node, $tokens & $tokens )
Iterative function that tokenizes a node, putting it into an accumulator. To iterate is human, to recurse divine - L. Peter Deutsch |
protected
|
|
protected
|
|
protected
|
#
transformAttrToAssoc( $attribute_list $node_map )
Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array. |
public
|
|
public
|
#
callbackUndoCommentSubst( $matches )
Callback function for undoing escaping of stray angled brackets in comments |
public
|
#
callbackArmorCommentEntities( $matches )
Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them |
protected
|
CDATACallback(),
create(),
escapeCDATA(),
escapeCommentedCDATA(),
extractBody(),
normalize(),
parseData(),
removeIEConditional()
|
$_special_entity2str,
$tracksLineNumbers
|