From the source:
/* This is a full featured entitiy fixer.
* I have to process quite too many defective XML files out there ..
*
* This does not fix problems in the XML file structure,
* however it fixes any problems you might observe with unknown entities.
* This knows all common HTML entities and transforms them into XML entities.
* Additionally it knows about double escapes.
* It also fixes defective escapes (or not escaped & signs).
*
* This assumes there are no entities defined in the XML file.
* Additionally see latin1-utf8.c to fix lazy character encodings.
*
* This should be built in xml2sql, as we have access to the known entities there.
*/
Usage:
entityfix
Notes:
In case you wonder, " is output as " which is
correct XML for the quote (") character. This helps in case
it happens to show up in the attributes as in:
Dr. Evil
which is not XML but often mistaken from HTML. This is
correctly fixed to
Dr. Evil
as
Dr. Evil
would be complete rubbish. ;)