From the source: /* This is a full featured entitiy fixer. * I have to process quite too many defective XML files out there .. * * This does not fix problems in the XML file structure, * however it fixes any problems you might observe with unknown entities. * This knows all common HTML entities and transforms them into XML entities. * Additionally it knows about double escapes. * It also fixes defective escapes (or not escaped & signs). * * This assumes there are no entities defined in the XML file. * Additionally see latin1-utf8.c to fix lazy character encodings. * * This should be built in xml2sql, as we have access to the known entities there. */ Usage: entityfix Notes: In case you wonder, " is output as " which is correct XML for the quote (") character. This helps in case it happens to show up in the attributes as in: Dr. Evil which is not XML but often mistaken from HTML. This is correctly fixed to Dr. Evil as Dr. Evil would be complete rubbish. ;)