Last modified: 2013-08-16 18:30:54 UTC
Currently we always serialize to wikitext and re-parse that to HTML, which runs the sanitizer on the token stream to ensure that our final HTML does not cause bad things to happen. Soon both us and the Flow team want to store HTML from the VisualEditor directly without first serializing to wikitext. This means that we need to perform the sanitization on the HTML instead of the token stream. For performance, sanitizing on the way in would be preferable. We should however support re-sanitization when new issues were discovered. This could potentially be coupled with the versioning discussed in bug 52937. A new sanitizer could bump the version number, and the upgrade path would then run the new sanitizer on old HTML (and probably update the storage with the newly sanitized version).