Last modified: 2013-12-18 16:37:29 UTC
I really love the extracts feature, but I noted that currently all span elements are flattened out of the cleaned up HTML. But one of the biggest usages of span tags is to mark different languages and script directions using the attributes dir and lang. These different languages are quite often present in the first line of an article on a non-english topic. I think those are thus very important elements to preserve in our multilingual content.
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/mobile/cards/1454
Can you provide an example of real-life breakages caused by this removal?
Font selection for the bengali language article extract probably fails for many people in this result: https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exintro&titles=Bengali_language&format=jsonfm There is no indication another font needs to be used for this fragment, so only glyph fallback can save you. Voice software also won't know when to select a different voice. You could make a similar argument for the font-family css style attribute that ULS depends on for IPA for instance. But since ULS can also use language attributes, I think those are a tad more important.
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/mobile/cards/1479