Last modified: 2013-12-18 16:37:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59582, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57582 - Extracts API: Extracts strips lang attributes from html by flattening the span elements
Extracts API: Extracts strips lang attributes from html by flattening the spa...
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
TextExtracts (Other open bugs)
master
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-26 10:14 UTC by Derk-Jan Hartman
Modified: 2013-12-18 16:37 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Derk-Jan Hartman 2013-11-26 10:14:48 UTC
I really love the extracts feature, but I noted that currently all span elements are flattened out of the cleaned up HTML.

But one of the biggest usages of span tags is to mark different languages and script directions using the attributes dir and lang. These different languages are quite often present in the first line of an article on a non-english topic. I think those are thus very important elements to preserve in our multilingual content.
Comment 1 Bingle 2013-11-26 10:15:25 UTC
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/mobile/cards/1454
Comment 2 Max Semenik 2013-12-02 14:21:55 UTC
Can you provide an example of real-life breakages caused by this removal?
Comment 3 Derk-Jan Hartman 2013-12-02 14:35:44 UTC
Font selection for the bengali language article extract probably fails for many people in this result: https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exintro&titles=Bengali_language&format=jsonfm

There is no indication another font needs to be used for this fragment, so only glyph fallback can save you. Voice software also won't know when to select a different voice.

You could make a similar argument for the font-family css style attribute that ULS depends on for IPA for instance. But since ULS can also use language attributes, I think those are a tad more important.
Comment 4 Bingle 2013-12-04 18:54:34 UTC
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/mobile/cards/1479

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links