Last modified: 2013-12-18 16:37:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59582, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 57582 - Extracts API: Extracts strips lang attributes from html by flattening the span elements


Summary:	Extracts API: Extracts strips lang attributes from html by flattening the spa...

Status:	NEW

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	TextExtracts (Other open bugs)
Version:	master
Hardware:	All All

Importance:	Unprioritized normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2013-11-26 10:14 UTC by Derk-Jan Hartman
Modified:	2013-12-18 16:37 UTC (History)
CC List:	9 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Derk-Jan Hartman 2013-11-26 10:14:48 UTC

I really love the extracts feature, but I noted that currently all span elements are flattened out of the cleaned up HTML.

But one of the biggest usages of span tags is to mark different languages and script directions using the attributes dir and lang. These different languages are quite often present in the first line of an article on a non-english topic. I think those are thus very important elements to preserve in our multilingual content.

Comment 1 Bingle 2013-11-26 10:15:25 UTC

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/mobile/cards/1454

Comment 2 Max Semenik 2013-12-02 14:21:55 UTC

Can you provide an example of real-life breakages caused by this removal?

Comment 3 Derk-Jan Hartman 2013-12-02 14:35:44 UTC

Font selection for the bengali language article extract probably fails for many people in this result: https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exintro&titles=Bengali_language&format=jsonfm

There is no indication another font needs to be used for this fragment, so only glyph fallback can save you. Voice software also won't know when to select a different voice.

You could make a similar argument for the font-family css style attribute that ULS depends on for IPA for instance. But since ULS can also use language attributes, I think those are a tad more important.

Comment 4 Bingle 2013-12-04 18:54:34 UTC

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/mobile/cards/1479

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links