Last modified: 2014-03-06 17:54:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T63011, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 61011 - Failed to cope with miss matched superscript and subscript tags


Summary:	Failed to cope with miss matched superscript and subscript tags

Status:	UNCONFIRMED

Product:	Parsoid
Classification:	Unclassified
Component:	General (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Low minor
Target Milestone:	---
Assigned To:	Gabriel Wicke

URL:	http://parsoid.wmflabs.org/enwiki/Div...
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-02-07 07:49 UTC by Richard Morris
Modified:	2014-03-06 17:54 UTC (History)
CC List:	6 users (show)

See Also:
Web browser:	Google Chrome
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Python file to scan a database dump looking for miss matched <sup> and <sup> (884 bytes, text/x-python-script) 2014-02-25 17:15 UTC, Richard Morris	Details
Add an attachment (proposed patch, testcase, etc.)

Description Richard Morris 2014-02-07 07:49:00 UTC

Intention:
Just started VE on the article [[en:Divergent series#Zeta function regularization]]

Steps to Reproduce:
1. https://en.wikipedia.org/wiki/Divergent_series?veaction=edit
2. scroll down to the Zeta function regularization section near the end
3. observe the last two sentences. 


Actual Results:  
The final sentence has kept the superscript style. It appears as
...the trace of ''A''<sup>–''s''. For example...</sup>


Expected Results:  
The final sentence should appear as 
...the trace of ''A''<sup>–''s''<sup>. For example...

Reproducible: Always

If you edit the second sentence then when pressing save page and preview it shows VE has put the closing </sup> in the wrong place at the end of the second sentence.

Comment 1 Richard Morris 2014-02-07 07:53:29 UTC

Just spotted the problem. The source text had mismatched tags, a sup and a sub: ''A''<sup>–''s''</sub>

Comment 2 Richard Morris 2014-02-07 08:03:28 UTC

Maybe still a tiny bug. The normal render recovers nicely when given the text 
''A''<sup>–''s''</sub>, it seems to change the closing </sub> to a </sup>. VE (Parsoid?) does not recover quite so nicely, putting the closing </sup> at the end of the paragraph. 

I've fixed the source now so you will need to look at an old version
https://en.wikipedia.org/w/index.php?title=Divergent_series&oldid=594238903

Comment 3 James Forrester 2014-02-20 17:47:57 UTC

Shifting over to Parsoid – Gabriel, do you have a view as to whether Parsoid should assume </sub> means </sup> in context?

Comment 4 Gabriel Wicke 2014-02-24 04:03:02 UTC

We generally rely on the HTML5 treebuilder to fix issues like this. We could add code to handle sub / sup mismatches in a non-standard way, but this would add complexity in both that pass and the serializer (to avoid dirty diffs).

Is this issue common enough to warrant the cost of special-case handling?

Comment 5 Richard Morris 2014-02-24 08:52:46 UTC

Probably not. Its quite an easy mistake to make, no doubt there are other instances, but a really very minor problem. A better option might be a periodic scan of the database to spot miss-matched tags.

Comment 6 ssastry 2014-02-24 17:16:15 UTC

I think this could be one more test case / scenario for the wikitext linting (bug 46705), but a WONTFIX for primary Parsoid functionality?

Comment 7 Richard Morris 2014-02-25 17:15:57 UTC

Created attachment 14679 [details]
Python file to scan a database dump looking for miss matched <sup> and <sup>

Comment 8 Richard Morris 2014-02-25 17:37:23 UTC

I've run a scan on one part of the database dumps and it found 97 mainspace articles with some problems. There are 27 dump files so I guess its about 2700 articles with some problems. Often these just an extra </sup> at the end of a reference, which might come from a copy and paste but there are a few errors in maths pages with unbalanced tags.

The attached python script is quite simple it just counts the number of opening and closing tags and prints the article name if they don't match. The -d option prints the lines where the counts don't match.

Comment 9 Elitre 2014-03-06 15:12:07 UTC

(A related discussion is at https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Check_Wikipedia#Mismatched_sub_and_sup_tags .)

Comment 10 Richard Morris 2014-03-06 17:54:29 UTC

The Check Wikipedia project has been made aware of this problem and a scan found 7,096 articles with problems. https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Check_Wikipedia#Mismatched_sub_and_sup_tags

For the most part these have been fixed on en.wiki and tests are being built into AWB and other tool. 

The people at checkwiki did ask for more info about HTML treebuilder so see if there are similar problems which might be worth checking.

I would say a WONTFIX would be fine for this as there are now tools for bots to check this.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links