Last modified: 2014-07-03 21:15:54 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T69237, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 67237 - Citation ordering seems non-deterministic
Citation ordering seems non-deterministic
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: ssastry
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-28 05:34 UTC by ssastry
Modified: 2014-07-03 21:15 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description ssastry 2014-06-28 05:34:58 UTC
Looks like the order in which citations are generated are not deterministic and ordered in terms of how they are seen on the page wikitext. So, the first citation on the page might start with [39] instead of [1]. This is so today @ http://parsoid.wmflabs.org/enwiki/Barack_Obama?oldid=614710126 -- it appears that the citations are ordered based on the order in which Parsoid's Cite implementation receives them which in turn depends on how subpipelines fire, etc. and is non-deterministic.

Our citation implementation needs fixing to match wikitext order -- perhaps by sorting on top level dsr for top level citations and template dsr for citations that are generated by templates.

This should also fix the irritating rt-test variations we occasionally see (and which had baffled us till now) where semantic diffs are triggered because of cite numbering changes.
Comment 1 Gabriel Wicke 2014-06-30 16:40:13 UTC
Moving the numbering to the DOM could help as well.
Comment 2 ssastry 2014-06-30 21:19:05 UTC
Everything is being done on the DOM right now, but there is no distinction between the dom of part of the page processed in a new pipeline vs. the top-level content DOM. The final generateRefs dom pass has to be restructured to do different things on the top-level doc and on pieces of the doc being processed in other pipelines.

I am going to restructure this now that I already added this distinction (top-level vs. not) as part of a recent token-stream-patcher commit. So far, looks like Cite and Linter also might do different things on the DOM based on whether the dom is the top-level dom or not.
Comment 3 Gerrit Notification Bot 2014-07-01 18:06:23 UTC
Change 143362 had a related patch set uploaded by Subramanya Sastry:
(Bug 67237): WIP HACK: Fix citation numbering issue

https://gerrit.wikimedia.org/r/143362
Comment 4 Gerrit Notification Bot 2014-07-02 09:18:23 UTC
Change 143362 merged by jenkins-bot:
(Bug 67237): Fix citation numbering issue

https://gerrit.wikimedia.org/r/143362

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links