Last modified: 2014-11-17 09:21:24 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T47861, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 45861 - Lua modules are ignored during PDF export
Lua modules are ignored during PDF export
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Collection (Other open bugs)
master
All All
: Highest major with 1 vote (vote)
: ---
Assigned To: Ralf Schmitt
: upstream
Depends on:
Blocks: 48052
  Show dependency treegraph
 
Reported: 2013-03-07 20:16 UTC by Robert Rohde
Modified: 2014-11-17 09:21 UTC (History)
14 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Robert Rohde 2013-03-07 20:16:36 UTC
It appears that Lua modules are completely ignored when using Download as PDF.

Whether that requires a change to Scribunto or a change to Special:Book, I don't know.  At present the conversion of templates to Lua is causing the corresponding information to be lost in the PDF exports.
Comment 1 Brad Jorsch 2013-03-07 20:24:19 UTC
The PDF exporter uses its own parser rather than MediaWiki's, and so probably needs to be updated somehow. See for example bug 39095 and bug 41748, which seem to be similar sorts of issues.
Comment 2 Tim Starling 2013-03-08 03:31:27 UTC
I contacted Volker Haas and Ralf Schmitt last August, explaining that Lua support would be required in mwlib, and giving an outline of the approach they might take to implement it. They didn't seem to be in any hurry.

The deadline has passed. Can we just disable the Collection extension for now?
Comment 3 MZMcBride 2013-03-08 03:34:10 UTC
(In reply to comment #2)
> Can we just disable the Collection extension for now?

This seems reasonable. Or at least add a warning to users so that their PDFs are not silently missing pieces.

(In reply to comment #1)
> The PDF exporter uses its own parser rather than MediaWiki's [...]

Yeah, this is pretty wtf to me. Unless there's a really, really good reason, a separate parser shouldn't be used here.
Comment 4 Volker Haas 2013-03-08 08:37:47 UTC
Can someone please post a link to a minimal example where lua modules fail. Unfortunately the two bugs mentioned above are not related to mwlibs inability to handle lua.

Thanks
Comment 6 Ralf Schmitt 2013-03-08 17:48:16 UTC
we are working on a solution for this
Comment 7 Robert Rohde 2013-03-10 05:37:24 UTC
Would it be possible to get a rough timeline for how long you believe it may take for this bug to be solved and a fix deployed?  Days?  Weeks?  More than a month?

Enwiki is essentially ready to start deploying Lua-based citations; however, if we migrate the citation templates while this bug is still open, then all of the references are going to start disappearing from the PDF exports.

For coordinating purposes it would be nice to have some idea of whether the PDF engine is likely to be fixed quickly or not.
Comment 8 Marcin Cieślak 2013-03-10 12:28:46 UTC
This question should be asked when deployment of Scribunto was pondered, not now. In my opinion it is not easy thing and nothing that can be done withing days.

I don't think that disabling Collection "for now" in exchange for new cool feature is a good thing (recently stumbled into really good PDF stuff made by community on different projects).
Comment 9 Matthew Flaschen 2013-03-10 22:51:39 UTC
Marcin, Tim said above that the PDF team was notified in August.  Lua is not a surprise, and there are very important reasons for it.
Comment 10 Ralf Schmitt 2013-03-11 06:01:07 UTC
To be fair, I asked Tim for a schedule for deployment. What I got back
is his comment above asking to disable the collection extension
because the deadline has passed.

We currently do plan to replace our own template expansion with a call
to api.php with action=expandtemplates. As MZMcBride noticed this is
still pretty wtf, but it would give him a chance to bitch about
mediawiki's parser itself, as that one intermingles template expansion
with parsing, i.e. for certain pages the expanded page does parse
differently from what you get if you expand and parse in one step.

I'm not sure at the moment if we're allowed to use
action=expandtemplates, or if that will bring the whole of wikipedia
down cause it overloads the cluster.

If we're allowed to do that, I guess it's a matter of days until this
issue is resolved.
Comment 11 Tim Starling 2013-03-11 06:16:00 UTC
(In reply to comment #10)
> To be fair, I asked Tim for a schedule for deployment. What I got back
> is his comment above asking to disable the collection extension
> because the deadline has passed.

In August, the rollout schedule had not been decided. In September, Heiko again asked for a schedule. In October, Rob Lanphier replied to Heiko, saying that it had already been rolled out to mediawiki.org at that time, and that it would be progressively deployed to more wikis. That rollout is still in progress.

The right time to start development work was in August, which is why I sent my email at that time.

> I'm not sure at the moment if we're allowed to use
> action=expandtemplates, or if that will bring the whole of wikipedia
> down cause it overloads the cluster.

You can use it. It's unlikely that there would be any problems. A CPU overload would not bring the site down, since API requests go to a separate Apache cluster. At most you would bring down the API.
Comment 12 Ralf Schmitt 2013-03-11 06:47:11 UTC
ok, so I'm going to implement the action=expandtemplates thing. I guess we can still disable the collection extension, when the API request cluster is down.

Another thing that's not going to work then is template blacklisting/print templates. volker took a look at that some time ago, and it should not have a big effect on pdf output. note that some elements will still be filtered via the class=noprint attribute.
Comment 13 Erik Moeller 2013-03-11 07:56:22 UTC
Thanks Ralf. Robert, can you hold off on the Cite-to-Lua migration at least until 3/18 or so? Ralf, please let us know if you run into any issues with your attempted solution and if we can help. This is a rather urgent issue and we'd prefer to avoid disabling the PDF feature (especially the sidebar link receives very substantial usage).
Comment 14 Ralf Schmitt 2013-03-11 12:39:55 UTC
do I have to sync with anyone when deploying a new version?
Comment 15 Greg Grossmeier 2013-03-11 16:19:41 UTC
Ralf: Let me know when you have something ready (ie: something in Gerrit and reviewed) and we (you and I) can coordinate a deploy slot. I've noted this issue on the deployments calendar (https://wikitech.wikimedia.org/wiki/Deployments) in the to be scheduled section.

Thanks.
Comment 16 Marcin Cieślak 2013-03-11 16:26:59 UTC
Not sure there is anything to be changed in the Collection extension code. 

I think what needs to be done is to update mw-lib (https://github.com/pediapress/)
and the code on the rendering server the PediaPress is running. So I think at least some communication should be prepared towards the non-tech wiki community that changes are coming.

I'm happy to help with the deployment and/or handling bug reports/issues/whatever comes after this.
Comment 17 Helder 2013-03-11 16:48:44 UTC
(In reply to comment #16)
> Not sure there is anything to be changed in the Collection extension code. 
> 
> I think what needs to be done is to update mw-lib
> (https://github.com/pediapress/)
> and the code on the rendering server the PediaPress is running.

In that case, it seems appropriated to tag this with the 'upstream' keyword.
Comment 18 Ralf Schmitt 2013-03-12 04:22:19 UTC
we need to update mwlib on the pdf cluster. this is something I can do on my own.
but mwlib will then call api.php with different parameters and there's a chance that will have a negative effect on some of the machines. we've been through this when we changed the size of images we fetch, and I'd prefer not to repeat this rather unpleasant experience.
Comment 19 Ralf Schmitt 2013-03-12 07:32:21 UTC
the new version is ready to be deployed.
Comment 20 MZMcBride 2013-03-12 07:32:50 UTC
(In reply to comment #16)
> [...] So I think at least some communication should be prepared towards the
> non-tech wiki community that changes are coming.

What changes are relevant to the non-tech wiki community?

(In reply to comment #18)
> we need to update mwlib on the pdf cluster. this is something I can do on my
> own.
> but mwlib will then call api.php with different parameters and there's a
> chance that will have a negative effect on some of the machines. we've been
> through this when we changed the size of images we fetch, and I'd prefer not
> to repeat this rather unpleasant experience.

Right. So I think a few things should happen here:

1. Ensure that the API requests are using an identifiable (preferably versioned) User-Agent string (cf. [[m:User-Agent policy]]).

2. Coordinate with either Erik, Tim, or some other Wikimedia Foundation technical staffer to schedule a deployment slot (time window) of this new code, while preparing for the possibility of needing to roll back the new code, if necessary.

3. Stay on IRC (#wikimedia-operations on irc.freenode.net) during the deployment and for a few hours subsequent to the deployment to ensure that the increased API load isn't problematic.
Comment 21 Ralf Schmitt 2013-03-12 07:55:51 UTC
here's the changelog: http://mwlib.readthedocs.org/en/latest/changelog.html

I don't know what the "non-tech wiki community" is, so I don't know which of those changes are relevant.

the user agent being used should be mwlib.
Comment 22 kipod 2013-03-12 14:13:53 UTC
This is very good news.

current mwlib rendering is completely (well, almost completely: enough to be unusable) borked on RTL wikis. iirc, someone said once something about "not having access to appropriate free hebrew fonts", but the whole page was mangled with RTL, regardless of the super-ugly fonts. i sure hope that switching to API rendering will fix this too.

peace.
Comment 23 Greg Grossmeier 2013-03-12 17:57:32 UTC
(In reply to comment #20)

> 2. Coordinate with either Erik, Tim, or some other Wikimedia Foundation
> technical staffer to schedule a deployment slot (time window) of this new
> code,
> while preparing for the possibility of needing to roll back the new code, if
> necessary.

Hi, your friendly Release Manager here :)

Please coordinate with me. The current Deployments calendar is available here:

https://wikitech.wikimedia.org/wiki/Deployments

Please tell me what time slot works for you that is available and has good coverage in Pacific timezone (so we have Ops/Platform teams available as needed).

> 
> 3. Stay on IRC (#wikimedia-operations on irc.freenode.net) during the
> deployment and for a few hours subsequent to the deployment to ensure that
> the
> increased API load isn't problematic.

Agreed.
Comment 24 Matthew Flaschen 2013-03-12 23:06:19 UTC
kipod, that sounds like a different bug.  Please file if there's not already one.
Comment 25 Ralf Schmitt 2013-03-13 10:55:00 UTC
I've logged into #wikimedia-operations as schmir.
Comment 26 kipod 2013-03-13 16:34:54 UTC
(In reply to comment #24)
> kipod, that sounds like a different bug.  Please file if there's not already
> one.
it is, of course, a different bug (didn't look too deeply, but i'll be very surprised if it's not reported at least once). 

i just expressed hope that switching from proprietary parsing to API parsing will squash this one "en passant". if it won't, i'll make sure a separate bug report exists.

peace.
Comment 27 Ralf Schmitt 2013-03-13 21:23:08 UTC
My first attempt to upgrade mwlib failed. I had to fix another bug, which I introduced in mwlib 0.15. I had hoped to upgrade the code right now, but greg says that's not possible.

Can you offer us some deployment windows?
Comment 28 Ralf Schmitt 2013-03-13 22:00:27 UTC
forget that last message. I've deployed the new mwlib code a few minutes ago.
Comment 29 Erik Moeller 2013-03-13 22:31:16 UTC
The test case at https://en.wikipedia.org/wiki/User:BJorsch_(WMF)/Bug_45861 seems to work now. Can folks test with a few more complex cases?
Comment 30 Robert Rohde 2013-03-14 00:12:06 UTC
Lua tests:

http://en.wikipedia.org/wiki/Bays_of_the_Philipines

The presentation of the Lua-based coordinates is somewhat malformed, but still usable (whereas before they were simply absent).

http://en.wikipedia.org/wiki/Climate_of_Russia

The Lua-based climate boxes look great in PDF, but all the references embedded in the templates are absent from the References section.

http://en.wikipedia.org/wiki/List_of_longest_bridges_in_the_world

The Lua-based number formatting and conversions look perfect.  However some of the non-Lua based references seem incorrect (e.g. #54).


Complex, non-Lua test:

http://en.wikipedia.org/wiki/Barack_Obama

On wiki there are 335 numbered notes, the PDF skips many of the numbers so that only 58 are actually present, and the numbers that are present don't match the wiki numbering.  To be clear this is testing the Barack Obama article with _traditional template-based citations_.  So whatever change was made appears to have caused problems with normal article rendering.


So altogether these changes look like good progress towards handling Lua, BUT there also seem to be negative impacts the presentation of other unrelated page elements, especially references.  This includes the traditional {{cite}} template-based references.
Comment 32 Volker Haas 2013-03-14 08:27:36 UTC
(In reply to comment #30)
> Lua tests:
> 
> http://en.wikipedia.org/wiki/Bays_of_the_Philippines
> 
> The presentation of the Lua-based coordinates is somewhat malformed, but
> still
> usable (whereas before they were simply absent).

I fixed this with https://github.com/pediapress/mwlib/commit/fbce5a01c3b05b5f6aea6ee38bdf7acf76b44719


> 
> http://en.wikipedia.org/wiki/Climate_of_Russia
> 
> The Lua-based climate boxes look great in PDF, but all the references
> embedded
> in the templates are absent from the References section.

The problem is that the expandTemplates API call returns with some unexpanded templates:

Minimal Example: http://en.wikipedia.org/wiki/User:Volker.haas/TemplateExpansion

Result: https://en.wikipedia.org/w/api.php?action=expandtemplates&format=jsonfm&text={{User:Volker.haas/TemplateExpansion}}

Can this be fixed on the mediawiki end?

> 
> http://en.wikipedia.org/wiki/List_of_longest_bridges_in_the_world
> 
> The Lua-based number formatting and conversions look perfect.  However some
> of
> the non-Lua based references seem incorrect (e.g. #54).

This is the same problem as described above: unexpanded templates after an expandTemplates mediawiki API call.

> 
> 
> Complex, non-Lua test:
> 
> http://en.wikipedia.org/wiki/Barack_Obama
> 
> On wiki there are 335 numbered notes, the PDF skips many of the numbers so
> that
> only 58 are actually present, and the numbers that are present don't match
> the
> wiki numbering.

I haven't checked that thoroughly but I am pretty sure it is the same problem as above.

Sidenote about the numbering being off (despite the bug that is present): In order to present links in an unobtrusive way they are also put in the reference section of the PDF. This is the cause that the numbering of references in the article and the PDF is different (whoever, in the PDF the numbering should be "correct")


>  To be clear this is testing the Barack Obama article with
> _traditional template-based citations_.  So whatever change was made appears
> to
> have caused problems with normal article rendering.
> 
> 
> So altogether these changes look like good progress towards handling Lua, BUT
> there also seem to be negative impacts the presentation of other unrelated
> page
> elements, especially references.  This includes the traditional {{cite}}
> template-based references.
Comment 33 Andre Klapper 2013-03-14 10:30:19 UTC
Thanks for the work on this!

(In reply to comment #32)
> > http://en.wikipedia.org/wiki/Climate_of_Russia
> > The Lua-based climate boxes look great in PDF, but all the references
> > embedded in the templates are absent from the References section.
> 
> The problem is that the expandTemplates API call returns with some unexpanded
> templates:
> 
> Minimal Example:
> http://en.wikipedia.org/wiki/User:Volker.haas/TemplateExpansion
> 
> Result:
> https://en.wikipedia.org/w/api.
> php?action=expandtemplates&format=jsonfm&text={{User:Volker.haas/
> TemplateExpansion}}
> 
> Can this be fixed on the mediawiki end?

Would be good to file a separate ticket for this, so it doesn't get lost.
This report is about code changes in the "Collection" extension.
Comment 34 Brad Jorsch 2013-03-14 12:56:09 UTC
(In reply to comment #33)
> Thanks for the work on this!
> 
> (In reply to comment #32)
> > > http://en.wikipedia.org/wiki/Climate_of_Russia
> > > The Lua-based climate boxes look great in PDF, but all the references
> > > embedded in the templates are absent from the References section.
> > 
> > The problem is that the expandTemplates API call returns with some unexpanded
> > templates:
> > 
> > Minimal Example:
> > http://en.wikipedia.org/wiki/User:Volker.haas/TemplateExpansion
> > 
> > Result:
> > https://en.wikipedia.org/w/api.
> > php?action=expandtemplates&format=jsonfm&text={{User:Volker.haas/
> > TemplateExpansion}}
> > 
> > Can this be fixed on the mediawiki end?
> 
> Would be good to file a separate ticket for this, so it doesn't get lost.
> This report is about code changes in the "Collection" extension.

It wouldn't surprise me if there's already a bug about it somewhere. Note the problem is in the parser, not the API.

The problem is that the underlying parser logic behind expandtemplates treats anything inside a tag hook's tags (e.g. between "<ref>" and "</ref>") as opaque text to be handled by the tag hook's callback. Some of those callbacks turn right around and parse the contents, but that happens at a later stage than where expandtemplates stops.
Comment 35 Volker Haas 2013-03-14 14:03:57 UTC
Thanks for the suggestions! I created a new ticket (despite searching I couldn't find a prior ticket with that issue):

https://bugzilla.wikimedia.org/show_bug.cgi?id=46115
Comment 36 Greg Grossmeier 2013-03-14 16:21:16 UTC
Alright, since it seems this bug is now done, I'm resolving it. Let's work on any other lingering issues with templates and PDF export in separate/new bugs.

Thanks all!
Comment 37 kipod 2013-03-14 21:04:47 UTC
i do not think it's done - Reopening.
(this is done based on the assumption that the fix is already deployed on enwiki. if it isn't deployed yet, please close again and let me know, so i can slap myself).

See page

https://en.wikipedia.org/wiki/User:%D7%A7%D7%99%D7%A4%D7%95%D7%93%D7%A0%D7%97%D7%A9/sandbox 

(if this page, which is a user sandbox, will lose its content anytime soon: the problematic content is a single {{chess diagram-fen}} on enwiki. no other content on the page) 

the page displays an image of chessboard and chess pieces, in game-opening stance. it's created by Lua, by placing several div-wrapped images on top of another image, using z-index and "position:absolute" in the style of the divs wrapping the pieces. 

the page shows the chess board with the pieces in opening stance. 
the pdf seem to ignore image sizes (i.e., it scales the board image using different scaling factor than the one used for the pieces), and ignore the "position:absolute" (hence also the "top" and "left") style attributes of the divs wrapping the pieces.

as a side, i'll mention that this is an improvement over yesterday (or was it a day before?) when the pdf entirely missed all the images that were placed on the page by Lua code.

peace.
Comment 38 Volker Haas 2013-03-15 13:14:08 UTC
(In reply to comment #37)
> i do not think it's done - Reopening.
> (this is done based on the assumption that the fix is already deployed on
> enwiki. if it isn't deployed yet, please close again and let me know, so i
> can
> slap myself).
> 
> See page
> 
> https://en.wikipedia.org/wiki/User:
> %D7%A7%D7%99%D7%A4%D7%95%D7%93%D7%A0%D7%97%D7%A9/sandbox 
> 
> (if this page, which is a user sandbox, will lose its content anytime soon:
> the
> problematic content is a single {{chess diagram-fen}} on enwiki. no other
> content on the page) 
> 
> the page displays an image of chessboard and chess pieces, in game-opening
> stance. it's created by Lua, by placing several div-wrapped images on top of
> another image, using z-index and "position:absolute" in the style of the divs
> wrapping the pieces. 
> 
> the page shows the chess board with the pieces in opening stance. 
> the pdf seem to ignore image sizes (i.e., it scales the board image using
> different scaling factor than the one used for the pieces), and ignore the
> "position:absolute" (hence also the "top" and "left") style attributes of the
> divs wrapping the pieces.
> 
> as a side, i'll mention that this is an improvement over yesterday (or was
> it a
> day before?) when the pdf entirely missed all the images that were placed on
> the page by Lua code.
> 
> peace.

The bug you describe is not related to lua templates. The PDF rendering engine ignores the position attribute of tags - therefore the output is probably not what you would expect.
Comment 39 kipod 2013-03-15 14:33:33 UTC
if i understand correctly, you say that the same problem will appear whenever style="position:something" is used. i was surprised to hear that - we use "position:absolute" in more than one place: specifically for "pushpins" in map displays. 

i tried the PDF of [[en:Buenos Aires]], and found (to my dismay) that this is indeed the case: the pushpin is shoved from its correct location completely outside of the map.

is there an open bug about it? either way, i am reverting my "reopen" status change and return the status to "resolved", because my previous change is really not related to Lua but to PDF rendering. at large.

peace.
Comment 40 Volker Haas 2013-03-15 14:40:16 UTC
(In reply to comment #39)
> if i understand correctly, you say that the same problem will appear whenever
> style="position:something" is used. i was surprised to hear that - we use
> "position:absolute" in more than one place: specifically for "pushpins" in
> map
> displays. 
> 
> i tried the PDF of [[en:Buenos Aires]], and found (to my dismay) that this is
> indeed the case: the pushpin is shoved from its correct location completely
> outside of the map.
> 
> is there an open bug about it? 

yes. Unfortunately it is not possible to solve this in a reasonable amount of time.

> either way, i am reverting my "reopen" status
> change and return the status to "resolved", because my previous change is
> really not related to Lua but to PDF rendering. at large.
> 
> peace.
Comment 41 Greg Grossmeier 2013-03-15 16:12:19 UTC
Actually setting this to Resolved again. Please open a new bug about any further issues you are facing with the PDF export tool.

Thanks
Comment 42 Helder 2013-03-26 10:13:36 UTC
(In reply to comment #12)
> ...
> Another thing that's not going to work then is template blacklisting/print
> templates. volker took a look at that some time ago, and it should not have a
> big effect on pdf output. note that some elements will still be filtered via
> the class=noprint attribute.

https://www.mediawiki.org/wiki/Thread:Extension_talk:Collection/Hide_in_print_broken

https://en.wikipedia.org/wiki/User:Unjedai/Hide_in_print_test
Comment 43 Volker Haas 2013-03-26 10:33:39 UTC
(In reply to comment #42)
> (In reply to comment #12)
> > ...
> > Another thing that's not going to work then is template blacklisting/print
> > templates. volker took a look at that some time ago, and it should not have a
> > big effect on pdf output. note that some elements will still be filtered via
> > the class=noprint attribute.
> 
> https://www.mediawiki.org/wiki/Thread:Extension_talk:Collection/
> Hide_in_print_broken
> 
> https://en.wikipedia.org/wiki/User:Unjedai/Hide_in_print_test

Also see: https://en.wikipedia.org/wiki/Template_talk:Hide_in_print#Possible_Fix
Comment 44 kipod 2013-06-25 14:02:54 UTC
(In reply to comment #41)
> Actually setting this to Resolved again. Please open a new bug about any
> further issues you are facing with the PDF export tool.
> 
> Thanks

not sure why it was set to "resolved", as it seems that the pdf creator still does not respect "position:" attribute. however, i will act on the advise, and will open a new ticket.

note that i do not have this extension installed on my wiki, and the bug report is based on enwiki. 
for instance, look at [[en:Brussels]]: in section "Languages", there is a cute pie chart, whose implementation depends on various things, including position:absolute. the pdf completely misses it. 
again, all the location maps that use pushpins (such as [[en:Buenos Aires]] mentioned above). in deference to your request, i'll open a new ticket.

peace.
Comment 45 Brad Jorsch 2013-06-25 14:14:54 UTC
(In reply to comment #44)
> not sure why it was set to "resolved", as it seems that the pdf creator still
> does not respect "position:" attribute.

Because that's a different bug. It's clearer for everyone involved if a new bug is opened for a new bug instead of repurposing an existing bug about a different issue.
Comment 46 kipod 2013-06-25 16:18:06 UTC
(In reply to comment #45)
> (In reply to comment #44)
> > not sure why it was set to "resolved", as it seems that the pdf creator still
> > does not respect "position:" attribute.
> 
> Because that's a different bug. It's clearer for everyone involved if a new
> bug
> is opened for a new bug instead of repurposing an existing bug about a
> different issue.

my bad. see Bug 50178 for further details about the other issue.
peace.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links