Last modified: 2014-08-07 15:21:27 UTC
We're not sure we could test directly for https://bugzilla.wikimedia.org/show_bug.cgi?id=45861 but PDF is an area that breaks often and is a pain to fix after the fact, so a browser test would be good to have.
Volunteer karim_r started working on this. Dave McNulla suggested idea for implementation: generate pdf from a page, inspect the pdf manually and add it to version control. The next time when the test runs, compare the new and the old pdf files and fail the test if the new one is not the same as the old one.
Related URL: https://gerrit.wikimedia.org/r/60890 (Gerrit Change I195a590844ebd1eee779cd2f3486c9e63035110d)
Related URL: https://gerrit.wikimedia.org/r/60898 (Gerrit Change Idec21c492871f4a55aaa0f1568971ccc7174cf1d)
Related URL: https://gerrit.wikimedia.org/r/61257 (Gerrit Change I4b3a96fd7eefb85a1ba402b22c8b6ad922360550)
How to test this manually: - go to http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page - on the left hand side expand "Print/export" section (if not already expanded) - click "Download as PDF" link - page with "Download the file" link will open (eventually) - download the pdf file and check if it has the same data as the web page
How to test this with a script: - create a simple page - do the above steps with Selenium, but visit the simple page instead of home page - since Selenium can not inspect pdf files, find a ruby pdf library[1] that can - when the file is downloaded, use the pdf library to inspect the pdf file A simple page could contain: - nothing - some text - title - title and some text - image - title and image - image and text - title, text and image ... The test should be executed for every simple page mentioned above (and maybe a few other pages). 1: https://www.ruby-toolbox.com/categories/pdf_generation
Another solution, maybe a simpler one, would be to do the above steps manually, inspect the pdf files manually and save them to the test repository. Then do the above steps with a script, but instead of using a pdf library to inspect the files, use a diff tool to compare if the generated files are the same as the ones that are already saved in the repository.
Relevant links: http://filipin.eu/tag/workshop/ http://filipin.eu/selenium-conference-2013/ http://filipin.eu/how-mediawiki-software-that-runs-wikipedia-is-tested/
Change 98160 had a related patch set uploaded by Mayankmadan: Added a test for downloading pdf from a random page https://gerrit.wikimedia.org/r/98160
I have just tested it with firefox, pdf file that is downloaded is automatically opened by firefox and rendered as html page, so it can be inspected with selenium, there is no need for a separate pdf parsing library.
It is not the same for chrome, it uses pdf plugin.
A few simple tests exist now (check if the pdf file downloads at all, if text, title and image are the same on the wiki page and in the pdf file). We need more complex tests, that check more page elements: ordered and unordered lists, links and similar elements.
Resolving as wontfix, I have no plans on working on this. Please reopen if you plan to work on it.