Last modified: 2013-08-14 15:36:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54831, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52831 - Non-latin text broken after import to etherpad lite
Non-latin text broken after import to etherpad lite
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Etherpad (Other open bugs)
wmf-deployment
All All
: Normal major (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n
Depends on:
Blocks: 45312
  Show dependency treegraph
 
Reported: 2013-08-14 05:01 UTC by Santhosh Thottingal
Modified: 2013-08-14 15:36 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Santhosh Thottingal 2013-08-14 05:01:00 UTC
see https://etherpad.wikimedia.org/p/i18n-team-02.
All question marks you see there are non-latin text.
Comment 1 Ariel T. Glenn 2013-08-14 05:07:50 UTC
I just created a test pad and added some non lating characters (greek) and they were fine. Is there any chance this pad was imported from somewhere else, i.e. one of the old etherpad instances?
Comment 2 Santhosh Thottingal 2013-08-14 05:38:16 UTC
Yes, it was imported from old instance(same URL)
Comment 3 Andre Klapper 2013-08-14 08:56:43 UTC
Non-latin text at the top (Hebrew etc) is displayed correctly though. 
How exactly was this "imported" and from where?
Comment 4 Niklas Laxström 2013-08-14 09:05:44 UTC
(In reply to comment #3)
> Non-latin text at the top (Hebrew etc) is displayed correctly though. 

That part has been changed after the import.

> How exactly was this "imported" and from where?

etherpad.wikimedia.org used to run Etherpad, now it runs Etherpad-lite. Someone from ops is needed to answer this question.

For non-latin text pads this would be a dataloss issue if not fixed.
Comment 5 Ariel T. Glenn 2013-08-14 09:08:13 UTC
Adding akosiaris who I believe did the import.
Comment 6 Alexandros Kosiaris 2013-08-14 10:43:57 UTC
Most of etherpad.wikimedia.org running etherpad was automatically imported to etherpad-lite using the (patched by me to actually run, more info at https://rt.wikimedia.org/Ticket/Display.html?id=5464 ) convert.js script provided by etherpad-lite.

The old pads are still available at etherpad-old.wikimedia.org (albeit as read-only) so this pads content still exists unmodified at:

http://etherpad-old.wikimedia.org/i18n-team-02

I believe the error is due to the new database's character set. I will investigate further and update this ticket accordingly.
Comment 7 Alexandros Kosiaris 2013-08-14 15:36:51 UTC
It turns out it was not just the database's character set but the conversion script's as well (it was missing a SET NAMES utf8) declaration at the beginning of the output.I reimported the entire database (it took a long time) and now the pad in question as well as some other show up correctly.

The ones listed at RT #5464 as problematic may still exhibit some problems but otherwise I consider this fixed. Please confirm and update the ticket accordingly

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links