Last modified: 2012-11-14 15:21:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T42844, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 40844 - download tool issue with Cyrillic encoding in filenames (wget)
download tool issue with Cyrillic encoding in filenames (wget)
Status: RESOLVED INVALID
Product: Utilities
Classification: Unclassified
Other (Other open bugs)
unspecified
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: upstream
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-07 19:40 UTC by Andrij
Modified: 2012-11-14 15:21 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Andrij 2012-10-07 19:40:15 UTC
https://toolserver.org/~platonides/catdown/catdown.php tool do not recognize Cyrillic in names of files.  For example it writes "Р%9FамС%8FС%82РЅРёРє_Р·Р°С%82опленнС%8BРј_РєРѕС%80аблС%8FРј_РІ_СеваС%81С%82ополе"
instead of "Памятник затопленным кораблям в Севастополе.JPG"  Please, fix it.
Comment 1 Platonides 2012-10-07 20:43:15 UTC
As answered in the mailing list, that's a wget problem.

The list generated by my tool correctly uses:
http://upload.wikimedia.org/wikipedia/commons/a/ad/%D0%9F%D0%B0%D0%BC%D1%8F%D1%82%D0%BD%D0%B8%D0%BA_%D0%B7%D0%B0%D1%82%D0%BE%D0%BF%D0%BB%D0%B5%D0%BD%D0%BD%D1%8B%D0%BC_%D0%BA%D0%BE%D1%80%D0%B0%D0%B1%D0%BB%D1%8F%D0%BC_%D0%B2_%D0%A1%D0%B5%D0%B2%D0%B0%D1%81%D1%82%D0%BE%D0%BF%D0%BE%D0%BB%D0%B5.JPG


The problem seems to lie in wget when extracting to a local filename.

If you are using *nix with a utf-8 filesystem, pass the
--restrict-file-names=nocontrol parameter to wget.

If you're using Windows you will end up with utf-8 encoded filenames, so
you'd need another script to decode them to the format used by Windows.
Comment 2 Andre Klapper 2012-10-09 16:02:56 UTC
Andrij: Does comment 1 help?
Comment 3 Andrij 2012-10-09 17:07:15 UTC
Unfortunately no. I could not understand how could i "pass the
--restrict-file-names=nocontrol parameter to wget".
Comment 4 Platonides 2012-10-13 15:52:58 UTC
Andrij, you would add that inside download.bat

I could try downloading the category for you if that helps.

I reported the problem upstream https://savannah.gnu.org/bugs/index.php?37564 This should be fixed at wget level.
Comment 5 Liangent 2012-10-13 15:55:48 UTC
Does this bug belongs to this bugzilla?
Comment 6 Andre Klapper 2012-11-14 15:21:47 UTC
Andrij: Toolserver issues should be filed at https://jira.toolserver.org/secure/Dashboard.jspa

Closing as "INVALID" simply because this bug database is not the place where this report should be, but not because the report itself is invalid.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links