Last modified: 2014-09-09 02:23:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57195, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55195 - Invalid Title in flickrripper
Invalid Title in flickrripper
Status: NEW
Product: Pywikibot
Classification: Unclassified
Other scripts (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-05 04:41 UTC by Kunal Mehta (Legoktm)
Modified: 2014-09-09 02:23 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2013-10-05 04:41:15 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1466/
Reported by: betacommand
Created on: 2012-06-19 19:52:25
Subject: Invalid Title in flickrripper
Assigned to: xqt
Original description:
Betacommand	multichill: I know you wrote flickrripper.py and Im trying to fix an issue with it, and thought it might be easier for you to fix
Betacommand	lines 157-161 where it grabs the description and uses it for the file name
Betacommand	when you start working with non-latin descriptions it doesnt handle multi-byte characters well, it ended up with a title over 320 bytes
Betacommand	the max mediawiki lets you have is 255
multichill	Lol
Betacommand	multichill: really rather a pain
multichill	So the check shoul probably encode it and than see how long it is?
Betacommand	correct
multichill	Or just lower the limit a bit?
Betacommand	thai letters for example are 3 bytes
Betacommand	notes it was discovered with flickrripper.py -autonomous -user\_id:40561337@N07 -addcategory:"Files from Abhisit Vejjajiva Flickr stream"
multichill	Betacommand: Could you file a bug for this?
Betacommand	multichill: you would need to cut it down to 85 to be safe
Comment 1 Kunal Mehta (Legoktm) 2013-10-05 04:41:17 UTC
I guess the title is cutted by mw and not the slice operator since it works correct for unicode strings. len\(\) also gives the number of characters not the number bytes. Do we have any size\(object\) method?
Comment 2 Kunal Mehta (Legoktm) 2013-10-05 04:41:19 UTC
an idea for getFilename \(could anybody test it whether it works\)

if not title:
\#find the max length for a mw title
maxBytes = 240 - len\(project.encode\('utf-8'\)\) \
\- len\(username.encode\('utf-8'\)\)
description = photoInfo.find\('photo'\).find\('description'\).text
if description:
descBytes = len\(description.encode\('utf-8'\)\)
if descBytes > maxBytes:
\# maybe we cut more than needed, anyway we do it
items = max\(0, len\(description\) - maxBytes + descBytes\)
description = description\[:items\]
title = cleanUpTitle\(description\)
else:
title = u''
\# Should probably have the id of the photo as last resort.
Comment 3 Kunal Mehta (Legoktm) 2013-10-05 04:41:21 UTC
- **assigned_to**: nobody --> xqt
Comment 4 Kunal Mehta (Legoktm) 2013-10-05 04:41:22 UTC
fix committed in r10387, please check
Comment 5 Kunal Mehta (Legoktm) 2013-10-05 04:41:24 UTC
- **summary**: Invalid Title --> Invalid Title in flickrripper
Comment 6 Kunal Mehta (Legoktm) 2013-10-05 04:41:26 UTC
- **status**: open --> pending
Comment 7 Kunal Mehta (Legoktm) 2013-10-05 04:41:27 UTC
- **status**: pending --> pending-fixed
Comment 8 Kunal Mehta (Legoktm) 2013-10-05 04:41:29 UTC
Issue still not fixed, actually its worse
C:\Dev\SVN\pywikipedia>flickrripper.py -autonomous -user\_id:40561337@N07 -addcat
egory:"Files from Abhisit Vejjajiva Flickr stream"
5703017392
Traceback \(most recent call last\):
File "C:\Dev\SVN\pywikipedia\flickrripper.py", line 609, in <module>
main\(\)
File "C:\Dev\SVN\pywikipedia\flickrripper.py", line 599, in main
removeCategories, autonomous\)
File "C:\Dev\SVN\pywikipedia\flickrripper.py", line 257, in processPhoto
filename = getFilename\(photoInfo\)
File "C:\Dev\SVN\pywikipedia\flickrripper.py", line 172, in getFilename
% \(title, project, username\)\).exists\(\):
File "C:\Dev\SVN\pywikipedia\wikipedia.py", line 1284, in exists
self.get\(\)
File "C:\Dev\SVN\pywikipedia\wikipedia.py", line 705, in get
expandtemplates = expandtemplates\)
File "C:\Dev\SVN\pywikipedia\wikipedia.py", line 787, in \_getEditPage
raise BadTitle\('BadTitle: %s' % self\)
pywikibot.exceptions.BadTitle: BadTitle: \[\[commons:File:&\#3609;&\#3634;&\#3618;&\#3
585;&\#3619;&\#3633;&\#3600;&\#3617;&\#3609;&\#3605;&\#3619;&\#3637; &\#3649;&\#3621;&\#363
2;&\#3588;&\#3603;&\#3632;&\#3648;&\#3604;&\#3636;&\#3609;&\#3607;&\#3634;&\#3591;&\#3629;&
\#3629;&\#3585;&\#3592;&\#3634;&\#3585;&\#3585;&\#3619;&\#3640;&\#3591;&\#3592;&\#3634;&\#35
85;&\#3634;&\#3619;&\#3660;&\#3605;&\#3634; &\#3626;&\#3634;&\#3608;&\#3634;&\#3619;&\#3603
;&\#3619;&\#3633;&\#3600;&\#3629;&\#3636;&\#3609;&\#3650;&\#3604;&\#3609;&\#3637;&\#3648;&\#
3595;&\#3637;&\#3618;&\#3585;&\#3621;&\#3633;&\#3610;&\#3618;&\#3633;&\#3591;&\#3611;&\#361
9;&\#3632;&\#3648;&\#3607;&\#3624;&\#3652;&\#3607;&\#3618; &\#3623;&\#3633;&\#3609;&\#3629;
&\#3634;&\#3607;&\#3636;&\#3605;&\#3618;&\#3660;&\#3607;&\#3637;&\#3656; 8 &\#3614;&\#3620;
&\#3625;&\#3616;&\#3634;&\#3588;&\#3617; &\#3614;.&\#3624;.2554 \(Photographer attached
to the Prime Minister of the Kingdom of Thailand \(H.E.Mr.Abhisit Vejjajiva\) , Pe
erapat Wimolrungkarat - &\#3614;&\#3637;&\#3619;&\#3614;&\#3633;&\#3602;&\#3609;&\#3660;
&\#3623;&\#3636;&\#3617;&\#3621;&\#3619;&\#3633;&\#3591;&\#3588;&\#3619;&\#3633;&\#3605;&\#
3609;&\#3660;\) @is50mm - Flickr - Abhisit Vejjajiva.jpg\]\]
Comment 9 Kunal Mehta (Legoktm) 2013-10-05 04:41:31 UTC
- **status**: pending-fixed --> open
Comment 10 Kunal Mehta (Legoktm) 2013-10-05 04:41:33 UTC
Where are the html entities from? Are they part of the flickr page?
Comment 11 Kunal Mehta (Legoktm) 2013-10-05 04:41:34 UTC
those are the thai parts of the page title that are being converted when the exception is being thrown
Comment 12 Kunal Mehta (Legoktm) 2013-10-05 04:41:36 UTC
I do not see a conversion by the exception. I converted the title from html entities to unicode in my last commit
Comment 13 Kunal Mehta (Legoktm) 2013-10-05 04:41:38 UTC
Line 787 doesnt return the title, it returns the whole page \(self\) when you print the object and not the title it gets converted there. I used a log to confirm that the title was UTF-8 before filling this bug,
Comment 14 Kunal Mehta (Legoktm) 2013-10-05 04:41:40 UTC
Thanks for testing. The lenght calculation was wrong. I've corrected it

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links