Last modified: 2014-02-12 23:40:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T47806, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 45806 - en.planet stopped updating
en.planet stopped updating
Status: REOPENED
Product: Wikimedia
Classification: Unclassified
Planet (Other open bugs)
wmf-deployment
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-06 20:51 UTC by Bawolff (Brian Wolff)
Modified: 2014-02-12 23:40 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bawolff (Brian Wolff) 2013-03-06 20:51:18 UTC
En planet hasnt updated since march 2
Comment 1 Daniel Zahn 2013-03-08 19:50:44 UTC
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 32: ordinal not in range(128)


hrm... Django/utf-8/something along these lines:

http://stackoverflow.com/questions/2513027/encoding-gives-ascii-codec-cant-encode-character-ordinal-not-in-range128
Comment 2 Daniel Zahn 2013-03-09 00:21:22 UTC
i got it to update once, so now it's March 07, but after that i ran into the same issue again. keep open
Comment 3 Bawolff (Brian Wolff) 2013-03-09 00:50:45 UTC
Sounds kind of like bug 44569 but not quite
Comment 4 Daniel Zahn 2013-03-09 02:25:56 UTC
could be fixed (for now) by deleting all content from the cache directory and re-running:

root@zirconium:/var/cache/planet/en/     

rm *

sudo -u planet /usr/bin/planet -v /usr/share/planet-venus/wikimedia/en/config.ini 


this also fixed the atom link http://en.planet.wikimedia.org/atom.xml
Comment 5 Bawolff (Brian Wolff) 2013-03-12 05:29:46 UTC
Looks like the issue is back:
Last updated:March 10, 2013 09:02 PM
Comment 6 jeremyb 2013-03-13 06:02:25 UTC
dzahn reran Comment #4 and it's still stuck:

13 05:55:55 < jeremyb_> mutante: did planet finish?
13 05:56:50 < mutante> unfortunately, no. UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 32: ordinal not in range(128)
Comment 7 Daniel Zahn 2013-03-13 06:23:59 UTC
INFO:planet.runner:Loading cached data
Traceback (most recent call last):
  File "/usr/bin/planet", line 138, in <module>
    splice.apply(doc.toxml('utf-8'))
  File "/usr/lib/pymodules/python2.7/planet/splice.py", line 118, in apply
    output_file = shell.run(template_file, doc)
  File "/usr/lib/pymodules/python2.7/planet/shell/__init__.py", line 66, in run
    module.run(template_resolved, doc, output_file, options)
  File "/usr/lib/pymodules/python2.7/planet/shell/tmpl.py", line 254, in run
    for key,value in template_info(doc).items():
  File "/usr/lib/pymodules/python2.7/planet/shell/tmpl.py", line 193, in template_info
    data=feedparser.parse(source)
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 3525, in parse
    feedparser.feed(data)
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 1662, in feed
    sgmllib.SGMLParser.feed(self, data)
  File "/usr/lib/python2.7/sgmllib.py", line 104, in feed
    self.goahead(0)
  File "/usr/lib/python2.7/sgmllib.py", line 143, in goahead
    k = self.parse_endtag(i)
  File "/usr/lib/python2.7/sgmllib.py", line 320, in parse_endtag
    self.finish_endtag(tag)
  File "/usr/lib/python2.7/sgmllib.py", line 360, in finish_endtag
    self.unknown_endtag(tag)
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 569, in unknown_endtag
    method()
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 1512, in _end_content
    value = self.popContent('content')
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 849, in popContent
    value = self.pop(tag)
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 764, in pop
    mfresults = _parseMicroformats(output, self.baseuri, self.encoding)
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 2219, in _parseMicroformats
    p.vcard = p.findVCards(p.document)
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 2161, in findVCards
    sVCards += '\n'.join(arLines) + '\n'
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 32: ordinal not in range(128)
Comment 8 Betacommand 2013-03-14 22:30:38 UTC
You should treat all strings in python as unicode not ASCII, 

sVCards += '\n'.join(arLines) + '\n'
becomes
sVCards += u'\n'.join(arLines) + u'\n'
Comment 9 jeremyb 2013-03-14 22:31:51 UTC
(In reply to comment #8)
> You should treat all strings in python as unicode not ASCII, 

That's surely not the root cause, right?
Comment 10 Betacommand 2013-03-14 22:33:45 UTC
It is a matter of a UTF-8 string coming in and being treated as ASCII
Comment 11 Betacommand 2013-03-14 22:36:30 UTC
See http://docs.python.org/2/howto/unicode.html#the-unicode-type for exact duplication of this issue
Comment 12 Daniel Zahn 2013-03-15 02:48:34 UTC
thank you very much for the reply, but it still does not work when i changed line 2161 in feedparser.py
...
    sVCards += u'\n'.join(arLines) + u'\n'
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 8: ordinal not in range(128)

----

INFO:planet.runner:Loading cached data
Traceback (most recent call last):
  File "/usr/bin/planet", line 138, in <module>
    splice.apply(doc.toxml('utf-8'))
  File "/usr/lib/pymodules/python2.7/planet/splice.py", line 118, in apply
    output_file = shell.run(template_file, doc)
  File "/usr/lib/pymodules/python2.7/planet/shell/__init__.py", line 66, in run
    module.run(template_resolved, doc, output_file, options)
  File "/usr/lib/pymodules/python2.7/planet/shell/tmpl.py", line 254, in run
    for key,value in template_info(doc).items():
  File "/usr/lib/pymodules/python2.7/planet/shell/tmpl.py", line 193, in template_info
    data=feedparser.parse(source)
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 3525, in parse
    feedparser.feed(data)
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 1662, in feed
    sgmllib.SGMLParser.feed(self, data)
  File "/usr/lib/python2.7/sgmllib.py", line 104, in feed
    self.goahead(0)
  File "/usr/lib/python2.7/sgmllib.py", line 143, in goahead
    k = self.parse_endtag(i)
  File "/usr/lib/python2.7/sgmllib.py", line 320, in parse_endtag
    self.finish_endtag(tag)
  File "/usr/lib/python2.7/sgmllib.py", line 360, in finish_endtag
    self.unknown_endtag(tag)
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 569, in unknown_endtag
    method()
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 1512, in _end_content
    value = self.popContent('content')
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 849, in popContent
    value = self.pop(tag)
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 764, in pop
    mfresults = _parseMicroformats(output, self.baseuri, self.encoding)
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 2219, in _parseMicroformats
    p.vcard = p.findVCards(p.document)
  File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 2161, in findVCards
    sVCards += u'\n'.join(arLines) + u'\n'
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 8: ordinal not in range(128)
Comment 13 Betacommand 2013-03-15 02:58:10 UTC
You need to go through the file and make sure that all references and assignments make sVCards unicode also, otherwise you are just repeating the same error when you attempt to +=
Comment 14 Daniel Zahn 2013-03-15 03:01:49 UTC
searching through all of feedparser.py, there are just 3 occurences.  sVCards = '' in the beginning, the one we changed and the returning it.


for tonight i simply commented line 2161 in feedparser.py

                # sVCards += u'\n'.join(arLines) + u'\n'

as the temporary fix as i figured this is just because a vCard is found in one feed. This way sVCards should simply be returned empty. (        sVCards = '').

and this made the update work for now. We are back to March 14th on https://en.planet.wikimedia.org/
Comment 15 Betacommand 2013-03-15 03:04:03 UTC
sVCards = u''
is the proper method for creating a unicode string
Comment 16 Daniel Zahn 2013-03-15 05:39:40 UTC
thanks, i tried. but also sVCards = u'' in line 1949 in addition to that did not fix it yet. back to commenting 2161. 

updates work for now, but not the real fix and we should report upstream

and compare this: https://github.com/rubys/venus/blob/master/planet/vendor/feedparser.py  to the one we have from planet-venus version "0~bzr116-1" in Ubuntu precise
Comment 17 Daniel Zahn 2013-03-15 06:02:42 UTC
http://www.intertwingly.net/code/venus/

in http://www.intertwingly.net/code/venus/docs/index.html it links to http://feedparser.org/docs/ but that is a parking domain at godaddy, sigh.

http://www.intertwingly.net/code/venus/AUTHORS

Package: planet-venus
Priority: optional
Section: universe/python
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Noah Slater <nslater@tumbolia.org>
Version: 0~bzr116-1
Comment 18 jeremyb 2013-03-15 06:15:43 UTC
(In reply to comment #17)
> in http://www.intertwingly.net/code/venus/docs/index.html it links to
> http://feedparser.org/docs/ but that is a parking domain at godaddy, sigh.

Try https://code.google.com/p/feedparser/ and http://pythonhosted.org/feedparser/

see also http://feedvalidator.org/
Comment 19 Daniel Zahn 2013-03-15 06:32:51 UTC
lowering importance to normal/normal because updates are working and en.planet is March 15 due to live hack.
Comment 20 Daniel Zahn 2013-08-08 07:56:44 UTC
well, all that would be todo here is reporting to upstream.. but we're fine with the hack ...
Comment 21 Andre Klapper 2013-08-08 08:31:57 UTC
If there is a clear testcase and if somebody tells me where upstream is I can forward the ticket. https://github.com/rubys/venus/issues ? https://code.google.com/p/feedparser/issues/list ?
Comment 22 Daniel Zahn 2013-10-04 08:53:19 UTC
upstream is the github link you pasted. feedparser would be upstream for them i suppose. but it might also be a request to Ubuntu to have newer packages.. hrmm

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links