Last modified: 2014-11-15 06:24:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T75151, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 73151 - Site object becomes a string in MySQLPageGenerator, throws error
Site object becomes a string in MySQLPageGenerator, throws error
Status: PATCH_TO_REVIEW
Product: Pywikibot
Classification: Unclassified
pagegenerators (Other open bugs)
core-(2.0)
All All
: Unprioritized normal
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-11-07 19:35 UTC by Guillaume Paumier
Modified: 2014-11-15 06:24 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Guillaume Paumier 2014-11-07 19:35:45 UTC
MySQLPageGenerator says it can take a 'site' argument as either a Site object or a string (that represents the dbname). If 'site' isn't given, it defaults to the current Site.

In all cases, site ends up being a string, either because that's what was passed as an argument, or because the Site object is replaced by the dbname in 'site = site.dbName()'

However, a few lines later, site is expected to be a Site object again in 'query = query.encode(site.encoding())'.

This throws an AttributeError: 'unicode' object has no attribute 'encoding'.
Comment 1 Merlijn van Deen (test) 2014-11-07 19:48:40 UTC
The "query.encode(site.encoding())" only makes sense if:

1) the values in the database are encoded in site.encoding, but stored in a "latin-1" [1] column as  bytes (i.e. not using utf8/"utf8mb4" [2] charset/collations in mysql)
2) the communication with mysql is in latin-1 (there is no SET NAMES utf8 and character_set_client  / character_set_results  / character_set_connection  are not set)

[1] "latin-1" as it's actually windows-1252, but MySQL calls it latin-1.

Basically, there are two pieces of relevant information:
1) what charset does mysql think the table is in (WMF: latin-1. Many other contexts: utf-8)
   --> run SET NAMES <XXX> for this charset
2) what charset is the data actually in (WMF: utf-8. Other contexts might use latin-1 or others)
   --> decode bytes we get from mysql using this charset.
Comment 2 Gerrit Notification Bot 2014-11-14 19:05:35 UTC
Change 173332 had a related patch set uploaded by Merlijn van Deen:
Bug 73151: split use of 'site' and 'dbname'

https://gerrit.wikimedia.org/r/173332

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links