Last modified: 2014-11-15 06:24:41 UTC
MySQLPageGenerator says it can take a 'site' argument as either a Site object or a string (that represents the dbname). If 'site' isn't given, it defaults to the current Site. In all cases, site ends up being a string, either because that's what was passed as an argument, or because the Site object is replaced by the dbname in 'site = site.dbName()' However, a few lines later, site is expected to be a Site object again in 'query = query.encode(site.encoding())'. This throws an AttributeError: 'unicode' object has no attribute 'encoding'.
The "query.encode(site.encoding())" only makes sense if: 1) the values in the database are encoded in site.encoding, but stored in a "latin-1" [1] column as bytes (i.e. not using utf8/"utf8mb4" [2] charset/collations in mysql) 2) the communication with mysql is in latin-1 (there is no SET NAMES utf8 and character_set_client / character_set_results / character_set_connection are not set) [1] "latin-1" as it's actually windows-1252, but MySQL calls it latin-1. Basically, there are two pieces of relevant information: 1) what charset does mysql think the table is in (WMF: latin-1. Many other contexts: utf-8) --> run SET NAMES <XXX> for this charset 2) what charset is the data actually in (WMF: utf-8. Other contexts might use latin-1 or others) --> decode bytes we get from mysql using this charset.
Change 173332 had a related patch set uploaded by Merlijn van Deen: Bug 73151: split use of 'site' and 'dbname' https://gerrit.wikimedia.org/r/173332