Last modified: 2013-03-26 13:16:12 UTC
http://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rcprop=title%7Cids%7Csizes%7Cflags%7Cuser%7Ccomment%7Ctimestamp%7Cloginfo&rclimit=200&format=xml The above link returns an error page now. Removing "%7Cloginfo" will get an XML. We rely on loginfo to get full updates of a mediawiki site. Currently nothing works.
https://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rcprop=title%7Cids%7Csizes%7Cflags%7Cuser%7Ccomment%7Ctimestamp%7Cloginfo&rclimit=200&format=xmlfm ^ that works fine. So certainly it's not "nothing works"
Problem is XML parsing error. The offending bit appears to be: ... logid="44650407" logtype="articlefeedbackv5" logaction="helpful" 4::feedbackId="348221" 5::pageId="34684163" ... the 4::feedbackId and 5::pageId are invalid attribute names (I'm not sure whether it's the double colon or the initial digit offhand that's the prob). These really shouldn't be in the output, not sure how it happens.
Note that the original link no longer fails because the offending item has scrolled off. At this moment rclimit=2000 still shows it.
(In reply to comment #2) > Problem is XML parsing error. The offending bit appears to be: > > ... logid="44650407" logtype="articlefeedbackv5" logaction="helpful" > 4::feedbackId="348221" 5::pageId="34684163" ... > > the 4::feedbackId and 5::pageId are invalid attribute names (I'm not sure > whether it's the double colon or the initial digit offhand that's the prob). > > These really shouldn't be in the output, not sure how it happens. https://gerrit.wikimedia.org/r/#/c/23380/ https://gerrit.wikimedia.org/r/#/c/23575/
(In reply to comment #1) > https://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rcprop=title%7Cids%7Csizes%7Cflags%7Cuser%7Ccomment%7Ctimestamp%7Cloginfo&rclimit=200&format=xmlfm > > ^ that works fine. So certainly it's not "nothing works" by "nothing works" I mean it's on our side. we have a monitor process polling the API to get every update on all wikipedia site and then crawl the update and update our internal storage. At first the bug only happened on EN wikipedia, now all wikipedia sites suffer this. So our system is complete down.
(In reply to comment #2) > Problem is XML parsing error. The offending bit appears to be: > > ... logid="44650407" logtype="articlefeedbackv5" logaction="helpful" > 4::feedbackId="348221" 5::pageId="34684163" ... > > the 4::feedbackId and 5::pageId are invalid attribute names (I'm not sure > whether it's the double colon or the initial digit offhand that's the prob). > > These really shouldn't be in the output, not sure how it happens. Yeah I'm not quite sure where this is coming from, *something* is giving us bad data. I suppose it would be ArticleFeedbackv5, I'll look into that.
(In reply to comment #6) > Yeah I'm not quite sure where this is coming from, *something* is giving us bad > data. I suppose it would be ArticleFeedbackv5, I'll look into that. Confirmed it's AFTv5's fault, talked to Mathias and he said he'd fix it later today.
Confirmed.
(In reply to comment #8) > Confirmed. What is the current status now? We can't get updates from wikipedia because of this bug.
Seems https://gerrit.wikimedia.org/r/#/c/23623/ fixed this bug.
(In reply to comment #10) > Seems https://gerrit.wikimedia.org/r/#/c/23623/ fixed this bug. Is this code in use now? I can still saw 4::feedbackId 1 hour ago. For example: <rc type="log" ns="-1" title="Special:ArticleFeedbackv5/The Philadelphia Inquirer/254606" rcid="527746847" pageid="0" revid="0" old_revid="0" user="Medvedenko" oldlen="0" newlen="0" timestamp="2012-09-14T03:13:48Z" comment="" logid="44672742" logtype="articlefeedbackv5" logaction="unhelpful" 4::feedbackId="254606" 5::pageId="102952" />
Added permanent url https://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rcprop=title|ids|sizes|flags|user|comment|timestamp|loginfo&rclimit=20&format=xml&rcstart=2012-09-14T03:13:48Z There will be stray log entries left that need to be dealt with, even when that fix is live. ApiFormatXml should be able to check if all the attribute keys are valid before simply outputing them.
A little bit more information from <http://msdn.microsoft.com/en-us/library/ms256152.aspx> "Like element names, attribute names are case-sensitive and must start with a letter or underscore. The rest of the name can contain letters, digits, hyphens, underscores, and periods." I think the best solution is to prefix everything that does not start with a letter or underscore with an underscore, and replace every special character with an underscore.
*** Bug 40299 has been marked as a duplicate of this bug. ***
(In reply to comment #11) > (In reply to comment #10) > > Seems https://gerrit.wikimedia.org/r/#/c/23623/ fixed this bug. > > Is this code in use now? I can still saw 4::feedbackId 1 hour ago. For example: > > <rc type="log" ns="-1" title="Special:ArticleFeedbackv5/The Philadelphia > Inquirer/254606" rcid="527746847" pageid="0" revid="0" old_revid="0" > user="Medvedenko" oldlen="0" newlen="0" timestamp="2012-09-14T03:13:48Z" > comment="" logid="44672742" logtype="articlefeedbackv5" logaction="unhelpful" > 4::feedbackId="254606" 5::pageId="102952" /> The fix is now live, not sure why Roan didn't do it before. We need a script to clean up these bad entries.. I thought re-running loggingUpdate.php would've fixed them..
Sam: loggingUpdate should indeed have fixed them, and quickly skimming the entries in logging-table, they seem fine now. E.g.: a:2:{s:10:"feedbackId";i:67680;s:6:"pageId";i:29219160;}
(In reply to comment #16) > Sam: loggingUpdate should indeed have fixed them, and quickly skimming the > entries in logging-table, they seem fine now. E.g.: > a:2:{s:10:"feedbackId";i:67680;s:6:"pageId";i:29219160;} Must be squid caching or similar...
Note: might also want to update documentation (http://www.mediawiki.org/wiki/Manual:Logging_to_Special:Log) - still encourages to use parameter numbering (// Parameter numbering should start from 4.)
It seems the the history data is not fixed correctly: https://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rcprop=title|ids|sizes|flags|user|comment|timestamp|loginfo&rclimit=20&format=xml&rcstart=2012-09-14T03:13:48Z
Just looked at the data in the db and it's just fine in there: mysql> SELECT log_params FROM logging WHERE log_id = 44672742; +---------------------------------------------------------+ | log_params | +---------------------------------------------------------+ | a:2:{s:10:"feedbackId";i:254606;s:6:"pageId";i:102952;} | +---------------------------------------------------------+ 1 row in set (0.00 sec) Still some cache persisting, apparently.