Last modified: 2014-09-05 12:27:27 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59458, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57458 - '\n' are added to various elements in CommonsMetadata output
'\n' are added to various elements in CommonsMetadata output
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CommonsMetadata (Other open bugs)
unspecified
All All
: Unprioritized normal (vote)
: ---
Assigned To: Tisza Gergő
:
: 66652 69497 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-22 23:38 UTC by Jean-Fred
Modified: 2014-09-05 12:27 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Comment 1 Jean-Fred 2013-11-23 00:43:46 UTC
Looking more around, '\n' are added to several values:

See <https://commons.wikimedia.org/wiki/Special:ApiSandbox#action=query&prop=imageinfo&format=json&iiprop=extmetadata&iilimit=10&titles=File%3ACompans%20lake%20-%20Anas%20platyrhynchos%2007.JPG> :

"Credit": {
"value": "\nSelf-photographed",
"source": "commons-desc-page",
"hidden": ""
},
"LicenseUrl": {
"value": "http://creativecommons.org/licenses/by-sa/3.0\n",
"source": "commons-desc-page",
"hidden": ""
},
"LicenseShortName": {
"value": "CC-BY-SA-3.0\n",
"source": "commons-desc-page",
"hidden": ""
},
"UsageTerms": {
"value": "Creative Commons Attribution-Share Alike 3.0\n",
"source": "commons-desc-page",
"hidden": ""
},
Comment 2 Gerrit Notification Bot 2013-11-26 15:47:39 UTC
Change 97743 had a related patch set uploaded by Gergő Tisza:
Trim HTML-based metadata values

https://gerrit.wikimedia.org/r/97743
Comment 3 Gerrit Notification Bot 2013-12-10 15:27:04 UTC
Change 97743 abandoned by Gergő Tisza:
Trim HTML-based metadata values

Reason:
Abandoning this change since InformationParser has been completely rewritten in the meantime.

https://gerrit.wikimedia.org/r/97743
Comment 4 Gerrit Notification Bot 2014-03-25 21:49:35 UTC
Change 120948 had a related patch set uploaded by Gergő Tisza:
Clean parsed HTML

https://gerrit.wikimedia.org/r/120948
Comment 5 Gerrit Notification Bot 2014-03-27 09:47:19 UTC
Change 120948 merged by jenkins-bot:
Clean parsed HTML

https://gerrit.wikimedia.org/r/120948
Comment 6 Lokal_Profil 2014-05-21 08:52:09 UTC
This issue is occurring again. See e.g. https://commons.wikimedia.org/w/api.php?action=query&prop=imageinfo&format=json&iiprop=commonmetadata|extmetadata&iilimit=1&titles=File%3ALandsort%20Lighthouse%20August%202013%2009.jpg

where
"LicenseShortName": {
    "value": "CC-BY-SA-3.0\n",
    "source": "commons-desc-page",
    "hidden": ""
},
"UsageTerms": {
    "value": "Creative Commons Attribution-Share Alike 3.0\n",
    "source": "commons-desc-page",
    "hidden": ""
},
"LicenseUrl": {
    "value": "http://creativecommons.org/licenses/by-sa/3.0\n",
    "source": "commons-desc-page",
    "hidden": ""
},
Comment 7 Lokal_Profil 2014-08-13 12:00:30 UTC
Looking at the html source of the example above [1] there is no trace of these newline characters. Hence it might not be a cleaning/trimming issue in the TemplateParser but rather inserted by it?

[1] https://commons.wikimedia.org/wiki/File:Landsort_Lighthouse_August_2013_09.jpg
Comment 8 Tisza Gergő 2014-08-14 13:56:08 UTC
*** Bug 69497 has been marked as a duplicate of this bug. ***
Comment 9 Lupo 2014-08-14 13:58:39 UTC
As stated in bug 69497, these newlines are in the license template, and the code doing the HTML scraping there had better remove them.
Comment 10 Tisza Gergő 2014-08-14 14:05:19 UTC
The code to remove is in https://gerrit.wikimedia.org/r/#/c/120948/1/TemplateParser.php which at a glance seems correct to me. Also, Lokal_Profil is right that the newline is not always present in the HTML code. I'll test locally with the examples mentioned here.
Comment 11 Lupo 2014-08-14 14:41:31 UTC
This code does _not_ look good. '/^\s+(.*)\s+$/' is wrong. It fails to trim if there are no leading blanks (or no trailing blanks). And watch out for the greedy (.*), that also looks wrong.
Comment 12 Lupo 2014-08-14 14:48:16 UTC
(In reply to Tisza Gergő from comment #10)
> Also, Lokal_Profil is right that the newline is
> not always present in the HTML code. I'll test locally with the examples
> mentioned here.

Not correct. See

https://commons.wikimedia.org/w/api.php?action=query&prop=imageinfo&iiprop=extmetadata&format=jsonfm&titles=File:Landsort_Lighthouse_August_2013_09.jpg

Returns the same trailing newlines for UsageTerms and LicenseUrl.
Comment 13 Gerrit Notification Bot 2014-08-23 16:14:59 UTC
Change 155901 had a related patch set uploaded by TheDJ:
TemplateParser: Fix whitespace trim

https://gerrit.wikimedia.org/r/155901
Comment 14 Gerrit Notification Bot 2014-08-23 16:40:27 UTC
Change 155901 merged by jenkins-bot:
TemplateParser: Fix whitespace trim

https://gerrit.wikimedia.org/r/155901
Comment 15 Tisza Gergő 2014-08-23 16:44:08 UTC
(In reply to Lupo from comment #11)
> This code does _not_ look good. '/^\s+(.*)\s+$/' is wrong. It fails to trim
> if there are no leading blanks (or no trailing blanks). And watch out for
> the greedy (.*), that also looks wrong.

D'oh, that was stupid. Thanks for fixing, Lupo & TheDJ!
Comment 16 Tisza Gergő 2014-09-05 12:27:27 UTC
*** Bug 66652 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links