Last modified: 2014-11-13 14:18:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T47925, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 45925 - Trim spaces around statements of string type
Trim spaces around statements of string type
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: High normal with 1 vote (vote)
: ---
Assigned To: Wikidata bugs
https://www.wikidata.org/w/index.php?...
u=dev c=backend p=0
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-09 09:04 UTC by Raimond Spekking
Modified: 2014-11-13 14:18 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Raimond Spekking 2013-03-09 09:04:58 UTC
I suggest to trim leading/trailing spaces around statements of string type.

Due to a c&p error I added some trainling spaces at a VIAF statement and had to remove them with an extra edit, see URL.
Comment 1 Mushroom 2014-04-04 14:44:03 UTC
The "malformed input" error is thrown by the RegexValidator in function buildStringType from WikibaseDataTypeBuilders. It often happens when copy-pasting text from other sources (see http://www.wikidata.org/wiki/Wikidata:Project_chat#Error when adding commonscat) and it is made worse by bug 63301 which sometimes adds newlines when pressing return to save the claim, thus triggering the error.

The trimming was previously done in ValueView but it has since been removed (see http://github.com/wmde/ValueView/commit/0a8350999bb1ee028db9487286173aee0b20640f). We could use the StringNormalizer class to trim the string and also remove incomplete UTF-8 sequences (see related bug 50486), however I'm not familiar with the data value processing code so I'm not sure where is the correct place to do it.
Comment 3 Adrian Lang 2014-04-07 08:48:12 UTC
Just to be clear: ValueView did not trim the value, it just checked for empty or whitespace-only strings.
Comment 4 Henning 2014-10-29 07:47:37 UTC
At the time of writing, the current behaviour seems to be: When a value with a leading space character is submitted, the back-end parser will return a "malformed input" error.
I wonder whether silent trimming should rather be implemented in the back-end parser instead of in the front-end.
Can some authority please decide on that?
Comment 5 Lydia Pintscher 2014-11-03 18:06:10 UTC
IIRC When I discussed this with Daniel his comment was that the parser shouldn't do anything to the string but only parse it. That makes sense to me.
If we only do it in the frontend then these could still get in via the API for example. That seems sub-optimal.

Daniel: Can you chime in on the pros and cons of doing this in the backend please?
Comment 6 Daniel Kinzler 2014-11-13 14:04:00 UTC
I said that? Hm, then I have to disagree with myself there... The parsevalue API module (resp. StringValueParser) should trim input (and apply utf8 normalization).
Comment 7 Daniel Kinzler 2014-11-13 14:18:11 UTC
Hm... apparently, StringValueParser does not create StringValues from strings. We seem to use the NullParser for this, which returns an UnknownValue. Whatever.

So, let's have an actual StringValueParser that applies normalization.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links