Last modified: 2014-07-29 13:39:04 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70565, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 68565 - SnakUnserializer must not fail on mismatching hashes
SnakUnserializer must not fail on mismatching hashes
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: High critical (vote)
: ---
Assigned To: Wikidata bugs
u=dev c=backend p=8 s=2014-07-15
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-25 12:15 UTC by Daniel Kinzler
Modified: 2014-07-29 13:39 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Daniel Kinzler 2014-07-25 12:15:48 UTC
SnakUnserializer will check the hash of the unserialized snak against the snak field in the serialization, and fails if they are not equal. The fail-fast approach is counter to the need to be able to read old or seemingly corrupt data.

There are two critical cases:

1) the value type used in the snak is not configured on the local system. The value would then unserialize into a UnDeserializableValue object, which would round-trip cleanly - but with the hash check, the unserialization is aborted.

2) the structure of the snak or value class changes a bit (e.g. we decide to add a field, or apply additional normalization). Even if the unserialization is backward compatible, we would still fail because the hash is now different.

Number (2) is especially bad because getHash() is implemented as sha1( serialize( $this ) ). This uses php's internal serialization, not SnakObject::serialize - so it would change even if we just re-arrange the members in a class, or rename a private field. If we change this to sha1( serialize( $this->serialize() ) ), we would have more control, but could still never add or re-arrange fields.

So, in order to be able to read legacy data, SnakUnserializer must not be picky about the hash. Perhaps an SerializationErrorMonitor of some sort could be injected to decide whether to ignore the problem, log it, or throw an exception, depending on the needs of the respective use case.
Comment 1 Daniel Kinzler 2014-07-25 12:21:55 UTC
Setting to "critical", because it blocks the switch to the new data model.
Comment 2 Jeroen De Dauw 2014-07-26 04:20:36 UTC
The hash is not part of the value, this should never have been done by the deserializer.

Killed by https://github.com/wmde/WikibaseDataModelSerialization/pull/72
Comment 3 Jeroen De Dauw 2014-07-26 04:21:21 UTC
Wow, this has 8 points? Nice. That means I can now take a day off I guess :)
Comment 4 Jeroen De Dauw 2014-07-26 05:39:37 UTC
While this fixes the issue at hand, the way the snak hash is implemented sucks. Created a ticket https://github.com/wmde/WikibaseDataModel/issues/130
Comment 5 Daniel Kinzler 2014-07-28 11:51:11 UTC
(In reply to Jeroen De Dauw from comment #3)
> Wow, this has 8 points? Nice. That means I can now take a day off I guess :)

Well, just killing it makes it a lot easier, the 8 was considering the option to log mismatching hashes if desired. But sure, take a day off, why not ;)
Comment 6 Gerrit Notification Bot 2014-07-29 12:49:08 UTC
Change 150202 had a related patch set uploaded by Thiemo Mättig (WMDE):
Add UnDeserializableValue to DataModelSerializationRoundtripTest

https://gerrit.wikimedia.org/r/150202
Comment 7 Gerrit Notification Bot 2014-07-29 12:55:53 UTC
Change 150202 merged by jenkins-bot:
Add UnDeserializableValue to DataModelSerializationRoundtripTest

https://gerrit.wikimedia.org/r/150202

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links