Last modified: 2013-10-08 16:47:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T43529, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 41529 - Special page and/or parser function to check quotations from references
Special page and/or parser function to check quotations from references
Status: RESOLVED WONTFIX
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Wikidata bugs
: need-volunteer
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-30 06:50 UTC by jeblad
Modified: 2013-10-08 16:47 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description jeblad 2012-10-30 06:50:27 UTC
It is possible to check quotations from references to see if a page pointed to by an url infact contains the given quote. If not a correct quote the reference and what it is used for validating can be invalidated.
Comment 1 Nikola Smolenski 2012-10-30 07:27:50 UTC
IMO, this would be best done by bots. Quotes are often modified from the original (for example I could quote you as "jeblad@gmail.com: If not a correct quote the reference [...] can be invalidated." ) and even if genuinely no longer present, human intervention is required (for example, if you are referencing web content the page might have moved or it may still be in the Internet Archive). Also, it could be used on Wikipedia and is not Wikidata-specific.
Comment 2 jeblad 2012-10-30 11:14:15 UTC
I have a more complete description somewhere, but for Wikipedia it can be implemented as a "quote" tag function that also takes a url to the referred site. In Wikidata it would be part of the reference object.

An easy way to do it is to first assume the quote to be correct, but push a job to the job queue if it doesn't already exist in memcached. If it exist in memcached it can be mared as valid or invalid right away. It will be cached for a day or two in memcached, then a new job will be generated. When the job is run it will check the external site.

There should be a small set of markers that act as wildcards during testing, mostly just square brackets (could need localization) that can contain anything. During matching they will be replaced by a non-greedy dot-star (.*?).

Also the page requested will need some cleanup, but it seems that a pretty simple regex-base scrubbing will be sufficient. Getting the raw text from a page (screen scraping) isn't that uncommon for bot and it is fairly simple.
Comment 3 jeblad 2012-10-30 11:22:57 UTC
And yes, some parts of the code can be shared with an extension for Wikipedia. ;)
Comment 4 Lydia Pintscher 2013-10-08 16:47:32 UTC
I don't see us doing this. Let's close it to get the number of bugs down.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links