Last modified: 2014-01-03 15:34:33 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T61283, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 59283 - DBQ-27 find enwiki articles that don't exist on enwiktionary
DBQ-27 find enwiki articles that don't exist on enwiktionary
Status: RESOLVED FIXED
Product: Tool Labs tools
Classification: Unclassified
Database Queries (Other open bugs)
unspecified
All All
: Unprioritized minor
: ---
Assigned To: Bugzilla Bug Importer (valhallasw)
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-03 15:34 UTC by Bugzilla Bug Importer (valhallasw)
Modified: 2014-01-03 15:34 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bugzilla Bug Importer (valhallasw) 2014-01-03 15:34:25 UTC
This issue was converted from https://jira.toolserver.org/browse/DBQ-27.
Summary: find enwiki articles that don't exist on enwiktionary
Issue type: Task - A task that needs to be done.
Priority: Minor
Status: Done
Assignee: norman james vondall <norm_vondall@yahoo.com>

-------------------------------------------------------------------------------
From: Msh210 <m.hamm.1@alumni.nyu.edu>
Date: Wed, 04 Jun 2008 19:51:51
-------------------------------------------------------------------------------

Could someone please find every enwiki ns:0 article [Foo] such that (1) en.wiktionary does not have [foo]; (2) enwiki's article [Foo], if not a hard redirect, contains the word "foo" in it somewhere in lowercase; and (3) if enwiki's article [Foo] is a redirect to [Bar], then the latter contains the word "foo" in its somewhere in lowercase? (This is, in short, a way to find all words, excpet proper nouns, that enwiki has articles on and enwiktionary doesn't.) If the following doesn't complicate matters too much, I'd rather have an additional restriction: (4) The title "Foo" contains a space (20) in it. Thanks much. (Note that this is not strictly a ts request, as dump analysis will suffice. I just haven't found anyone willing and able to analyze the dumps.)
Comment 1 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:34:27 UTC
-------------------------------------------------------------------------------
From: SQL <sxwiki@gmail.com>
Date: Wed, 13 Aug 2008 16:16:58
-------------------------------------------------------------------------------

...I think this may be beyond what can realistically be done. Probably hundreds of thousands, if not millions of articles between the two, and, I'm not quite sure how we'd filter out proper nouns, without someone going over it line-by-line. Perhaps someone else has a better way to go about this, however.
Comment 2 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:34:28 UTC
-------------------------------------------------------------------------------
From: Msh210 <m.hamm.1@alumni.nyu.edu>
Date: Wed, 13 Aug 2008 17:15:47
-------------------------------------------------------------------------------

"I'm not quite sure how we'd filter out proper nouns, without someone going over it line-by-line." The original description described that, though perhaps it wasn't made clear enough: by making sure that the enwiki article has its title in its text, but in _lowercase_, you've gotten rid of most proper nouns. (And the rest can get through the filter; I don't care.)
Comment 3 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:34:30 UTC
-------------------------------------------------------------------------------
From: CBM <cbm.wikipedia@gmail.com>
Date: Sat, 16 Aug 2008 23:45:46
-------------------------------------------------------------------------------

The request requires scanning the wiki source code of each page. That cannot be done with a toolserver database query. It will have to be done with a database dump.
Comment 4 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:34:32 UTC
-------------------------------------------------------------------------------
From: MZMcBride <mzmcbride@gmail.com>
Date: Sun, 17 Aug 2008 00:16:23
-------------------------------------------------------------------------------

This is not something that can be done with the Toolserver, as CBM noted. You'll likely need to find someone with a database dump in order to do what you're trying to do.

Perhaps try asking at <http://en.wikipedia.org/wiki/Wikipedia:Bot_requests> or <http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)> ? Usually bot operators and people with database dumps overlap.

Resolved as declined.
Comment 5 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:34:33 UTC
This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: (none)
CC list: b@mzmcbride.com, cbm.wikipedia@gmail.com, sxwiki@gmail.com, msh210+wmfbugzilla@gmail.com

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links