Last modified: 2014-01-03 15:51:57 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T61351, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 59351 - DBQ-99 Quality information from 100 articles
DBQ-99 Quality information from 100 articles
Status: RESOLVED FIXED
Product: Tool Labs tools
Classification: Unclassified
Database Queries (Other open bugs)
unspecified
All All
: Unprioritized trivial
: ---
Assigned To: Bugzilla Bug Importer (valhallasw)
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-03 15:51 UTC by Bugzilla Bug Importer (valhallasw)
Modified: 2014-01-03 15:51 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:35 UTC
This issue was converted from https://jira.toolserver.org/browse/DBQ-99.
Summary: Quality information from 100 articles
Issue type: Task - A task that needs to be done.
Priority: Trivial
Status: Done
Assignee: (none)

-------------------------------------------------------------------------------
From: ACNiklas  <wiki@bietenbeck.net>
Date: Thu, 26 Aug 2010 11:55:49
-------------------------------------------------------------------------------

I have a list of 100 articles (the 100 most viewed articles in medicine) and want to extract the following informations:

  * last edit article
  * size of article
  * size discussion
  * number of editors article

Please help me write a sql-query as I don't know anything about the structure of wikipedia's tables. 

I can supply the list of articles in probably any format. The result is also fine in any format.

It would be nice to store the query, so I can modify and rerun it later on.

Thanks,

Andreas
Comment 1 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:37 UTC
-------------------------------------------------------------------------------
From: DaB. <dab@ts.wikimedia.org>
Date: Thu, 26 Aug 2010 12:02:24
-------------------------------------------------------------------------------

> SELECT page_title,page_len,rev_timestamp FROM page JOIN revision ON page_latest=rev_id WHERE page_namespace=0 AND page_title="XXXX" LIMIT 1;

for size-of-article and last-edit.
Comment 2 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:38 UTC
-------------------------------------------------------------------------------
From: DaB. <dab@ts.wikimedia.org>
Date: Thu, 26 Aug 2010 12:03:46
-------------------------------------------------------------------------------

> SELECT page_title,page_len,rev_timestamp FROM page JOIN revision ON page_latest=rev_id WHERE page_namespace=1 AND page_title="XXXX" LIMIT 1;

for the size-of-discusssion and last-edit-on-discussion.
Comment 3 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:40 UTC
-------------------------------------------------------------------------------
From: DaB. <dab@ts.wikimedia.org>
Date: Thu, 26 Aug 2010 12:08:39
-------------------------------------------------------------------------------

> SELECT page_title,COUNT![][1] AS number FROM (SELECT DISTINCT page_title,rev_user_text FROM page JOIN revision ON page_id=rev_page WHERE page_namespace=0 AND page_title="XXXX") AS dummy;

for the number-of-editors. You can change 

> AND page_title="XXXX"

to 

> AND page_title="XXXX" AND rev_user>0

to remove non-login-users.

   [1]: https://jira.toolserver.org/images/icons/emoticons/star_yellow.gif
Comment 4 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:41 UTC
-------------------------------------------------------------------------------
From: ACNiklas  <wiki@bietenbeck.net>
Date: Thu, 26 Aug 2010 20:28:01
-------------------------------------------------------------------------------

Thanks for the quick reply. I suppose I can substitute 

AND page_title="XXXX"

with

AND page_title IN ('article001', 'article002'...).

Is the page_title URLEncoded (f%C3%BCr instead of für) or can I use all characters as they appear in wikipedia's titles?

Thanks again,

Andreas
Comment 5 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:43 UTC
-------------------------------------------------------------------------------
From: DaB. <dab@ts.wikimedia.org>
Date: Thu, 26 Aug 2010 20:34:23
-------------------------------------------------------------------------------

You can use all chars as long as your charset is correct (that's default for the mysql-command), but you have to replace spaces " " with underscores "_".
Comment 6 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:45 UTC
-------------------------------------------------------------------------------
From: ACNiklas  <wiki@bietenbeck.net>
Date: Sat, 28 Aug 2010 09:26:58
-------------------------------------------------------------------------------

Ok, so I have now the following queries: 

SELECT page_title,page_len,rev_timestamp FROM page JOIN revision ON page_latest=rev_id WHERE page_namespace=0 AND page_title IN (art1, art2..) LIMIT 1;

SELECT page_title,page_len,rev_timestamp FROM page JOIN revision ON page_latest=rev_id WHERE page_namespace=1 AND page_title IN (art1, art2..) LIMIT 1;

SELECT page_title,COUNT AS number FROM (SELECT DISTINCT page_title,rev_user_text FROM page JOIN revision ON page_id=rev_page WHERE page_namespace=0 AND page_title = IN (art1, art2..)) AS dummy;

(art1, art2 ...) is

("Vagina", "Vulva", "Borderline-Persönlichkeitsstörung", "Penis", "Lyme-Borreliose", "Multiple_Sklerose", "Analverkehr", "Klitoris", "Schamhaarentfernung", "Schwangerschaft", "Burnout-Syndrom", "Konfektionsgröße", "Hämorrhoiden", "Depression", "Schamlippe", "Orgasmus", "Schizophrenie", "Pfeiffer-Drüsenfieber", "Penis_des_Menschen", "Diabetes_mellitus", "Fisting", "Ejakulation", "Herz", "Weibliche_Ejakulation", "Lupus_erythematodes", "Thrombose", "Ethanol", "Autismus", "Herpes_Zoster", "Morbus_Crohn", "AIDS", "Syphilis", "Magnetresonanztomographie", "Erektion", "Warze", "Gicht", "Gold", "Bandscheibenvorfall", "Amphetamin", "Asperger-Syndrom", "Arthrose", "Body-Mass-Index", "Tuberkulose", "Schuhgröße", "Narzissmus", "Windpocken", "Kokain", "Bipolare_Störung", "Parkinson-Krankheit", "Homöopathie", "Epilepsie", "Büstenhalter", "Blutgruppe", "Hitzeschaden", "Demenz", "Kondom", "Hand-Fuß-Mund-Krankheit", "Empathie", "Gonorrhoe", "Herpes_simplex", "Lorenzos_Öl", "Schilddrüse", "Down-Syndrom", "Vaginalverkehr", "Blutdruck", "Leukämie", "Chlamydien", "Staphylococcus_aureus", "Anus", "Atopisches_Ekzem", "Bettwanze", "Lungenentzündung", "PH-Wert", "Fibromyalgie", "Suizid", "Meningitis", "Ibuprofen", "Hepatitis_B", "Nesselsucht", "Menstruationszyklus", "Adipositas", "Amyotrophe_Lateralsklerose", "Sperma", "Lungenembolie", "Geschlechtsorgan", "Hashimoto-Thyreoiditis", "Alkoholkrankheit", "Gesetzliche_Krankenversicherung", "Weibliche_Brust", "Malaria", "Schlaganfall", "Marihuana", "Leistenbruch", "Zirkumzision", "Prostata", "Glucose", "Anämie", "Fersensporn", "Laktoseintoleranz", "Klitorisvorhautpiercing")

What else do I have to do to get these queries executed?

Thanks,  
Andreas
Comment 7 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:46 UTC
-------------------------------------------------------------------------------
From: As <megeekz@arandomblog.co.cc>
Date: Fri, 03 Sep 2010 23:07:51
-------------------------------------------------------------------------------

We can go ahead and execute the query for you. I'll go ahead and do it.
Comment 8 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:48 UTC
-------------------------------------------------------------------------------
From: As <megeekz@arandomblog.co.cc>
Date: Fri, 03 Sep 2010 23:16:57
-------------------------------------------------------------------------------

Query 1 has finished, doing query 2 right now.
Comment 9 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:50 UTC
-------------------------------------------------------------------------------
From: As <megeekz@arandomblog.co.cc>
Date: Fri, 03 Sep 2010 23:34:41
-------------------------------------------------------------------------------

All queries finished.

The results of queries 1 and 2 will be posted as a text file alone. I will also post query 3 as a text file as well, but it is close to 1.0M so I will also gzip it.

http://toolserver.org/~andrew/9403/ is where you may find the results of your queries.

Please tell me when to remove the results.
Comment 10 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:51 UTC
-------------------------------------------------------------------------------
From: ACNiklas  <wiki@bietenbeck.net>
Date: Sat, 04 Sep 2010 11:03:59
-------------------------------------------------------------------------------

Thanks for runing these queries. However I noticed, that all articles with German special characters (äüöß) in their name have not produced any results ("Leukämie" for example). Do you think there is a way to fix this?

Andreas
Comment 11 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:53 UTC
-------------------------------------------------------------------------------
From: As <megeekz@arandomblog.co.cc>
Date: Sat, 04 Sep 2010 15:04:02
-------------------------------------------------------------------------------

I have re-run the query, please check again.
Comment 12 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:55 UTC
-------------------------------------------------------------------------------
From: As <megeekz@arandomblog.co.cc>
Date: Sat, 04 Sep 2010 15:05:46
-------------------------------------------------------------------------------

Actually, I'm still not getting the results. I'll try a different method.
Comment 13 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:51:57 UTC
This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: (none)
CC list: wikimedia-bugzilla@dabpunkt.eu

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links