Last modified: 2014-01-03 15:51:57 UTC
This issue was converted from https://jira.toolserver.org/browse/DBQ-99. Summary: Quality information from 100 articles Issue type: Task - A task that needs to be done. Priority: Trivial Status: Done Assignee: (none) ------------------------------------------------------------------------------- From: ACNiklas <wiki@bietenbeck.net> Date: Thu, 26 Aug 2010 11:55:49 ------------------------------------------------------------------------------- I have a list of 100 articles (the 100 most viewed articles in medicine) and want to extract the following informations: * last edit article * size of article * size discussion * number of editors article Please help me write a sql-query as I don't know anything about the structure of wikipedia's tables. I can supply the list of articles in probably any format. The result is also fine in any format. It would be nice to store the query, so I can modify and rerun it later on. Thanks, Andreas
------------------------------------------------------------------------------- From: DaB. <dab@ts.wikimedia.org> Date: Thu, 26 Aug 2010 12:02:24 ------------------------------------------------------------------------------- > SELECT page_title,page_len,rev_timestamp FROM page JOIN revision ON page_latest=rev_id WHERE page_namespace=0 AND page_title="XXXX" LIMIT 1; for size-of-article and last-edit.
------------------------------------------------------------------------------- From: DaB. <dab@ts.wikimedia.org> Date: Thu, 26 Aug 2010 12:03:46 ------------------------------------------------------------------------------- > SELECT page_title,page_len,rev_timestamp FROM page JOIN revision ON page_latest=rev_id WHERE page_namespace=1 AND page_title="XXXX" LIMIT 1; for the size-of-discusssion and last-edit-on-discussion.
------------------------------------------------------------------------------- From: DaB. <dab@ts.wikimedia.org> Date: Thu, 26 Aug 2010 12:08:39 ------------------------------------------------------------------------------- > SELECT page_title,COUNT![][1] AS number FROM (SELECT DISTINCT page_title,rev_user_text FROM page JOIN revision ON page_id=rev_page WHERE page_namespace=0 AND page_title="XXXX") AS dummy; for the number-of-editors. You can change > AND page_title="XXXX" to > AND page_title="XXXX" AND rev_user>0 to remove non-login-users. [1]: https://jira.toolserver.org/images/icons/emoticons/star_yellow.gif
------------------------------------------------------------------------------- From: ACNiklas <wiki@bietenbeck.net> Date: Thu, 26 Aug 2010 20:28:01 ------------------------------------------------------------------------------- Thanks for the quick reply. I suppose I can substitute AND page_title="XXXX" with AND page_title IN ('article001', 'article002'...). Is the page_title URLEncoded (f%C3%BCr instead of für) or can I use all characters as they appear in wikipedia's titles? Thanks again, Andreas
------------------------------------------------------------------------------- From: DaB. <dab@ts.wikimedia.org> Date: Thu, 26 Aug 2010 20:34:23 ------------------------------------------------------------------------------- You can use all chars as long as your charset is correct (that's default for the mysql-command), but you have to replace spaces " " with underscores "_".
------------------------------------------------------------------------------- From: ACNiklas <wiki@bietenbeck.net> Date: Sat, 28 Aug 2010 09:26:58 ------------------------------------------------------------------------------- Ok, so I have now the following queries: SELECT page_title,page_len,rev_timestamp FROM page JOIN revision ON page_latest=rev_id WHERE page_namespace=0 AND page_title IN (art1, art2..) LIMIT 1; SELECT page_title,page_len,rev_timestamp FROM page JOIN revision ON page_latest=rev_id WHERE page_namespace=1 AND page_title IN (art1, art2..) LIMIT 1; SELECT page_title,COUNT AS number FROM (SELECT DISTINCT page_title,rev_user_text FROM page JOIN revision ON page_id=rev_page WHERE page_namespace=0 AND page_title = IN (art1, art2..)) AS dummy; (art1, art2 ...) is ("Vagina", "Vulva", "Borderline-Persönlichkeitsstörung", "Penis", "Lyme-Borreliose", "Multiple_Sklerose", "Analverkehr", "Klitoris", "Schamhaarentfernung", "Schwangerschaft", "Burnout-Syndrom", "Konfektionsgröße", "Hämorrhoiden", "Depression", "Schamlippe", "Orgasmus", "Schizophrenie", "Pfeiffer-Drüsenfieber", "Penis_des_Menschen", "Diabetes_mellitus", "Fisting", "Ejakulation", "Herz", "Weibliche_Ejakulation", "Lupus_erythematodes", "Thrombose", "Ethanol", "Autismus", "Herpes_Zoster", "Morbus_Crohn", "AIDS", "Syphilis", "Magnetresonanztomographie", "Erektion", "Warze", "Gicht", "Gold", "Bandscheibenvorfall", "Amphetamin", "Asperger-Syndrom", "Arthrose", "Body-Mass-Index", "Tuberkulose", "Schuhgröße", "Narzissmus", "Windpocken", "Kokain", "Bipolare_Störung", "Parkinson-Krankheit", "Homöopathie", "Epilepsie", "Büstenhalter", "Blutgruppe", "Hitzeschaden", "Demenz", "Kondom", "Hand-Fuß-Mund-Krankheit", "Empathie", "Gonorrhoe", "Herpes_simplex", "Lorenzos_Öl", "Schilddrüse", "Down-Syndrom", "Vaginalverkehr", "Blutdruck", "Leukämie", "Chlamydien", "Staphylococcus_aureus", "Anus", "Atopisches_Ekzem", "Bettwanze", "Lungenentzündung", "PH-Wert", "Fibromyalgie", "Suizid", "Meningitis", "Ibuprofen", "Hepatitis_B", "Nesselsucht", "Menstruationszyklus", "Adipositas", "Amyotrophe_Lateralsklerose", "Sperma", "Lungenembolie", "Geschlechtsorgan", "Hashimoto-Thyreoiditis", "Alkoholkrankheit", "Gesetzliche_Krankenversicherung", "Weibliche_Brust", "Malaria", "Schlaganfall", "Marihuana", "Leistenbruch", "Zirkumzision", "Prostata", "Glucose", "Anämie", "Fersensporn", "Laktoseintoleranz", "Klitorisvorhautpiercing") What else do I have to do to get these queries executed? Thanks, Andreas
------------------------------------------------------------------------------- From: As <megeekz@arandomblog.co.cc> Date: Fri, 03 Sep 2010 23:07:51 ------------------------------------------------------------------------------- We can go ahead and execute the query for you. I'll go ahead and do it.
------------------------------------------------------------------------------- From: As <megeekz@arandomblog.co.cc> Date: Fri, 03 Sep 2010 23:16:57 ------------------------------------------------------------------------------- Query 1 has finished, doing query 2 right now.
------------------------------------------------------------------------------- From: As <megeekz@arandomblog.co.cc> Date: Fri, 03 Sep 2010 23:34:41 ------------------------------------------------------------------------------- All queries finished. The results of queries 1 and 2 will be posted as a text file alone. I will also post query 3 as a text file as well, but it is close to 1.0M so I will also gzip it. http://toolserver.org/~andrew/9403/ is where you may find the results of your queries. Please tell me when to remove the results.
------------------------------------------------------------------------------- From: ACNiklas <wiki@bietenbeck.net> Date: Sat, 04 Sep 2010 11:03:59 ------------------------------------------------------------------------------- Thanks for runing these queries. However I noticed, that all articles with German special characters (äüöß) in their name have not produced any results ("Leukämie" for example). Do you think there is a way to fix this? Andreas
------------------------------------------------------------------------------- From: As <megeekz@arandomblog.co.cc> Date: Sat, 04 Sep 2010 15:04:02 ------------------------------------------------------------------------------- I have re-run the query, please check again.
------------------------------------------------------------------------------- From: As <megeekz@arandomblog.co.cc> Date: Sat, 04 Sep 2010 15:05:46 ------------------------------------------------------------------------------- Actually, I'm still not getting the results. I'll try a different method.
This bug was imported as RESOLVED. The original assignee has therefore not been set, and the original reporters/responders have not been added as CC, to prevent bugspam. If you re-open this bug, please consider adding these people to the CC list: Original assignee: (none) CC list: wikimedia-bugzilla@dabpunkt.eu