Last modified: 2014-07-11 18:10:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T48555, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 46555 - Entity suggester for Wikidata
Entity suggester for Wikidata
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: High major with 3 votes (vote)
: ---
Assigned To: Wikidata bugs
:
: 41054 52553 (view as bug list)
Depends on: 63223 63224 63368
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-26 00:23 UTC by Quim Gil
Modified: 2014-07-11 18:10 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Quim Gil 2013-03-26 00:23:28 UTC
Proposed at http://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Entity_Suggester

Wikidata could be a lot smarter than it is right now e.g. by suggesting fields to fill and probable values. 

For example: when an editor edits an item about a person that is still missing the date of birth, this should be suggested as a possible property. Or when the editor is entering the sex of the person, Wikidata should be smart and suggest the ones that are used most for these properties first. Think of it as something very similar to the famous "people who bought x also bought y" systems.
Comment 1 Nilesh Chakraborty 2013-04-19 17:40:24 UTC
Hi,

I am a 3rd year undergraduate student of computer science, pursuing my B.Tech degree at RCC Institute of Information Technology. I am proficient in Java, PHP and C#.

Among the project ideas on the GSoC 2013 ideas page, the one particular idea that seemed really interesting to me is developing an Entity Suggester for Wikidata. I want to work on this.

I am passionate about data mining, big data and recommendation engines, therefore this idea naturally appeals to me a lot. I have experience with building music and people recommendation systems, and have worked with Myrrix and Apache Mahout. I recently designed and implemented such a recommendation system and deployed it on a live production site, where I'm interning at, to recommend Facebook users to each other depending upon their interests.

The problem is, the documentation for Wikidata and the Wikibase extension seems pretty daunting to me since I have not ever configured a mediawiki instance or actually used it. (I am on my way to try it out following the instructions at http://www.mediawiki.org/wiki/Summer_of_Code_2013#Where_to_start.) I can easily build a recommendation system and create a web-service or REST based API through which the engine can be trained with existing data, and queried and all. This seems to be a collaborative filtering problem (people who bought x also bought y). It'll be easier if I could get some help about the part where/how I need to integrate it with Wikidata. Also, some sample datasets (csv files?) or schemas (just the column names and data types?) would help a lot, for me to figure this out.

Please ask me if you have any questions. :-)

Thanks,
Nilesh
Comment 2 Andre Klapper 2013-04-22 09:00:39 UTC
nilesh: Did you reach out to the Wikidata developer team at http://www.wikidata.org/wiki/Wikidata:Contact_the_development_team ? 
Asking as your personal CV is offtopic for this specific bug report here. :)
Comment 3 denny vrandecic 2013-04-22 14:28:32 UTC
Answered Nilesh on the mailing list.
Comment 4 Quim Gil 2013-04-22 16:28:06 UTC
(In reply to comment #2)
> nilesh: Did you reach out to the Wikidata developer team at
> http://www.wikidata.org/wiki/Wikidata:Contact_the_development_team ? 
> Asking as your personal CV is offtopic for this specific bug report here. :)

Background: I asked Nilesh to create the bug report and send the proposal to wikitech-l. The Wikidata team was/is aware. Welcome Nilesh and good luck with your project idea.
Comment 5 Quim Gil 2013-05-03 17:21:18 UTC
Just a note to say that Nilesh Chakraborty has submitted a GSoC proposal related to this report: https://www.mediawiki.org/wiki/User:Nilesh.c/Entity_Suggester

Good luck!
Comment 6 Nilesh Chakraborty 2013-05-03 17:25:52 UTC
Thanks everyone, thank you Quim. Really appreciate it!
Comment 7 Puneet Kaur 2013-05-03 18:35:48 UTC
Hello everyone , I am Puneet Kaur ,an undergraduate student at Indira Gandhi Institute of Technology ,New Delhi ,India.

I am interested in this project, and in large the concept of the wikidata features in whole seems nice.

I have spent some time on web development and designing, and I wish to make a good use of my existing knowledge through helping wikidata get some more features :)
Comment 8 Nilesh Chakraborty 2013-05-05 11:57:40 UTC
I'm considering two options for feeding the item/property data into the recommender:

i) Using the database-related code in the wikidata extension (I'm studying the DataModel classes and how they interact with the database) to fetch what I need and feed them into the recommendation engine.

ii) Not accessing the DB at all. Rather, I can write map-reduce scripts to extract all the training data and everything I need for each Item from the wikidatawiki data dump and feed it into the recommendation engine. I can use a cron job to download the latest data dump when available, and run the scripts on it. I don't think it would be an issue even if the engine lags by the interval the dumps are generated in, since the whole recommendation thing is all about approximations.

I personally think (ii) will be cleaner and faster. Please share your views on this. More details on the idea can be found at : https://www.mediawiki.org/wiki/User:Nilesh.c/Entity_Suggester
Comment 9 Daniel Kinzler 2013-05-08 14:59:31 UTC
I agree that ii) is better, especially since this info isn't really in the database  (yet), except in the form of json blobs. 

The only downside is that the JSON in the XML dumps is the *internal* JSON, not the canonical JSON used in the API. We'll provide dumps using the canonical JSON at some point.

But even so, if we have code that uses the internal JSON, it should be be easy to adopt later.
Comment 10 Nilesh Chakraborty 2013-05-08 21:35:21 UTC
Thanks Daniel. I'm going with (ii).

Please check out https://github.com/nilesh-c/wikidata-entity-suggester for some code, loads of info, and my immediate TODO list.

I'm prototyping the entity suggester and pushing the code there and will keep updating the github repo.
Comment 11 Lydia Pintscher 2013-08-08 04:52:46 UTC
*** Bug 52553 has been marked as a duplicate of this bug. ***
Comment 12 Lydia Pintscher 2013-08-15 12:05:04 UTC
*** Bug 41054 has been marked as a duplicate of this bug. ***
Comment 13 Quim Gil 2013-09-17 16:19:27 UTC
GSoC "soft pencils down" date was yesterday and all coding must stop on 23 September. Has this project been completed?
Comment 14 Quim Gil 2013-10-22 19:40:23 UTC
If you have open tasks or bugs left, one possibility is to list them at https://www.mediawiki.org/wiki/Google_Code-In and volunteer yourself as mentor.

We have heard from Google and free software projects participating in Code-in that students participating in this programs have done a great work finishing and polishing GSoC projects, many times mentores by the former GSoC student. The key is to be able to split the pending work in little tasks.

More information in the wiki page. If you have questions you can ask there or you can contact me directly.
Comment 15 Andre Klapper 2013-10-31 12:14:51 UTC
[replacing wikidata keyword by adding CC - see bug 56417]
Comment 16 Andre Klapper 2013-11-25 17:27:05 UTC
Daniel: What is this bug report "tracking"? No dependencies here.
Is assignee and priority still correct?
Comment 17 Lydia Pintscher 2014-01-09 20:30:41 UTC
Here's a recent status update from the students including a first demo: http://lists.wikimedia.org/pipermail/wikidata-l/2014-January/003301.html
Comment 18 Virginia Weidhaas 2014-02-05 15:38:41 UTC
Hello,
we are the above mentioned students group and we would like to get assigned for this ticket.
We introduced ourselves on the mailing list: http://lists.wikimedia.org/pipermail/wikidata-l/2014-January/003301.html

You can find our current status and code on GitHub through the provided links in the mail.

Thanks!

Virginia Weidhaas
Christian Dullweber
Moritz Finke
Felix Niemeyer
Comment 19 Lydia Pintscher 2014-06-30 09:40:42 UTC
Deployed on test now \o/ Time to close this.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links