Last modified: 2012-08-05 20:35:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T28119, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 26119 - Document-centric semantic wiki
Document-centric semantic wiki
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
Semantic MediaWiki (Other open bugs)
unspecified
All All
: Normal enhancement with 1 vote (vote)
: ---
Assigned To: Markus Krötzsch
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-11-25 15:16 UTC by Brett Zamir
Modified: 2012-08-05 20:35 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brett Zamir 2010-11-25 15:16:58 UTC
I know this is likely to be a very tall order, but I am wondering whether the Semantic Mediawiki extension or its like, and all its abilities for enabling querying, could also be taken advantage of, with adaptations which allowed nested textual semantic markup, to simplify say the creation of XML semantically-rich documents like TEI (see http://tei.oucs.ox.ac.uk/P5/Guidelines-web/en/html/REF-ELEMENTS.html for sample tags) (e.g., by using Mediawiki-like syntax without enabling XML which requires closing tags; as well known to wiki syntax lovers, even if closing tags can sometimes provide readability, it can also hamper it).

In this case, the "properties" would not necessarily be about the page per se, but about information within the article relevant to its immediate context of the document or possibly some external context.

For example, if plays, novels, Scriptures, etc. were added to such a wiki, any references to say a date could be marked up as such, as could structural information like letter openings, closings, etc., or annotations of meaning like irony, humor, etc., which could then be made available to queries, and even possibly allowing joins of other documents (e.g., searching for all document passages mentioning that date range (for the current document, or those belonging to a given category or with a given property)). 

(And to the extent security/performance is a concern with such repeatable and open-ended user queries, the extension might allow documents to be added by the user to say locally stored HTML5 IndexedDB collections, which could in turn be queried by the user in an offline-friendly manner. jQuery or XQuery (if made available to JavaScript) could be made available by any HTML5 applications granted access to these database collections, including ideally some built-in one available at the wiki itself such that one could query the XML-generated output within a document or collection.)

While "semantic" often tends to be associated with data-centric applications in standards-based, web-centric discussions, I am really eager to see it expand to document-centric data as well. When both are combined, one might be able to both query a document in a rich way relative to the content, while also access additional data, perhaps via references to a data-centric wiki like Wikipedia, in an integrated way.

Thank you.
Comment 1 Brett Zamir 2010-11-25 15:19:58 UTC
I might add this could really expand the important of a site like Wikisource, given that semantic mark-up really can benefit from a large base of users whose role expands from not only being proofreaders or formatters, but potentially also being markup editors for enriching important documents.
Comment 2 contrafibularity 2012-08-05 20:35:31 UTC
Amen to that!

Since you posted this (2010!), there have been a couple of developments that you might be interested in:

Following a feature request for the eagerly awaited VirtualEditor extension, James Forrester of Wikimedia dropped a note at bug 37934. See the response by Gabriel Wicke and http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary (to which he refers).

A more ad hoc solution was designed for the Transcribe Bentham Project at UCL. It uses a number of custom extensions:

http://www.mediawiki.org/wiki/Extension:TEITags
http://www.mediawiki.org/wiki/Extension:JBTEIToolbar
http://www.mediawiki.org/wiki/Extension:JBZV

Although this stuff has not been tested for later versions of MW, it shows at least that such things are possible. 

I don't know what role SMW could play here. Perhaps the syntax required for a TEI document could be semi-automated using templates and Semantic Forms?

I'm not a developer nor am I even remotely familiar with the way that all this is supposed to work, but this information seemed to me worth passing on.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links