Last modified: 2013-09-06 19:01:08 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T47206, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 45206 - Collect meta info and serialize it in the head or elsewhere
Collect meta info and serialize it in the head or elsewhere
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Low enhancement
: ---
Assigned To: C. Scott Ananian
https://www.mediawiki.org/wiki/Parsoi...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-20 20:34 UTC by Gabriel Wicke
Modified: 2013-09-06 19:01 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Gabriel Wicke 2013-02-20 20:34:45 UTC
We need to collect various meta info like the page title, revision id etc- very similar to the <page> structure in http://en.wikipedia.org/wiki/Special:Export/OLPC. To make our life easy and provide some abstraction vs. the serialization, we can use an env.page.meta object for this purpose. This can then be passed to a 'buildHead' method for the in-DOM-head serialization. Later on we might want to keep some of that info out of the DOM, which can be done without changing all internal users of this meta info.
Comment 1 Gerrit Notification Bot 2013-04-05 02:04:32 UTC
Related URL: https://gerrit.wikimedia.org/r/57703 (Gerrit Change Ib7b866b899cfbcf47d8a06190a411d1a3d46dd50)
Comment 2 Gerrit Notification Bot 2013-04-06 02:05:54 UTC
Related URL: https://gerrit.wikimedia.org/r/57820 (Gerrit Change Ie6de8d6ece925c752c1d3a7d9b6f4e3d4b67a09e)
Comment 3 Gerrit Notification Bot 2013-04-17 00:58:27 UTC
https://gerrit.wikimedia.org/r/57703 (Gerrit Change Ib7b866b899cfbcf47d8a06190a411d1a3d46dd50) | change APPROVED and MERGED [by jenkins-bot]
Comment 4 Gerrit Notification Bot 2013-04-17 01:21:30 UTC
https://gerrit.wikimedia.org/r/57820 (Gerrit Change Ie6de8d6ece925c752c1d3a7d9b6f4e3d4b67a09e) | change APPROVED and MERGED [by jenkins-bot]
Comment 5 C. Scott Ananian 2013-04-17 16:21:18 UTC
This bug is almost done; we're just trying to finalize the exact format of the metadata in <head>.  Currently we have:

<html data-parsoid="{}" prefix="mw: http://mediawiki.org/rdf/">
<head data-parsoid="{}" prefix="schema: http://schema.org/">
<meta charset="UTF-8">
<meta property="mw:articleNamespace" content="0">
<meta property="schema:CreativeWork/version" content="550389492">
<meta property="schema:CreativeWork/version/parent" content="550001092">
<meta property="schema:CreativeWork/dateModified" content="2013-04-15T00:02:02.000Z">
<meta property="schema:CreativeWork/contributor/username" content="//en.wikipedia.org/wiki/User:Tariqabjotu">
<meta property="schema:CreativeWork/contributor" content="//en.wikipedia.org/wiki/Special:UserById/153365">
<meta property="mw:revisionSHA1" content="59b13d0d38a8f8992d5f7231f439c8805656f71c">
<meta property="schema:CreativeWork/comment" content="replacing with permanent TAFI code">
<title>Main Page</title>
<base href="//en.wikipedia.org/wiki/">
</head>

The schema:CreativeWork/contributor and schema:CreativeWork/contributor/username attributes might need revision.  In particular the Special:UserById/xxx URL is implemented in https://gerrit.wikimedia.org/r/59050 and an alternative Special:Redirect/user/by-id/xxx URL format is implemented in https://gerrit.wikimedia.org/r/59572 --- one or the other of those will get merged (hopefully!), and we might have to tweak the link to match it.

Also, schema:CreativeWork/contributor is a Person (see http://schema.org/Person) and technically we are specifying the 'url' property of the Person.  It's not clear that the tags above correctly specify that.  We might also want to shift from <meta> to <link> tags; the href attribute of a <link> is guaranteed to have typed correctly as a URL (which ensures resolution of relative paths).  Relatedly, most of our DOM uses relative paths; our metadata probably should as well.
Comment 6 C. Scott Ananian 2013-05-01 15:22:59 UTC
Special:Redirect was merged, and https://gerrit.wikimedia.org/r/61603 tweaks out <head> to use it.

The only remaining issues are the type-system issues (is the contributor a Person or a URL or what) and whether we should be using <link> rather than <meta> tags in some places.
Comment 7 C. Scott Ananian 2013-05-31 14:46:49 UTC
Proposal for new metadata (hammered out with Daniel Friesen's help):

<html data-parsoid="{}" prefix="mw: http://mediawiki.org/rdf/">
  <head data-parsoid="{}" prefix="dc: http://purl.org/dc/terms/ mwr:http://en.wikipedia.org/wiki/Special:Redirect/">
    <meta charset="UTF-8">
    <meta property="dc:isFormatOf" resource="mwr:revision/555790550">
    <meta about="mwr:revision/555790550"
          property="mw:articleNamespace" content="0"
          datatype="xsd:integer">
    <meta about="mwr:revision/555790550"
          property="dc:modified" content="2013-04-15T00:02:02.000Z"
          datatype="xsd:dateTime">
    <meta about="mwr:revision/555790550"
          property="dc:replaces" resource="mwr:revision/552548208">
    <meta about="mwr:revision/555790550"
          property="mw:revisionSHA1"
          content="59b13d0d38a8f8992d5f7231f439c8805656f71c">
    <meta about="mwr:revision/555790550"
          property="dc:description"
          content="replacing with permanent TAFI code">
    <meta about="mwr:revision/555790550"
          property="dc:contributor" resource="mwr:user/153365">
    <meta about="mwr:user/153365"
          property="dc:title" content="Tariqabjotu">
  
  <title>Main Page</title>
<base href="//en.wikipedia.org/wiki/">
</head>

This uses the Dublin Core metadata terms exclusively, and describes the parsoid output as "a format of" a particular revision of the wikipedia article.  That revision is also described as "replacing" an earlier version of the article.  The wikipedia user is identified by userid and described as a "contributor" to the particular revision of the wikipedia article.  The username is then given as the "title" associated with that userid (which is how dublin core recommends names of people be described).

We use CURIEs for the repeated http://en.wikipedia.org/wiki/Special:Redirect/* urls in order to describe the relations more compactly.
Comment 8 Gabriel Wicke 2013-05-31 15:23:24 UTC
Overall this looks sensible to me.

The structure can likely be compressed a bit by setting the default about on the head element to the revision.

As a minor niggle about terminology, I'm not convinced that the prefixed URLs are technically CURIEs since I seem to remember that those are not allowed in HTML5 + RDFa.
Comment 9 C. Scott Ananian 2013-05-31 16:07:29 UTC
http://www.w3.org/TR/html-rdfa/ seems to indicate that CURIEs are fine in HTML5/RDFa.

However, re-reading indicates that the 'resource' attribute should really only be used with the <link> element, so some of the <meta>s above should probably still change to <link>.  Also, <base> sets the default subject for RDFa processing, so we need to be careful to set an appropriate @about on the isFormatOf property (and then setting @about on <head> to default to the revision is a good idea).  The result is:

<html data-parsoid="{}" prefix="mw: http://mediawiki.org/rdf/">
  <head data-parsoid="{}" prefix="dc: http://purl.org/dc/terms/ mwr: http://en.wikipedia.org/wiki/Special:Redirect/"
        about="mwr:revision/555790550">
    <meta charset="UTF-8">
    <link about="http://parsoid.wmflabs.org/en/Main_Page"
          rel="dc:isFormatOf" resource="mwr:revision/555790550">
    <link rel="dc:isVersionOf" href="http://en.wikipedia.org/wiki/Main_Page">
    <meta property="mw:articleNamespace" content="0"
          datatype="xsd:integer">
    <meta property="dc:modified" content="2013-04-15T00:02:02.000Z"
          datatype="xsd:dateTime">
    <link rel="dc:replaces" resource="mwr:revision/552548208">
    <meta property="mw:revisionSHA1"
          content="59b13d0d38a8f8992d5f7231f439c8805656f71c">
    <meta property="dc:description"
          content="replacing with permanent TAFI code">
    <link rel="dc:contributor" resource="mwr:user/153365">
    <meta about="mwr:user/153365"
          property="dc:title" content="Tariqabjotu">
  
    <title>Main Page</title>
    <base href="http://en.wikipedia.org/wiki/Main_Page">
</head>
  
<body>
  <div>Welcome to <a rel="mw:WikiLink" href="./Wikipedia" >Wikipedia</a></div>
</body>

This talks about three pages:
http://parsoid.wmflabs.org/en/Main_Page
http://en.wikipedia.org/wiki/Main_Page
http://en.wikipedia.org/wiki/Special:Redirect/revision/555790550

Note that WikiLinks and other content in the <body> are relations on wikipedia pages (not parsoid output).
Comment 10 Gerrit Notification Bot 2013-05-31 22:00:48 UTC
Related URL: https://gerrit.wikimedia.org/r/66300 (Gerrit Change I7da7762462635530189c0d994e89f83a38c1f5ff)
Comment 11 C. Scott Ananian 2013-06-17 23:47:12 UTC
New version removes the controversial dc:isFormatOf triple, a property of "this document" (which is hard to talk about in RDFa if you have a <base> element).  The head now looks like:

<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/terms/
              mw: http://mediawiki.org/rdf/
              mwr: http://en.wikipedia.org/wiki/Special:Redirect/"
      about="mwr:revision/560327612">
<head>
  <meta charset="UTF-8">
  <meta property="mw:articleNamespace" datatype="xsd:integer" content="0">
  <link rel="dc:replaces" resource="mwr:revision/560314723">
  <meta property="dc:modified" datatype="xsd:dateTime" content="2013-06-17T17:55:30.000Z">
  <meta about="mwr:user/1624037" property="dc:title" content="Edokter">
  <link rel="dc:contributor" resource="mwr:user/1624037">
  <meta property="mw:revisionSHA1" content="e0564e710b93f998658bd5527f0042eaba6d6c87">
  <meta property="dc:description" content="Undid revision 560314723 by [[Special:Contributions/Meno25|Meno25]] ([[User talk:Meno25|talk]]) Sync structure to other main page pages. Don't make null-edits either; post on talk instead."><link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/Main_Page">
  <title>Main Page</title>
  <base href="//en.wikipedia.org/wiki/Main_Page">
</head>
<body>
  <div>Welcome to <a rel="mw:WikiLink" href="./Wikipedia" >Wikipedia</a></div>
</body>
</html>
Comment 12 Andre Klapper 2013-07-04 10:33:27 UTC
[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]
Comment 13 Gerrit Notification Bot 2013-08-16 18:00:35 UTC
Change 66300 merged by jenkins-bot:
Tweak RDFa markup of page metadata in <head>.

https://gerrit.wikimedia.org/r/66300

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links