Last modified: 2013-07-05 03:56:15 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T46696, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 44696 - Ask API with XML format produces invalid XML (title tags)
Ask API with XML format produces invalid XML (title tags)
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Semantic MediaWiki (Other open bugs)
unspecified
All All
: Unprioritized blocker with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 48705 (view as bug list)
Depends on:
Blocks: 41842
  Show dependency treegraph
 
Reported: 2013-02-06 00:30 UTC by Al Johnson
Modified: 2013-07-05 03:56 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Al Johnson 2013-02-06 00:30:32 UTC
Child elements under <results> are given the name of the page.  It is easy to create pages with titles that result in illegal XML tag names.  Just to name a few that I've tried:

4me
Some "quoted" text <- an important one for special purpose wikis
xml

Example query:

Modification%20date::%3E4%20February%202013]]%20[[Has property::%2B]]|?Has">http://www.mywikidev.com/wiki/api.php?action=ask&query=[[Modification%20date::%3E4%20February%202013]]%20[[Has property::%2B]]|?Has property&format=xml

<?xml version="1.0"?>
<api>
  <query>
    <printrequests>
      <printrequest label="" typeid="_wpg" mode="2" />
      <printrequest label="Has property" typeid="_txt" mode="1" />
    </printrequests>
    <results>
      <some_"quoted"_text fulltext="some &quot;quoted&quot; text" fullurl="http://www.mywikidev.com/wiki/index.php/some_%22quoted%22_text">
        <printouts>
          <Has_property>
            <value>1234</value>
          </Has_property>
        </printouts>
      </some_"quoted"_text>
    </results>
  </query>
</api>

Workaround(s): Unknown, but would love to hear of one.
Comment 1 Al Johnson 2013-02-06 01:41:56 UTC
I would like to suggest that the result tag names (currently set to the page names) be replaced by something simple, such as <result> or <result-<index>>, since the title and page is already specified by the fulltext and fullurl properties.  

So, the sample output would instead look like this:

<?xml version="1.0"?>
<api>
  <query>
    <printrequests>
      <printrequest label="" typeid="_wpg" mode="2" />
      <printrequest label="Has property" typeid="_txt" mode="1" />
    </printrequests>
    <results>
      <result fulltext="some &quot;quoted&quot; text"
fullurl="http://www.mywikidev.com/wiki/index.php/some_%22quoted%22_text">
        <printouts>
          <Has_property>
            <value>1234</value>
          </Has_property>
        </printouts>
      </result>
    </results>
  </query>
</api>

Thank you
Comment 2 Al Johnson 2013-02-06 01:48:27 UTC
Of course, I'm not suggesting to break backwards compatibility with the above suggestion :)  So, maybe a new format/query param will be acceptable?
Comment 3 Nischay Nahata 2013-02-06 02:54:55 UTC
I think it would be acceptable to break bc, most APIs I know of do <pages><p> and this should be no different.
Comment 4 MWJames 2013-02-06 03:10:38 UTC
Woh ... not so fast. We are not jumping ship here and break things up. The SMW\DISerialzier provides serialization for the SMWAPI, the JSON format, and the SMW\ApiResultPrinter (since SMW 1.9). Before considering any change, please be aware of the legacy support that comes with the serialization and its content structure.
Comment 5 Nischay Nahata 2013-02-06 03:58:36 UTC
I am not sure how it has been useful till now, I would find it hard to parse.
Still if you think there has to be bc support please add in a follow-up change or put precise comments in https://gerrit.wikimedia.org/r/#/c/47707/
Comment 6 MWJames 2013-04-18 01:33:47 UTC
[1] was breaking compatibility and therefore abandoned.

This was only important for XML and similar formats it is therefore suggested to only change the output for these formats, and not for JSON.

https://gerrit.wikimedia.org/r/#/c/47707/
Comment 7 MWJames 2013-04-18 01:44:48 UTC
It is not a tag problem but rather a problem in how 'fulltext' => $title->getFullText() encodes special characters (&' etc.). It results in encoded strings like &#039; &quot; that causes problems in the XML output format.
Comment 8 MWJames 2013-04-18 02:05:13 UTC
Another issue with XML could be that for example Property:GG, XML is claiming that "Namespace prefix Property on ... is not defined"

## Example

<Property:GG fulltext="..." namespace="106" exists="1">
<printouts>
  <Modification_date>
    <value>1365684120</value>
  </Modification_date>
</printouts>
</Property:GG>
Comment 9 Al Johnson 2013-04-18 02:46:35 UTC
James, 

I think trying to create XML tag names that are page titles is just asking for trouble.  The XML spec has restrictions on what characters can be in a tag name[1] so any character that can be in a page title will have to be mapped into an XML element. It also makes the XML unnecessarily verbose and hard to read... just looks flaky, imo.  Finally, it is also redundant information since the page name is provided by the fulltext attribute already.

I propose putting in the change just for the XML format if that solves the JSON compatibility conflict.

1. http://www.w3.org/TR/REC-xml/#NT-NameStartChar
Comment 10 MWJames 2013-04-18 03:24:03 UTC
> read... just looks flaky, imo.  Finally, it is also redundant information
> since
> the page name is provided by the fulltext attribute already.

For more information about SMW related serialization see [1].

PS: I will not take a crack on it in near future, so feel free to tackle this issue but please keep in mind to add PHPUnit/QUnit tests to ensure consistency among the output serialization.

[1] http://www.semantic-mediawiki.org/wiki/Serialization_%28JSON%29
Comment 11 Al Johnson 2013-04-18 04:10:25 UTC
Hi James, what was the reference for?  BTW, I'm afraid I'm not qualified to hack on the wiki code myself.
Comment 12 ben.mcgee.good 2013-04-18 22:26:37 UTC
Hey, in case it matters.  This is a major pain for me..  I hit it while trying to upgrade my SMW installation and it is a real blocker for downstream code.
Comment 13 MWJames 2013-04-19 05:16:45 UTC
(In reply to comment #11)
> Hi James, what was the reference for?  BTW, I'm afraid I'm not qualified to
> hack on the wiki code myself.

It will give some insights in how serialization works in SMW works and why [1] wasn't a fit as it only eliminates a possible tag parameter at the head by replacing  

$results[$diWikiPage->getTitle()->getFullText()] = $result;
with
$results[] = $result;

This solves the issue half way because if you happen to use a property like "Has_xml'_label" and use it as printout parameter, it would face the same problem but at this level you need to know to which printout you are referring since it a reference key to the printrequests array .

While the subject "tag" at the head might seem as information redundancy (it isn't but that's not the issue of this discussion), you clearly can't get away by eliminating the property label from the structure as it is used as key for the a purpose to eliminate redundancy by splitting printrequest and result information.

XML (pretty-print) output

<?xml version="1.0"?>
<api>
  <query>
    <printrequests>
      <printrequest label="" typeid="_wpg" mode="2" format="" />
      <printrequest label="Has date" typeid="_dat" mode="1" format="ISO" />
      <printrequest label="Has xml" typeid="_wpg" mode="1" format="" />
      <printrequest label="Has xml&#039; label" typeid="_wpg" mode="1" format="" />
    </printrequests>
    <results>
      <XML_Example fulltext="XML Example" fullurl=".." namespace="0" exists="1">
        <printouts>
          <Has_date>
            <value>631152000</value>
          </Has_date>
          <Has_xml>
            <value fulltext="Test" fullurl=".." namespace="0" exists="" />
          </Has_xml>
          <Has_xml'_label>
            <value fulltext="Test" fullurl=".." namespace="0" exists="" />
          </Has_xml'_label>
        </printouts>
      </XML_Example>
    </results>
    <meta hash="d3a1a814ff424003d9cfaa9a3ab7221f" count="1" offset="0" />
  </query>
</api>

JSON (pretty-print) output
 
{
    "query": {
        "printrequests": [
            {
                "label": "",
                "typeid": "_wpg",
                "mode": 2,
                "format": false
            },
            {
                "label": "Has date",
                "typeid": "_dat",
                "mode": 1,
                "format": "ISO"
            },
            {
                "label": "Has xml",
                "typeid": "_wpg",
                "mode": 1,
                "format": ""
            },
            {
                "label": "Has xml' label",
                "typeid": "_wpg",
                "mode": 1,
                "format": ""
            }
        ],
        "results": {
            "XML Example": {
                "printouts": {
                    "Has date": [
                        "631152000"
                    ],
                    "Has xml": [
                        {
                            "fulltext": "Test",
                            "fullurl": "...",
                            "namespace": 0,
                            "exists": false
                        }
                    ],
                    "Has xml' label": [
                        {
                            "fulltext": "Test",
                            "fullurl": "...",
                            "namespace": 0,
                            "exists": false
                        }
                    ]
                },
                "fulltext": "XML Example",
                "fullurl": "...",
                "namespace": 0,
                "exists": true
            }
        },
        "meta": {
            "hash": "d3a1a814ff424003d9cfaa9a3ab7221f",
            "count": 1,
            "offset": 0
        }
    }
}

[1] https://gerrit.wikimedia.org/r/#/c/47707/
Comment 14 MWJames 2013-05-22 12:13:06 UTC
*** Bug 48705 has been marked as a duplicate of this bug. ***
Comment 15 Gerrit Notification Bot 2013-05-26 19:53:50 UTC
Related URL: https://gerrit.wikimedia.org/r/65646 (Gerrit Change Icbc92c9e74161c1ec626775bf6f95703a6df8de1)
Comment 16 Al Johnson 2013-05-26 23:16:25 UTC
I don't see any use for the printrequests element in the XML format other than just confirmation of the output part of the query.  Consumers will know what elements they are looking for and their XPath.  

Maybe it would be easier to let the XML format diverge from the JSON format by eliminating the printrequests element.  I don't think the two formats need to mirror one another element-for-element; the formats are too different.  It's issues like this that are already known to cause problems with JSON->XML conversion.

Just my $.02
Comment 17 MWJames 2013-05-26 23:26:56 UTC
JSON/XML will mirror available information in order to support interoperability which means output formats will stay as close as possible. A content consumer (Custom parser that implements the individual parsing on client-side) can ignore the information if necessary.
Comment 18 Al Johnson 2013-05-26 23:38:23 UTC
Interoperability between what?
Comment 19 Al Johnson 2013-07-04 13:36:45 UTC
I see, but perfect interoperability btw JSON and XML is impossible... as you may have noticed.  This is a major bug, 5 months old, w/an easy fix by Nischay, but it's been rolled back in an attempt to do the impossible (commendable, but impossible).  Google JSON to XML conversion and you'll see that no solution is perfect and will fail exactly like this one does with invalid tags.
Comment 20 Gerrit Notification Bot 2013-07-05 03:48:40 UTC
Change 65646 merged by jenkins-bot:
(Bug 44696) AskApi to support valid XML using the SMW\ApiQueryResultFormatter

https://gerrit.wikimedia.org/r/65646

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links