Last modified: 2014-07-20 10:53:40 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1164/ Reported by: djbarrett Created on: 2010-04-12 18:33:11 Subject: weblinkchecker should ignore URLs inside some tags, part 2 Assigned to: xqt Original description: This is a followup to \[pywikipediabot-Bugs-1969051\] \"weblinkchecker should ignore URLs inside some tags\" The fix in pyrev:8076 by xqt is appreciated, but not an appropriate solution. The particular tag I listed in the ticket, \"<sql>\", was just an example. The fix by xqt simply hard-coded this example \(bogus\) tag into the Pywikipedia source code: svn diff -c8076 http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia A better fix would be to recognize when you are reading a tag attribute: <AnyTagGoesHere ... attr=\'http://whatever\' ...> \{\{AnyTemplateOrParserFunction | attr=http://whatever and ignore the URL in these situations. $ python version.py Pywikipedia \[http\] trunk/pywikipedia \(r8050, 2010/04/01, 15:43:14\) Python 2.4.3 \(\#1, Sep 3 2009, 15:37:37\) \[GCC 4.1.2 20080704 \(Red Hat 4.1.2-46\)\]
I disagree. It is very well possible to have a sensible URL in a template \(e.g. a reference\). I'd suggest to only add 'exceptions', as has been done in r8076.
I do not agree. Since it is legal putting URLs into <ref /> tags as well as others like <noinclude> etc. or assigning URLs to a template field, this normally shouldn't be ignored by the weblinkchecker but checked if this URL is still valid.
- **status**: open --> open-rejected
- **assigned_to**: nobody --> xqt - **status**: open-rejected --> pending-rejected
I see your point. Three notes: 1\. Can this be an OPTION for weblinkchecker? 2\. If not, can you at least strip off the trailing single quotes \(shown in bug 1969051\) so you don't get broken URLs? Since single quotes are valid in tags but should not be part of the URL. 3\. In any case, you should revert pyrev:8076 because there is no such tag as <sql>.
- **status**: pending-rejected --> open-rejected
3rd done in pyrev:8086
The <sql> tag is a non-standard tag, but is used by on the other bug reporters' wiki \(as was clearly stated in his/hers bug report\)
valhallasw: Actually, I \*am\* the other bug reporter. :-\) <sql> is a made-up tag for the example. We have 40 tags that exhibit the problem behavior.