Last modified: 2014-07-20 10:53:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57276, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55276 - weblinkchecker should ignore URLs inside some tags, part 2
weblinkchecker should ignore URLs inside some tags, part 2
Status: NEW
Product: Pywikibot
Classification: Unclassified
weblinkchecker.py (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-05 04:55 UTC by Kunal Mehta (Legoktm)
Modified: 2014-07-20 10:53 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2013-10-05 04:55:45 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1164/
Reported by: djbarrett
Created on: 2010-04-12 18:33:11
Subject: weblinkchecker should ignore URLs inside some tags, part 2
Assigned to: xqt
Original description:
This is a followup to \[pywikipediabot-Bugs-1969051\] \"weblinkchecker should ignore URLs inside some tags\"

The fix in pyrev:8076 by xqt is appreciated, but not an appropriate solution.  The particular tag I listed in the ticket, \"<sql>\", was just an example. The fix by xqt simply hard-coded this example \(bogus\) tag into the Pywikipedia source code:

svn diff -c8076  http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia

A better fix would be to recognize when you are reading a tag attribute:

<AnyTagGoesHere ... attr=\'http://whatever\' ...>

\{\{AnyTemplateOrParserFunction | attr=http://whatever

and ignore the URL in these situations.


$ python version.py
Pywikipedia \[http\] trunk/pywikipedia \(r8050, 2010/04/01, 15:43:14\)
Python 2.4.3 \(\#1, Sep  3 2009, 15:37:37\) 
\[GCC 4.1.2 20080704 \(Red Hat 4.1.2-46\)\]
Comment 1 Kunal Mehta (Legoktm) 2013-10-05 04:55:47 UTC
I disagree. It is very well possible to have a sensible URL in a template \(e.g. a reference\). I'd suggest to only add 'exceptions', as has been done in r8076.
Comment 2 Kunal Mehta (Legoktm) 2013-10-05 04:55:49 UTC
I do not agree. Since it is legal putting URLs into <ref /> tags as well as others like <noinclude> etc. or assigning URLs to a template field, this normally shouldn't be ignored by the weblinkchecker but checked if this URL is still valid.
Comment 3 Kunal Mehta (Legoktm) 2013-10-05 04:55:51 UTC
- **status**: open --> open-rejected
Comment 4 Kunal Mehta (Legoktm) 2013-10-05 04:55:54 UTC
- **assigned_to**: nobody --> xqt
- **status**: open-rejected --> pending-rejected
Comment 5 Kunal Mehta (Legoktm) 2013-10-05 04:55:55 UTC
I see your point.  Three notes:

1\. Can this be an OPTION for weblinkchecker?

2\. If not, can you at least strip off the trailing single quotes \(shown in bug 1969051\) so you don't get broken URLs?  Since single quotes are valid in tags but should not be part of the URL.

3\. In any case, you should revert pyrev:8076 because there is no such tag as <sql>.
Comment 6 Kunal Mehta (Legoktm) 2013-10-05 04:55:58 UTC
- **status**: pending-rejected --> open-rejected
Comment 7 Kunal Mehta (Legoktm) 2013-10-05 04:56:00 UTC
3rd done in pyrev:8086
Comment 8 Kunal Mehta (Legoktm) 2013-10-05 04:56:02 UTC
The <sql> tag is a non-standard tag, but is used by on the other bug reporters' wiki \(as was clearly stated in his/hers bug report\)
Comment 9 Kunal Mehta (Legoktm) 2013-10-05 04:56:03 UTC
valhallasw: Actually, I \*am\* the other bug reporter. :-\)  <sql> is a made-up tag for the example.  We have 40 tags that exhibit the problem behavior.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links