Last modified: 2014-07-24 16:48:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T56562, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 54562 - Support optional capturing groups in replaceExcept
Support optional capturing groups in replaceExcept
Status: NEW
Product: Pywikibot
Classification: Unclassified
textlib.py (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-24 22:32 UTC by Kunal Mehta (Legoktm)
Modified: 2014-07-24 16:48 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2013-09-24 22:32:28 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/patches/555/
Reported by: eranroz
Created on: 2012-07-03 18:35:29
Subject: Bugfix for optional caputring group
Original description:
Patch for pywikibot/textlib.py for the replace function \(replaceExcept\) for supporting for empty/optional capturing groups.
This is a bugfix for a crash that occur when using replace.py with a regex containing optional capturing group \(eg AAA in this regex "bla\(AAA\)?bla" \)
Comment 1 Kunal Mehta (Legoktm) 2013-09-24 22:32:30 UTC
support for empty capturing group
Comment 2 Kunal Mehta (Legoktm) 2013-09-24 22:32:32 UTC
Is this path for bug \#3539444?
Comment 3 Kunal Mehta (Legoktm) 2013-09-24 22:32:34 UTC
See my comment at the corresponding bug tracker. Maybe it would be ok to accept this patch, anyway I've asked for a third opinion in this matter.
Comment 4 Kunal Mehta (Legoktm) 2013-09-24 22:32:35 UTC
I don't understand this bug. What is the traceback before this patch is implemented. And what should that replaceexcept\(\) do in your special case Could you give me a full example. You may exclude this group by "bla\(?:AAA\)?bla"; would this help?
Comment 5 Kunal Mehta (Legoktm) 2013-09-24 22:32:38 UTC
Yea, this is bugfix for 3539444 .
In short:
when running the following regex "ADMA \(a\)?poria" => "ADMA \1porya"
on text containing ADMA poria \(with no a before poria\) it crashs with the following error
doReplacements
res = replace.ReplaceRobot.doReplacements\(self,original\_text\)
File "D:\myBot\python\pywikipedia-nightly\replace.py", line 390, in doReplacements
allowoverlap=self.allowoverlap\)
File "D:\myBot\python\pywikipedia-nightly\pywikibot\textlib.py", line 179, in replaceExcept
match.group\(groupID\) + \
TypeError: coercing to Unicode: need string or buffer, NoneType found

You may suggest to rewrite the specific regex and it may probably work, but it is just workaround - regex with optional capturing group is correct and should work properly.
Longer story :\) :
In Hebrew Wikipedia there is a list of regexs that are used for replacements in all articles \(almost\). which is here:
http://he.wikipedia.org/wiki/%D7%95%D7%A7:%D7%A8%D7%94
The columns in the table there are:
ID  |  old   | new | exceptText
The list is used by C\# bot implementation which isn't active, and by JS userscript implementation which is used for specific page replacements.
I have ported it to work with replace.py, but if fails when it gets to replacement with optional capturing group. After my fix \(locally\) I ran it for 250 test edits and it worked properly without crashes
Comment 6 Amir Ladsgroup 2014-07-24 08:29:26 UTC
The patch looks good to me

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links