Last modified: 2013-10-05 10:45:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57184, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55184 - replace doesn't support optional groups
replace doesn't support optional groups
Status: RESOLVED INVALID
Product: Pywikibot
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-05 04:39 UTC by Kunal Mehta (Legoktm)
Modified: 2013-10-05 10:45 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2013-10-05 04:39:11 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1484/
Reported by: Anonymous user
Created on: 2012-07-02 13:25:23
Subject: replace doesn't support optional groups
Original description:
textlib.py \(method replaceExcept\) doesn't support optional capturing groups in regex.

I tried to run replace.py with the following regex: "RISHMI\(T |IM\)?" => "RISHMI\1"
when running it on a page containing the following text "SOMETHING RISHMI SOMETHING"
it crashes with the following error:
textlib.py, line 178, in replaceExcept
match.group\(groupID\) + \
TypeError: coercing to Unicode: need string or buffer, NoneType found

line 178 contains the statement:
replacement = replacement\[:groupMatch.start\(\)\] + \
match.group\(groupID\) + \
replacement\[groupMatch.end\(\):\]

textlib.py should check for match.group\(groupID\) ==None and if so, add here empty string instead of match.group\(groupID\)
Comment 1 Kunal Mehta (Legoktm) 2013-10-05 04:39:13 UTC
The group must exist to reuse it. What should this regex do in your opinion. What about RISHMI\(T |IM|\)" or  RISHM\(\(?:T |IM\)?\)"? Errors should never pass silently unless explicitly silenced \(PEP 20\). Maybe replacing empty strings could lead to unwanted side effects but I have'nt thought about it.
Comment 2 Kunal Mehta (Legoktm) 2013-10-05 04:39:15 UTC
This regex here is just an example, and probably a bad one \(as the regex it does nothing by this replacement\). Your suggestion regarding the specific regex \(to use inner optional group within group\) would probably fix this specific regex, but this is workaround - replace.py should support replacing capturing optional capturing group the same way re.findall behaves.

The behaviour of replacing None to empty string is compatible with the behaviour of re.findall \(re.findall\('a\(b\)?\(c\)','ac'\) => \[\('', 'c'\)\]\)  and with regex engines of most languages \(in JS: 'ac'.replace\(/a\(b\)?\(c\)/,'a$1c'\)\), though python re isn't consistent here \(re.sub\('a\(b\)?\(c\)','X\\\1','ac'\) - is error\).
Comment 3 Nemo 2013-10-05 10:45:48 UTC
I had my own fights with this problem and my conclusion was that there's nothing to do about it but rewriting your regexes, it's how python works.
Mostly, what's nasty is the idiotic error message.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links