Last modified: 2013-12-26 14:39:54 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/509/ Reported by: cosoleto Created on: 2007-09-28 07:35:32 Subject: showDiff() highlighting limitation due to difflib design Assigned to: cosoleto Original description: showDiff\(\) can fail to highlight a char-by-char difference because Python difflib seems don't support fully char-by-char comparison. Please see in Python tracker: \* issue \#1528074: "difflib.SequenceMatcher.find\_longest\_match\(\) wrong result" \(http://bugs.python.org/issue1528074\) \* issue \#1678345: "A fix for the bug \#1528074 \[warning: quite slow\]" \(http://bugs.python.org/issue1678345\)
Logged In: YES user\_id=181280 Originator: YES File Added: difflib\_test.py
- **priority**: 5 --> 6
Logged In: NO Guess this is an example http://bildr.no/view/146822
Assigned before somebody certainly steals this issue to me. I am going to add a modified difflib version. Unless the lack of feature is fixed in recent Python builds or, of course, anyone makes an objection. I am not sure about a config option to enable or disable line-by-line/char-by-char comparision.
- **priority**: 6 --> 7 - **assigned_to**: nobody --> cosoleto
Actually, I'd very much like to see better diff support for pywikipedia. I dont know why I missed that bug =\) I see in those bugs several comments about complexity changes, saying that a patch could change complexity from O\(n\*m\) to O\(n+m\), which certainly looks interesting. If char-by-char comparison provides better diffs, at a lower cost, what exactly is the reason for not supporting in Python? :s Two things to look at during implementation: \* Would it provide interesting diffs for all cases? \(if one case is improved while other matches get worse, it's not so interesting anymore\) \* Performance changes for big diffs. Good luck =\)
I haven't need luck because I am not going to do big works, just silly adaptation of already written code \(with loss of performance\). If you are interested to work on this problem in a different way you are welcome \(and not only in this open project\). Anyway it's nice to see you have analysed the situation a bit. The changed version should be safe, without regression cases. I will see to document performace loss.
- **Group**: --> confirmed
This appears to have been fixed upstream, right?
Both links in comment 0 (http://bugs.python.org) have been fixed, indeed.