Last modified: 2014-10-19 08:55:43 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74092, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72092 - Grid engine "swallows" quotation marks (double and single quotation marks) and does not recognize pages at cs.wiki (more at "Additional Information")
Grid engine "swallows" quotation marks (double and single quotation marks) an...
Status: REOPENED
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Unprioritized major
: ---
Assigned To: Marc A. Pelletier
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-15 20:33 UTC by Wesalius
Modified: 2014-10-19 08:55 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Wesalius 2014-10-15 20:33:00 UTC
Intention:
I was trying to submit a category.py move task on cs.wiki through jstart to gridengine through my hypobot tool at labs. As the script name says, I was trying to move a category on cs.wiki

running pywikibots category.py from local goes through ok

(dash for hyphen replacement in the name of the category)

Steps to Reproduce:
1. jsub python /shared/pywikipedia/core/scripts/category.py move -from:"Hudební skupiny 1970-1979" -to:"Hudební skupiny 1970–1979"
2. cat python2.err
3. WARNING: Moving category page 'Kategorie:Hudební' requested, but the page doesn't exist.

-happens with single quotation marks as well ('Hudební skupiny 1970-1979' instead of "Hudební skupiny 1970-1979")

running pywikibots category.py from local goes through ok

Actual Results:  
job did not get done

Expected Results:  
the category to be moved

Reproducible: Always

Weird is that if I try to submit the job without spaces, for ex.: category.py move -from:Hudební_skupiny_1970-1979 -to:Hudební_skupiny_1970–1979
then it goes through as fas as getting the command right, but engine still gives an error "WARNING: Moving category page 'Kategorie:Hudební skupiny 1970-1979' requested, but the page doesn't exist.", which is weird, since the page exists and I can move it from local (but I dont want to since the category contains over 1000 pages and I would prefer it to run trhough labs than having my PC running all night long)
Comment 1 Wesalius 2014-10-15 20:35:15 UTC
the problem with quotation marks appeared before (in May) when trying to submit replace.py task, but I gave up on it and did it locally
Comment 2 Marc A. Pelletier 2014-10-15 21:02:58 UTC
The problem is that gridengine will perform an arbitrary, and sometimes variable, number of shell substitutions over the commandline, making the use of quotes or escapes (in any combination) on the command line problematic at the best of times.  This problem is fundamental to gridengine.

Sometimes, double quoting or escaping internal quotes can solve the issue for a specific set of command line values, but this varies and is sometimes hard to predict (for instance, the presence of some shell metacharacter within the /quoted/ string causes gridengine to invoke an extra '/bin/sh -c' at the remote end, stripping one level of extra quoting).

The only reliable ways to circumvent that issue are to either (a) create a small shell script that contains the final invocation with adequate quoting, and invoke /that/ through jsub/qsub instead, or (b) pass arguments to the job in some other manner than the command line (if pywikibot can accept arguments from a file, for instance, this could be used instead).
Comment 3 John Mark Vandenberg 2014-10-16 06:53:01 UTC
Try using:

jsub python /shared/pywikipedia/core/scripts/category.py move -from:Hudební_skupiny_1970-1979 -to:Hudební_skupiny_1970–1979
Comment 4 John Mark Vandenberg 2014-10-16 06:54:56 UTC
@Marc, is this bug logged against gridengine ? If so, can you add the URL here?
Comment 5 Wesalius 2014-10-16 06:57:17 UTC
(In reply to John Mark Vandenberg from comment #3)
> Try using:
> 
> jsub python /shared/pywikipedia/core/scripts/category.py move
> -from:Hudební_skupiny_1970-1979 -to:Hudební_skupiny_1970–1979

Gives "WARNING: Moving category page 'Kategorie:Hudební skupiny 1970-1979' requested, but the page doesn't exist." 
Which is not true since if you copy that name of a page, and paste it to cs.wiki, it will get you to an existing page (https://cs.wikipedia.org/wiki/Kategorie:Hudebn%C3%AD_skupiny_1970-1979)
Comment 6 Wesalius 2014-10-16 07:21:52 UTC
(In reply to John Mark Vandenberg from comment #3)
> Try using:
> 
> jsub python /shared/pywikipedia/core/scripts/category.py move
> -from:Hudební_skupiny_1970-1979 -to:Hudební_skupiny_1970–1979

With -debug and -verbose I get thiss .err

SITE VERSION: 1.25wmf2
MESSAGES: unknown (not logged in)
=== === === === === === === === === === === === === ===
Pywikibot rad0c47505aac26115b029acfa40908a3d5461c29
Python 2.7.3 (default, Feb 27 2014, 19:58:35)
[GCC 4.6.3]
Found 1 wikipedia:cs processes running, including this one.
WARNING: Moving category page 'Kategorie:Hudební skupiny 1970-1979' requested, but the page doesn't exist.
Moving category talk page 'Diskuse ke kategorii:Hudební skupiny 1970-1979' requested, but the page doesn't exist.
Dropped throttle(s).
Waiting for 1 network thread(s) to finish. Press ctrl-c to abort
All threads finished.
Comment 7 John Mark Vandenberg 2014-10-16 08:14:08 UTC
OK, there are no quotes in that command line (which is just a more complete version of what was stated in the last paragraph of comment 0), so I assume that the reason for WONTFIXing in comment 2 is not relevant to this bug.
Comment 8 Wesalius 2014-10-16 08:36:19 UTC
python /shared/pywikipedia/core/scripts/category.py move -from:Hudební_skupiny_1970-1979 -to:Hudební_skupiny_1970–1979 -simulate -verbose -debug

will give in .err log this line: COMMAND: ['/shared/pywikipedia/core/scripts/category.py', 'move', '-family:wikipedia', '-lang:cs', '-from:Hudebn\xc3\xad_skupiny_1970-1979', '-to:Hudebn\xc3\xad_skupiny_1970\xe2\x80\x931979', '-simulate', '-debug', '-verbose']
Comment 9 John Mark Vandenberg 2014-10-16 08:42:19 UTC
Interesting!  gridengine is also changing the unicode arguments, *and* _appears_ the be changing the output ??

Here is what I see when I run
$ python pwb.py category.py 'move' '-family:wikipedia' '-lang:cs' '-from:Hudebn\xc3\xad_skupiny_1970-1979' '-to:Hudebn\xc3\xad_skupiny_1970\xe2\x80\x931979' '-simulate' '-debug' '-verbose'
....
WARNING: Moving category page 'Kategorie:Hudebn\xc3\xad skupiny 1970-1979' requested, but the page doesn't exist.

which is not the same as the reported output from jsub:

WARNING: Moving category page 'Kategorie:Hudební skupiny 1970-1979' requested, but the page doesn't exist.
Comment 10 John Mark Vandenberg 2014-10-16 09:04:21 UTC
Ugh - ignore comment 9.  I see the same log line from my workstation when I run

$ python pwb.py category.py move -from:Hudební_skupiny_1970-1979 -to:Hudební_skupiny_1970–1979 -simulate -verbose -debug
...
COMMAND: ['category.py', 'move', '-family:wikipedia', '-lang:cs', '-from:Hudebn\xc3\xad_skupiny_1970-1979', '-to:Hudebn\xc3\xad_skupiny_1970\xe2\x80\x931979', '-simulate', '-debug', '-verbose', '-log']
...

So, I still have no idea why it doesnt work on gridengine, as that command works elsewhere.
Comment 11 Wesalius 2014-10-19 08:55:43 UTC
(In reply to Marc A. Pelletier from comment #2)
> The problem is that gridengine will perform an arbitrary, and sometimes
> variable, number of shell substitutions over the commandline, making the use
> of quotes or escapes (in any combination) on the command line problematic at
> the best of times.  This problem is fundamental to gridengine.
> 
> Sometimes, double quoting or escaping internal quotes can solve the issue
> for a specific set of command line values, but this varies and is sometimes
> hard to predict (for instance, the presence of some shell metacharacter
> within the /quoted/ string causes gridengine to invoke an extra '/bin/sh -c'
> at the remote end, stripping one level of extra quoting).
> 
> The only reliable ways to circumvent that issue are to either (a) create a
> small shell script that contains the final invocation with adequate quoting,
> and invoke /that/ through jsub/qsub instead, or (b) pass arguments to the
> job in some other manner than the command line (if pywikibot can accept
> arguments from a file, for instance, this could be used instead).

I tried to submit the command through script. I made an executable .sh file containing this string:
python /shared/pywikipedia/core/scripts/category.py move -from:Hudební_skupiny_1970-1979 -to:Hudební_skupiny_1970–1979 -simulate -verbose -debug

Submitted this script through jsub. The .err file after the script execution contained:
...
COMMAND: ['/shared/pywikipedia/core/scripts/category.py', 'move', '-from:Hudebn\xc3\xad_skupiny_1970-1979', '-to:Hudebn\xc3\xad_skupiny_1970\xe2\x80\x931979', '-simulate', '-verbose', '-debug'] DATE: 2014-10-19 08:35:52.070622 UTC
...
Moving category page 'Kategorie:Hudební skupiny 1970-1979' requested, but the page doesn't exist. Moving category talk page 'Diskuse ke kategorii:Hudební skupiny 1970-1979' requested, but the page doesn't exist.

Any further ideas or suggestions?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links