Last modified: 2014-08-06 23:56:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T49312, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 47312 - not all jobs are processed (webVideoTranscode)
not all jobs are processed (webVideoTranscode)
Status: PATCH_TO_REVIEW
Product: MediaWiki extensions
Classification: Unclassified
TimedMediaHandler (Other open bugs)
unspecified
All All
: High major (vote)
: ---
Assigned To: Nobody - You can work on this!
: shell
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-17 08:24 UTC by Jan Gerber
Modified: 2014-08-06 23:56 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Jan Gerber 2013-04-17 08:24:22 UTC
videoscalers run "jobs-loop.sh ... webVideoTranscode" to process video transcoding jobs. there are unprocessed jobs in the queue that never run.
Comment 1 Aaron Schulz 2013-06-25 21:53:27 UTC
Seems to be jobs failing rather that not ever getting run.
Comment 2 Rob Lanphier 2013-08-22 22:11:30 UTC
Jan, can you clarify what you think the next steps on this one should be?
Comment 3 Andre Klapper 2013-09-03 10:22:26 UTC
Jan, can you clarify what you think the next steps on this one should be?
Comment 4 Jan Gerber 2013-09-03 11:42:24 UTC
http://commons.wikimedia.org/wiki/File:Turtle_at_Mississippi_River_Park_and_Museum_Tunica_Resorts_MS.oggtheora.ogv is another case where its happening, trying to get more date on it now to see what might cause it.
Comment 5 Andre Klapper 2013-12-12 13:13:37 UTC
(In reply to comment #0 by jgerber)
> videoscalers run "jobs-loop.sh ... webVideoTranscode" to process video
> transcoding jobs. there are unprocessed jobs in the queue that never run.

(In reply to comment #1 by aschulz4587)
> Seems to be jobs failing rather that not ever getting run.


Is this still a problem, and how to find out?


(In reply to comment #4 by jgerber)
> trying to get more date on it now to see what might cause it.

jgerber: Did this ever happen?
Comment 6 Andre Klapper 2014-02-27 16:28:25 UTC
(In reply to comment #0 by jgerber)
> videoscalers run "jobs-loop.sh ... webVideoTranscode" to process video
> transcoding jobs. there are unprocessed jobs in the queue that never run.

(In reply to comment #1 by aschulz4587)
> Seems to be jobs failing rather that not ever getting run.


Is this still a problem, and how to find out?


(In reply to comment #4 by jgerber)
> trying to get more date on it now to see what might cause it.

jgerber: Did this ever happen?
Comment 7 Marco 2014-03-30 22:39:55 UTC
(In reply to Andre Klapper from comment #6)
Currently we have 6723 queued transcodes as per https://commons.wikimedia.org/wiki/Special:TimedMediaHandler 
It seems most (/all) of them were "Added to Job queue 37 days, .. hours, .. minutes, .. seconds ago" as part of the 160p-ogg-transcode batch.

Though the jobs seem not to fail when resubmitted. Thus I don't know if it is related to this bug report specifically. Do you think I should open another bug report requesting to re-queue all queued transcodes?
Comment 8 Bawolff (Brian Wolff) 2014-03-30 22:43:47 UTC
(In reply to Marco from comment #7)
> (In reply to Andre Klapper from comment #6)
> Currently we have 6723 queued transcodes as per
> https://commons.wikimedia.org/wiki/Special:TimedMediaHandler 
> It seems most (/all) of them were "Added to Job queue 37 days, .. hours, ..
> minutes, .. seconds ago" as part of the 160p-ogg-transcode batch.
> 
> Though the jobs seem not to fail when resubmitted. Thus I don't know if it
> is related to this bug report specifically. Do you think I should open
> another bug report requesting to re-queue all queued transcodes?

Its separate (I think its related to how we temporarily stopped making 160p files. When re-renabled it I don't think jobs got re-added despite what that page says). Anyhow, see bug 61690
Comment 9 Bawolff (Brian Wolff) 2014-03-30 22:48:21 UTC
>(I think its related to how we temporarily stopped making 160p
> files. When re-renabled it I don't think jobs got re-added despite what that
> page says).

Err, actually we didn't do that. So never mind....

Anyways, it looks like a bunch of jobs disappeared, and then there's an inconsistent state with TimedMediaHandler thinking they are just pending, not gone. Which may or may not be the same bug as this one. I'm not sure.
Comment 10 Bawolff (Brian Wolff) 2014-03-31 00:53:33 UTC
(In reply to Bawolff (Brian Wolff) from comment #9)
> >(I think its related to how we temporarily stopped making 160p
> > files. When re-renabled it I don't think jobs got re-added despite what that
> > page says).
> 
> Err, actually we didn't do that. So never mind....
> 
> Anyways, it looks like a bunch of jobs disappeared, and then there's an
> inconsistent state with TimedMediaHandler thinking they are just pending,
> not gone. Which may or may not be the same bug as this one. I'm not sure.

Ugh, sorry. Cannot read. Thought this was bug 61401. Ignore everything I said. This is quite likely the right bug.



(In reply to Andre Klapper from comment #6)
> (In reply to comment #0 by jgerber)
> > videoscalers run "jobs-loop.sh ... webVideoTranscode" to process video
> > transcoding jobs. there are unprocessed jobs in the queue that never run.
> 
> (In reply to comment #1 by aschulz4587)
> > Seems to be jobs failing rather that not ever getting run.
> 
> 
> Is this still a problem, and how to find out?

Someone with access to job queue log (I believe that's basically the folks with shell access) can find out if there are failing jobs that do not have their "failed" status reflected in the transcode table.

It would be interesting to see if those 6723 queued transcodes ever ran and failed, or if they just never ran.

-----

It might be good to do something like - If there are transcodes that have been pending for more than 10 days, and if the job queue for webVideoTranscode is empty, then automatically re-add the jobs for those videos
Comment 11 Marco 2014-04-26 18:04:51 UTC
> (In reply to comment #10)
> Someone with access to job queue log (I believe that's basically the folks
> with shell access) can find out if there are failing jobs that do not have
> their "failed" status reflected in the transcode table.
> 
> It would be interesting to see if those 6723 queued transcodes ever ran and
> failed, or if they just never ran.

Is there any progress in making the analysis?

> It might be good to do something like - If there are transcodes that have
> been pending for more than 10 days, and if the job queue for
> webVideoTranscode is empty, then automatically re-add the jobs for those
> videos

Is there any progress in implementing this? If not, I may code a bot to fulfill the request to reset those transcodes made @ https://commons.wikimedia.org/w/index.php?title=Commons%3ABots%2FWork_requests&diff=122148976&oldid=122026535 which
Comment 12 Gerrit Notification Bot 2014-05-18 18:58:03 UTC
Change 133994 had a related patch set uploaded by Brian Wolff:
Automatically re-add transcode jobs if transcode pending for 72h

https://gerrit.wikimedia.org/r/133994
Comment 13 Bawolff (Brian Wolff) 2014-05-18 20:51:40 UTC
(In reply to Gerrit Notification Bot from comment #12)
> Change 133994 had a related patch set uploaded by Brian Wolff:
> Automatically re-add transcode jobs if transcode pending for 72h
> 
> https://gerrit.wikimedia.org/r/133994

This would obviously be a band aid solution. We should figure out why they arent working in the first place.
Comment 14 Andre Klapper 2014-07-04 11:42:58 UTC
(In reply to Bawolff (Brian Wolff) from comment #13)
> > https://gerrit.wikimedia.org/r/133994
> 
> This would obviously be a band aid solution. We should figure out why they
> arent working in the first place.

Workaround patch has two -1s. What's the plan here?
Comment 15 Bawolff (Brian Wolff) 2014-07-04 16:45:58 UTC
(In reply to Andre Klapper from comment #14)
> (In reply to Bawolff (Brian Wolff) from comment #13)
> > > https://gerrit.wikimedia.org/r/133994
> > 
> > This would obviously be a band aid solution. We should figure out why they
> > arent working in the first place.
> 
> Workaround patch has two -1s. What's the plan here?

Good question. I need to figure out a new plan/investigate further or talk it over with Arron. Gilles' -1 is just cosmetic so that -1 is trivial.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links