Last modified: 2013-10-25 12:51:57 UTC
I have a tool on the toolserver that checks all weblinks in dewiki for availability and reports broken links on talk pages. To check the links in parallel I would need a separate instance within the tools project to not to impact the others. Maybe a dedicated grid node would also be OK. There would also be the possibility to have a separate project, but I don't know if that makes sense. Is it possible to have root on my instance? If not, I would need Tcl 8.6 (tip-386-impl branch), tcllib, TclCurl, mysqltcl (or better tdbc and tdbc::mysql), access to the production replicas, local (mysql) database and access to the file storage (/data/project, maybe also dumps). My username is gifti, service-group is giftbot.
Having this run as multiple continuous tasks should be good enough, methinks. Perhaps use a redis queue to co-ordinate between multiple instances? :)
I don't quite understand why a separate instance is needed. Why doesn't work this on Tools now? You shouldn't need to worry about impacting others, that's what the grid is for. (In reply to comment #1) > Having this run as multiple continuous tasks should be good enough, methinks. > Perhaps use a redis queue to co-ordinate between multiple instances? :) The nice thing about MediaWiki is that you have page IDs. So you don't have to write complex job schedulers, but for example can have 10 jobs that each check the external links of pages where (i - 1) * 1000000 <= el_from < i * 100000 (for example, depending on the maximum page_id of course).
(In reply to comment #2) > [...] > The nice thing about MediaWiki is that you have page IDs. So you don't have > to > write complex job schedulers, but for example can have 10 jobs that each > check > the external links of pages where (i - 1) * 1000000 <= el_from < i * 100000 > (for example, depending on the maximum page_id of course). ... provided, that the number of zeroes are equal :-). Well, you get the gist.
It's also not clear to me why a dedicated instance is required for this. Why does the grid scheduling not suffice?