Last modified: 2014-04-16 20:38:27 UTC
Seen on fluorine starting sometime 2014-02-28: [07:28] < springle> lots of odd things in fluorine:/a/mw-log [07:28] <AaronSchulz> yeah I saw that [07:28] <AaronSchulz> happens every blue moon Numerous log files with names like: 0180.log 0.log #100206.log 15):.log (278):.log bileContext.php(278):.log Context.php(278):.log eContext.php(278):.log ext.php(278):.log Possibly more interesting are the files with names that are partial of an expected log or an expected log with some portion of 'fatal' prefixed to them: al.log atal.log faapi.log faCirrusSearch-all.log fapi.log farunJobs.log fataapi.log fatalapache2.log fatalapi.log fatalCirrusSearch-all.log fatalrunJobs.log fatalxff.log fatamemcached-serious.log fatapi.log fatarunJobs.log fatatestwiki.log fataxff.log fatCirrusSearch-all.log fatpoolcounter.log fatrunJobs.log fatxff.log faxff.log fCirrusSearch-all.log fmemcached-serious.log frunJobs.log fxff.log Because of the 'f', 'fa', 'fat', ... and various logs that are named with parts of a stack trace from MobileContext, this seems likely to be related to Bug 62078 that is causing 39,289 frame stacktraces to be recorded in the fatal log.
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1464
Why is this analytics? Do we own this machine? thanks, -Toby
(In reply to Toby Negrin from comment #2) > Why is this analytics? Do we own this machine? Greg and I guessed that analytics was the right component to file the bug under because the udp2log application is in the analytics/udplog.git gerrit repository.
Yes -- makes sense. We'll take a look. -Toby
We'll prioritize for next sprint (Thursday 3/6) -Toby
Aaron/Bryan -- is this a serious issue? We expect to phase this technology out in the near future and this bug will require some investigation. thanks, -Toby
(In reply to Toby Negrin from comment #6) > Aaron/Bryan -- is this a serious issue? We expect to phase this technology > out in the near future and this bug will require some investigation. Its probably not an urgent problem. It is pretty annoying/disruptive when it occurs as it makes the logs on florine very hard to follow and some monitoring tools untrustworthy. It doesn't seem to happen frequently at this point however. Out of curiosity, what is udp2log going to be replaced with? Kafka everywhere?
(In reply to Bryan Davis from comment #7) > Out of curiosity, what is udp2log going to be replaced with? Kafka > everywhere? Ping on that :) Also, it happened again last night (3/13) due to a huge MobileFrontend backtrace. MaxSem fixed the part in MobileFrontend, but udp2log.py is still vulnerable to these issues. See the short thread on engineering@ "Strange log files on fluorine".
udp2log will be replaced by Kafka at some point -- hopefully we are talking about a few months. All of the logs will be copied back to VA for analysis. We do have a kafka to UDP2Log converter for "legacy" apps.