Last modified: 2014-07-15 14:46:30 UTC
From Asher's email: "Looking over parsoid logs and the code that creates it, I think that refactoring could help deliver some of what I'd like without adding any complexity. Parsoid already logs what's being parsed, request urls sent to varnish, and parse times in ms, written to a mix of stdout and stderr. The formatting could be standardized and/or the data combined into an access log which we could parse for stats. It also wouldn't take much to use node's dgram module to either udplog this to the mediawiki log server, send stats directly to a statsd daemon, or both. This could indeed be done by an engineer outside of parsoid team." Gabriel's response: "If you spec a format that is easily consumable by tools you would like to use, then I would be happy to change the few lines that do our current minimalistic logging to fit it. Apart from regular request logging and cache requests (currently logged for debugging purposes) we'd like to keep the ability to log exceptions. It probably makes sense to split things up into request and error logs at that point."
@Asher: Could you spec the log format you prefer?
I think your options are as follows: For request / error / exception logging, you can send datagrams to fluorine.eqiad.wmnet:8420 in the format of '<sequence id>\t<log name> <log message>', where sequence ID is a uint64_t number that increments with every transmission. (At least in theory. The sequence ID can be used as a basis for log sampling and for detecting packet loss, but neither is going to be useful for you, so I don't think there'd be any practical consequences to sending some constant number instead.) The udp2log instance on fluorine (which is the MediaWiki log aggregator for the production cluster) will write each log message to /a/mw-log/<log file>.log, which gets rotated and archived daily. (It was formerly the case that someone with root would have to touch the file once for log2udp to start writing the log stream, but I am told this is no longer required. YMMV.) Because log messages go into flat files, this doesn't buy you much by way of fancy queryability. The primary benefit would likely be an increased chance that someone unfamiliar with Parsoid would be able to find the log files in case of an outage. There are also some tentative plans for writing this log data into a log analysis tool like logstash, and by logging to fluorine you make it more likely that your logs would be ported over for you at some point without you needing to do anything about it. If you want the ability to slice and dice the log data using some high-level query language, you can create an event schema on metawiki, then log JSON blobs confirming to that schema via UDP to vanadium.eqiad.wmnet:8421. Your schema will be used to create both a MySQL table on db1047 and a MongoDB collection on vanadium and each entry will be written to both. See <https://meta.wikimedia.org/wiki/Schema:MobileWebCentralAuthError> for an example. For logging stats (like counters and time measurements), use StatsD! There's an instance on the cluster that writes to either/both Ganglia and Graphite. The StatsD protocol is also UDP-based, and it is extremely simple: see <https://github.com/b/statsd_spec> for the format. You'll get automatically-generated graphs representing various summary figures computed at a regular interval.
We are leaning towards GELF (http://www.graylog2.org/gelf#specs). There are several libraries available: * https://github.com/trentm/node-bunyan * https://github.com/wavded/graygelf (also includes a server) * https://github.com/nomiddlename/log4js-gelf * https://github.com/robertkowalski/gelf-node There are probably more. Also see https://www.mediawiki.org/wiki/Requests_for_comment/Structured_logging for the wider RFC on this.
*** Bug 58352 has been marked as a duplicate of this bug. ***
Change 109841 had a related patch set uploaded by Cscott: Support structured logging. https://gerrit.wikimedia.org/r/109841
Change 109846 had a related patch set uploaded by Cscott: Support structured logging. https://gerrit.wikimedia.org/r/109846
Change 109841 abandoned by Cscott: Support structured logging. Reason: Abandoned in favor of https://gerrit.wikimedia.org/r/109846 https://gerrit.wikimedia.org/r/109841
Change 139196 had a related patch set uploaded by Marcoil: Bug 49762: Logging: emit structured logs via node-bunyan https://gerrit.wikimedia.org/r/139196
Change 139196 abandoned by Marcoil: Bug 49762: Logging: emit structured logs via node-bunyan Reason: Abandoned in favor of a different design with Parsoid's Logger configuration driving the output instead. https://gerrit.wikimedia.org/r/139196
Change 146130 had a related patch set uploaded by Marcoil: Bug 49762: Logging - emit structured logs via node-bunyan https://gerrit.wikimedia.org/r/146130
Change 146130 merged by jenkins-bot: Bug 49762: Logging - emit structured logs via node-bunyan https://gerrit.wikimedia.org/r/146130
The support code for logging using node-bunyan has been merged into master.