Last modified: 2014-09-03 09:16:13 UTC
ironholds@stat1002:~$ hive Unable to determine Hadoop version information. 'hadoop version' returned: No default-logstash-fields.properties resource present, using defaults Hadoop 2.3.0-cdh5.0.2 Subversion git://github.sf.cloudera.com/CDH/cdh.git -r 8e266e052e423af592871e2dfe09d54c03f6a0e8 Compiled by jenkins on 2014-06-09T16:20Z Compiled with protoc 2.5.0 From source with checksum 75596fe27f833e512f27fbdaaa7b0ab This command was run using /usr/lib/hadoop/hadoop-common-2.3.0-cdh5.0.2.jar
(just wanted to file the same bug :-) ) The breakage happened around 2014-08-30 ~00:49 [1]. Around that time bc8e34859268b6943f1e2c9621bd01bdc6676371 got merged, which turns gelf logging on. (We saw having gelf logging on to cause the exact same problems 4 days ago [2], which was worked around by turning gelf logging off (See 82cab341b6070d95437b00f005280fed3289dcac)). ------------------------------------- The immediate work-around is to create an empty default-logstash-fields.properties in the current directory: touch default-logstash-fields.properties Then hive again starts without issues, and also queries etc work. ------------------------------------- [1] I had a couple jobs running during the night. On 00:47:19 the last successful one started. On 00:49:00 the first failing job started. [2] See http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics/20140826.txt starting at 20:49:30, and http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/20140826.txt starting at 20:55:17
Opsen -- can we please consider some sort of sanity check post cluster maintenance? I'm also wondering if the data quality scripts also broke. Thanks for grabbing Christian.
Sorry folks. I did sanity check with 'hdfs', but because that output is just a warning I didn't think it would cause problems. I'll also test with 'hive' in the future. Did a lot of research into upstream defaults before making this change, was surprised at the outcome. I'll disable gelf again for now. I discovered this ticket via Google search results while troubleshooting :P
(Adding jgage to CC) (In reply to Toby Negrin from comment #2) > I'm also wondering if the data quality scripts also broke. Even if ... our setup allows to re-check partitions easily without getting Icinga confused. So we're safe and prepared for that. However, the hive breakage is limited to non-cluster machines. Like stat1002. The monitoring however runs from within the cluster. So the monitoring is working: +---------------------+--------+--------+--------+--------+ | Date | bits | text | mobile | upload | +---------------------+--------+--------+--------+--------+ [...] | 2014-08-30T00:xx:xx | . | . | . | . | | 2014-08-30T01:xx:xx | . | . | . | . | | 2014-08-30T02:xx:xx | . | . | . | . | | 2014-08-30T03:xx:xx | X | X | X | X | <-- problematic commit was merged. | 2014-08-30T04:xx:xx | . | . | . | . | | 2014-08-30T05:xx:xx | . | . | . | . | | 2014-08-30T06:xx:xx | . | . | . | X | <-- needs investigation | 2014-08-30T07:xx:xx | . | . | . | . | | 2014-08-30T08:xx:xx | . | . | . | . | | 2014-08-30T09:xx:xx | . | . | . | . | | 2014-08-30T10:xx:xx | . | . | . | . | | 2014-08-30T11:xx:xx | . | . | . | . | | 2014-08-30T12:xx:xx | . | . | . | . | [...] Statuses: . --> Partition is ok X --> Partition is not ok (duplicates, missing, or nulls) > Thanks for grabbing Christian. I didn't grab the issue -- I just provided a work-around :-) There is not much I can to there. Only ops people can merge to the operations/puppet repo. And since there is a workaround that makes hive work again on stat1002, I think we can safely wait for a proper fix next week. Let's not forget: Hive is not yet a production service ;-)
Christian -- I ran a hive query and redirected output to file -- thus I thought hive was running :( Totally agree -- Hive is not a production service and there is no expectation of off-hour support. Gage -- We can cc you on all tickets if you want. We are pretty bugzilla focused here. Let's discuss Tuesday. thanks all -Toby
Works for me again (Hence closing). Thanks!
Just to keep bugs connected: (In reply to christian from comment #4) > The monitoring however runs from within the cluster. So the monitoring > is working: > > +---------------------+--------+--------+--------+--------+ > | Date | bits | text | mobile | upload | > +---------------------+--------+--------+--------+--------+ > [...] [...] > | 2014-08-30T03:xx:xx | X | X | X | X | <-- > problematic commit was merged. This monitoring alert is tracked in bug 70330 [...] > | 2014-08-30T06:xx:xx | . | . | . | X | <-- needs > investigation This monitoring alert is tracked in bug 70331