Last modified: 2014-07-17 12:07:35 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70161, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 68161 - Current puppet does not allow to bring up a cluster in labs
Current puppet does not allow to bring up a cluster in labs
Status: NEW
Product: Analytics
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-17 11:47 UTC by christian
Modified: 2014-07-17 12:07 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-07-17 11:47:54 UTC
When trying to bring up a namenode in labs, puppet fails with

  Error: Cannot create /var/lib/hadoop/name; parent directory /var/lib/hadoop does not exist
  Error: /Stage[main]/Cdh::Hadoop::Namenode/File[/var/lib/hadoop/name]/ensure: change from absent to directory failed: Cannot create /var/lib/hadoop/name; parent directory /var/lib/hadoop does not exist

With puppet at commit ebcbef50568960d424fcb95fc79ba3be945a905e,
everything is working, and setting up a cluster in labs works.

With 87bd718e678d290b80b0916d255f1bae8666e7d7 (i.e.: the child
following the above ebcdef commit) + cherry-picking
a38770013716dd39ee5df90380473b734e0cebbb on top [1], puppet fails to
set up namenode. Puppet runs fail with the above error message.

So it seems 87bd718e678d290b80b0916d255f1bae8666e7d7 is the culprit.
But as this commit is doing much reshuffling (~800 lines changed),
I'll leave it to CDH+puppet experts to dig deeper.



* Steps to Reproduce
   * Add a new instance 'demo-master'
      (m1.small, ubuntu-12.04-precise)
   * Wait for the instance to come up.
   * Configure the instance by adding role
       role::analytics::hadoop::master
     and setting
       hadoop_namenodes
     to
       demo-master.eqiad.wmflabs
   * Wait for the next puppet run

* Expected result
  Puppet passes without errors

* Actual result
  Puppet fails with
    Error: Cannot create /var/lib/hadoop/name; parent directory /var/lib/hadoop does not exist
    Error: /Stage[main]/Cdh::Hadoop::Namenode/File[/var/lib/hadoop/name]/ensure: change from absent to directory failed: Cannot create /var/lib/hadoop/name; parent directory /var/lib/hadoop does not exist




[1] Plain 87bd718e678d290b80b0916d255f1bae8666e7d7 fails with

  Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate parameter 'mapreduce_output_compression' for on Class[Cdh::Hadoop] at /etc/puppet/manifests/role/analytics/hadoop.pp:201 on node qchris-master-87bd718.eqiad.wmflabs

which was fixed upstream in commit
a38770013716dd39ee5df90380473b734e0cebbb.
Comment 1 christian 2014-07-17 11:52:26 UTC
Btw. bringing up hadoop workers with current puppet also fails with
a (different) directory in /var/lib/hadoop not existing.

(Again, when using puppet at ebcbef50568960d424fcb95fc79ba3be945a905e
hadoop workers are brought up by puppet without issues.)
Comment 2 christian 2014-07-17 12:07:35 UTC
It seems the part that creates /var/lib/hadoop [1] has been lost in translation
for commit 87bd718e678d290b80b0916d255f1bae8666e7d7.

[1] Search for "unlikely" on
https://git.wikimedia.org/blobdiff/operations%2Fpuppet/87bd718e678d290b80b0916d255f1bae8666e7d7/manifests%2Frole%2Fanalytics%2Fhadoop.pp

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links