May 2012

1 articles in May 2012

I’m hooked on Debian as of late. I’m still kinda new to Linux environment. I’ve been on Windows side for waaay too long. So, after familiarizing myself with Linux for the past 2 months, I decided to pick up a book about Hadoop, mainly because I’m interested in processing big data. While this is a great book, it seems to assume that you are familiar with Linux and Java. This has been a fun learning experience for me. This might be useful for others who might be struggling to get Hadoop set up for the first time. If you are a Debian guru, please be gentle. This is my first Debian related post.

Let’s dive in. I’m assuming that you have a clean install of Debian, with nothing but SSH installed. You need to have the following package installed:

  • sudo(optional). To install, login as root(type su, enter your password), run apt-get install sudo.
    • give your username the ability to sudo by adding the following line to /etc/sudoers
      • vi /etc/sudoers, and add the following line under User Privelege Secifications(hit i key to insert text, and escape key to get out of the edit/insert mode. Type in :wq to save a file and quit the editor
      • yourUserName ALL=(ALL) ALL
  • vim (type sudo apt-get install vim). You also need an SSH server, which I installed during my Debian installation.
  • Generate private and public key pair for the current user:
  • type ssh-keygen and accept the default location by hitting enter.
  • You can choose to protect your private key with a password.
  • After the pair is generated, run cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  • To make sure that this is done correctly, run ssh localhost. You should get a prompt, without having to type in password again
  • The authenticity of host ‘localhost (127.0.0.1)’ can’t be established.
    RSA key fingerprint is xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added ‘localhost’ (RSA) to the list of known hosts.
    The authenticity of host ‘localhost (127.0.0.1)’ can’t be established.RSA key fingerprint is .
    The next time you ssh in, the message above shouldn’t appear again.

All those above are debian configuration. Now, let’s try to set up single-node Hadoop. Some of this is described in Appendix A of the book I mentioned above. However, the instruction seems to oversimplify stuff. I’ll try to go into more details on how to install Java and Hadoop for a first timer. If you just follow the installation instruction on Appendix A, and try to run the command on page 23:

$ export HADOOP_CLASSPATH=build/classes
$ hadoop MaxTemperature input/ncdc/sample.txt output

You will get the following error(even after installing JDK):

Exception in thread “main” java.lang.NoClassDefFoundError: MaxTemperature
Caused by: java.lang.ClassNotFoundException: MaxTemperature
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Continue Reading →