I remember it took me sometime to get this configured when I first started trying Jupyter and Spark out. Hopefully this is helpful for others. This works for Hadoop 2.6.0-CDH5.9.1, Spark 1.6.0 using python2.7 and Python3. For other versions, you need to adjust the path accordingly. Basically, you just need to tell spark 4 things: The location of your (Ana)conda installation The location of your Jupyter installation and its configuration The location of your Python installation Resources your Spark executors need Type the following from your bash terminal (If you are using Cloudera, this would be your Edge node. If …

In case you are wondering how to use the awesome LIBSVM package with the awesome PyCharm IDE, here is a brief guide:

LIBSVM documentation  mention you need to

  • Download  libsvm(version 3.2 at the time of this writing)
  • Extract the zip to a folder. By default, this libsvm-3.20
  • Go to your command prompt, within the extracted directory, libsvm-3.20/, type “make”. This will create a file called libsvm.so.2. If this doesn’t work, google how to enable make command on Mac OS.
  • go to libsvm-3.20/python directory, type make. This will create 2 files: svm.py and svmutil.py

Now, within PyCharm, say that you have a folder called Project/code/ where your code is stored at this level. Copy svm.py and svmutil.py mentioned above to your Project/code/ directory and Copy libsvm.so.2 file to your Project/ directory.

From Project/code/somefile.py python file you can import the svm library with

from svmutil import *

That’s it. You should be all set.


I am new to D3, and struggled a bit trying to figure out how to use external data source obtained through HTTP REST call as my data source. If you don’t know what I mean by HTTP REST call, it’s basically calling a URL that would return JSON data. If you are not familiar with D3,  this tutorial by Scott Murray is a required reading. After you are done reading that, read this manual by the author of d3 himself.  In a nutshell: No, you don’t need jquery library. D3 itself is enough. HTTP calls are asynchronous in d3. If …

