Pig doesn’t support scalar variable assignment. That is you can not have a statement like this

var = 3

The smallest unit you can have is a tuple, containing a single value

var = {3}

So, say that you have a variable X containing 2 columns,


and you need to do some math against the second column, based on the result of a value stored in a variable, var above.

The following statement won’t work:

result = FOREACH X GENERATE $1*var;

Instead you need to join two variables together so that for every row of X, you will have an additional column containing the value from var. You need to produce the following data before proceeding with your calculation


To accomplish this, you need to do the following:

temp = JOIN X BY 1, var BY 1 USING 'replicated';

Now you can do your math operation

result = FOREACH temp GENERATE $1*$2;

My code is pulling JSON data from a web service I wrote. That JSON data is then loaded into a python dictionary. Things works fabulously when I debug in my local console running an internal web server provided by the App Engine SDK. But when I upload the application to Goole CLoud Engine, things break. This is a snippet of my code

except Exception as ex:
   self.response.write("Google says: %s"%(ex.message))
   self.response.write("Try refreshing the page (again)")

I have all the states abbreviation as keys. when I try to access the value of the dictionary directly using square brackets, like


Google App Engine complains about KeyError exception. However if I try iterating through the key using a for loop, like

for each_state in states:

it works flawlessly in the App Engine. I also found out that the following also works perfectly


The only thing I cannot do is accessing the dictionary using square brackets. That baffles me. If you know what’s up with that, drop me a line.

Ok. Being new to linux and all, this took a while for me to figure out, but I finally did. So, I have a debian running Nagios Core monitoring system. My former co-worker also set the system up to use Nagios API, so that we can create a custom page that shows up the status of all our system.

Briefly, this is how the API is supposed to work:

There is a python script called nagios-api that you need to run so that the api would run as an application server on a certain port. This API takes advantage of a status file dump called status.dat, configured in /etc/nagios3/nagios.cfg, updated peridocally by Nagios. Supervisor(A linux process manager) would start this script every time the server starts. The configuration file is as follow:

directory = /home/nagapi
user = api
command = /bin/bash -c "source /home/nagapi/.virtualenvs/nagapi/bin/activate; /home/nagapi/nagios-api/nagios-api"
stdout_logfile = /home/nagapi/supervisor_nagios-api_stdout.log
stderr_logfile = /home/nagapi/supervisor_nagios-api_stderr.log

Problem I was experiencing

Every time I restart the server, the following would happen:

 Continue Reading →