November 14, 2013

1 articles on November 14, 2013

Pig doesn’t support scalar variable assignment. That is you can not have a statement like this

var = 3

The smallest unit you can have is a tuple, containing a single value

var = {3}

So, say that you have a variable X containing 2 columns,

(word1,1)
(word2,4)
(word3,14)

and you need to do some math against the second column, based on the result of a value stored in a variable, var above.

The following statement won’t work:

result = FOREACH X GENERATE $1*var;

Instead you need to join two variables together so that for every row of X, you will have an additional column containing the value from var. You need to produce the following data before proceeding with your calculation

(word1,1)
(word2,4,3)
(word3,14,3)

To accomplish this, you need to do the following:

temp = JOIN X BY 1, var BY 1 USING 'replicated';

Now you can do your math operation

result = FOREACH temp GENERATE $1*$2;