What does (int & Integer.MAX_VALUE) % int do in Java?

What does (int & Integer.MAX_VALUE) % int do in Java?

i & Integer.MAX_VALUE does the same thing as this code:

if(i < 0) {
    i = (i + Integer.MAX_VALUE + 1);
}

The % is a regular remainder operation.

Its a quick way of ensuring that an integer is positive if you dont care about its actual value (e.g. if you want to turn random numbers that can be both positive and negative into only positive values).

The Integer.MAX_VALUE is 0x7FFFFFFF. Thus num & Integer.MAX_VALUE clears the highest bit in num. The % numReduceTasks is normal remainder after division by numReduceTasks.

This is done to convert the signed number to the non-negative number and then get the evenly distributed value from 0 to numReduceTasks-1. Note that if you write Math.abs(key.getLeft().hashCode()) % numReduceTasks you may get negative number if hashCode() happens to be Integer.MIN_VALUE as Math.abs(Integer.MIN_VALUE) is still Integer.MIN_VALUE. So & Integer.MAX_VALUE is a safer alternative.

What does (int & Integer.MAX_VALUE) % int do in Java?

There are two parts here:

  • What the function actually does from a Java perspective, and
  • What purpose the function actually serves, from a Hadoop perspective.

Lets cover the Java side of it first. Its fairly straightforward bitwise math, in that it clears the sign bit and turns the value into a positive integer.

Thats easy enough to demonstrate here; lets assume that our key is -128876912, which is 0xF8517E90. The max value for an int is 0x7FFFFFFF.

If we look at the actual math operation, the sign bit is cleared (along with quite a few other bits, in this case), and we get a positive integer value.

1111 1000 0101 0001 0111 1110 1001 0000
0111 1111 1111 1111 1111 1111 1111 1111
---------------------------------------
0111 1000 0101 0001 0111 1110 1001 0000

If the value is positive, then the net result is that we get back the same value.

This is important, since a hash code can come back negative; I dont believe that you want a negative value for a reason thats important a bit later.

For the partitioning bit, this is a bit more Hadoop knowledge than I can truly claim, but after reading the docs, it informs you which partition the value falls under. That is where the modulo comes in; youre guaranteed to get a value between [0, partition), thus specifying which reducer the particular bit of data is processed by.

From my reading of it, this is one of the default supplied partitioners, and may not be entirely suitable for your uses (you may want to group your data in a different way, for instance).

Leave a Reply

Your email address will not be published. Required fields are marked *