Python - Hash Randomization in Python

Subscribe Send me a message home page tags


#python  #hash randomization 

In this short post we will talk about the hash randomization in Python. Since Python3.3, the hash randomization is enabled by default. The idea is to avoid certain types of attack. However, this does have potential impact on the code correctness. For example, the hash value of an element may be used to determine the bucket ID for load distributing. If the hash randomization is enabled it means the element will be route to different buckets in different runs, which may not be desirable.

Some other posts mention that we can use the environment variable PYTHONHASHSEED to disable the hash randomization. Another option is to write a customized hash function by leveraging hashlib.sha256. For example

1
2
3
4
5
6
7
8
9
10
import hashlib

x = "test-hash-in-python"

def myStringHash(s):
    v = hashlib.sha256(s.encode("utf-8"))
    return int.from_bytes(v.digest()[:4], 'little')

print("hash of x: {}".format(hash(x)))
print("customized hash of x: {}".format(myStringHash(x)))
# First Run
hash of x: 4053607709610309209
customized hash of x: 1764439175

# Second Run
hash of x: -7917897423850556667
customized hash of x: 1764439175

As we can see in the outputs above, the build-in hash functino return different values and our customized hash function returns the same value in all runs.

----- END -----