In this short post we will talk about the hash randomization in Python. Since Python3.3, the hash randomization is enabled by default. The idea is to avoid certain types of attack. However, this does have potential impact on the code correctness. For example, the hash value of an element may be used to determine the bucket ID for load distributing. If the hash randomization is enabled it means the element will be route to different buckets in different runs, which may not be desirable.
Some other posts mention that we can use the environment variable PYTHONHASHSEED
to disable the hash randomization. Another option is to write a customized hash function by leveraging hashlib.sha256
. For example
import hashlib
x = "test-hash-in-python"
def myStringHash(s):
v = hashlib.sha256(s.encode("utf-8"))
return int.from_bytes(v.digest()[:4], 'little')
print("hash of x: {}".format(hash(x)))
print("customized hash of x: {}".format(myStringHash(x)))
# First Run hash of x: 4053607709610309209 customized hash of x: 1764439175 # Second Run hash of x: -7917897423850556667 customized hash of x: 1764439175
As we can see in the outputs above, the build-in hash
functino return different values and our customized hash function returns the same value in all runs.
----- END -----
©2019 - 2024 all rights reserved