Skip to content

Change realpath cache hash algorithm to the regular string hash algorithm #11124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nielsdos
Copy link
Member

Right now the FNV-1 algorithm is used for determine the realpath cache key. For applications that are light-weight, but have lots of files (e.g. WordPress), the realpath cache key computation shows up in the Callgrind profile. The reason is that we do a simple byte-by-byte loop. Furthermore, we always use the 32-bit prime and offset values, even in a 64-bit environment which reduces the diffusion property of the hash. This hinders the distribution of keys a bit (although probably not a lot since we have only limited entries in the cache).

I propose to switch to our regular string hashing algorithm, which is better optimised than a byte-per-byte loop, and has better diffusion on 64-bit systems.

I don't know why FNV-1 was chosen over the DJB33X algorithm we use in the normal string hashing. Also, I don't know why FNV-1A wasn't chosen instead of FNV-1, which would be a simple modification and would distribute the hashes better than FNV-1.
The only thing I can think of is that typically FNV-1A has a better distribution than DJB33X algorithms like what we use for string hashing [1]. But I doubt that makes a difference here, and if it does then we should perhaps look into changing the string hash algorithm from DJB33X to FNV-1A.

[1] http://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

…ithm

Right now the FNV-1 algorithm is used for determine the realpath cache
key. For applications that are light-weight, but have lots of files
(e.g. WordPress), the realpath cache key computation shows up in the
Callgrind profile. The reason is that we do a simple byte-by-byte loop.
Furthermore, we always use the 32-bit prime and offset values, even in a
64-bit environment which reduces the diffusion property of the hash.
This hinders the distribution of keys a bit (although probably not a lot
since we have only limited entries in the cache).

I propose to switch to our regular string hashing algorithm, which is
better optimised than a byte-per-byte loop, and has better diffusion on
64-bit systems.

I don't know why FNV-1 was chosen over the DJB33X algorithm we use in the
normal string hashing. Also, I don't know why FNV-1A wasn't chosen
instead of FNV-1, which would be a simple modification and would
distribute the hashes better than FNV-1.
The only thing I can think of is that typically FNV-1A has a better
distribution than DJB33X algorithms like what we use for string hashing
[1]. But I doubt that makes a difference here, and if it does then we
should perhaps look into changing the string hash algorithm from DJB33X to
FNV-1A.

[1] http://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants