Change realpath cache hash algorithm to the regular string hash algorithm #11124
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Right now the FNV-1 algorithm is used for determine the realpath cache key. For applications that are light-weight, but have lots of files (e.g. WordPress), the realpath cache key computation shows up in the Callgrind profile. The reason is that we do a simple byte-by-byte loop. Furthermore, we always use the 32-bit prime and offset values, even in a 64-bit environment which reduces the diffusion property of the hash. This hinders the distribution of keys a bit (although probably not a lot since we have only limited entries in the cache).
I propose to switch to our regular string hashing algorithm, which is better optimised than a byte-per-byte loop, and has better diffusion on 64-bit systems.
I don't know why FNV-1 was chosen over the DJB33X algorithm we use in the normal string hashing. Also, I don't know why FNV-1A wasn't chosen instead of FNV-1, which would be a simple modification and would distribute the hashes better than FNV-1.
The only thing I can think of is that typically FNV-1A has a better distribution than DJB33X algorithms like what we use for string hashing [1]. But I doubt that makes a difference here, and if it does then we should perhaps look into changing the string hash algorithm from DJB33X to FNV-1A.
[1] http://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed