Change realpath cache hash algorithm to the regular string hash algorithm #11124

nielsdos · 2023-04-23T20:08:49Z

Right now the FNV-1 algorithm is used for determine the realpath cache key. For applications that are light-weight, but have lots of files (e.g. WordPress), the realpath cache key computation shows up in the Callgrind profile. The reason is that we do a simple byte-by-byte loop. Furthermore, we always use the 32-bit prime and offset values, even in a 64-bit environment which reduces the diffusion property of the hash. This hinders the distribution of keys a bit (although probably not a lot since we have only limited entries in the cache).

I propose to switch to our regular string hashing algorithm, which is better optimised than a byte-per-byte loop, and has better diffusion on 64-bit systems.

I don't know why FNV-1 was chosen over the DJB33X algorithm we use in the normal string hashing. Also, I don't know why FNV-1A wasn't chosen instead of FNV-1, which would be a simple modification and would distribute the hashes better than FNV-1.
The only thing I can think of is that typically FNV-1A has a better distribution than DJB33X algorithms like what we use for string hashing [1]. But I doubt that makes a difference here, and if it does then we should perhaps look into changing the string hash algorithm from DJB33X to FNV-1A.

[1] http://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

Zend/zend_virtual_cwd.c

…ithm Right now the FNV-1 algorithm is used for determine the realpath cache key. For applications that are light-weight, but have lots of files (e.g. WordPress), the realpath cache key computation shows up in the Callgrind profile. The reason is that we do a simple byte-by-byte loop. Furthermore, we always use the 32-bit prime and offset values, even in a 64-bit environment which reduces the diffusion property of the hash. This hinders the distribution of keys a bit (although probably not a lot since we have only limited entries in the cache). I propose to switch to our regular string hashing algorithm, which is better optimised than a byte-per-byte loop, and has better diffusion on 64-bit systems. I don't know why FNV-1 was chosen over the DJB33X algorithm we use in the normal string hashing. Also, I don't know why FNV-1A wasn't chosen instead of FNV-1, which would be a simple modification and would distribute the hashes better than FNV-1. The only thing I can think of is that typically FNV-1A has a better distribution than DJB33X algorithms like what we use for string hashing [1]. But I doubt that makes a difference here, and if it does then we should perhaps look into changing the string hash algorithm from DJB33X to FNV-1A. [1] http://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

nielsdos requested a review from iluuu1994 as a code owner April 23, 2023 20:08

github-actions bot added the Category: Engine label Apr 23, 2023

Girgias reviewed May 5, 2023

View reviewed changes

Zend/zend_virtual_cwd.c Outdated Show resolved Hide resolved

nielsdos force-pushed the other-hash branch from d09b710 to c5f26f5 Compare May 5, 2023 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change realpath cache hash algorithm to the regular string hash algorithm #11124

Change realpath cache hash algorithm to the regular string hash algorithm #11124

Uh oh!

nielsdos commented Apr 23, 2023

Uh oh!

Uh oh!

Uh oh!

Change realpath cache hash algorithm to the regular string hash algorithm #11124

Are you sure you want to change the base?

Change realpath cache hash algorithm to the regular string hash algorithm #11124

Uh oh!

Conversation

nielsdos commented Apr 23, 2023

Uh oh!

Uh oh!

Uh oh!