Hashing in the Wild

Introduction

In 2012, GitHub added a feature: every file in every repository has a permanent address — the SHA-1 hash of its contents. Not a path. Not a filename. The hash. Change one byte in the file and the address changes. The file at the old address remains permanently available and permanently unchanged.

This is not a convenience feature. It is an architectural decision with a specific algorithmic basis. The hash function makes storage immutable. Immutable storage makes branching cheap. Cheap branching makes version control distributed. Every Git operation you have ever run — clone, branch, merge, push — depends on the properties of a hash function.

You built hash tables in Book 1, Chapter 22. You saw modular arithmetic on a ring in Chapter 39. Now those structures appear inside three systems you use daily. The shape is identical. The scale and the consequences are different.


Thread Activation

In Book 1, Ch 22, you built a hash table: a hash function maps keys to bucket indices; collisions are resolved by chaining or probing. In Ch 39, you saw modular arithmetic applied to a ring — the basis of consistent hashing.

This chapter traces Thread 1 (Hashing) into three production systems. Each system uses the same mathematical properties — uniform distribution, determinism, and (in the case of SHA-1) collision resistance — but applies them to different engineering problems. The thread continues in Book 3, Chapters 4–5, where consistent hashing becomes the foundation of distributed database sharding.