When a vector index has to absorb a steady stream of insertions and deletions, the part of HNSW that hurts most is not the search but the neighbor-rewiring that happens every time a deleted slot is recycled. The routine responsible, replace_update, fires off a complete edge rebuild for every former neighbor of the tombstoned node during which the graph is frozen and incoming queries pile up.
The root of the pause is the asymptotic cost of the rebuild. For each of the M neighbors the algorithm gathers the one-hop and two-hop vicinity of the recycled slot, producing a candidate list that already contains O(M²) vectors, and then performs a full D-dimensional distance against every entry to select the best M edges. Because this quadratic scan is repeated M times, the total work grows as M³, and with typical M between 48 and 64 the cubic term quickly dominates; on a mobile ARM core a single update can spend tens of milliseconds inside this loop, longer than the actual nearest-neighbor search the user is waiting for.
We cut the cost by observing the following two phenomenons. First, a node that was never adjacent to the recycled node is unlikely to be affected by the point reuse, so rebuild on this kind of nodes is meaningless. Second, for any specific node, the candidate pool doesn’t have to be as large as the entire two-hop set. So, instead of offering the full two-hop set to every neighbor, we restrict the candidate pool for each affected node to its own current neighbors plus the immediate neighbors of the recycled slot and the newly inserted vector, shrinking the set from M² to at most 2M+1. A single pass over this pruned list is enough to re-select the edges, so the complexity drops from M³ to M² and the distance computations fall by an order of magnitude; the working set now fits into L1 and the entire replacement finishes in just over one millisecond on the same mobile core, an eighty percent reduction that turns a blocking bottleneck into a cheap background task.
By pruning the candidate space we have removed the cubic explosion that made frequent updates unacceptable: the update path is now linear in the fan-out M, and the tail-latency cliff disappears even when the index is refreshed hundreds of times per second.