Rebuilding an HNSW index is without doubt one of the most resource-intensive elements of utilizing HNSW in manufacturing workloads. In contrast to conventional databases, the place information deletions may be dealt with by merely deleting a row in a desk, utilizing HNSW in a vector database usually requires a whole rebuild to keep up optimum efficiency and accuracy.
Why is Rebuilding Obligatory?
Due to its layered graph construction, HNSW isn’t inherently designed for dynamic datasets that change continuously. Including new information or deleting current information is important for sustaining up to date information, particularly to be used instances like RAG, which goals to enhance search relevence.
Most databases work on an idea referred to as “exhausting” and “tender” deletes. Laborious deletes completely take away information, whereas tender deletes flag information as ‘to-be-deleted’ and take away it later. The problem with tender deletes is that the to-be-deleted information nonetheless makes use of important reminiscence till it’s completely eliminated. That is significantly problematic in vector databases that use HNSW, the place reminiscence consumption is already a big situation.
HNSW creates a graph the place nodes (vectors) are linked primarily based on their proximity within the vector house, and traversing on an HNSW graph is finished like a skip-list. With a view to assist that, the layers of the graph are designed in order that some layers have only a few nodes. When vectors are deleted, particularly these on layers which have only a few nodes that function vital connectors within the graph, the entire HNSW construction can grow to be fragmented. This fragmentation might result in nodes (or layers) which can be disconnected from the principle graph, which require rebuilding of your entire graph, or on the very least will end in a degradation within the effectivity of searches.
HNSW then makes use of a soft-delete approach, which marks vectors for deletion however doesn’t instantly take away them. This method lowers the expense of frequent full rebuilds, though periodic reconstruction continues to be wanted to keep up the graph’s optimum state.