Redis-Powered Vector Similarity Search in Go

A cartoon gopher coming out of a mail envelope in a futuristic background

By Daniel Laczkowski

The problem

AI and vector search is all the rage right now, so you may have thought to create a solution yourself, of course exclusively using Go if you are a fellow enthusiast.

Well, it turns out that you can't (without a little workaround), as the current popular and maintained Go libraries do not support this feature of redis, while the official python one does.

Before we dive into the Go "workaround" that makes this an interesting feature, we need to design the index and predict which queries would be useful to us (or you can skip right to the good part).

Setting up the Vector Index and Choosing a Distance Metric

Similarity search follows an algorithm (in this case it's K-Nearest-Neighbours) which is as simple as it sounds, it returns the closest k vectors (by distance) to your input vector, by a chosen distance metric.

For our distance metric, we can choose from Euclidean distance (think Pythagoras), the inner product, or cosine distance. Here we pick Cosine similarity as it would be the most useful for document similarity, as it compares the angle of two vectors (and hence their overall direction) as opposed to the Euclidean distance as it would be sensitive to the magnitude.

An illustration depicting the difference between euclidean distance and cosine similarity as vector distance metrics.

To set up the index, we need to know the format of our vector embeddings - in our case OpenAI's "text-embedding-ada-002" has a dimensionality of 1536. We use this for our create index query.

err := RedisClient.Do(context.Background(), "FT.CREATE", "idx:a-personal-index", "ON", "HASH", "PREFIX", "1", "e:", "SCHEMA", "exampleKey", "NUMERIC", "v", "VECTOR", "FLAT", "your number of items to index", "TYPE", "FLOAT32", "DIM", "1536", "DISTANCE_METRIC", "COSINE").Err()

As for the other parts of the index, they wholly depend on your use case. You can add additional fields to the index for enhanced queries (that is, filtering one of the fields such as a timestamp) at the expense of memory usage (storage in redis).

Now that we know what to do, here's the Go part.

Inserting, Creating and Searching the vector index using Go

Due to the aforementioned lack of support at the time of development, this step required a little workaround.

After sifting through the documentation, one method showed potential (Client.Do), but as we will see later, it was not enough!

We can use the Client.Do method to create and query the index itself which were the unavailable functions, but when it came to inserting into the index, we have another problem: We need to enter the data in the right format for the index.

Even worse, if I manually entered the vector data using RedisInsight, it worked, but then could not search the index programmatically either.

Various attempts were made to insert the data correctly, and unfortunately a simple Sprint did not do the job. Neither did a float array, because the Redis library expects a string, and attempting to "stringify" the vector caused Redis to misinterpret the data and produce garbage results.

Screenshot of Azure Function App Monitoring showcasing a failed execution's operation id.

What we get from Sprint (a string)

A query using Kusto Query Language finding and displaying logs related to a given operation id.

What we want (the vector)

Thankfully, I stumbled upon Rueidis" and it's implementation of a "VectorString32" type," and miraculously it followed the expected behaviour and we've successfully entered vector data into our Redis hash!

RedisClient.HSet(context.Background(), "e:a-personal-index", "vector", rueidis.VectorString32(Embedding), "exampleKey", "exampleData")

We can now combine the rueidis VectorString and Client.Do to search the index using our vector data, using one of our parameters as a date range filter to minimise the search time.

res, err := RedisClient.Do(context.Background(), "FT.SEARCH", "idx:a-personal-index", `@exampleKey:[2020 2022]=>[KNN 2 @v $blob AS x]`, "RETURN", 1, "x", "PARAMS", 2, "blob", rueidis.VectorString32(secondEmbedding), "SORTBY", "x", "DIALECT", 2).Result()

This query filters the exampleKey to be in the range of 2020 and 2022, returning the closest 2 matches using the parameter @v which is my vector embedding, returning only the similarity score in descending order.

In practice

If you are wondering how my index works in practice, I use Microsoft's subscription webhook system to be notified when a mail is received and access it in real time, to immediately vectorize it, and add it to the index of my specific client, as each user has their own vector index in Redis to avoid sharing data. This system has presented its own technical challenges, some for which I use azure functions.

Once some examples are in the index, we can vectorize new data and perform the similarity search. AssortMail does this through an Outlook widget, allowing the user to find similar mails with little to no friction straight from their inbox by vectorizing the selected mail, and returning the closest X results through an internal API, to present in a pane in Outlook.

If you enjoyed reading this technical blog, see how I use Azure functions with go.