want to know if milvus is the right thing to choose for my project. i’ve got around 160 million images (each image is 32x32). i want to insert those images in milvus so i want to know if it’s possible to insert those images in a fast and efficient way, without it taking more than 2-3 days. also i want to know it i will be able to search those images from input. for example i’ve got an image that’s slightly different than the original one in the milvus database, each pixel is off by 1-2 rgb value so think of it as a slight tint on it. will i be able to search similar images from there? thanks.
Your scenario is a good one for using Milvus. there is also a tutorial for searching by an image in the Milvus community, which I think is what you want. https://github.com/milvus-io/bootcamp/tree/v2.0rc6/solutions/reverse_image_search.
If you don’t want to use the community’s image search system and want to develop an image search system by yourself, you can use the following steps：
- Generate vectors/embeddings from your images
We have a project named “Towhee” to do this work. It can convert images/audios into vectors/embeddings within 10 lines script.Sounds your requirement is similar to this pipeline. Just 3 lines to generate an embedding for an image:
from towhee import pipeline
embedding_pipeline = pipeline(‘image-encoding’)
img0_embedding = embedding_pipeline(’/path/to/img0’)
To install this tool, follow this page.
- Insert the vectors/embeddings into Milvus, build a mapping relationship between embeddings and image paths. Now you can write a script to do this work:
- use Towhee to generate embeddings for your images
- insert embeddings into Milvus, we recommend inserting batch by batch. For example, generate 100 embeddings, insert the 100 embeddings, then generate the next 100 embeddings, insert …
- for each insert API call, the Milvus server will return an id array of the inserted embeddings(you also can specify one id for each embedding by yourself), use a SQL database to store the id and image path relationship.
Search similar vectors
To search an image, you call the Towhee to generate an embedding again, then use the embedding to search in Milvus. Milvue returns a top result, the top one is the most similar image’s id.
Find out the image by the mapping relationship
You use the id to query in the SQL database, finally you get the image path.