I first heard of Milvus when I came across this. I since then have tried multiple methods of indexing/vectorizing offered from Milvus but the results don’t seem to be correct. I wonder if anything I have done wrong, or just simply misconfigure and hopefully someone here would be able to answer my questions.
I have prepared a dataset of 550,000 logos for the test. In particular, Resnet50 is what I chose for image feature extraction. I then convert these logos into vectors. Please see bellow my attempts and specifications for each try:
The first metric I tried was IP (without normalizing the embeddings, without yolov5) and IVF_SQ8 as an index type. As you can see below, the results returned were quite disappointing. Results are sorted based on highest score of similarity
The second try, L2 metric , IVF_SQ8 used as an index, and without YOLO v5 type would render a much better result, but still nowhere from good enough as you can see from the picture below.
On a third try, I implemented L2 metric, IVF_SQ8 and YOLO with hopes that the detection and recognition of objects in the dataset would somewhat make a different. The results retuned very closed to #2
At this point, it’s uncertain as to IVF_SQ8 would be the best for indexing, so I switched to FLAT as this type of index is known for accurate and exact search. Please see below.
I would rate #2 is our best attempt so far with 5 Nike logos shown out of 15 in total. You can tell that there are similar shapes but it’s nowhere near the exact matches from Nike logos. Though it did show other similar in shapes. The results also seem to be random at some point. For example, I went on Google and randomly selected a few of Nike images from Nike com (visually, it’s identical to what I have from the dataset) but when perform searches, results show irrelevant logos.
My concern is that when the for type of an image changed, would that potentially also change the structure, vector of the image? I’ve tried to keep all the trained image, as well as searching images at the same format of PNG. However, when converting JPG to PNG, the results are also different from the initial search. One more thing I realized is when I change the height, or the length of the image, it would also affect the result. Thus the input of the images are important when performing searches as I saw the results also vary based on its shapes. Is there a way of “standardize” an image before searching to make sure it will return the best results? Would it make any different if I use my own model of training instead of Milvus default training model?
Hopefully, someone will be able to identify and tell me what I didn’t do correctly here. Thank you very much for your time and input.
P/s Due to the restriction, I will keep all the picture in 1 link: https://drive.google.com/drive/u/1/folders/1y0U09qIgUdA3IGJh-GycQeEyVzBHlrd2