Milvus Adventures July 29, 2024

COMMUNITY

We had so much fun at the meetup this week in Palo Ato and can't wait to see you all again next month. We haven't had the chance to upload the video, yet, however, the Berlin and SF videos are up for your viewing pleasure.

Unstructured Data SF Meetup video
Unstructured Data Berlin Meetup video

Learn About Vector Databases

There are so many databases with Vector Search capabilities that it can be overwhelming to know where to start! This week, let's focus on learning about similarity metrics, the diffrence between sparse and dense vectors and get our hands dirty with some hands-on tutorials.

Similarity Metrics for Vector Search like Euclidean distance or cosine similarity are used to measure how closely vectors relate to each other in high-dimensional space. Choosing an appropriate metric is crucial, as it can significantly enhance the performance of machine learning tasks such as classification and clustering.
Getting Started: Pgvector Guide for Developers Exploring Vector Databases. If you are a postgres fan, you can build a little prototype with PGVector.
Beginner Guide to Implementing Vector Databases, including key considerations and steps to get started with a vector database and implementation best practices.

Get Started with Milvus

Milvus is an open source vector database that is a popular choice for builing all kinds of AI applications.

Getting Started with a Milvus Connection. It comes with everything you need to get started built right in, and runs on your local machine.
JSON Metadata Filtering in Milvus is useful when you want to use data other than vectors to fine tune your search results.
Hybrid Search with Milvus is another example of using different kinds of vectors and meta data to get the best search results.
Multimodal RAG with CLIP, Llama3, and Milvus is all the rage! Try this tutorial to see they power of multi-modal search.

Vector Embeddings

In general, there are two types of vectors: dense vectors and sparse vectors. While they can be utilized for similar tasks, each has advantages and disadvantages.

You can also train your own models, learn more about sentence transformers and even give time series embedding a go!

Vector Indexes

Most vector search solutions rely on HNSW, but there are many other vector indexes and understanding the differences will help you create a performant and cost effective AI application. Here are two that you might not have heard about yet:

Learn RAG

Chunking Strategies

Optimizing your RAG applications

GITHUB REPOS

Milvus Milvus is an open-source vector database built to power embedding similarity search and AI applications.

Akcio: Enhancing LLM-Powered ChatBot with CVP Stack A full chatbot app all open-source for you to try out for your self!

GPT Cache. GPTCache is an open-source tool designed to improve the efficiency and speed of GPT-based applications by implementing a cache to store the responses generated by language models.

VectorDBBench. VectorDBBench is an open-source benchmarking tool to help you evaluate the performance of mainstream vector databases and cloud services with yoru specific use case.

Chris Churilo @chrischurilo