What is Vector database
A vector database is a kind of database that is designed to store, index and retrieve data points with multiple dimensions, which are commonly referred to as vectors.
Each deimension capture particular feature. Key concepts are Vectors and Similarity Search. Vectors are numerical array. Similarity search uses indexing and searching algorithms, most relevant to the query.
Vector database stores in embeddings, nothing but numerical codes which encapsulate the key charecteristics of obejct of interest.
Difference between tradiotional and Vector database
Tradional based on the scalar datatypes, 2 dimensional - rows and column- table like structure, optimized for transactional data and exact match by methods like search queries.
Vector based on vector data types, multi dimensional space structure. optimized for AI and ML applications and seacrh methods involving semantic seacrh (cosine Similarity) and similarity search.
Benefits:
Fast and Accurate Seach
Mostle correct Response
Semantic Understanding
Handson
Enter to colab using your goocle account
create a new note with name demo_chromadb
Click +Code cell
add this code to install chromadb abd senetence-transformer
!pip install chromadb -q
!pip install sentence-transformers -q
Click +Code, add this code to create a client
import chromadb
client = chromadb.Client()
collection = client.create_collection("aucse_demo")
Click +Code, add this code to add documents
collection.add(
documents=["This is a document about cat", "This is a document about car"],
metadatas=[{"category": "animal"}, {"category": "vehicle"}],
ids=["id1", "id2"]
)
Click +Code, add this code to make a semantic query
results = collection.query(
query_texts=["vehicle"],
n_results=1
)
print(results)
You will get ouput like
{'ids': [['id2']], 'distances': [[0.8069301843643188]],
'metadatas': [[{'category': 'vehicle'}]],
'embeddings': None,
'documents': [['This is a document about car']],
'uris': None, 'data': None
}
The above DB is in Memory. So it is very easy and simple.
If you want to create a persistent vbdir, add five files of .txt.
Crete a folder in your notebook called 'vbdb' add some files
Click +Code, add this code to Reading a file from folder
import os
def read_files_from_folder(folder_path):
file_data = []
for file_name in os.listdir(folder_path):
if file_name.endswith(".txt"):
with open(os.path.join(folder_path, file_name), 'r') as file:
content = file.read()
file_data.append({"file_name": file_name, "content": content})
return file_data
folder_path = "/content/vbdir" # your folder path
file_data = read_files_from_folder(folder_path)
for data in file_data:
print(f"File Name: {data['file_name']}")
print(f"Content: {data['content']}\n")
Click +Code, add this code to make collections
documents = []
metadatas = []
ids = []
for index, data in enumerate(file_data):
documents.append(data['content'])
metadatas.append({'source': data['file_name']})
ids.append(str(index + 1))
pet_collection = client.create_collection("pet_collection")
pet_collection.add(
documents=documents,
metadatas=metadatas,
ids=ids
)
Click +Code, add this code to get results
results = pet_collection.query(
query_texts=["What are the different kinds of pets people commonly own?"],
n_results=1
)
print(results)
Click +Code, add this code to get metadata
metadatas
Click +Code, add this code to get results for a new query
results = pet_collection.query(
query_texts=["What are the different kinds of pets people commonly own?"],
n_results=1
)
print(results)
Click +Code, add this code to get results for a new query
pet_collection.query(
query_texts=["What are the emotional benefits of owning a pet?"],
n_results=1,
where_document={"$contains":"reptiles"}
)
print(results)
results = pet_collection.query(
query_texts=["What are the emotional benefits of owning a pet?"],
n_results=1,
where={"source": "Training and Behaviour of Pets.txt"}
)
print(results)
Click +CODE to use different model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-MiniLM-L3-v2')
documents = []
embeddings = []
metadatas = []
ids = []
for index, data in enumerate(file_data):
documents.append(data['content'])
embedding = model.encode(data['content']).tolist()
embeddings.append(embedding)
metadatas.append({'source': data['file_name']})
ids.append(str(index + 1))
click +code to add for Creating a new collection
pet_collection_emb = client.create_collection("pet_collection_emb")
pet_collection_emb.add(
documents=documents,
embeddings=embeddings,
metadatas=metadatas,
ids=ids
)
Code to search again
query = "What are the different kinds of pets people commonly own?"
input_em = model.encode(query).tolist()
results = pet_collection_emb.query(
query_embeddings=[input_em],
n_results=1
)
print(results)
Code to make a query about what foods are recommended for dogs
query = "foods that are recommended for dogs?"
input_em = model.encode(query).tolist()
results = pet_collection_emb.query(
query_embeddings=[input_em],
n_results=1
)
print(results)