..

2023-08-10

2023-08-10

Agenda

Generative AI and LLMs
embeddings
vector DB
search in LLM
RAG - Retrieval augmented generation

How large are LLMs

Human brain has $10^14$ connections
GPT4 has around $10^12$ parameters
Supervised model have a limit on the size of dataset. LLM are unsupervised thus having massive datasets

Next word Prediction is how most LLMs are used for

On the fly dataset creation
A window slides on set of words whose output is the next word

Input 1	Input 2	Output
Thou	shall	not

This enables parallelization

Embedding

Non number data to numbers
Eg:
- Audio to audio vectors
- text to text vectors
- Video to video vectors etc .
Text vectorization is complicated
Older methods like Bag Of Word, TF-IDF can’t convey certain meanings
Word2Vec and Glove overcome these issues
Embedding can become very big
Vector databases are used to efficiently store embeddings and LLMs

Vector databases

DB that stores vectors
Vector databases are used in recommendation engines too

LLM Flow

$PrivateData \rightarrow DataChunks \rightarrow LLM \rightarrow VectorDatabase$
Break Data into chunks

Search Space

Find documents relevant to the query
https://cohere.ai
Give trial key to use on colab

Retrieval Augmented Generation

Data source not in LLM used to generate results in realtime
Use some new knowledge sources during runtime
Suppose we prompt for something that is not there in the LLM. Then we use external knowledge sources to satisfy the request
First the external data is converted to the appropriate embedding
Then the embedding along with the query is passed on the LLM

Steps

Store all internal docs in suitable format for querying
1. Split corpus into chunks
2. Embedding (use the same embedding model as the LLM)
3. Store vectors in vector DB
4. Save text with pointers to embedding
Embed Query
1. Use the embedded query to get the relevant docs from the vector DB. Vector DB uses ANN for searching
2. Send those docs to LLM