..
2023-08-10
Agenda
- Generative AI and LLMs
- embeddings
- vector DB
- search in LLM
- RAG - Retrieval augmented generation
How large are LLMs
- Human brain has $10^14$ connections
- GPT4 has around $10^12$ parameters
- Supervised model have a limit on the size of dataset. LLM are unsupervised thus having massive datasets
Next word Prediction is how most LLMs are used for
- On the fly dataset creation
- A window slides on set of words whose output is the next word
Input 1 | Input 2 | Output |
---|---|---|
Thou | shall | not |
- This enables parallelization
Embedding
- Non number data to numbers
- Eg:
- Audio to audio vectors
- text to text vectors
- Video to video vectors etc .
- Text vectorization is complicated
- Older methods like Bag Of Word, TF-IDF can’t convey certain meanings
- Word2Vec and Glove overcome these issues
- Embedding can become very big
- Vector databases are used to efficiently store embeddings and LLMs
Vector databases
- DB that stores vectors
- Vector databases are used in recommendation engines too
LLM Flow
- $PrivateData \rightarrow DataChunks \rightarrow LLM \rightarrow VectorDatabase$
- Break Data into chunks
Search Space
- Find documents relevant to the query
- https://cohere.ai
- Give trial key to use on colab
Retrieval Augmented Generation
- Data source not in LLM used to generate results in realtime
- Use some new knowledge sources during runtime
- Suppose we prompt for something that is not there in the LLM. Then we use external knowledge sources to satisfy the request
- First the external data is converted to the appropriate embedding
- Then the embedding along with the query is passed on the LLM
Steps
- Store all internal docs in suitable format for querying
- Split corpus into chunks
- Embedding (use the same embedding model as the LLM)
- Store vectors in vector DB
- Save text with pointers to embedding
- Embed Query
- Use the embedded query to get the relevant docs from the vector DB. Vector DB uses ANN for searching
- Send those docs to LLM