File size: 4,912 Bytes
b09d5e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
title: RAG Project
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 8000
python_version: 3.10
---

# πŸš€ RAG System with LangChain and FastAPI 🌐

Welcome to this repository! This project demonstrates how to build a powerful RAG system using **LangChain** and **FastAPI** for generating contextually relevant and accurate responses by integrating external data into the generative process.

## πŸ“‹ Project Overview

The RAG system combines retrieval and generation to provide smarter AI-driven responses. Using **LangChain** for document handling and embeddings, and **FastAPI** for deploying a fast, scalable API, this project includes:

- πŸ—‚οΈ **Document Loading**: Load data from various sources (text, PDFs, etc.).
- βœ‚οΈ **Text Splitting**: Break large documents into manageable chunks.
- 🧠 **Embeddings**: Generate vector embeddings for efficient search and retrieval.
- πŸ” **Vector Stores**: Store embeddings in a vector store for fast similarity searches.
- πŸ”§ **Retrieval**: Retrieve the most relevant document chunks based on user queries.
- πŸ’¬ **Generative Response**: Use retrieved data with language models (LLMs) to generate accurate, context-aware answers.
- 🌐 **FastAPI**: Deploy the RAG system as a scalable API for easy interaction.

## βš™οΈ Setup and Installation

### Prerequisites

Make sure you have the following installed:
- 🐍 Python 3.10+
- 🐳 Docker (optional, for deployment)
- πŸ› οΈ PostgreSQL or FAISS (for vector storage)

### Installation Steps

1. **Clone the repository**:
   ```bash
   git clone https://github.com/yadavkapil23/RAG_Project.git
   ```

2. **Set up a virtual environment**:
   ```bash
   python -m venv venv
   source venv/bin/activate   # For Linux/Mac
   venv\Scripts\activate      # For Windows
   ```

3. **Install dependencies**:
   ```bash
   pip install -r requirements.txt
   ```

4. **Run the FastAPI server**:
   ```bash
   uvicorn main:app --reload
   ```

   Now, your FastAPI app will be running at `http://127.0.0.1:8000` πŸŽ‰!

### Set up Ollama πŸ¦™

This project uses Ollama to run local large language models.

1.  **Install Ollama:** Follow the instructions on the [Ollama website](https://ollama.ai/) to download and install Ollama.

2.  **Pull a model:** Pull a model to use with the application. This project uses `llama3`.
    ```bash
    ollama pull llama3
    ```


## πŸ› οΈ Features

- **Retrieval-Augmented Generation**: Combines the best of both worldsβ€”retrieving relevant data and generating insightful responses.
- **Scalable API**: FastAPI makes it easy to deploy and scale the RAG system.
- **Document Handling**: Supports multiple document types for loading and processing.
- **Vector Embeddings**: Efficient search with FAISS or other vector stores.

## πŸ›‘οΈ Security

- πŸ” **OAuth2 and API Key** authentication support for secure API access.
- πŸ”’ **TLS/SSL** for encrypting data in transit.
- πŸ›‘οΈ **Data encryption** for sensitive document storage.

## πŸš€ Deployment

### Hugging Face Spaces (Docker) Deployment
This project is configured for a Hugging Face Space using the Docker runtime.

1. Push this repository to GitHub (or connect local).
2. Create a new Space on Hugging Face β†’ Choose "Docker" SDK.
3. Point it to this repo. Spaces will build using the `Dockerfile` and run `uvicorn` binding to the provided `PORT`.
4. Ensure the file `data/sample.pdf` exists (or replace it) to allow FAISS index creation on startup.

Notes:
- Models `Qwen/Qwen2-0.5B-Instruct` and `all-MiniLM-L6-v2` will be downloaded on first run; initial cold start may take several minutes.
- Dependencies are CPU-friendly; no GPU is required.
- If you see OOM, consider reducing `max_new_tokens` in `vector_rag.py` or swapping to an even smaller instruct model.

### Docker Deployment (Local)
If you want to deploy your RAG system using Docker, simply build the Docker image and run the container:

```bash
docker build -t rag-system .
docker run -p 8000:8000 rag-system
```

### Cloud Deployment
Deploy your RAG system to the cloud using platforms like **AWS**, **Azure**, or **Google Cloud** with minimal setup.

## 🧠 Future Enhancements

- πŸ”„ **Real-time Data Integration**: Add real-time data sources for dynamic responses.
- πŸ€– **Advanced Retrieval Techniques**: Implement deep learning-based retrievers for better query understanding.
- πŸ“Š **Monitoring Tools**: Add monitoring with tools like Prometheus or Grafana for performance insights.

## 🀝 Contributing

Want to contribute? Feel free to fork this repository, submit a pull request, or open an issue. We welcome all contributions! πŸ› οΈ

## πŸ“„ License

This project is licensed under the MIT License.

---

πŸŽ‰ **Thank you for checking out the RAG System with LangChain and FastAPI!** If you have any questions or suggestions, feel free to reach out or open an issue. Let's build something amazing!