Spaces:
Sleeping
Sleeping
Upload 9 files
Browse files- Dockerfile +9 -0
- README.md +137 -0
- chatbot.py +117 -0
- flask_app.py +173 -0
- initial.ipynb +1252 -0
- requirements.txt +15 -0
- static/scripts.js +285 -0
- static/styles.css +687 -0
- templates/chat_page.html +196 -0
Dockerfile
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.10-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
COPY . /app
|
| 5 |
+
|
| 6 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 7 |
+
|
| 8 |
+
EXPOSE 7860
|
| 9 |
+
CMD ["python", "flask_app.py"]
|
README.md
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Document RAG Chatbot
|
| 2 |
+
|
| 3 |
+
An intelligent, context-aware chatbot that understands your documents.
|
| 4 |
+
Upload a PDF or text file, and it will answer questions using only the information inside — no hallucinations, no fluff.
|
| 5 |
+
|
| 6 |
+
Built with **Flask**, **LangChain**, and **Google Gemini**, this project demonstrates a clean, modular approach to **Retrieval-Augmented Generation (RAG)** in action.
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Live Demo
|
| 11 |
+
|
| 12 |
+
Try it live here 👉 [**Document RAG Chatbot Demo**](https://document-rag-system.onrender.com/)
|
| 13 |
+
|
| 14 |
+
> You’ll need your own **Gemini API Key** - the app will prompt you to enter it before use.
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
## Highlights
|
| 19 |
+
|
| 20 |
+
- Upload and analyze **PDF** or **TXT** files
|
| 21 |
+
- Uses **FAISS** for fast semantic search
|
| 22 |
+
- Embeddings powered by **HuggingFace Sentence Transformer**
|
| 23 |
+
- Answers generated by **Gemini 2.5 Flash**
|
| 24 |
+
- Works with **frontend-provided API key** (no server-side storage)
|
| 25 |
+
- Clean, responsive interface for smooth chat interaction
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
## Tech Stack
|
| 30 |
+
|
| 31 |
+
| Component | Technology |
|
| 32 |
+
|------------|-------------|
|
| 33 |
+
| Backend | Flask |
|
| 34 |
+
| Language Model | Google Gemini |
|
| 35 |
+
| Vector Store | FAISS |
|
| 36 |
+
| Embeddings | HuggingFace all-MiniLM-L6-v2 |
|
| 37 |
+
| Frontend | HTML, CSS, JavaScript |
|
| 38 |
+
| Framework | LangChain |
|
| 39 |
+
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
## Getting Started
|
| 43 |
+
|
| 44 |
+
### 1️⃣ Clone the Repository
|
| 45 |
+
```bash
|
| 46 |
+
git clone https://github.com/<your-username>/Document-RAG-System.git
|
| 47 |
+
cd Document-RAG-System
|
| 48 |
+
````
|
| 49 |
+
|
| 50 |
+
### 2️⃣ Set Up Your Environment
|
| 51 |
+
|
| 52 |
+
```bash
|
| 53 |
+
python -m venv venv
|
| 54 |
+
# Activate
|
| 55 |
+
venv\Scripts\activate # Windows
|
| 56 |
+
source venv/bin/activate # macOS/Linux
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
### 3️⃣ Install Dependencies
|
| 60 |
+
|
| 61 |
+
```bash
|
| 62 |
+
pip install -r requirements.txt
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
---
|
| 66 |
+
|
| 67 |
+
## Run the App
|
| 68 |
+
|
| 69 |
+
```bash
|
| 70 |
+
python flask_app.py
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
Once running, open your browser and go to:
|
| 74 |
+
👉 **[http://127.0.0.1:5000/](http://127.0.0.1:5000/)**
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## Get Your Gemini API Key
|
| 79 |
+
|
| 80 |
+
1. Visit [Google AI Studio](https://aistudio.google.com/app/apikey)
|
| 81 |
+
2. Generate a **Gemini API Key**
|
| 82 |
+
3. Paste it in the “API Key” field on the webpage
|
| 83 |
+
4. Save and start chatting!
|
| 84 |
+
|
| 85 |
+
Your key is never stored — it’s used only in your current session.
|
| 86 |
+
|
| 87 |
+
---
|
| 88 |
+
|
| 89 |
+
## How It Works
|
| 90 |
+
|
| 91 |
+
Here’s what happens behind the scenes:
|
| 92 |
+
|
| 93 |
+
1. You upload your document.
|
| 94 |
+
2. The file is split into small chunks (for efficient retrieval).
|
| 95 |
+
3. Each chunk is embedded into a vector using HuggingFace embeddings.
|
| 96 |
+
4. FAISS indexes these vectors for quick similarity search.
|
| 97 |
+
5. When you ask a question, relevant chunks are retrieved and sent to Gemini.
|
| 98 |
+
6. Gemini generates a focused, contextual answer — grounded in your document.
|
| 99 |
+
|
| 100 |
+
That’s **Retrieval-Augmented Generation (RAG)** in a nutshell.
|
| 101 |
+
|
| 102 |
+
---
|
| 103 |
+
|
| 104 |
+
## Example Use Cases
|
| 105 |
+
|
| 106 |
+
- Summarize long reports
|
| 107 |
+
- Extract key information from research papers
|
| 108 |
+
- Study assistant for textbooks
|
| 109 |
+
- Legal, medical, or technical document Q&A
|
| 110 |
+
- Analyze and interpret **crypto project whitepapers** — understand tokenomics, roadmap, and team details before investing
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
---
|
| 114 |
+
|
| 115 |
+
## Customization
|
| 116 |
+
|
| 117 |
+
You can tweak:
|
| 118 |
+
|
| 119 |
+
* `chunk_size` and `chunk_overlap` in `chatbot.py`
|
| 120 |
+
* The **system message** for tone or depth
|
| 121 |
+
* The **Gemini model** (to `gemini-2.5-flash`, `gemini-1.5-pro`, etc.)
|
| 122 |
+
|
| 123 |
+
---
|
| 124 |
+
|
| 125 |
+
## Author
|
| 126 |
+
|
| 127 |
+
**Williams Odunayo**
|
| 128 |
+
*Machine Learning Engineer | Builder of useful AI systems*😉
|
| 129 |
+
🔗 [GitHub](https://github.com/Wills17) • [LinkedIn](https://linkedin.com/in/williamsodunayo)
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
---
|
| 133 |
+
|
| 134 |
+
## License
|
| 135 |
+
|
| 136 |
+
Released under the **MIT License**.
|
| 137 |
+
Free to use, modify, and build upon - attribution is appreciated.
|
chatbot.py
ADDED
|
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
""" Script to chat with a Gemini model about the content of an uploaded file."""
|
| 2 |
+
|
| 3 |
+
import os
|
| 4 |
+
import tempfile
|
| 5 |
+
from dotenv import load_dotenv
|
| 6 |
+
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
| 7 |
+
from langchain_community.vectorstores import FAISS
|
| 8 |
+
from langchain_community.document_loaders import TextLoader, PyPDFLoader
|
| 9 |
+
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
|
| 10 |
+
from langchain_core.prompts import PromptTemplate
|
| 11 |
+
from langchain_core.output_parsers import StrOutputParser
|
| 12 |
+
from langchain_core.runnables import RunnableParallel, RunnableLambda, RunnablePassthrough
|
| 13 |
+
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
|
| 14 |
+
from langchain_huggingface import HuggingFaceEmbeddings
|
| 15 |
+
|
| 16 |
+
# Variables for global use
|
| 17 |
+
chunk_size=1000
|
| 18 |
+
chunk_overlap=100
|
| 19 |
+
model_name= "sentence-transformers/all-MiniLM-L6-v2"
|
| 20 |
+
|
| 21 |
+
System_Message = """
|
| 22 |
+
You are RAG Assistant for the provided document.
|
| 23 |
+
Your role is to help users understand and explore the content of uploaded documents.
|
| 24 |
+
|
| 25 |
+
Follow these rules:
|
| 26 |
+
1. Always prioritize the document context when answering questions.
|
| 27 |
+
2. If the answer is not in the document, clearly say you don't know.
|
| 28 |
+
3. Keep responses friendly, clear, and concise.
|
| 29 |
+
"""
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
# Load environment variables from .env file
|
| 33 |
+
load_dotenv()
|
| 34 |
+
api_key = os.getenv("GEMINI_API_KEY")
|
| 35 |
+
|
| 36 |
+
if not api_key:
|
| 37 |
+
raise ValueError("GEMINI_API_KEY not found. Please add it to your .env file.")
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
# Input file path
|
| 41 |
+
file_path = input("\nEnter path to your document (PDF or TXT): ").strip()
|
| 42 |
+
if not os.path.exists(file_path):
|
| 43 |
+
print(f"\nThe file path '{file_path}' does not exist. Please check the path and try again.\n")
|
| 44 |
+
raise FileNotFoundError(f"File not found: {file_path}")
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
# Load document
|
| 48 |
+
if file_path.endswith(".pdf") or file_path.endswith(".PDF"):
|
| 49 |
+
loader = PyPDFLoader(file_path)
|
| 50 |
+
else:
|
| 51 |
+
loader = TextLoader(file_path)
|
| 52 |
+
|
| 53 |
+
documents = loader.load()
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
# Split text into chunks
|
| 58 |
+
splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
|
| 59 |
+
chunks = splitter.split_documents(documents)
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
# embeddings + retriever
|
| 63 |
+
embeds = HuggingFaceEmbeddings(model_name=model_name)
|
| 64 |
+
vector_store = FAISS.from_documents(chunks, embeds)
|
| 65 |
+
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})
|
| 66 |
+
|
| 67 |
+
# Initialize chat model
|
| 68 |
+
LLM = ChatGoogleGenerativeAI(
|
| 69 |
+
model="gemini-2.5-flash",
|
| 70 |
+
google_api_key=api_key
|
| 71 |
+
)
|
| 72 |
+
|
| 73 |
+
# Initialize chat history
|
| 74 |
+
messages = [SystemMessage(content=System_Message)]
|
| 75 |
+
|
| 76 |
+
print("\n Your document is ready. Ask anything (type 'exit' to quit).\n")
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
# Chat loop
|
| 80 |
+
while True:
|
| 81 |
+
user_input = input("\nYou: ")
|
| 82 |
+
if user_input.lower() in ["exit", "quit"]:
|
| 83 |
+
print("Chat ended.")
|
| 84 |
+
break
|
| 85 |
+
|
| 86 |
+
messages.append(HumanMessage(content=user_input))
|
| 87 |
+
|
| 88 |
+
# Retrieve relevant documents
|
| 89 |
+
retrieved_document = retriever.invoke(user_input)
|
| 90 |
+
context_text = "\n\n".join(document.page_content for document in retrieved_document)
|
| 91 |
+
|
| 92 |
+
prompt_template = PromptTemplate(
|
| 93 |
+
template="""You are answering based on this document:
|
| 94 |
+
|
| 95 |
+
{context}
|
| 96 |
+
|
| 97 |
+
Question: {question}""",
|
| 98 |
+
input_variables=["context", "question"],
|
| 99 |
+
)
|
| 100 |
+
|
| 101 |
+
parallel_chain = RunnableParallel(
|
| 102 |
+
{"context": retriever | RunnableLambda(lambda documents: "\n\n".join(doc.page_content for doc in documents)),
|
| 103 |
+
"question": RunnablePassthrough()}
|
| 104 |
+
)
|
| 105 |
+
|
| 106 |
+
parse = StrOutputParser()
|
| 107 |
+
|
| 108 |
+
# main chain
|
| 109 |
+
main_chain = parallel_chain | prompt_template | LLM | parse
|
| 110 |
+
|
| 111 |
+
# respond to user
|
| 112 |
+
response = ""
|
| 113 |
+
for chunk in main_chain.stream(user_input):
|
| 114 |
+
response += chunk
|
| 115 |
+
|
| 116 |
+
print("AI Bot:", response.strip())
|
| 117 |
+
messages.append(AIMessage(content=response.strip()))
|
flask_app.py
ADDED
|
@@ -0,0 +1,173 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Flask App script for RAG chatbot"""
|
| 2 |
+
|
| 3 |
+
import gc
|
| 4 |
+
import os
|
| 5 |
+
import re
|
| 6 |
+
import tempfile
|
| 7 |
+
from flask import Flask, request, jsonify, render_template
|
| 8 |
+
|
| 9 |
+
# Disable CUDA and excessive parallel threads to save memory
|
| 10 |
+
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
|
| 11 |
+
os.environ["TOKENIZERS_PARALLELISM"] = "false"
|
| 12 |
+
os.environ["TRANSFORMERS_OFFLINE"] = "1"
|
| 13 |
+
|
| 14 |
+
# Flask app initialization
|
| 15 |
+
app = Flask(__name__, template_folder="templates", static_folder="static")
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
# Global states
|
| 19 |
+
retriever = None
|
| 20 |
+
LLM_model = None
|
| 21 |
+
api_key = None # API key will come from frontend
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
SYSTEM_MESSAGE = """
|
| 25 |
+
You are a RAG Assistant for the uploaded document.
|
| 26 |
+
Your role is to help users understand its contents clearly and accurately.
|
| 27 |
+
|
| 28 |
+
Rules:
|
| 29 |
+
1. Prioritize the document context first.
|
| 30 |
+
2. If the answer isn’t in the document, say you don’t know.
|
| 31 |
+
3. Be friendly, direct, and concise.
|
| 32 |
+
4. Avoid adding extra information unless asked.
|
| 33 |
+
"""
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
# routes
|
| 37 |
+
@app.route("/")
|
| 38 |
+
def home():
|
| 39 |
+
return render_template("chat_page.html")
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
@app.route("/upload", methods=["POST"])
|
| 43 |
+
def upload_file():
|
| 44 |
+
"""Route handling document upload, splitting, chunking, and vectorization."""
|
| 45 |
+
|
| 46 |
+
global retriever, LLM_model, api_key
|
| 47 |
+
|
| 48 |
+
# Import heavy dependencies only when needed
|
| 49 |
+
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
| 50 |
+
from langchain_community.vectorstores import FAISS
|
| 51 |
+
from langchain_community.document_loaders import TextLoader, PyPDFLoader
|
| 52 |
+
from langchain_huggingface import HuggingFaceEmbeddings
|
| 53 |
+
from langchain_google_genai import ChatGoogleGenerativeAI
|
| 54 |
+
|
| 55 |
+
api_key = request.form.get("apiKey")
|
| 56 |
+
if not api_key:
|
| 57 |
+
return "API key missing!", 400
|
| 58 |
+
|
| 59 |
+
uploaded = request.files.get("file")
|
| 60 |
+
if not uploaded or uploaded.filename.strip() == "":
|
| 61 |
+
return "No file uploaded", 400
|
| 62 |
+
|
| 63 |
+
ext = uploaded.filename.rsplit(".", 1)[-1].lower()
|
| 64 |
+
with tempfile.NamedTemporaryFile(delete=False, suffix=f".{ext}") as tmp_file:
|
| 65 |
+
uploaded.save(tmp_file.name)
|
| 66 |
+
path = tmp_file.name
|
| 67 |
+
|
| 68 |
+
# load document
|
| 69 |
+
try:
|
| 70 |
+
loader = PyPDFLoader(path) if ext == "pdf" else TextLoader(path)
|
| 71 |
+
documents = loader.load()
|
| 72 |
+
except Exception as e:
|
| 73 |
+
os.unlink(path)
|
| 74 |
+
return f"Failed to read document: {e}", 400
|
| 75 |
+
|
| 76 |
+
if not documents:
|
| 77 |
+
os.unlink(path)
|
| 78 |
+
return "No readable content found in the document.", 400
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
# split document into chunks
|
| 82 |
+
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100) # reduce chunk_size for low memory
|
| 83 |
+
chunks = splitter.split_documents(documents)
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
# Light embedding model (fast + low memory)
|
| 87 |
+
try:
|
| 88 |
+
embeds = HuggingFaceEmbeddings(model_name="sentence-transformers/paraphrase-MiniLM-L3-v2")
|
| 89 |
+
vector_store = FAISS.from_documents(chunks, embeds)
|
| 90 |
+
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})
|
| 91 |
+
|
| 92 |
+
except Exception as e:
|
| 93 |
+
os.unlink(path)
|
| 94 |
+
return f"Embedding model failed: {e}", 500
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
# Initialize chat model
|
| 98 |
+
try:
|
| 99 |
+
LLM_model = ChatGoogleGenerativeAI(model="gemini-2.5-flash", google_api_key=api_key)
|
| 100 |
+
except Exception as e:
|
| 101 |
+
return f"Failed to initialize chat model: {e}", 500
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
# Cleanup temp file
|
| 105 |
+
os.unlink(path)
|
| 106 |
+
del documents, chunks, vector_store
|
| 107 |
+
gc.collect()
|
| 108 |
+
|
| 109 |
+
return "Document processed successfully! You can now ask questions."
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
@app.route("/chat", methods=["POST"])
|
| 113 |
+
def chat():
|
| 114 |
+
"""Q&A route on uploaded document."""
|
| 115 |
+
global retriever, LLM_model
|
| 116 |
+
|
| 117 |
+
from langchain_core.prompts import PromptTemplate
|
| 118 |
+
from langchain_core.runnables import RunnableParallel, RunnableLambda, RunnablePassthrough
|
| 119 |
+
from langchain_core.output_parsers import StrOutputParser
|
| 120 |
+
|
| 121 |
+
if retriever is None or LLM_model is None:
|
| 122 |
+
return jsonify({"error": "Please upload a document first."}), 400
|
| 123 |
+
|
| 124 |
+
question = request.form.get("question") or (request.json and request.json.get("question"))
|
| 125 |
+
if not question:
|
| 126 |
+
return jsonify({"error": "No question provided."}), 400
|
| 127 |
+
|
| 128 |
+
# Retrieve documents with retriever
|
| 129 |
+
try:
|
| 130 |
+
docs = retriever.invoke(question)
|
| 131 |
+
context = "\n\n".join(d.page_content for d in docs)
|
| 132 |
+
except Exception as e:
|
| 133 |
+
return jsonify({"error": f"Retriever failed: {e}"}), 500
|
| 134 |
+
|
| 135 |
+
# prompt template
|
| 136 |
+
prompt_template = PromptTemplate(
|
| 137 |
+
template=(
|
| 138 |
+
"You are answering strictly based on this document.\n\n"
|
| 139 |
+
"{context}\n\n"
|
| 140 |
+
"Question: {question}\n\n"
|
| 141 |
+
"Answer:"
|
| 142 |
+
),
|
| 143 |
+
input_variables=["context", "question"],
|
| 144 |
+
)
|
| 145 |
+
|
| 146 |
+
# Combine into a pipeline
|
| 147 |
+
chain = (
|
| 148 |
+
RunnableParallel({
|
| 149 |
+
"context": retriever | RunnableLambda(lambda docs: "\n\n".join(d.page_content for d in docs)),
|
| 150 |
+
"question": RunnablePassthrough(),
|
| 151 |
+
})
|
| 152 |
+
| prompt_template
|
| 153 |
+
| LLM_model
|
| 154 |
+
| StrOutputParser()
|
| 155 |
+
)
|
| 156 |
+
|
| 157 |
+
try:
|
| 158 |
+
response = chain.invoke(question).strip()
|
| 159 |
+
except Exception as e:
|
| 160 |
+
response = f"Error generating response: {str(e)}"
|
| 161 |
+
|
| 162 |
+
# Clean markdown artifacts
|
| 163 |
+
cleaned = re.sub(r"\*\*(.*?)\*\*", r"\1", response)
|
| 164 |
+
cleaned = re.sub(r"\*(.*?)\*", r"\1", cleaned)
|
| 165 |
+
|
| 166 |
+
gc.collect()
|
| 167 |
+
return jsonify({"answer": cleaned})
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
# run app
|
| 171 |
+
if __name__ == "__main__":
|
| 172 |
+
port = int(os.environ.get("PORT", 7860))
|
| 173 |
+
app.run(host="0.0.0.0", port=port, debug=False)
|
initial.ipynb
ADDED
|
@@ -0,0 +1,1252 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cells": [
|
| 3 |
+
{
|
| 4 |
+
"cell_type": "markdown",
|
| 5 |
+
"id": "94bbe043",
|
| 6 |
+
"metadata": {},
|
| 7 |
+
"source": [
|
| 8 |
+
"# Document RAG System."
|
| 9 |
+
]
|
| 10 |
+
},
|
| 11 |
+
{
|
| 12 |
+
"cell_type": "code",
|
| 13 |
+
"execution_count": null,
|
| 14 |
+
"id": "367a3c60",
|
| 15 |
+
"metadata": {},
|
| 16 |
+
"outputs": [],
|
| 17 |
+
"source": [
|
| 18 |
+
"# install needed libraries to use\n",
|
| 19 |
+
"!pip -q install langchain langchain-google-genai langchain-community google-genai faiss-cpu tiktoken python-dotenv pypdf langchain-huggingface sentence-transformers"
|
| 20 |
+
]
|
| 21 |
+
},
|
| 22 |
+
{
|
| 23 |
+
"cell_type": "code",
|
| 24 |
+
"execution_count": null,
|
| 25 |
+
"id": "276de997",
|
| 26 |
+
"metadata": {},
|
| 27 |
+
"outputs": [],
|
| 28 |
+
"source": [
|
| 29 |
+
"\"\"\"Google Colab environment setup\"\"\"\n",
|
| 30 |
+
"# import os\n",
|
| 31 |
+
"\n",
|
| 32 |
+
"# # set environment variables for google and huggingface\n",
|
| 33 |
+
"\n",
|
| 34 |
+
"# os.environ['GOOGLE_API_KEY'] = userdata.get(\"GOOGLE_API_KEY\")\n",
|
| 35 |
+
"# os.environ['HUGGINGFACEHUB_ACCESS_TOKEN'] = userdata.get(\"HUGGINGFACEHUB_ACCESS_TOKEN\")"
|
| 36 |
+
]
|
| 37 |
+
},
|
| 38 |
+
{
|
| 39 |
+
"cell_type": "markdown",
|
| 40 |
+
"id": "2651b55f",
|
| 41 |
+
"metadata": {},
|
| 42 |
+
"source": [
|
| 43 |
+
"### Load keys."
|
| 44 |
+
]
|
| 45 |
+
},
|
| 46 |
+
{
|
| 47 |
+
"cell_type": "code",
|
| 48 |
+
"execution_count": null,
|
| 49 |
+
"id": "7673d4e2",
|
| 50 |
+
"metadata": {},
|
| 51 |
+
"outputs": [
|
| 52 |
+
{
|
| 53 |
+
"name": "stdout",
|
| 54 |
+
"output_type": "stream",
|
| 55 |
+
"text": [
|
| 56 |
+
"OpenAI key loaded: True\n",
|
| 57 |
+
"\n",
|
| 58 |
+
"Gemini key loaded: True\n"
|
| 59 |
+
]
|
| 60 |
+
}
|
| 61 |
+
],
|
| 62 |
+
"source": [
|
| 63 |
+
"import os\n",
|
| 64 |
+
"from dotenv import load_dotenv\n",
|
| 65 |
+
"\n",
|
| 66 |
+
"load_dotenv()\n",
|
| 67 |
+
"\n",
|
| 68 |
+
"openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n",
|
| 69 |
+
"gemini_api_key = os.getenv(\"GEMINI_API_KEY\")\n",
|
| 70 |
+
"\n",
|
| 71 |
+
"print(\"OpenAI key loaded:\", bool(openai_api_key))\n",
|
| 72 |
+
"# print(\"OpenAI key:\", openai_api_key)\n",
|
| 73 |
+
"\n",
|
| 74 |
+
"print(\"\\nGemini key loaded:\", bool(gemini_api_key))\n",
|
| 75 |
+
"# print(\"Gemini key:\", gemini_api_key)\n"
|
| 76 |
+
]
|
| 77 |
+
},
|
| 78 |
+
{
|
| 79 |
+
"cell_type": "code",
|
| 80 |
+
"execution_count": 2,
|
| 81 |
+
"id": "d1aa3528",
|
| 82 |
+
"metadata": {},
|
| 83 |
+
"outputs": [
|
| 84 |
+
{
|
| 85 |
+
"name": "stderr",
|
| 86 |
+
"output_type": "stream",
|
| 87 |
+
"text": [
|
| 88 |
+
"/usr/local/lib/python3.12/dist-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
|
| 89 |
+
" from .autonotebook import tqdm as notebook_tqdm\n"
|
| 90 |
+
]
|
| 91 |
+
}
|
| 92 |
+
],
|
| 93 |
+
"source": [
|
| 94 |
+
"# import necessary libraries\n",
|
| 95 |
+
"\n",
|
| 96 |
+
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
| 97 |
+
"from langchain_google_genai import GoogleGenerativeAIEmbeddings,ChatGoogleGenerativeAI,GoogleGenerativeAI\n",
|
| 98 |
+
"from langchain_community.vectorstores import FAISS\n",
|
| 99 |
+
"from langchain_core.prompts import PromptTemplate\n",
|
| 100 |
+
"from langchain_community.document_loaders import PyPDFLoader\n",
|
| 101 |
+
"from langchain_huggingface import HuggingFaceEmbeddings"
|
| 102 |
+
]
|
| 103 |
+
},
|
| 104 |
+
{
|
| 105 |
+
"cell_type": "markdown",
|
| 106 |
+
"id": "72887907",
|
| 107 |
+
"metadata": {},
|
| 108 |
+
"source": [
|
| 109 |
+
"### Test key with prompt."
|
| 110 |
+
]
|
| 111 |
+
},
|
| 112 |
+
{
|
| 113 |
+
"cell_type": "code",
|
| 114 |
+
"execution_count": null,
|
| 115 |
+
"id": "181c0959",
|
| 116 |
+
"metadata": {},
|
| 117 |
+
"outputs": [
|
| 118 |
+
{
|
| 119 |
+
"name": "stderr",
|
| 120 |
+
"output_type": "stream",
|
| 121 |
+
"text": [
|
| 122 |
+
"E0000 00:00:1759189263.070101 109561 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.\n"
|
| 123 |
+
]
|
| 124 |
+
},
|
| 125 |
+
{
|
| 126 |
+
"name": "stdout",
|
| 127 |
+
"output_type": "stream",
|
| 128 |
+
"text": [
|
| 129 |
+
"content='Hello! How can I help you today?' additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []} id='run--0c2a5214-4f37-43b5-b432-836a3abe7058-0' usage_metadata={'input_tokens': 2, 'output_tokens': 46, 'total_tokens': 48, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 37}}\n"
|
| 130 |
+
]
|
| 131 |
+
}
|
| 132 |
+
],
|
| 133 |
+
"source": [
|
| 134 |
+
"from langchain_google_genai import ChatGoogleGenerativeAI\n",
|
| 135 |
+
"\n",
|
| 136 |
+
"LLM = ChatGoogleGenerativeAI(\n",
|
| 137 |
+
" model=\"gemini-2.5-flash\",\n",
|
| 138 |
+
" google_api_key=gemini_api_key\n",
|
| 139 |
+
")\n",
|
| 140 |
+
"\n",
|
| 141 |
+
"response = LLM.invoke(\"Hello\")\n",
|
| 142 |
+
"print(response)\n"
|
| 143 |
+
]
|
| 144 |
+
},
|
| 145 |
+
{
|
| 146 |
+
"cell_type": "code",
|
| 147 |
+
"execution_count": 53,
|
| 148 |
+
"id": "83bf1f6f",
|
| 149 |
+
"metadata": {},
|
| 150 |
+
"outputs": [
|
| 151 |
+
{
|
| 152 |
+
"name": "stdout",
|
| 153 |
+
"output_type": "stream",
|
| 154 |
+
"text": [
|
| 155 |
+
"Hello! How can I help you today?\n"
|
| 156 |
+
]
|
| 157 |
+
}
|
| 158 |
+
],
|
| 159 |
+
"source": [
|
| 160 |
+
"print(response.content)"
|
| 161 |
+
]
|
| 162 |
+
},
|
| 163 |
+
{
|
| 164 |
+
"cell_type": "markdown",
|
| 165 |
+
"id": "7d42c604",
|
| 166 |
+
"metadata": {},
|
| 167 |
+
"source": [
|
| 168 |
+
"## Load document."
|
| 169 |
+
]
|
| 170 |
+
},
|
| 171 |
+
{
|
| 172 |
+
"cell_type": "code",
|
| 173 |
+
"execution_count": null,
|
| 174 |
+
"id": "a39b0bfb",
|
| 175 |
+
"metadata": {},
|
| 176 |
+
"outputs": [],
|
| 177 |
+
"source": [
|
| 178 |
+
"# load and read PDF file\n",
|
| 179 |
+
"\n",
|
| 180 |
+
"load_document = PyPDFLoader(\"dataset/ChenZhang_cropmapping_ReviewPaper.pdf\")\n",
|
| 181 |
+
"document = load_document.load()"
|
| 182 |
+
]
|
| 183 |
+
},
|
| 184 |
+
{
|
| 185 |
+
"cell_type": "code",
|
| 186 |
+
"execution_count": 7,
|
| 187 |
+
"id": "44f61c67",
|
| 188 |
+
"metadata": {},
|
| 189 |
+
"outputs": [
|
| 190 |
+
{
|
| 191 |
+
"data": {
|
| 192 |
+
"text/plain": [
|
| 193 |
+
"29"
|
| 194 |
+
]
|
| 195 |
+
},
|
| 196 |
+
"execution_count": 7,
|
| 197 |
+
"metadata": {},
|
| 198 |
+
"output_type": "execute_result"
|
| 199 |
+
}
|
| 200 |
+
],
|
| 201 |
+
"source": [
|
| 202 |
+
"len(document)"
|
| 203 |
+
]
|
| 204 |
+
},
|
| 205 |
+
{
|
| 206 |
+
"cell_type": "code",
|
| 207 |
+
"execution_count": 8,
|
| 208 |
+
"id": "46d97e70",
|
| 209 |
+
"metadata": {},
|
| 210 |
+
"outputs": [
|
| 211 |
+
{
|
| 212 |
+
"name": "stdout",
|
| 213 |
+
"output_type": "stream",
|
| 214 |
+
"text": [
|
| 215 |
+
"Review\n",
|
| 216 |
+
"Remote sensing for crop mapping: A perspective on current and future \n",
|
| 217 |
+
"crop-specific land cover data products\n",
|
| 218 |
+
"Chen Zhang\n",
|
| 219 |
+
"a , *\n",
|
| 220 |
+
", Hannah Kerner\n",
|
| 221 |
+
"b\n",
|
| 222 |
+
", Sherrie Wang\n",
|
| 223 |
+
"c\n",
|
| 224 |
+
", Pengyu Hao\n",
|
| 225 |
+
"d\n",
|
| 226 |
+
", Zhe Li\n",
|
| 227 |
+
"e\n",
|
| 228 |
+
", Kevin A. Hunt\n",
|
| 229 |
+
"e\n",
|
| 230 |
+
", \n",
|
| 231 |
+
"Jonathon Abernethy\n",
|
| 232 |
+
"e\n",
|
| 233 |
+
", Haoteng Zhao\n",
|
| 234 |
+
"f\n",
|
| 235 |
+
", Feng Gao\n",
|
| 236 |
+
"f\n",
|
| 237 |
+
", Liping Di\n",
|
| 238 |
+
"a , *\n",
|
| 239 |
+
", Claire Guo\n",
|
| 240 |
+
"a , g\n",
|
| 241 |
+
", Ziao Liu\n",
|
| 242 |
+
"a\n",
|
| 243 |
+
", \n",
|
| 244 |
+
"Zhengwei Yang\n",
|
| 245 |
+
"e\n",
|
| 246 |
+
", Rick Mueller\n",
|
| 247 |
+
"e\n",
|
| 248 |
+
", Claire Boryan\n",
|
| 249 |
+
"e\n",
|
| 250 |
+
", Qi Chen\n",
|
| 251 |
+
"h\n",
|
| 252 |
+
", Peter C. Beeson\n",
|
| 253 |
+
"i\n",
|
| 254 |
+
", Hankui K. Zhang\n",
|
| 255 |
+
"j\n",
|
| 256 |
+
", \n",
|
| 257 |
+
"Yu Shen\n",
|
| 258 |
+
"j , k\n",
|
| 259 |
+
"a\n",
|
| 260 |
+
"Center for Spatial Information Science and Systems, George Mason University, Fairfax, VA 22030, USA\n",
|
| 261 |
+
"b\n",
|
| 262 |
+
"School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA\n",
|
| 263 |
+
"c\n",
|
| 264 |
+
"Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA\n",
|
| 265 |
+
"d\n",
|
| 266 |
+
"Food and Agriculture Organization of the United Nations, Viale delle Terme di Caracalla, 00153 Rome, Italy\n",
|
| 267 |
+
"e\n",
|
| 268 |
+
"U.S. Department of Agriculture, National Agricultural Statistics Service, Washington, DC 20250, USA\n",
|
| 269 |
+
"f\n",
|
| 270 |
+
"U.S. Department of Agriculture, Agricultural Research Service, Hydrology and Remote Sensing Laboratory, Beltsville, MD 20705, USA\n",
|
| 271 |
+
"g\n",
|
| 272 |
+
"Thomas Jefferson High School for Science and Technology, Alexandria, VA 22312, USA\n",
|
| 273 |
+
"h\n",
|
| 274 |
+
"Department of Geography & Environment, University of Hawai ’ i at M ¯anoa, Honolulu, HI 96822, USA\n",
|
| 275 |
+
"i\n",
|
| 276 |
+
"U.S. Department of Agriculture, Economic Research Service, Washington, DC 20250, USA\n",
|
| 277 |
+
"j\n",
|
| 278 |
+
"Geospatial Sciences Center of Excellence, Department of Geography and Geospatial Sciences, South Dakota State University, Brookings, SD 57007, USA\n",
|
| 279 |
+
"k\n",
|
| 280 |
+
"Nicholas School of the Environment, Duke University, Durham, NC 27708, USA\n",
|
| 281 |
+
"ARTICLE INFO\n",
|
| 282 |
+
"Edited by Dr. Marie Weiss\n",
|
| 283 |
+
"Keywords:\n",
|
| 284 |
+
"Crop mapping\n",
|
| 285 |
+
"Land use land cover\n",
|
| 286 |
+
"Geospatial data product\n",
|
| 287 |
+
"Systematic literature review\n",
|
| 288 |
+
"Cropland data layer\n",
|
| 289 |
+
"ABSTRACT\n",
|
| 290 |
+
"Crop mapping is an indispensable application in agricultural and environmental remote sensing. Over the last \n",
|
| 291 |
+
"few decades, the exponential growth of open Earth Observation (EO) data has significantly enhanced crop \n",
|
| 292 |
+
"mapping and enabled the production of detailed crop-specific land cover data at national and regional scales. \n",
|
| 293 |
+
"These data have served multiple purposes across a wide range of applications and research initiatives. However, \n",
|
| 294 |
+
"there is currently no comprehensive summary of the crop mapping data products, nor is there a detailed dis -\n",
|
| 295 |
+
"cussion of their uses in remote sensing studies. This paper provides the first in-depth review of remote sensing for \n",
|
| 296 |
+
"crop mapping from the perspective of crop-specific land cover data by evaluating over 60 open-access opera -\n",
|
| 297 |
+
"tional products, archival crop type map datasets, single-crop extent map datasets, cropping pattern datasets, and \n",
|
| 298 |
+
"crop mapping platforms and systems. Using the Cropland Data Layer (CDL) – one of the most widely used \n",
|
| 299 |
+
"products with over 25 years of continuous monitoring of U.S. croplands – as a case study, we also conduct a \n",
|
| 300 |
+
"systematic literature review on the application of crop type maps in remote sensing science. Our analysis syn -\n",
|
| 301 |
+
"thesizes 129 research articles through three core research questions: (1) What EO data are used with CDL; (2) \n",
|
| 302 |
+
"What scientific problems and technologies are explored using CDL; and (3) What role does CDL play in remote \n",
|
| 303 |
+
"sensing applications. Furthermore, we delve into the implications of our vision for new data products and \n",
|
| 304 |
+
"propose emerging research topics, ranging from extending the spatiotemporal coverage of current data products \n",
|
| 305 |
+
"to improving global mapping reliability and developing operational in-season crop mapping systems. This review \n",
|
| 306 |
+
"paper not only serves as a reference for stakeholders seeking to utilize crop-specific land cover data in their work, \n",
|
| 307 |
+
"but also outlines the directions for future geospatial data product development.\n",
|
| 308 |
+
"* Corresponding authors.\n",
|
| 309 |
+
"E-mail addresses: czhang11@gmu.edu (C. Zhang), hkerner@asu.edu (H. Kerner), sherwang@mit.edu (S. Wang), pengyu.hao@fao.org (P. Hao), zhe.li@usda.gov\n",
|
| 310 |
+
"(Z. Li), kevin.a.hunt@usda.gov (K.A. Hunt), jake.abernethy@usda.gov (J. Abernethy), haoteng.zhao@usda.gov (H. Zhao), feng.gao@usda.gov (F. Gao), ldi@gmu. \n",
|
| 311 |
+
"edu (L. Di), zliu23@gmu.edu (Z. Liu), zhengwei.yang@usda.gov (Z. Yang), rick.mueller@usda.gov (R. Mueller), claire.boryan@usda.gov (C. Boryan), qichen@ \n",
|
| 312 |
+
"hawaii.edu (Q. Chen), peter.beeson@usda.gov (P.C. Beeson), hankui.zhang@sdstate.edu (H.K. Zhang), yu.shen@duke.edu (Y. Shen). \n",
|
| 313 |
+
"Contents lists available at ScienceDirect\n",
|
| 314 |
+
"Remote Sensing of Environment\n",
|
| 315 |
+
"journal homepage: www.else vier.com/loc ate/rse\n",
|
| 316 |
+
"https://doi.org/10.1016/j.rse.2025.114995\n",
|
| 317 |
+
"Received 12 December 2024; Received in revised form 11 August 2025; Accepted 22 August 2025 \n",
|
| 318 |
+
"Remote Sensing of Environment 330 (2025) 114995 \n",
|
| 319 |
+
"0034-4257/© 2025 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by- \n",
|
| 320 |
+
"nc-nd/4.0/ ).\n"
|
| 321 |
+
]
|
| 322 |
+
}
|
| 323 |
+
],
|
| 324 |
+
"source": [
|
| 325 |
+
"# First page of PDF\n",
|
| 326 |
+
"print(document[0].page_content) "
|
| 327 |
+
]
|
| 328 |
+
},
|
| 329 |
+
{
|
| 330 |
+
"cell_type": "code",
|
| 331 |
+
"execution_count": 9,
|
| 332 |
+
"id": "fa96dcd3",
|
| 333 |
+
"metadata": {},
|
| 334 |
+
"outputs": [
|
| 335 |
+
{
|
| 336 |
+
"name": "stdout",
|
| 337 |
+
"output_type": "stream",
|
| 338 |
+
"text": [
|
| 339 |
+
"the WoS database, including the publication title, abstract, or keywords. \n",
|
| 340 |
+
"In our survey, we found that many papers introduced, discussed, or cited \n",
|
| 341 |
+
"CDL, but did not directly use the data in their experiments. Therefore, \n",
|
| 342 |
+
"IC1 could ensure that CDL has been applied in the selected publications, \n",
|
| 343 |
+
"rather than simply mentioning it in passing.\n",
|
| 344 |
+
"To narrow down the publications to those specifically related to \n",
|
| 345 |
+
"remote sensing, IC2 states that the publication ’ s “ Category ” field in the \n",
|
| 346 |
+
"WoS database must be labeled as “ remote sensing ” . However, many \n",
|
| 347 |
+
"publications related to remote sensing were published in computer sci -\n",
|
| 348 |
+
"ence, agricultural, or multidisciplinary journals, which were not cate -\n",
|
| 349 |
+
"gorized as “ remote sensing ” . To include these publications in this \n",
|
| 350 |
+
"review, we added a rule that requires the presence of certain terms, such \n",
|
| 351 |
+
"as “ Remote Sensing ” , “ Earth observation ” , “ Landsat ” , “ Sentinel ” , or \n",
|
| 352 |
+
"“ MODIS ” in any of the title, keywords, or abstract of the publication.\n",
|
| 353 |
+
"To ensure the selected publications reflected the up-to-date research \n",
|
| 354 |
+
"trends and avoided duplicate research items, IC3 limits the document \n",
|
| 355 |
+
"type to only peer-reviewed articles that were published in journals \n",
|
| 356 |
+
"indexed by the WoS Core Collection. Focusing on these high-impact \n",
|
| 357 |
+
"journal articles guarantees that our review reflects the most represen -\n",
|
| 358 |
+
"tative studies within the remote sensing field.\n",
|
| 359 |
+
"The query string of inclusion criteria in the WoS data database is: \n",
|
| 360 |
+
"ALL = ( “ Cropland Data Layer ” OR “ CDL ” ) AND (WC = “ Remote Sensing ” \n",
|
| 361 |
+
"OR ALL = ( “ Remote Sensing ” OR “ Earth observation ” OR “ Landsat ” OR \n",
|
| 362 |
+
"“ Sentinel ” OR “ MODIS ” )) AND DT = “ Article ” , where ALL represents all \n",
|
| 363 |
+
"fields (title, abstract, keywords), WC represents WoS categories, and DT \n",
|
| 364 |
+
"represents document type. After the initial screening process, we \n",
|
| 365 |
+
"manually applied the three exclusion criteria to exclude publications \n",
|
| 366 |
+
"where the full term “ CDL ” was not related to “ Cropland Data Layer ” , \n",
|
| 367 |
+
"studies that did not use remote sensing data, and any review articles. \n",
|
| 368 |
+
"These exclusion criteria were essential for ensuring the reliability of our \n",
|
| 369 |
+
"selection results and for eliminating any irrelevant literature. The \n",
|
| 370 |
+
"literature selection process from the CDL citations on the USDA NASS \n",
|
| 371 |
+
"website adheres to the same inclusion and exclusion criteria. The \n",
|
| 372 |
+
"eligible documents were combined with the screening results of WoS \n",
|
| 373 |
+
"database, and any duplicate records were removed.\n",
|
| 374 |
+
"3.3. Results\n",
|
| 375 |
+
"The result of the literature screening process is illustrated in Fig. 4 . \n",
|
| 376 |
+
"Applying the inclusion criteria, we screened 162 and 43 articles from the \n",
|
| 377 |
+
"WoS database and the USDA NASS CDL website, respectively. We then \n",
|
| 378 |
+
"excluded 48 and 8 non-qualified articles from the two sources. After \n",
|
| 379 |
+
"removing the 20 duplicated records, we identified 129 qualified articles \n",
|
| 380 |
+
"for use in this systematic literature review. The full literature list and \n",
|
| 381 |
+
"surveyed features per selected publication are summarized in Table A1 .\n",
|
| 382 |
+
"Table 8 summarizes the publication distribution of 129 qualified \n",
|
| 383 |
+
"articles across over 40 scientific journals. It should be noted that these \n",
|
| 384 |
+
"screening results only encompass representative articles related to the \n",
|
| 385 |
+
"CDL in remote sensing science. As documents are searched based on \n",
|
| 386 |
+
"Table 7 \n",
|
| 387 |
+
"Document search criteria.\n",
|
| 388 |
+
"ID Description\n",
|
| 389 |
+
"Inclusion Criteria 1 (IC1) “ Cropland Data Layer ” OR “ CDL ” contained in any fields\n",
|
| 390 |
+
"Inclusion Criteria 2 (IC2) Category in “ Remote Sensing ” or in other categories but contain “ Remote Sensing ” or “ Earth observation ” or “ Landsat ” or “ Sentinel ” or “ MODIS ” in any \n",
|
| 391 |
+
"fields\n",
|
| 392 |
+
"Inclusion Criteria 3 (IC3) Publication is a journal article\n",
|
| 393 |
+
"Exclusion Criteria 1 \n",
|
| 394 |
+
"(EC1)\n",
|
| 395 |
+
"The full term of “ CDL ” is not related to “ Cropland Data Layer ”\n",
|
| 396 |
+
"Exclusion Criteria 2 \n",
|
| 397 |
+
"(EC2)\n",
|
| 398 |
+
"No remote sensing data is used in the study\n",
|
| 399 |
+
"Exclusion Criteria 3 \n",
|
| 400 |
+
"(EC3)\n",
|
| 401 |
+
"Publication is a review paper\n",
|
| 402 |
+
"Fig. 4. Literature screening process.\n",
|
| 403 |
+
"Table 8 \n",
|
| 404 |
+
"Publication distribution of the qualified CDL-related remote sensing studies by \n",
|
| 405 |
+
"journals.\n",
|
| 406 |
+
"Journal Record \n",
|
| 407 |
+
"Count\n",
|
| 408 |
+
"Remote Sensing of Environment 27\n",
|
| 409 |
+
"Remote Sensing 26\n",
|
| 410 |
+
"International Journal of Applied Earth Observation and \n",
|
| 411 |
+
"Geoinformation\n",
|
| 412 |
+
"9\n",
|
| 413 |
+
"ISPRS Journal of Photogrammetry and Remote Sensing 9\n",
|
| 414 |
+
"Photogrammetric Engineering and Remote Sensing 4\n",
|
| 415 |
+
"Agronomy Journal 3\n",
|
| 416 |
+
"Computers and Electronics in Agriculture 3\n",
|
| 417 |
+
"Remote Sensing Letters 3\n",
|
| 418 |
+
"Agricultural Systems 2\n",
|
| 419 |
+
"Agricultural Water Management 2\n",
|
| 420 |
+
"Canadian Journal of Remote Sensing 2\n",
|
| 421 |
+
"Earth System Science Data 2\n",
|
| 422 |
+
"European Journal of Remote Sensing 2\n",
|
| 423 |
+
"IEEE Journal of Selected Topics in Applied Earth Observations and \n",
|
| 424 |
+
"Remote Sensing\n",
|
| 425 |
+
"2\n",
|
| 426 |
+
"International Journal of Remote Sensing 2\n",
|
| 427 |
+
"Science of Remote Sensing 2\n",
|
| 428 |
+
"Sensors 2\n",
|
| 429 |
+
"Others (only one paper) 27\n",
|
| 430 |
+
"Total 129\n",
|
| 431 |
+
"C. Zhang et al. Remote Sensing of Environment 330 (2025) 114995 \n",
|
| 432 |
+
"10\n"
|
| 433 |
+
]
|
| 434 |
+
}
|
| 435 |
+
],
|
| 436 |
+
"source": [
|
| 437 |
+
"# 10th page of PDF\n",
|
| 438 |
+
"print(document[9].page_content)"
|
| 439 |
+
]
|
| 440 |
+
},
|
| 441 |
+
{
|
| 442 |
+
"cell_type": "code",
|
| 443 |
+
"execution_count": 10,
|
| 444 |
+
"id": "31c4df34",
|
| 445 |
+
"metadata": {},
|
| 446 |
+
"outputs": [
|
| 447 |
+
{
|
| 448 |
+
"name": "stdout",
|
| 449 |
+
"output_type": "stream",
|
| 450 |
+
"text": [
|
| 451 |
+
"et al., 2013 ). CDL data also have been used to delineate and stratify \n",
|
| 452 |
+
"regions, such as U.S. soybean growing areas ( Song et al., 2017 ), which \n",
|
| 453 |
+
"helps in understanding field size patterns for more effective agricultural \n",
|
| 454 |
+
"resource management.\n",
|
| 455 |
+
"Training samples: Beyond a crop type map, CDL is widely utilized \n",
|
| 456 |
+
"as an authoritative geospatial benchmark to support field-level crop \n",
|
| 457 |
+
"spectral signature training. The ML models trained with high-confidence \n",
|
| 458 |
+
"pixels in CDL and associated products (e.g., CSB, Confidence Layer) can \n",
|
| 459 |
+
"be applied to extend land cover classification while adjusting for factors \n",
|
| 460 |
+
"such as hemisphere seasonality and evolving farming trends, which is \n",
|
| 461 |
+
"invaluable for global crop monitoring. As discussed in RQ2, ML and DL \n",
|
| 462 |
+
"are the main technologies in remote sensing studies, which rely on high- \n",
|
| 463 |
+
"quality training data. Due to the extensive crop-specific land cover in -\n",
|
| 464 |
+
"formation, CDL has been extensively used to label training samples in EO \n",
|
| 465 |
+
"data. This enables the further supervised-learning-based training pro -\n",
|
| 466 |
+
"cess for semantic segmentation models ( Du et al., 2022a ), ML models \n",
|
| 467 |
+
"( Momm et al., 2020 ), DL models ( Cai et al., 2018 ; Xu et al., 2020 ), and \n",
|
| 468 |
+
"transfer learning models ( Hao et al., 2020 ; Wei et al., 2022 ). Instead of \n",
|
| 469 |
+
"directly using CDL as training samples, some works further optimized \n",
|
| 470 |
+
"the training sample selection process by modeling crop rotation patterns \n",
|
| 471 |
+
"in the historical CDL ( Zhang et al., 2022a ). Zhang et al. (2021) and Lin \n",
|
| 472 |
+
"et al. (2022b) used DNNs to automatically recognize training samples \n",
|
| 473 |
+
"from CDL time series to label Landsat and Sentinel-2 data for early and \n",
|
| 474 |
+
"in-season crop mapping.\n",
|
| 475 |
+
"Benchmark data: CDL is often adopted as benchmark data or \n",
|
| 476 |
+
"reference data to validate new crop mapping methodologies and algo -\n",
|
| 477 |
+
"rithms. The traditional ground-truthing process is usually labor- \n",
|
| 478 |
+
"intensive, particularly when surveying extensive geographic areas. By \n",
|
| 479 |
+
"comparing results against the CDL, researchers can efficiently assess \n",
|
| 480 |
+
"model performance, detect areas for improvement, and refine their \n",
|
| 481 |
+
"strategies to achieve optimal outcomes. However, despite its widespread \n",
|
| 482 |
+
"use, it should be noted that CDL only represents a high-quality classifi -\n",
|
| 483 |
+
"cation map rather than ground truth. Several studies have examined the \n",
|
| 484 |
+
"uncertainty and potential biases associated with using CDL as bench -\n",
|
| 485 |
+
"mark data for result validation. For example, Lark et al. (2021) found the \n",
|
| 486 |
+
"average accuracy for all crop classes has improved from 87 % in 2008 to \n",
|
| 487 |
+
"92 % in 2016. Kerner et al. (2022) showed 2019–2020 CDL had 89 % \n",
|
| 488 |
+
"accuracy evaluated with independent ground truth data within the \n",
|
| 489 |
+
"central US Corn Belt.\n",
|
| 490 |
+
"Other uses: CDL and its derivative data products have been applied \n",
|
| 491 |
+
"in addressing broader applications and scientific problems. Boryan et al. \n",
|
| 492 |
+
"(2014) developed a stratification method for agricultural area sampling \n",
|
| 493 |
+
"frame construction based on CDL. Gao et al. (2014) used CDL to assist in \n",
|
| 494 |
+
"the creation of Bidirectional Reflectance Distribution Function (BRDF) \n",
|
| 495 |
+
"look-up maps. Harmonic analysis techniques, such as linear and non- \n",
|
| 496 |
+
"linear harmonic models, have been employed with CDL to model peri -\n",
|
| 497 |
+
"odic patterns in time series data ( Roy and Yan, 2020 ; Wang et al., \n",
|
| 498 |
+
"2020a ). Shao et al. (2016a) evaluated different time-series smoothing \n",
|
| 499 |
+
"algorithms. Duveiller et al. (2015) developed a signal-to-noise ratio \n",
|
| 500 |
+
"method to identify spatially homogeneous vegetation cover. CDL has \n",
|
| 501 |
+
"also been utilized in GIS education ( Han et al., 2014 ) and as compared \n",
|
| 502 |
+
"dataset for particular purposes ( Wickham et al., 2014 ; Kokkinidis et al., \n",
|
| 503 |
+
"2017 ; Shi et al., 2018 ; Kraatz et al., 2023 ; Wang and Mountrakis, 2023 ).\n",
|
| 504 |
+
"4. Visions for future data products\n",
|
| 505 |
+
"As science and technology in remote sensing advances, the demand \n",
|
| 506 |
+
"for enhanced crop-specific land cover data products becomes increas -\n",
|
| 507 |
+
"ingly evident. This section explores vision and progress in improving \n",
|
| 508 |
+
"spatiotemporal coverage and resolution of the current data products \n",
|
| 509 |
+
"(Section 4.1), achieving reliable global mapping through robust training \n",
|
| 510 |
+
"datasets and cropland extent data (Section 4.2 and 4.3), incorporating \n",
|
| 511 |
+
"more crop-specific information (Section 4.4 and 4.5), and the develop -\n",
|
| 512 |
+
"ment of operational in-season crop mapping systems (Section 4.6).\n",
|
| 513 |
+
"4.1. Progress on enhanced coverage and resolution of current product\n",
|
| 514 |
+
"Enhanced spatial coverage and resolution significantly benefit crop \n",
|
| 515 |
+
"mapping, area estimation, and field size quantification by enabling more \n",
|
| 516 |
+
"accurate identification of land cover features. Advancement in geo -\n",
|
| 517 |
+
"spatial cloud computing platforms (e.g., GEE) and increasing availabil -\n",
|
| 518 |
+
"ity of higher spatiotemporal resolution open EO data (e.g., Sentinel-1, \n",
|
| 519 |
+
"Sentinel-2, HLS) have improved the efficiency and accuracy for pro -\n",
|
| 520 |
+
"ducing regional and national crop type map data with resolution of 10-m \n",
|
| 521 |
+
"or even higher ( Tran et al., 2022 ; Li et al., 2025 ). Such detailed field- \n",
|
| 522 |
+
"level crop cover information will not only facilitate a more precise \n",
|
| 523 |
+
"distinction between different types of vegetation and crops, but also \n",
|
| 524 |
+
"provide opportunities for improved agricultural monitoring, better \n",
|
| 525 |
+
"resource management, and informed decision-making to support sus -\n",
|
| 526 |
+
"tainable agriculture and food security.\n",
|
| 527 |
+
"As highlighted in the Section 3, the 30-m CDL has traditionally been \n",
|
| 528 |
+
"essential for scientific problem solving with various EO data. However, \n",
|
| 529 |
+
"the increasing availability of higher-resolution EO data from both open- \n",
|
| 530 |
+
"access and commercial satellites requires more detailed crop mapping \n",
|
| 531 |
+
"products. To meet this evolving need, the USDA NASS has been \n",
|
| 532 |
+
"enhancing data accuracy and usability by implementing a 10-m reso -\n",
|
| 533 |
+
"lution CDL. These improvements are vital, particularly given the \n",
|
| 534 |
+
"increasing vulnerability of agriculture to natural disasters and extreme \n",
|
| 535 |
+
"weather events. By utilizing the RF algorithm, enhanced stratified \n",
|
| 536 |
+
"random sampling approaches, and localized image processing, the 10-m \n",
|
| 537 |
+
"CDL provides a more accurate representation of diverse crop types for \n",
|
| 538 |
+
"CONUS, particularly in regions with unique or specialty crops. This \n",
|
| 539 |
+
"methodology reduces labor and workload while improving classification \n",
|
| 540 |
+
"accuracy and spatial clarity for small-area and specialty crops compared \n",
|
| 541 |
+
"to 30-m CDL. Fig. 8 shows the improvement achieved with the new 10-m \n",
|
| 542 |
+
"CDL compared to the current 30-m CDL on croplands with complex \n",
|
| 543 |
+
"landscapes.\n",
|
| 544 |
+
"Currently, the CDL is available only for the CONUS. However, efforts \n",
|
| 545 |
+
"are underway to extend coverage to other regions, such as Hawaii and U. \n",
|
| 546 |
+
"S. territories like Puerto Rico and the U.S. Virgin Islands. Enhanced CDLs \n",
|
| 547 |
+
"for these areas include the 2022 Beta version and the official 2024 \n",
|
| 548 |
+
"release of the 10-m resolution CDL for CONUS ( Li et al., 2024b ), and the \n",
|
| 549 |
+
"inaugural Hawaii Cropland Data Layer (HCDL) 2023 and 2024 ( Li et al., \n",
|
| 550 |
+
"2024a ). These products leverage gap-filled 10-day image composites \n",
|
| 551 |
+
"from Sentinel and Landsat sensors, processed through GEE. In devel -\n",
|
| 552 |
+
"oping the HCDL, assorted ML and DL algorithms were evaluated, \n",
|
| 553 |
+
"including RF, U-Net, ResNet50, VGG19, and DeepLabV3. The RF algo -\n",
|
| 554 |
+
"rithm achieved the best results for mapping major and specialty crops in \n",
|
| 555 |
+
"Hawaii. Fig. 9 illustrates the 10-m resolution HCDL 2023 V1.0 Beta, \n",
|
| 556 |
+
"which utilizes a RF algorithm with 100 trees for mapping crops, \n",
|
| 557 |
+
"including coffee, pineapple, macadamia nuts, commercial forest, citrus, \n",
|
| 558 |
+
"papaya, and tropical fruits. The official release of HCDL 2023 and 2024 \n",
|
| 559 |
+
"is anticipated in summer 2025. Future efforts will focus on creating a 10- \n",
|
| 560 |
+
"m resolution annual CDL for CONUS and potentially extending to Puerto \n",
|
| 561 |
+
"Rico and the U.S. Virgin Islands.\n",
|
| 562 |
+
"4.2. Developing training dataset in data-sparse regions\n",
|
| 563 |
+
"Lack of training data is a major barrier for developing crop type maps \n",
|
| 564 |
+
"like CDL in regions outside of the United States or other countries that \n",
|
| 565 |
+
"have instituted operational mapping programs (e.g., programs in \n",
|
| 566 |
+
"Table 1 ). Researchers aim to overcome this barrier in two main ways: (1) \n",
|
| 567 |
+
"developing more globally representative training datasets, and (2) \n",
|
| 568 |
+
"developing algorithms that learn more efficiently from small amounts of \n",
|
| 569 |
+
"training data.\n",
|
| 570 |
+
"Globally representative training datasets: Globally representative \n",
|
| 571 |
+
"reference data is essential for training modern data-hungry DL models \n",
|
| 572 |
+
"and has been identified as a key priority in advancing AI applications in \n",
|
| 573 |
+
"remote sensing ( Zhang et al., 2025a ). Collecting crop type data for \n",
|
| 574 |
+
"training ML classifiers for crop mapping is challenging because col -\n",
|
| 575 |
+
"lecting high-quality data typically requires ground-truthing ( Nakalembe \n",
|
| 576 |
+
"C. Zhang et al. Remote Sensing of Environment 330 (2025) 114995 \n",
|
| 577 |
+
"14\n"
|
| 578 |
+
]
|
| 579 |
+
}
|
| 580 |
+
],
|
| 581 |
+
"source": [
|
| 582 |
+
"# 14th page of PDF\n",
|
| 583 |
+
"print(document[13].page_content)"
|
| 584 |
+
]
|
| 585 |
+
},
|
| 586 |
+
{
|
| 587 |
+
"cell_type": "code",
|
| 588 |
+
"execution_count": 11,
|
| 589 |
+
"id": "0b15e82e",
|
| 590 |
+
"metadata": {},
|
| 591 |
+
"outputs": [
|
| 592 |
+
{
|
| 593 |
+
"name": "stdout",
|
| 594 |
+
"output_type": "stream",
|
| 595 |
+
"text": [
|
| 596 |
+
"and Kerner, 2023 ). Ground-truthing involves physically visiting agri -\n",
|
| 597 |
+
"cultural fields and recording the type of crop growing in the field. This \n",
|
| 598 |
+
"process is prohibitively expensive and logistically challenging for many \n",
|
| 599 |
+
"organizations and regions.\n",
|
| 600 |
+
"Currently available public reference samples are largely regional in \n",
|
| 601 |
+
"scope ( Dufourg et al., 2023 ; Kondmann et al., 2021 ). Recent work has \n",
|
| 602 |
+
"proposed novel methods of collecting ground-truth crop labels that \n",
|
| 603 |
+
"reduce the cost of data collection. Paliyam et al. (2021) proposed a \n",
|
| 604 |
+
"method called Street2Sat that uses computer vision (CV) techniques to \n",
|
| 605 |
+
"transform roadside images of fields collected with car- and motorcycle \n",
|
| 606 |
+
"helmet-mounted cameras into geo-referenced crop type labels of those \n",
|
| 607 |
+
"fields. d’Andrimont et al. (2022) used CV techniques to extract crop type \n",
|
| 608 |
+
"and phenology information from street-level images of fields taken with \n",
|
| 609 |
+
"car-mounted cameras in the Netherlands. Yan and Ryu (2021) and Soler \n",
|
| 610 |
+
"et al. (2024) used DL models to automatically create crop type labels \n",
|
| 611 |
+
"from Google Street View images in California and Thailand, \n",
|
| 612 |
+
"respectively.\n",
|
| 613 |
+
"Other work leveraged crowd-sourced data from online and mobile \n",
|
| 614 |
+
"platforms to collect ground-truth crop data. Wang et al. (2020b) used \n",
|
| 615 |
+
"crop type data crowd-sourced from the Plantix mobile app (used to help \n",
|
| 616 |
+
"farmers diagnose crop disease) for crop mapping in India. Fraisl et al. \n",
|
| 617 |
+
"(2022) demonstrated the use of the mobile app Picture Pile to engage \n",
|
| 618 |
+
"citizen scientists to annotate crop type labels in crowdsourced street- \n",
|
| 619 |
+
"level images from Mapillary, which could later be converted to geo- \n",
|
| 620 |
+
"referenced crop type labels for training crop mapping models. The \n",
|
| 621 |
+
"CropObserve app facilitated the process on crop-specific ground truth -\n",
|
| 622 |
+
"ing (e.g., crop types, phenological stage, visible damage, management \n",
|
| 623 |
+
"practices) anywhere in the world ( IIASA, 2023 ).\n",
|
| 624 |
+
"In parallel with data collection efforts, increasing attention is being \n",
|
| 625 |
+
"paid to making crop type reference data more Findable, Accessible, \n",
|
| 626 |
+
"Interoperable, and Reusable (FAIR). Major research initiatives (e.g., \n",
|
| 627 |
+
"CropHarvest, WorldCereal, and EuroCrops) are actively working on \n",
|
| 628 |
+
"harmonizing, standardizing, and openly publishing training datasets to \n",
|
| 629 |
+
"enhance the FAIRness of crop reference data within the remote sensing \n",
|
| 630 |
+
"and agricultural monitoring communities.\n",
|
| 631 |
+
"Algorithms that learn more efficiently from small amounts of \n",
|
| 632 |
+
"training data: To reduce the need for large labeled datasets to train \n",
|
| 633 |
+
"effective crop mapping models, researchers have proposed methods for \n",
|
| 634 |
+
"learning from a small amount of training data for a given location. Many \n",
|
| 635 |
+
"of these methods involve learning from labeled data in locations other \n",
|
| 636 |
+
"than the target region to supplement training. The WorldCereal project \n",
|
| 637 |
+
"trained a CatBoost classifier with expert-designed features extracted \n",
|
| 638 |
+
"from multiple satellite datasets using a reference database of globally \n",
|
| 639 |
+
"distributed crop type labels ( Van Tricht et al., 2023 ). Other work has \n",
|
| 640 |
+
"leveraged transfer learning, in which models are first “pre-trained” on a \n",
|
| 641 |
+
"large labeled dataset for one task (e.g., crop mapping in region A) and \n",
|
| 642 |
+
"then further trained (“fine-tuned”) on a smaller dataset for the target \n",
|
| 643 |
+
"task (e.g., crop mapping in region B). Meta-learning algorithms are also \n",
|
| 644 |
+
"used to learn efficiently from a small number of crop type examples in a \n",
|
| 645 |
+
"new target region by learning from many globally-distributed crop type \n",
|
| 646 |
+
"classification tasks in the CropHarvest dataset ( Tseng et al., 2021a, \n",
|
| 647 |
+
"2022 ).\n",
|
| 648 |
+
"Researchers have developed methods for learning generic features \n",
|
| 649 |
+
"that are useful in diverse tasks (e.g., crop mapping, land cover mapping, \n",
|
| 650 |
+
"tree species classification) from a large amount of unlabeled satellite EO \n",
|
| 651 |
+
"data in a process called self-supervised learning. Similar to transfer \n",
|
| 652 |
+
"learning discussed previously, after a model is pre-trained using self- \n",
|
| 653 |
+
"supervised learning, it can be fine-tuned for a specific crop mapping \n",
|
| 654 |
+
"task. For example, Tseng et al. (2024) proposed a self-supervised model \n",
|
| 655 |
+
"called Presto (which stands for Pre-trained remote sensing transformer) \n",
|
| 656 |
+
"that learns from unlabeled EO data from multiple satellite platforms and \n",
|
| 657 |
+
"derived products. They showed that fine-tuning Presto on the Kenya \n",
|
| 658 |
+
"maize classification task and Brazil coffee classification task in Cro -\n",
|
| 659 |
+
"pHarvest achieved state-of-the-art performance. Both tasks required \n",
|
| 660 |
+
"learning from small training data sizes of 1345 in Kenya and 203 in \n",
|
| 661 |
+
"Brazil. In Phase II of the ESA WorldCereal project (2024-2026) ( ESA, \n",
|
| 662 |
+
"2024 ), Presto was adopted for feature extraction for crop type mapping \n",
|
| 663 |
+
"in place of the expert-designed features used to train a CatBoost classifier \n",
|
| 664 |
+
"in Van Tricht et al. (2023) . With Presto’s robust algorithm for improving \n",
|
| 665 |
+
"spatiotemporal transferability, this integration is key to WorldCereal’s \n",
|
| 666 |
+
"aim of establishing a generic and customizable global crop mapping \n",
|
| 667 |
+
"system.\n",
|
| 668 |
+
"In recent years, foundation models have emerged to address the \n",
|
| 669 |
+
"scarcity of labeled training data in remote sensing applications ( Jakubik \n",
|
| 670 |
+
"et al., 2023 ; Xiao et al., 2025 ). For example, Google recently introduced \n",
|
| 671 |
+
"AlphaEarth Foundations (AEF) for global mapping from sparse label \n",
|
| 672 |
+
"data ( Brown et al., 2025 ). As a geospatial foundation model, AEF in -\n",
|
| 673 |
+
"tegrates multi-source, multi-modal EO and geoinformation data into a \n",
|
| 674 |
+
"time-continuous embedding space, and the resulting global dataset of \n",
|
| 675 |
+
"analysis-ready embedding field layers could enable a wide range of \n",
|
| 676 |
+
"mapping tasks. Such foundation models and analysis-ready data offer a \n",
|
| 677 |
+
"promising solution for efficient production of cropland and crop type \n",
|
| 678 |
+
"maps at a global scale.\n",
|
| 679 |
+
"4.3. Improving consistency of cropland extent mask for global crop \n",
|
| 680 |
+
"mapping\n",
|
| 681 |
+
"From the perspective of global crop mapping, a reliable and consis -\n",
|
| 682 |
+
"tent cropland extent map serves as the fundamental land cover category \n",
|
| 683 |
+
"in the crop-specific land cover data production, which are crucial for the \n",
|
| 684 |
+
"subsequent crop type classification process especially over the data- \n",
|
| 685 |
+
"sparse regions. Various cropland extent mask data derived from EO \n",
|
| 686 |
+
"data have been widely developed and validated over the past years. \n",
|
| 687 |
+
"However, selecting the most appropriate cropland extent mask and \n",
|
| 688 |
+
"conducting local validation of these data tailored to the specific re -\n",
|
| 689 |
+
"quirements of the study remains challenging due to inconsistency and \n",
|
| 690 |
+
"variability in their reported accuracies and cropland definitions.\n",
|
| 691 |
+
"To improve consistency and transparency of cropland extent, \n",
|
| 692 |
+
"Table 9 \n",
|
| 693 |
+
"FAO land use categories for cropland.\n",
|
| 694 |
+
"Land Use Category Definition\n",
|
| 695 |
+
"Cropland Land used for cultivation of crops. The total of areas under Arable land and Permanent crops.\n",
|
| 696 |
+
"Arable land Land used for cultivation of crops in rotation with fallow, meadows and pastures within cycles of up to 5 years. The total of areas under Temporary \n",
|
| 697 |
+
"crops, temporary meadows and pastures, and temporary fallow. Arable land does not include land that is potentially cultivable but is not cultivated.\n",
|
| 698 |
+
"Temporary crops Land used for crops with a less than 1-year growing cycle, which must be newly sown or planted for further production after the harvest. Some crops \n",
|
| 699 |
+
"remaining in the field for more than 1 year may also be considered as temporary crops (e.g., asparagus, strawberries, pineapples, bananas, and sugar \n",
|
| 700 |
+
"cane). Multiple-cropped areas are counted only once.\n",
|
| 701 |
+
"Temporary fallow Land that is not seeded for one or more growing seasons. The maximum idle period is usually less than 5 years. This land may be in the form sown for \n",
|
| 702 |
+
"the exclusive production of green manure. Land remaining fallow for too long may acquire characteristics requiring it to be reclassified as, for \n",
|
| 703 |
+
"instance, permanent meadows and pastures if used for grazing or haying.\n",
|
| 704 |
+
"Temporary meadows and \n",
|
| 705 |
+
"pastures\n",
|
| 706 |
+
"Land temporarily cultivated with herbaceous forage crops for mowing or pasture, as part of crop rotation periods of less than 5 years.\n",
|
| 707 |
+
"Permanent crops Land cultivated with long-term crops which do not have to be replanted for several years (e.g., cocoa and coffee), land under trees and shrubs \n",
|
| 708 |
+
"producing flowers (e.g., roses and jasmine), and nurseries (except those for forest trees, which should be classified under “forestry”). Permanent \n",
|
| 709 |
+
"meadows and pastures are excluded from permanent crops.\n",
|
| 710 |
+
"C. Zhang et al. Remote Sensing of Environment 330 (2025) 114995 \n",
|
| 711 |
+
"16\n"
|
| 712 |
+
]
|
| 713 |
+
}
|
| 714 |
+
],
|
| 715 |
+
"source": [
|
| 716 |
+
"# 16th page of PDF\n",
|
| 717 |
+
"print(document[15].page_content)"
|
| 718 |
+
]
|
| 719 |
+
},
|
| 720 |
+
{
|
| 721 |
+
"cell_type": "markdown",
|
| 722 |
+
"id": "2ec8c146",
|
| 723 |
+
"metadata": {},
|
| 724 |
+
"source": [
|
| 725 |
+
"## Split texts."
|
| 726 |
+
]
|
| 727 |
+
},
|
| 728 |
+
{
|
| 729 |
+
"cell_type": "code",
|
| 730 |
+
"execution_count": 12,
|
| 731 |
+
"id": "1f22bdbf",
|
| 732 |
+
"metadata": {},
|
| 733 |
+
"outputs": [],
|
| 734 |
+
"source": [
|
| 735 |
+
"# split into chunks\n",
|
| 736 |
+
"\n",
|
| 737 |
+
"doc_split= RecursiveCharacterTextSplitter(\n",
|
| 738 |
+
" chunk_size=1000,\n",
|
| 739 |
+
" chunk_overlap=200,\n",
|
| 740 |
+
")\n",
|
| 741 |
+
"chunks = doc_split.split_documents(document)"
|
| 742 |
+
]
|
| 743 |
+
},
|
| 744 |
+
{
|
| 745 |
+
"cell_type": "code",
|
| 746 |
+
"execution_count": 13,
|
| 747 |
+
"id": "94a200e7",
|
| 748 |
+
"metadata": {},
|
| 749 |
+
"outputs": [
|
| 750 |
+
{
|
| 751 |
+
"data": {
|
| 752 |
+
"text/plain": [
|
| 753 |
+
"268"
|
| 754 |
+
]
|
| 755 |
+
},
|
| 756 |
+
"execution_count": 13,
|
| 757 |
+
"metadata": {},
|
| 758 |
+
"output_type": "execute_result"
|
| 759 |
+
}
|
| 760 |
+
],
|
| 761 |
+
"source": [
|
| 762 |
+
"len(chunks)"
|
| 763 |
+
]
|
| 764 |
+
},
|
| 765 |
+
{
|
| 766 |
+
"cell_type": "code",
|
| 767 |
+
"execution_count": 14,
|
| 768 |
+
"id": "b519f56f",
|
| 769 |
+
"metadata": {},
|
| 770 |
+
"outputs": [
|
| 771 |
+
{
|
| 772 |
+
"name": "stdout",
|
| 773 |
+
"output_type": "stream",
|
| 774 |
+
"text": [
|
| 775 |
+
"crop mapping from the perspective of crop-specific land cover data by evaluating over 60 open-access opera -\n",
|
| 776 |
+
"tional products, archival crop type map datasets, single-crop extent map datasets, cropping pattern datasets, and \n",
|
| 777 |
+
"crop mapping platforms and systems. Using the Cropland Data Layer (CDL) – one of the most widely used \n",
|
| 778 |
+
"products with over 25 years of continuous monitoring of U.S. croplands – as a case study, we also conduct a \n",
|
| 779 |
+
"systematic literature review on the application of crop type maps in remote sensing science. Our analysis syn -\n",
|
| 780 |
+
"thesizes 129 research articles through three core research questions: (1) What EO data are used with CDL; (2) \n",
|
| 781 |
+
"What scientific problems and technologies are explored using CDL; and (3) What role does CDL play in remote \n",
|
| 782 |
+
"sensing applications. Furthermore, we delve into the implications of our vision for new data products and \n",
|
| 783 |
+
"propose emerging research topics, ranging from extending the spatiotemporal coverage of current data products\n"
|
| 784 |
+
]
|
| 785 |
+
}
|
| 786 |
+
],
|
| 787 |
+
"source": [
|
| 788 |
+
"# display 4th chunk\n",
|
| 789 |
+
"print(chunks[3].page_content)"
|
| 790 |
+
]
|
| 791 |
+
},
|
| 792 |
+
{
|
| 793 |
+
"cell_type": "code",
|
| 794 |
+
"execution_count": 15,
|
| 795 |
+
"id": "a62f7966",
|
| 796 |
+
"metadata": {},
|
| 797 |
+
"outputs": [
|
| 798 |
+
{
|
| 799 |
+
"name": "stdout",
|
| 800 |
+
"output_type": "stream",
|
| 801 |
+
"text": [
|
| 802 |
+
"propose emerging research topics, ranging from extending the spatiotemporal coverage of current data products \n",
|
| 803 |
+
"to improving global mapping reliability and developing operational in-season crop mapping systems. This review \n",
|
| 804 |
+
"paper not only serves as a reference for stakeholders seeking to utilize crop-specific land cover data in their work, \n",
|
| 805 |
+
"but also outlines the directions for future geospatial data product development.\n",
|
| 806 |
+
"* Corresponding authors.\n",
|
| 807 |
+
"E-mail addresses: czhang11@gmu.edu (C. Zhang), hkerner@asu.edu (H. Kerner), sherwang@mit.edu (S. Wang), pengyu.hao@fao.org (P. Hao), zhe.li@usda.gov\n",
|
| 808 |
+
"(Z. Li), kevin.a.hunt@usda.gov (K.A. Hunt), jake.abernethy@usda.gov (J. Abernethy), haoteng.zhao@usda.gov (H. Zhao), feng.gao@usda.gov (F. Gao), ldi@gmu. \n",
|
| 809 |
+
"edu (L. Di), zliu23@gmu.edu (Z. Liu), zhengwei.yang@usda.gov (Z. Yang), rick.mueller@usda.gov (R. Mueller), claire.boryan@usda.gov (C. Boryan), qichen@\n"
|
| 810 |
+
]
|
| 811 |
+
}
|
| 812 |
+
],
|
| 813 |
+
"source": [
|
| 814 |
+
"# display 5th chunk\n",
|
| 815 |
+
"print(chunks[4].page_content)"
|
| 816 |
+
]
|
| 817 |
+
},
|
| 818 |
+
{
|
| 819 |
+
"cell_type": "markdown",
|
| 820 |
+
"id": "d13d1af5",
|
| 821 |
+
"metadata": {},
|
| 822 |
+
"source": [
|
| 823 |
+
"### Vector Store Creation.\n",
|
| 824 |
+
"\n",
|
| 825 |
+
"Generate document embeddings and build a FAISS vector store for efficient similarity-based retrieval."
|
| 826 |
+
]
|
| 827 |
+
},
|
| 828 |
+
{
|
| 829 |
+
"cell_type": "code",
|
| 830 |
+
"execution_count": 16,
|
| 831 |
+
"id": "fd8ae71d",
|
| 832 |
+
"metadata": {},
|
| 833 |
+
"outputs": [
|
| 834 |
+
{
|
| 835 |
+
"name": "stderr",
|
| 836 |
+
"output_type": "stream",
|
| 837 |
+
"text": [
|
| 838 |
+
"/home/wills/.local/lib/python3.12/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).\n",
|
| 839 |
+
" from pandas.core import (\n",
|
| 840 |
+
"2025-09-30 00:19:35.389072: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.\n",
|
| 841 |
+
"2025-09-30 00:19:37.212948: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
|
| 842 |
+
"To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
|
| 843 |
+
"/home/wills/.local/lib/python3.12/site-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.\n",
|
| 844 |
+
" warnings.warn(\"Unable to import Axes3D. This may be due to multiple versions of \"\n"
|
| 845 |
+
]
|
| 846 |
+
}
|
| 847 |
+
],
|
| 848 |
+
"source": [
|
| 849 |
+
"embeds = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\")\n",
|
| 850 |
+
"vector_store = FAISS.from_documents(chunks, embeds)"
|
| 851 |
+
]
|
| 852 |
+
},
|
| 853 |
+
{
|
| 854 |
+
"cell_type": "markdown",
|
| 855 |
+
"id": "cb690b06",
|
| 856 |
+
"metadata": {},
|
| 857 |
+
"source": [
|
| 858 |
+
"### Retrieval."
|
| 859 |
+
]
|
| 860 |
+
},
|
| 861 |
+
{
|
| 862 |
+
"cell_type": "code",
|
| 863 |
+
"execution_count": 17,
|
| 864 |
+
"id": "ae03425b",
|
| 865 |
+
"metadata": {},
|
| 866 |
+
"outputs": [],
|
| 867 |
+
"source": [
|
| 868 |
+
"retriever = vector_store.as_retriever(search_type=\"similarity\", search_kwargs={\"k\": 5})"
|
| 869 |
+
]
|
| 870 |
+
},
|
| 871 |
+
{
|
| 872 |
+
"cell_type": "code",
|
| 873 |
+
"execution_count": 18,
|
| 874 |
+
"id": "c691f892",
|
| 875 |
+
"metadata": {},
|
| 876 |
+
"outputs": [
|
| 877 |
+
{
|
| 878 |
+
"data": {
|
| 879 |
+
"text/plain": [
|
| 880 |
+
"VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x76fcba598e00>, search_kwargs={'k': 5})"
|
| 881 |
+
]
|
| 882 |
+
},
|
| 883 |
+
"execution_count": 18,
|
| 884 |
+
"metadata": {},
|
| 885 |
+
"output_type": "execute_result"
|
| 886 |
+
}
|
| 887 |
+
],
|
| 888 |
+
"source": [
|
| 889 |
+
"retriever"
|
| 890 |
+
]
|
| 891 |
+
},
|
| 892 |
+
{
|
| 893 |
+
"cell_type": "code",
|
| 894 |
+
"execution_count": 19,
|
| 895 |
+
"id": "d4345de2",
|
| 896 |
+
"metadata": {},
|
| 897 |
+
"outputs": [
|
| 898 |
+
{
|
| 899 |
+
"data": {
|
| 900 |
+
"text/plain": [
|
| 901 |
+
"[Document(id='50bfaa32-168f-4e5e-94b6-27d2668d4ef5', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='fields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we \\nmanually applied the three exclusion criteria to exclude publications \\nwhere the full term “ CDL ” was not related to “ Cropland Data Layer ” , \\nstudies that did not use remote sensing data, and any review articles. \\nThese exclusion criteria were essential for ensuring the reliability of our \\nselection results and for eliminating any irrelevant literature. The \\nliterature selection process from the CDL citations on the USDA NASS \\nwebsite adheres to the same inclusion and exclusion criteria. The \\neligible documents were combined with the screening results of WoS \\ndatabase, and any duplicate records were removed.\\n3.3. Results\\nThe result of the literature screening process is illustrated in Fig. 4 . \\nApplying the inclusion criteria, we screened 162 and 43 articles from the \\nWoS database and the USDA NASS CDL website, respectively. We then'),\n",
|
| 902 |
+
" Document(id='864cf8ce-bfa8-426b-9bfe-390c8679f13a', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='as “ Remote Sensing ” , “ Earth observation ” , “ Landsat ” , “ Sentinel ” , or \\n“ MODIS ” in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( “ Cropland Data Layer ” OR “ CDL ” ) AND (WC = “ Remote Sensing ” \\nOR ALL = ( “ Remote Sensing ” OR “ Earth observation ” OR “ Landsat ” OR \\n“ Sentinel ” OR “ MODIS ” )) AND DT = “ Article ” , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we'),\n",
|
| 903 |
+
" Document(id='b700e905-8222-4ec0-8131-bee3ac9f51ca', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='the WoS database, including the publication title, abstract, or keywords. \\nIn our survey, we found that many papers introduced, discussed, or cited \\nCDL, but did not directly use the data in their experiments. Therefore, \\nIC1 could ensure that CDL has been applied in the selected publications, \\nrather than simply mentioning it in passing.\\nTo narrow down the publications to those specifically related to \\nremote sensing, IC2 states that the publication ’ s “ Category ” field in the \\nWoS database must be labeled as “ remote sensing ” . However, many \\npublications related to remote sensing were published in computer sci -\\nence, agricultural, or multidisciplinary journals, which were not cate -\\ngorized as “ remote sensing ” . To include these publications in this \\nreview, we added a rule that requires the presence of certain terms, such \\nas “ Remote Sensing ” , “ Earth observation ” , “ Landsat ” , “ Sentinel ” , or \\n“ MODIS ” in any of the title, keywords, or abstract of the publication.'),\n",
|
| 904 |
+
" Document(id='d6273845-9651-4f71-a5b0-3d3b72037a37', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 7, 'page_label': '8'}, page_content='preliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe article’s title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description'),\n",
|
| 905 |
+
" Document(id='b58a38d1-950a-4e9e-b7f7-d1985985c0dd', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 8, 'page_label': '9'}, page_content='Fig. 3. The number of publications indexed by Scopus and Google Scholar (data accessed by January, 2024). The publications are filtered based on combined \\nkeywords “ Cropland Data Layer ” AND “ Remote Sensing ” and the single keyword “ Cropland Data Layer ” .\\nTable 6 \\nResearch questions.\\nID Research Question Objective Description\\nRQ1 What EO data are used with CDL? Identify common and suitable EO data in conjunction with crop type maps in remote sensing field\\nRQ2 What scientific problems and technologies are explored \\nusing CDL?\\nUnderstand the state of the science and main technologies in remote sensing that are applied with crop type \\nmaps\\nRQ3 What role does CDL play in remote sensing applications? Help researchers to recognize the significance of crop type maps and consider how to incorporate into these \\ndata their own work')]"
|
| 906 |
+
]
|
| 907 |
+
},
|
| 908 |
+
"execution_count": 19,
|
| 909 |
+
"metadata": {},
|
| 910 |
+
"output_type": "execute_result"
|
| 911 |
+
}
|
| 912 |
+
],
|
| 913 |
+
"source": [
|
| 914 |
+
"# test retriever \n",
|
| 915 |
+
"retriever.invoke(\"what is the main topic of the document?\")"
|
| 916 |
+
]
|
| 917 |
+
},
|
| 918 |
+
{
|
| 919 |
+
"cell_type": "markdown",
|
| 920 |
+
"id": "45f8260f",
|
| 921 |
+
"metadata": {},
|
| 922 |
+
"source": [
|
| 923 |
+
"## Augmentation."
|
| 924 |
+
]
|
| 925 |
+
},
|
| 926 |
+
{
|
| 927 |
+
"cell_type": "code",
|
| 928 |
+
"execution_count": null,
|
| 929 |
+
"id": "eb4d1435",
|
| 930 |
+
"metadata": {},
|
| 931 |
+
"outputs": [
|
| 932 |
+
{
|
| 933 |
+
"name": "stderr",
|
| 934 |
+
"output_type": "stream",
|
| 935 |
+
"text": [
|
| 936 |
+
"E0000 00:00:1759188738.019868 109561 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.\n"
|
| 937 |
+
]
|
| 938 |
+
}
|
| 939 |
+
],
|
| 940 |
+
"source": [
|
| 941 |
+
"LLM_gen = GoogleGenerativeAI(model=\"models/gemini-1.5-flash\", google_api_key=gemini_api_key)"
|
| 942 |
+
]
|
| 943 |
+
},
|
| 944 |
+
{
|
| 945 |
+
"cell_type": "code",
|
| 946 |
+
"execution_count": null,
|
| 947 |
+
"id": "2b6ebacf",
|
| 948 |
+
"metadata": {},
|
| 949 |
+
"outputs": [],
|
| 950 |
+
"source": [
|
| 951 |
+
"prompt = PromptTemplate(\n",
|
| 952 |
+
" template = \"\"\"\n",
|
| 953 |
+
" You are a helpful assistant.\n",
|
| 954 |
+
" Answer ONLY from the provided transcript context.\n",
|
| 955 |
+
" If the context IS INSUFFICIENT, say you don't know.\n",
|
| 956 |
+
"\n",
|
| 957 |
+
" {context}\n",
|
| 958 |
+
"\n",
|
| 959 |
+
" Question: {question}\n",
|
| 960 |
+
" \"\"\",\n",
|
| 961 |
+
" input_variables=[\"context\",\"question\"]\n",
|
| 962 |
+
")"
|
| 963 |
+
]
|
| 964 |
+
},
|
| 965 |
+
{
|
| 966 |
+
"cell_type": "code",
|
| 967 |
+
"execution_count": 30,
|
| 968 |
+
"id": "17adfd08",
|
| 969 |
+
"metadata": {},
|
| 970 |
+
"outputs": [],
|
| 971 |
+
"source": [
|
| 972 |
+
"question = \"Is the aspect of stars mentioned in this document provided? If yes, explain what was discussed?\"\n",
|
| 973 |
+
"retrieved_documents = retriever.invoke(question)"
|
| 974 |
+
]
|
| 975 |
+
},
|
| 976 |
+
{
|
| 977 |
+
"cell_type": "code",
|
| 978 |
+
"execution_count": 31,
|
| 979 |
+
"id": "0c21e23b",
|
| 980 |
+
"metadata": {},
|
| 981 |
+
"outputs": [
|
| 982 |
+
{
|
| 983 |
+
"data": {
|
| 984 |
+
"text/plain": [
|
| 985 |
+
"[Document(id='b700e905-8222-4ec0-8131-bee3ac9f51ca', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='the WoS database, including the publication title, abstract, or keywords. \\nIn our survey, we found that many papers introduced, discussed, or cited \\nCDL, but did not directly use the data in their experiments. Therefore, \\nIC1 could ensure that CDL has been applied in the selected publications, \\nrather than simply mentioning it in passing.\\nTo narrow down the publications to those specifically related to \\nremote sensing, IC2 states that the publication ’ s “ Category ” field in the \\nWoS database must be labeled as “ remote sensing ” . However, many \\npublications related to remote sensing were published in computer sci -\\nence, agricultural, or multidisciplinary journals, which were not cate -\\ngorized as “ remote sensing ” . To include these publications in this \\nreview, we added a rule that requires the presence of certain terms, such \\nas “ Remote Sensing ” , “ Earth observation ” , “ Landsat ” , “ Sentinel ” , or \\n“ MODIS ” in any of the title, keywords, or abstract of the publication.'),\n",
|
| 986 |
+
" Document(id='864cf8ce-bfa8-426b-9bfe-390c8679f13a', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='as “ Remote Sensing ” , “ Earth observation ” , “ Landsat ” , “ Sentinel ” , or \\n“ MODIS ” in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( “ Cropland Data Layer ” OR “ CDL ” ) AND (WC = “ Remote Sensing ” \\nOR ALL = ( “ Remote Sensing ” OR “ Earth observation ” OR “ Landsat ” OR \\n“ Sentinel ” OR “ MODIS ” )) AND DT = “ Article ” , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we'),\n",
|
| 987 |
+
" Document(id='cf0e5d6a-656a-4073-b1ee-1e7fce2b5952', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 17, 'page_label': '18'}, page_content='early growth stages, with a mean difference of three days and a mean \\nabsolute difference of one week ( Gao et al., 2024 ).\\nWISE has been extended to five Corn Belt states (i.e., Iowa, Illinois, \\nIndiana, Minnesota, and Nebraska) for routine mapping of crop emer -\\ngence using HLS (30 m, 3 – 4 day revisit) data ( Gao et al., 2021 ). As \\nillustrated in Fig. 10 , benefiting from the frequent revisits of HLS, WISE \\ndetected the majority of fields across CONUS and provided detailed \\nspatial variability within each field. Recent high temporal and spatial \\nresolution satellite datasets (e.g., HLS, PlanetScope) are making it \\nfeasible for mapping within-season crop emergence over the CONUS \\n( Gao et al., 2024 ) and have great potential for integration with in-season \\ncrop mapping data products and operational crop monitoring systems \\n( Zhang et al., 2022b ; Zhang et al., 2023a ).\\n4.5. Advancing national-scale crop-specific field boundary mapping'),\n",
|
| 988 |
+
" Document(id='c4a9043a-fe10-459d-b714-4b1083160bac', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 23, 'page_label': '24'}, page_content='jag.2023.103390 .\\nESA, 2024. Webinar: WorldCereal Phase II [WWW Document]. https://esa-worldcereal. \\norg/en/events/webinar-worldcereal-phase-ii-32 .\\nFalkowski, M.J., Manning, J.A., 2010. Parcel-based classification of agricultural crops via \\nmultitemporal Landsat imagery for monitoring habitat availability of western \\nburrowing owls in the Imperial Valley agro-ecosystem. Can. J. Remote. Sens. 36, \\n750 – 762. https://doi.org/10.5589/m11-011 .\\nFAOSTAT, 2024. Definitions and standards used in FAOSTAT [WWW Document]. \\nhttps://www.fao.org/faostat/en/#definitions .\\nFarmonov, N., Amankulova, K., Khan, S.N., Abdurakhimova, M., Szatm ´ari, J., \\nKhabiba, T., Makhliyo, R., Khodicha, M., Mucsi, L., 2023. Effectiveness of machine \\nlearning and deep learning models at county-level soybean yield forecasting. \\nHungarian Geogr. Bull. 72, 383 – 398. https://doi.org/10.15201/hungeobull.72.4.4 .\\nFisette, T., Rollin, P., Aly, Z., Campbell, L., Daneshfar, B., Filyer, P., Smith, A.,'),\n",
|
| 989 |
+
" Document(id='d6273845-9651-4f71-a5b0-3d3b72037a37', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 7, 'page_label': '8'}, page_content='preliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe article’s title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description')]"
|
| 990 |
+
]
|
| 991 |
+
},
|
| 992 |
+
"execution_count": 31,
|
| 993 |
+
"metadata": {},
|
| 994 |
+
"output_type": "execute_result"
|
| 995 |
+
}
|
| 996 |
+
],
|
| 997 |
+
"source": [
|
| 998 |
+
"retrieved_documents"
|
| 999 |
+
]
|
| 1000 |
+
},
|
| 1001 |
+
{
|
| 1002 |
+
"cell_type": "code",
|
| 1003 |
+
"execution_count": 47,
|
| 1004 |
+
"id": "7993ff0d",
|
| 1005 |
+
"metadata": {},
|
| 1006 |
+
"outputs": [],
|
| 1007 |
+
"source": [
|
| 1008 |
+
"content_texts = \"\\n\\n\".join(document.page_content for document in retrieved_documents)"
|
| 1009 |
+
]
|
| 1010 |
+
},
|
| 1011 |
+
{
|
| 1012 |
+
"cell_type": "code",
|
| 1013 |
+
"execution_count": 48,
|
| 1014 |
+
"id": "d923bfda",
|
| 1015 |
+
"metadata": {},
|
| 1016 |
+
"outputs": [
|
| 1017 |
+
{
|
| 1018 |
+
"data": {
|
| 1019 |
+
"text/plain": [
|
| 1020 |
+
"'the WoS database, including the publication title, abstract, or keywords. \\nIn our survey, we found that many papers introduced, discussed, or cited \\nCDL, but did not directly use the data in their experiments. Therefore, \\nIC1 could ensure that CDL has been applied in the selected publications, \\nrather than simply mentioning it in passing.\\nTo narrow down the publications to those specifically related to \\nremote sensing, IC2 states that the publication ’ s “ Category ” field in the \\nWoS database must be labeled as “ remote sensing ” . However, many \\npublications related to remote sensing were published in computer sci -\\nence, agricultural, or multidisciplinary journals, which were not cate -\\ngorized as “ remote sensing ” . To include these publications in this \\nreview, we added a rule that requires the presence of certain terms, such \\nas “ Remote Sensing ” , “ Earth observation ” , “ Landsat ” , “ Sentinel ” , or \\n“ MODIS ” in any of the title, keywords, or abstract of the publication.\\n\\nas “ Remote Sensing ” , “ Earth observation ” , “ Landsat ” , “ Sentinel ” , or \\n“ MODIS ” in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( “ Cropland Data Layer ” OR “ CDL ” ) AND (WC = “ Remote Sensing ” \\nOR ALL = ( “ Remote Sensing ” OR “ Earth observation ” OR “ Landsat ” OR \\n“ Sentinel ” OR “ MODIS ” )) AND DT = “ Article ” , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we\\n\\nearly growth stages, with a mean difference of three days and a mean \\nabsolute difference of one week ( Gao et al., 2024 ).\\nWISE has been extended to five Corn Belt states (i.e., Iowa, Illinois, \\nIndiana, Minnesota, and Nebraska) for routine mapping of crop emer -\\ngence using HLS (30 m, 3 – 4 day revisit) data ( Gao et al., 2021 ). As \\nillustrated in Fig. 10 , benefiting from the frequent revisits of HLS, WISE \\ndetected the majority of fields across CONUS and provided detailed \\nspatial variability within each field. Recent high temporal and spatial \\nresolution satellite datasets (e.g., HLS, PlanetScope) are making it \\nfeasible for mapping within-season crop emergence over the CONUS \\n( Gao et al., 2024 ) and have great potential for integration with in-season \\ncrop mapping data products and operational crop monitoring systems \\n( Zhang et al., 2022b ; Zhang et al., 2023a ).\\n4.5. Advancing national-scale crop-specific field boundary mapping\\n\\njag.2023.103390 .\\nESA, 2024. Webinar: WorldCereal Phase II [WWW Document]. https://esa-worldcereal. \\norg/en/events/webinar-worldcereal-phase-ii-32 .\\nFalkowski, M.J., Manning, J.A., 2010. Parcel-based classification of agricultural crops via \\nmultitemporal Landsat imagery for monitoring habitat availability of western \\nburrowing owls in the Imperial Valley agro-ecosystem. Can. J. Remote. Sens. 36, \\n750 – 762. https://doi.org/10.5589/m11-011 .\\nFAOSTAT, 2024. Definitions and standards used in FAOSTAT [WWW Document]. \\nhttps://www.fao.org/faostat/en/#definitions .\\nFarmonov, N., Amankulova, K., Khan, S.N., Abdurakhimova, M., Szatm ´ari, J., \\nKhabiba, T., Makhliyo, R., Khodicha, M., Mucsi, L., 2023. Effectiveness of machine \\nlearning and deep learning models at county-level soybean yield forecasting. \\nHungarian Geogr. Bull. 72, 383 – 398. https://doi.org/10.15201/hungeobull.72.4.4 .\\nFisette, T., Rollin, P., Aly, Z., Campbell, L., Daneshfar, B., Filyer, P., Smith, A.,\\n\\npreliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe article’s title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description'"
|
| 1021 |
+
]
|
| 1022 |
+
},
|
| 1023 |
+
"execution_count": 48,
|
| 1024 |
+
"metadata": {},
|
| 1025 |
+
"output_type": "execute_result"
|
| 1026 |
+
}
|
| 1027 |
+
],
|
| 1028 |
+
"source": [
|
| 1029 |
+
"content_texts"
|
| 1030 |
+
]
|
| 1031 |
+
},
|
| 1032 |
+
{
|
| 1033 |
+
"cell_type": "code",
|
| 1034 |
+
"execution_count": 49,
|
| 1035 |
+
"id": "32fe4db1",
|
| 1036 |
+
"metadata": {},
|
| 1037 |
+
"outputs": [],
|
| 1038 |
+
"source": [
|
| 1039 |
+
"final_prompt = prompt.invoke({\"context\":content_texts,\"question\":question})"
|
| 1040 |
+
]
|
| 1041 |
+
},
|
| 1042 |
+
{
|
| 1043 |
+
"cell_type": "code",
|
| 1044 |
+
"execution_count": 50,
|
| 1045 |
+
"id": "49c857d3",
|
| 1046 |
+
"metadata": {},
|
| 1047 |
+
"outputs": [
|
| 1048 |
+
{
|
| 1049 |
+
"data": {
|
| 1050 |
+
"text/plain": [
|
| 1051 |
+
"StringPromptValue(text=\"\\n You are a helpful assistant.\\n Answer ONLY from the provided transcript context.\\n If the context IS INSUFFICIENT, just say you don't know and probably need more information.\\n\\n the WoS database, including the publication title, abstract, or keywords. \\nIn our survey, we found that many papers introduced, discussed, or cited \\nCDL, but did not directly use the data in their experiments. Therefore, \\nIC1 could ensure that CDL has been applied in the selected publications, \\nrather than simply mentioning it in passing.\\nTo narrow down the publications to those specifically related to \\nremote sensing, IC2 states that the publication ’ s “ Category ” field in the \\nWoS database must be labeled as “ remote sensing ” . However, many \\npublications related to remote sensing were published in computer sci -\\nence, agricultural, or multidisciplinary journals, which were not cate -\\ngorized as “ remote sensing ” . To include these publications in this \\nreview, we added a rule that requires the presence of certain terms, such \\nas “ Remote Sensing ” , “ Earth observation ” , “ Landsat ” , “ Sentinel ” , or \\n“ MODIS ” in any of the title, keywords, or abstract of the publication.\\n\\nas “ Remote Sensing ” , “ Earth observation ” , “ Landsat ” , “ Sentinel ” , or \\n“ MODIS ” in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( “ Cropland Data Layer ” OR “ CDL ” ) AND (WC = “ Remote Sensing ” \\nOR ALL = ( “ Remote Sensing ” OR “ Earth observation ” OR “ Landsat ” OR \\n“ Sentinel ” OR “ MODIS ” )) AND DT = “ Article ” , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we\\n\\nearly growth stages, with a mean difference of three days and a mean \\nabsolute difference of one week ( Gao et al., 2024 ).\\nWISE has been extended to five Corn Belt states (i.e., Iowa, Illinois, \\nIndiana, Minnesota, and Nebraska) for routine mapping of crop emer -\\ngence using HLS (30 m, 3 – 4 day revisit) data ( Gao et al., 2021 ). As \\nillustrated in Fig. 10 , benefiting from the frequent revisits of HLS, WISE \\ndetected the majority of fields across CONUS and provided detailed \\nspatial variability within each field. Recent high temporal and spatial \\nresolution satellite datasets (e.g., HLS, PlanetScope) are making it \\nfeasible for mapping within-season crop emergence over the CONUS \\n( Gao et al., 2024 ) and have great potential for integration with in-season \\ncrop mapping data products and operational crop monitoring systems \\n( Zhang et al., 2022b ; Zhang et al., 2023a ).\\n4.5. Advancing national-scale crop-specific field boundary mapping\\n\\njag.2023.103390 .\\nESA, 2024. Webinar: WorldCereal Phase II [WWW Document]. https://esa-worldcereal. \\norg/en/events/webinar-worldcereal-phase-ii-32 .\\nFalkowski, M.J., Manning, J.A., 2010. Parcel-based classification of agricultural crops via \\nmultitemporal Landsat imagery for monitoring habitat availability of western \\nburrowing owls in the Imperial Valley agro-ecosystem. Can. J. Remote. Sens. 36, \\n750 – 762. https://doi.org/10.5589/m11-011 .\\nFAOSTAT, 2024. Definitions and standards used in FAOSTAT [WWW Document]. \\nhttps://www.fao.org/faostat/en/#definitions .\\nFarmonov, N., Amankulova, K., Khan, S.N., Abdurakhimova, M., Szatm ´ari, J., \\nKhabiba, T., Makhliyo, R., Khodicha, M., Mucsi, L., 2023. Effectiveness of machine \\nlearning and deep learning models at county-level soybean yield forecasting. \\nHungarian Geogr. Bull. 72, 383 – 398. https://doi.org/10.15201/hungeobull.72.4.4 .\\nFisette, T., Rollin, P., Aly, Z., Campbell, L., Daneshfar, B., Filyer, P., Smith, A.,\\n\\npreliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe article’s title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description\\n\\n Question: Is the aspect of stars mentioned in this document provided? If yes, explain what was discussed?\\n \")"
|
| 1052 |
+
]
|
| 1053 |
+
},
|
| 1054 |
+
"execution_count": 50,
|
| 1055 |
+
"metadata": {},
|
| 1056 |
+
"output_type": "execute_result"
|
| 1057 |
+
}
|
| 1058 |
+
],
|
| 1059 |
+
"source": [
|
| 1060 |
+
"final_prompt"
|
| 1061 |
+
]
|
| 1062 |
+
},
|
| 1063 |
+
{
|
| 1064 |
+
"cell_type": "markdown",
|
| 1065 |
+
"id": "f838d0a6",
|
| 1066 |
+
"metadata": {},
|
| 1067 |
+
"source": [
|
| 1068 |
+
"## Answer Generation."
|
| 1069 |
+
]
|
| 1070 |
+
},
|
| 1071 |
+
{
|
| 1072 |
+
"cell_type": "code",
|
| 1073 |
+
"execution_count": 54,
|
| 1074 |
+
"id": "d8730f3c",
|
| 1075 |
+
"metadata": {},
|
| 1076 |
+
"outputs": [],
|
| 1077 |
+
"source": [
|
| 1078 |
+
"response = LLM.invoke(final_prompt)"
|
| 1079 |
+
]
|
| 1080 |
+
},
|
| 1081 |
+
{
|
| 1082 |
+
"cell_type": "code",
|
| 1083 |
+
"execution_count": 55,
|
| 1084 |
+
"id": "a6989473",
|
| 1085 |
+
"metadata": {},
|
| 1086 |
+
"outputs": [
|
| 1087 |
+
{
|
| 1088 |
+
"data": {
|
| 1089 |
+
"text/plain": [
|
| 1090 |
+
"\"I don't know and probably need more information, as the provided transcript does not mention the aspect of stars.\""
|
| 1091 |
+
]
|
| 1092 |
+
},
|
| 1093 |
+
"execution_count": 55,
|
| 1094 |
+
"metadata": {},
|
| 1095 |
+
"output_type": "execute_result"
|
| 1096 |
+
}
|
| 1097 |
+
],
|
| 1098 |
+
"source": [
|
| 1099 |
+
"response.content"
|
| 1100 |
+
]
|
| 1101 |
+
},
|
| 1102 |
+
{
|
| 1103 |
+
"cell_type": "markdown",
|
| 1104 |
+
"id": "5299ae94",
|
| 1105 |
+
"metadata": {},
|
| 1106 |
+
"source": [
|
| 1107 |
+
"## Build chain."
|
| 1108 |
+
]
|
| 1109 |
+
},
|
| 1110 |
+
{
|
| 1111 |
+
"cell_type": "code",
|
| 1112 |
+
"execution_count": 56,
|
| 1113 |
+
"id": "cf902ef7",
|
| 1114 |
+
"metadata": {},
|
| 1115 |
+
"outputs": [],
|
| 1116 |
+
"source": [
|
| 1117 |
+
"# import libraries for chain building\n",
|
| 1118 |
+
"from langchain_core.runnables import RunnableParallel,RunnablePassthrough,RunnableLambda\n",
|
| 1119 |
+
"from langchain_core.output_parsers import StrOutputParser"
|
| 1120 |
+
]
|
| 1121 |
+
},
|
| 1122 |
+
{
|
| 1123 |
+
"cell_type": "code",
|
| 1124 |
+
"execution_count": 57,
|
| 1125 |
+
"metadata": {},
|
| 1126 |
+
"outputs": [],
|
| 1127 |
+
"source": [
|
| 1128 |
+
"def reformat_doc(retrieved_documents):\n",
|
| 1129 |
+
" content_texts = \"\\n\\n\".join(document.page_content for document in retrieved_documents)\n",
|
| 1130 |
+
" return content_texts"
|
| 1131 |
+
]
|
| 1132 |
+
},
|
| 1133 |
+
{
|
| 1134 |
+
"cell_type": "code",
|
| 1135 |
+
"execution_count": 58,
|
| 1136 |
+
"id": "834931d5",
|
| 1137 |
+
"metadata": {},
|
| 1138 |
+
"outputs": [],
|
| 1139 |
+
"source": [
|
| 1140 |
+
"parallel_chain = RunnableParallel({\n",
|
| 1141 |
+
" \"context\": retriever | RunnableLambda(reformat_doc),\n",
|
| 1142 |
+
" \"question\": RunnablePassthrough()\n",
|
| 1143 |
+
"}\n",
|
| 1144 |
+
")"
|
| 1145 |
+
]
|
| 1146 |
+
},
|
| 1147 |
+
{
|
| 1148 |
+
"cell_type": "code",
|
| 1149 |
+
"execution_count": 59,
|
| 1150 |
+
"id": "91125159",
|
| 1151 |
+
"metadata": {},
|
| 1152 |
+
"outputs": [
|
| 1153 |
+
{
|
| 1154 |
+
"data": {
|
| 1155 |
+
"text/plain": [
|
| 1156 |
+
"{'context': 'as “ Remote Sensing ” , “ Earth observation ” , “ Landsat ” , “ Sentinel ” , or \\n“ MODIS ” in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( “ Cropland Data Layer ” OR “ CDL ” ) AND (WC = “ Remote Sensing ” \\nOR ALL = ( “ Remote Sensing ” OR “ Earth observation ” OR “ Landsat ” OR \\n“ Sentinel ” OR “ MODIS ” )) AND DT = “ Article ” , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we\\n\\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we \\nmanually applied the three exclusion criteria to exclude publications \\nwhere the full term “ CDL ” was not related to “ Cropland Data Layer ” , \\nstudies that did not use remote sensing data, and any review articles. \\nThese exclusion criteria were essential for ensuring the reliability of our \\nselection results and for eliminating any irrelevant literature. The \\nliterature selection process from the CDL citations on the USDA NASS \\nwebsite adheres to the same inclusion and exclusion criteria. The \\neligible documents were combined with the screening results of WoS \\ndatabase, and any duplicate records were removed.\\n3.3. Results\\nThe result of the literature screening process is illustrated in Fig. 4 . \\nApplying the inclusion criteria, we screened 162 and 43 articles from the \\nWoS database and the USDA NASS CDL website, respectively. We then\\n\\npreliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe article’s title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description\\n\\ncompag.2022.106866 .\\nDanielson, P., Yang, L., Jin, S., Homer, C., Napton, D., 2016. An assessment of the \\ncultivated cropland class of NLCD 2006 using a multi-source and multi-criteria \\napproach. Remote Sens 8, 101. https://doi.org/10.3390/rs8020101 .\\nDefourny, P., Bontemps, S., Bellemans, N., Cara, C., Dedieu, G., Guzzonato, E., \\nHagolle, O., Inglada, J., Nicola, L., Rabaute, T., Savinaud, M., Udroiu, C., Valero, S., \\nB ´egu ´e, A., Dejoux, J.-F., El Harti, A., Ezzahar, J., Kussul, N., Labbassi, K., \\nLebourgeois, V., Miao, Z., Newby, T., Nyamugama, A., Salh, N., Shelestov, A., \\nSimonneaux, V., Traore, P.S., Traore, S.S., Koetz, B., 2019. Near real-time agriculture \\nmonitoring at national scale at parcel resolution: performance assessment of the \\nSen2-Agri automated system in various cropping systems around the world. Remote \\nSens. Environ. 221, 551 – 568. https://doi.org/10.1016/j.rse.2018.11.007 .\\n\\nCRediT authorship contribution statement\\nChen Zhang: Writing – original draft, Project administration, \\nMethodology, Conceptualization. Hannah Kerner: Writing – original \\ndraft. Sherrie Wang: Writing – original draft. Pengyu Hao: Writing – \\noriginal draft. Zhe Li: Writing – original draft. Kevin A. Hunt: Writing – \\noriginal draft. Jonathon Abernethy: Writing – original draft. Haoteng \\nZhao: Writing – original draft. Feng Gao: Writing – original draft. \\nLiping Di: Writing – review & editing, Supervision, Funding acquisition. \\nClaire Guo: Writing – review & editing, Validation, Investigation. Ziao \\nLiu: Writing – review & editing, Investigation. Zhengwei Yang: Writing \\n– review & editing, Resources. Rick Mueller: Writing – review & edit -\\ning, Resources. Claire Boryan: Writing – review & editing, Resources. \\nQi Chen: Writing – review & editing, Resources. Peter C. Beeson: \\nWriting – review & editing, Resources. Hankui K. Zhang: Writing –',\n",
|
| 1157 |
+
" 'question': 'Quickly and briefly summarize the document'}"
|
| 1158 |
+
]
|
| 1159 |
+
},
|
| 1160 |
+
"execution_count": 59,
|
| 1161 |
+
"metadata": {},
|
| 1162 |
+
"output_type": "execute_result"
|
| 1163 |
+
}
|
| 1164 |
+
],
|
| 1165 |
+
"source": [
|
| 1166 |
+
"parallel_chain.invoke('Quickly and briefly summarize the document')"
|
| 1167 |
+
]
|
| 1168 |
+
},
|
| 1169 |
+
{
|
| 1170 |
+
"cell_type": "code",
|
| 1171 |
+
"execution_count": 60,
|
| 1172 |
+
"id": "9fa7a8aa",
|
| 1173 |
+
"metadata": {},
|
| 1174 |
+
"outputs": [],
|
| 1175 |
+
"source": [
|
| 1176 |
+
"parse = StrOutputParser()"
|
| 1177 |
+
]
|
| 1178 |
+
},
|
| 1179 |
+
{
|
| 1180 |
+
"cell_type": "code",
|
| 1181 |
+
"execution_count": 61,
|
| 1182 |
+
"id": "f16763a7",
|
| 1183 |
+
"metadata": {},
|
| 1184 |
+
"outputs": [],
|
| 1185 |
+
"source": [
|
| 1186 |
+
"main_chain = parallel_chain | prompt | LLM | parse"
|
| 1187 |
+
]
|
| 1188 |
+
},
|
| 1189 |
+
{
|
| 1190 |
+
"cell_type": "code",
|
| 1191 |
+
"execution_count": 62,
|
| 1192 |
+
"id": "8a92eb28",
|
| 1193 |
+
"metadata": {},
|
| 1194 |
+
"outputs": [
|
| 1195 |
+
{
|
| 1196 |
+
"name": "stdout",
|
| 1197 |
+
"output_type": "stream",
|
| 1198 |
+
"text": [
|
| 1199 |
+
"This document outlines a methodology for selecting relevant literature on \"Cropland Data Layer\" (CDL) within the remote sensing field. It details specific inclusion and exclusion criteria, keywords used for searching (\"Remote Sensing\", \"Earth observation\", \"Landsat\", \"Sentinel\", \"MODIS\", \"Cropland Data Layer\", \"CDL\"), and the databases utilized (Web of Science Core Collection and USDA NASS CDL website). The process involved screening, manually applying exclusion criteria, and removing duplicate records, ultimately identifying a specific number of articles from each source.\n"
|
| 1200 |
+
]
|
| 1201 |
+
}
|
| 1202 |
+
],
|
| 1203 |
+
"source": [
|
| 1204 |
+
"print(main_chain.invoke(\"Quickly and briefly summarize the document\"))"
|
| 1205 |
+
]
|
| 1206 |
+
},
|
| 1207 |
+
{
|
| 1208 |
+
"cell_type": "code",
|
| 1209 |
+
"execution_count": 63,
|
| 1210 |
+
"id": "7133b4da",
|
| 1211 |
+
"metadata": {},
|
| 1212 |
+
"outputs": [
|
| 1213 |
+
{
|
| 1214 |
+
"name": "stdout",
|
| 1215 |
+
"output_type": "stream",
|
| 1216 |
+
"text": [
|
| 1217 |
+
"* The document lists numerous authors and their specific contributions to the work, including writing, project administration, methodology, supervision, funding acquisition, validation, and providing resources.\n",
|
| 1218 |
+
"* It describes the methodology for screening qualified publications related to the Cropland Data Layer (CDL) in the remote sensing field.\n",
|
| 1219 |
+
"* The literature screening used the Web of Science (WoS) Core Collection and the USDA NASS website.\n",
|
| 1220 |
+
"* Inclusion criteria required specific keywords related to \"Cropland Data Layer\" or \"CDL\" and remote sensing terms (e.g., \"Remote Sensing\", \"Landsat\", \"MODIS\") in the title, abstract, or keywords, focusing on peer-reviewed articles.\n",
|
| 1221 |
+
"* Exclusion criteria were applied to remove irrelevant publications, such as those where \"CDL\" was not related to \"Cropland Data Layer,\" studies not using remote sensing data, or review articles.\n",
|
| 1222 |
+
"* The initial screening process yielded 162 articles from the WoS database and 43 from the USDA NASS CDL website.\n"
|
| 1223 |
+
]
|
| 1224 |
+
}
|
| 1225 |
+
],
|
| 1226 |
+
"source": [
|
| 1227 |
+
"print(main_chain.invoke(\"Quickly and briefly summarize the document. Put them in bullet format now.\"))"
|
| 1228 |
+
]
|
| 1229 |
+
}
|
| 1230 |
+
],
|
| 1231 |
+
"metadata": {
|
| 1232 |
+
"kernelspec": {
|
| 1233 |
+
"display_name": "Python 3",
|
| 1234 |
+
"language": "python",
|
| 1235 |
+
"name": "python3"
|
| 1236 |
+
},
|
| 1237 |
+
"language_info": {
|
| 1238 |
+
"codemirror_mode": {
|
| 1239 |
+
"name": "ipython",
|
| 1240 |
+
"version": 3
|
| 1241 |
+
},
|
| 1242 |
+
"file_extension": ".py",
|
| 1243 |
+
"mimetype": "text/x-python",
|
| 1244 |
+
"name": "python",
|
| 1245 |
+
"nbconvert_exporter": "python",
|
| 1246 |
+
"pygments_lexer": "ipython3",
|
| 1247 |
+
"version": "3.12.3"
|
| 1248 |
+
}
|
| 1249 |
+
},
|
| 1250 |
+
"nbformat": 4,
|
| 1251 |
+
"nbformat_minor": 5
|
| 1252 |
+
}
|
requirements.txt
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Flask
|
| 2 |
+
gunicorn
|
| 3 |
+
langchain
|
| 4 |
+
langchain-core
|
| 5 |
+
langchain-community
|
| 6 |
+
langchain-huggingface
|
| 7 |
+
langchain-google-genai
|
| 8 |
+
langchain-text-splitters
|
| 9 |
+
sentence-transformers
|
| 10 |
+
transformers
|
| 11 |
+
huggingface-hub
|
| 12 |
+
torch
|
| 13 |
+
faiss-cpu
|
| 14 |
+
pypdf
|
| 15 |
+
python-dotenv
|
static/scripts.js
ADDED
|
@@ -0,0 +1,285 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
// Script for handling file uploads and chat interactions.
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
// DOM elements
|
| 5 |
+
const dropZone = document.getElementById('drop-zone');
|
| 6 |
+
const fileInput = document.getElementById('file-input');
|
| 7 |
+
const uploadContent = document.getElementById('upload-content');
|
| 8 |
+
const uploadIcon = document.getElementById('upload-icon');
|
| 9 |
+
const chatMessages = document.getElementById('chat-messages');
|
| 10 |
+
const chatInput = document.getElementById('chat-input');
|
| 11 |
+
const sendBtn = document.getElementById('send-btn');
|
| 12 |
+
const chatStatus = document.getElementById('chat-status');
|
| 13 |
+
const apiKeyInput = document.getElementById('api-key-input');
|
| 14 |
+
const saveApiKeyBtn = document.getElementById('save-api-key-btn');
|
| 15 |
+
const helpBtn = document.getElementById('help-btn');
|
| 16 |
+
const helpModal = document.getElementById('help-modal');
|
| 17 |
+
const closeHelpBtn = document.getElementById('close-help-btn');
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
// state variables
|
| 21 |
+
let uploadedFileName = null;
|
| 22 |
+
let messages = [];
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
// notification popups section
|
| 26 |
+
function showToast(message, type = 'success') {
|
| 27 |
+
const toast = document.createElement('div');
|
| 28 |
+
toast.className = `toast ${type}`;
|
| 29 |
+
toast.innerHTML = `
|
| 30 |
+
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 31 |
+
${type === 'success'
|
| 32 |
+
? '<polyline points="20 6 9 17 4 12"/>'
|
| 33 |
+
: '<circle cx="12" cy="12" r="10"/><line x1="12" y1="8" x2="12" y2="12"/><line x1="12" y1="16" x2="12.01" y2="16"/>'}
|
| 34 |
+
</svg>
|
| 35 |
+
<span>${message}</span>
|
| 36 |
+
`;
|
| 37 |
+
document.body.appendChild(toast);
|
| 38 |
+
setTimeout(() => toast.remove(), 3000);
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
// apiKey Handling section
|
| 43 |
+
function saveApiKey() {
|
| 44 |
+
const key = apiKeyInput.value.trim();
|
| 45 |
+
if (!key) {
|
| 46 |
+
showToast('Please enter a valid Gemini API key', 'error');
|
| 47 |
+
return;
|
| 48 |
+
}
|
| 49 |
+
localStorage.setItem('gemini_api_key', key);
|
| 50 |
+
showToast('Gemini API key saved successfully!');
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
if (saveApiKeyBtn) saveApiKeyBtn.addEventListener('click', saveApiKey);
|
| 54 |
+
|
| 55 |
+
window.addEventListener('DOMContentLoaded', () => {
|
| 56 |
+
const savedKey = localStorage.getItem('gemini_api_key');
|
| 57 |
+
if (savedKey && apiKeyInput) {
|
| 58 |
+
apiKeyInput.value = savedKey;
|
| 59 |
+
}
|
| 60 |
+
});
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
// api key help handling
|
| 64 |
+
if (helpBtn && helpModal && closeHelpBtn) {
|
| 65 |
+
helpBtn.addEventListener('click', () => {
|
| 66 |
+
helpModal.style.display = 'flex';
|
| 67 |
+
});
|
| 68 |
+
|
| 69 |
+
closeHelpBtn.addEventListener('click', () => {
|
| 70 |
+
helpModal.style.display = 'none';
|
| 71 |
+
});
|
| 72 |
+
|
| 73 |
+
helpModal.addEventListener('click', (e) => {
|
| 74 |
+
if (e.target === helpModal) helpModal.style.display = 'none';
|
| 75 |
+
});
|
| 76 |
+
}
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
// File Upload Handling section
|
| 81 |
+
dropZone.addEventListener('dragover', (e) => {
|
| 82 |
+
e.preventDefault();
|
| 83 |
+
dropZone.classList.add('drag-over');
|
| 84 |
+
});
|
| 85 |
+
|
| 86 |
+
dropZone.addEventListener('dragleave', () => {
|
| 87 |
+
dropZone.classList.remove('drag-over');
|
| 88 |
+
});
|
| 89 |
+
|
| 90 |
+
dropZone.addEventListener('drop', (e) => {
|
| 91 |
+
e.preventDefault();
|
| 92 |
+
dropZone.classList.remove('drag-over');
|
| 93 |
+
if (e.dataTransfer.files.length > 0) {
|
| 94 |
+
handleFile(e.dataTransfer.files[0]);
|
| 95 |
+
}
|
| 96 |
+
});
|
| 97 |
+
|
| 98 |
+
fileInput.addEventListener('change', (e) => {
|
| 99 |
+
if (e.target.files.length > 0) {
|
| 100 |
+
handleFile(e.target.files[0]);
|
| 101 |
+
}
|
| 102 |
+
});
|
| 103 |
+
|
| 104 |
+
async function handleFile(file) {
|
| 105 |
+
const apiKey = localStorage.getItem('gemini_api_key');
|
| 106 |
+
if (!apiKey) {
|
| 107 |
+
showToast('Please enter and save your Gemini API key before uploading.', 'error');
|
| 108 |
+
return;
|
| 109 |
+
}
|
| 110 |
+
|
| 111 |
+
const validTypes = ['application/pdf', 'text/plain'];
|
| 112 |
+
if (!validTypes.includes(file.type)) {
|
| 113 |
+
showToast('Please upload a PDF or TXT file', 'error');
|
| 114 |
+
return;
|
| 115 |
+
}
|
| 116 |
+
|
| 117 |
+
uploadContent.innerHTML = '<div class="spinner"></div><p style="font-weight: 600;">Please hold, your document is being processed...</p>';
|
| 118 |
+
|
| 119 |
+
const formData = new FormData();
|
| 120 |
+
formData.append('file', file);
|
| 121 |
+
formData.append('apiKey', apiKey);
|
| 122 |
+
|
| 123 |
+
try {
|
| 124 |
+
const response = await fetch(`/upload`, {
|
| 125 |
+
method: 'POST',
|
| 126 |
+
body: formData
|
| 127 |
+
});
|
| 128 |
+
|
| 129 |
+
const result = await response.text();
|
| 130 |
+
|
| 131 |
+
if (!response.ok) throw new Error(result || 'Upload failed');
|
| 132 |
+
|
| 133 |
+
uploadedFileName = file.name;
|
| 134 |
+
dropZone.classList.add('uploaded');
|
| 135 |
+
uploadIcon.innerHTML = '<circle cx="12" cy="12" r="10"/><path d="M12 6v6m0 0v6m0-6h6m-6 0H6"/>';
|
| 136 |
+
uploadIcon.style.color = 'var(--accent)';
|
| 137 |
+
uploadContent.innerHTML = `
|
| 138 |
+
<svg xmlns="http://www.w3.org/2000/svg" width="48" height="48" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" style="color: var(--accent); margin-bottom: 1rem;">
|
| 139 |
+
<polyline points="20 6 9 17 4 12"/>
|
| 140 |
+
</svg>
|
| 141 |
+
<p style="font-size: 1.125rem; font-weight: 600; color: var(--accent); margin-bottom: 0.5rem;">Document Uploaded</p>
|
| 142 |
+
<p style="font-size: 0.875rem; color: var(--muted-foreground); display: flex; align-items: center; justify-content: center; gap: 0.5rem;">
|
| 143 |
+
${file.name}
|
| 144 |
+
</p>
|
| 145 |
+
<button class="btn btn-ghost" style="margin-top: 1rem;" onclick="resetUpload()">Upload another file</button>
|
| 146 |
+
`;
|
| 147 |
+
|
| 148 |
+
chatInput.disabled = false;
|
| 149 |
+
sendBtn.disabled = false;
|
| 150 |
+
chatStatus.textContent = `Chatting with: ${file.name}`;
|
| 151 |
+
chatMessages.innerHTML = '';
|
| 152 |
+
messages = [];
|
| 153 |
+
|
| 154 |
+
showToast('Document processed successfully!');
|
| 155 |
+
} catch (error) {
|
| 156 |
+
console.error('Upload error:', error);
|
| 157 |
+
showToast(error.message, 'error');
|
| 158 |
+
resetUpload();
|
| 159 |
+
}
|
| 160 |
+
}
|
| 161 |
+
|
| 162 |
+
function resetUpload() {
|
| 163 |
+
uploadedFileName = null;
|
| 164 |
+
dropZone.classList.remove('uploaded');
|
| 165 |
+
uploadIcon.innerHTML = '<path d="M14 2H6a2 2 0 00-2 2v16a2 2 0 002 2h12a2 2 0 002-2V8z"/><polyline points="14 2 14 8 20 8"/>';
|
| 166 |
+
uploadIcon.style.color = '';
|
| 167 |
+
uploadContent.innerHTML = `
|
| 168 |
+
<h3>Drop your file here</h3>
|
| 169 |
+
<p>or</p>
|
| 170 |
+
<label for="file-input">
|
| 171 |
+
<button class="btn btn-gradient" onclick="document.getElementById('file-input').click(); return false;">
|
| 172 |
+
Choose File
|
| 173 |
+
</button>
|
| 174 |
+
</label>
|
| 175 |
+
<p style="margin-top: 1rem; font-size: 0.75rem;">Supports PDF and TXT files</p>
|
| 176 |
+
`;
|
| 177 |
+
chatInput.disabled = true;
|
| 178 |
+
sendBtn.disabled = true;
|
| 179 |
+
chatStatus.textContent = 'No document uploaded';
|
| 180 |
+
chatMessages.innerHTML = `
|
| 181 |
+
<div class="chat-empty">
|
| 182 |
+
<h3>Upload a Document First</h3><br>
|
| 183 |
+
<p>Please upload a PDF or TXT file to start asking questions.</p>
|
| 184 |
+
</div>
|
| 185 |
+
`;
|
| 186 |
+
messages = [];
|
| 187 |
+
fileInput.value = '';
|
| 188 |
+
showToast('Ready for a new document');
|
| 189 |
+
}
|
| 190 |
+
|
| 191 |
+
|
| 192 |
+
|
| 193 |
+
// Chatbot handling and replies
|
| 194 |
+
|
| 195 |
+
async function sendMessage() {
|
| 196 |
+
const apiKey = localStorage.getItem('gemini_api_key');
|
| 197 |
+
if (!apiKey) {
|
| 198 |
+
showToast('Please enter and save your Gemini API key before chatting.', 'error');
|
| 199 |
+
return;
|
| 200 |
+
}
|
| 201 |
+
|
| 202 |
+
const question = chatInput.value.trim();
|
| 203 |
+
if (!question || !uploadedFileName) return;
|
| 204 |
+
|
| 205 |
+
addMessage('user', question);
|
| 206 |
+
chatInput.value = '';
|
| 207 |
+
|
| 208 |
+
const typingId = addTypingIndicator();
|
| 209 |
+
|
| 210 |
+
try {
|
| 211 |
+
const response = await fetch(`/chat`, {
|
| 212 |
+
method: 'POST',
|
| 213 |
+
headers: { 'Content-Type': 'application/json' },
|
| 214 |
+
body: JSON.stringify({ question, apiKey })
|
| 215 |
+
});
|
| 216 |
+
|
| 217 |
+
const data = await response.json();
|
| 218 |
+
|
| 219 |
+
removeTypingIndicator(typingId);
|
| 220 |
+
|
| 221 |
+
if (!response.ok) throw new Error(data.error || 'Chat failed');
|
| 222 |
+
addMessage('assistant', data.answer);
|
| 223 |
+
} catch (error) {
|
| 224 |
+
console.error('Chat error:', error);
|
| 225 |
+
removeTypingIndicator(typingId);
|
| 226 |
+
showToast(error.message, 'error');
|
| 227 |
+
addMessage('assistant', 'Error: ' + error.message);
|
| 228 |
+
}
|
| 229 |
+
}
|
| 230 |
+
|
| 231 |
+
function addMessage(role, content) {
|
| 232 |
+
messages.push({ role, content });
|
| 233 |
+
const messageDiv = document.createElement('div');
|
| 234 |
+
messageDiv.className = `message ${role}`;
|
| 235 |
+
messageDiv.innerHTML = `
|
| 236 |
+
<div class="message-avatar">
|
| 237 |
+
${role === 'assistant'
|
| 238 |
+
? '<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><rect x="3" y="11" width="18" height="11" rx="2" ry="2"/><path d="M7 11V7a5 5 0 0110 0v4"/></svg>'
|
| 239 |
+
: '<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M20 21v-2a4 4 0 00-4-4H8a4 4 0 00-4 4v2"/><circle cx="12" cy="7" r="4"/></svg>'}
|
| 240 |
+
</div>
|
| 241 |
+
<div class="message-content">${content}</div>
|
| 242 |
+
`;
|
| 243 |
+
chatMessages.appendChild(messageDiv);
|
| 244 |
+
chatMessages.scrollTop = chatMessages.scrollHeight;
|
| 245 |
+
}
|
| 246 |
+
|
| 247 |
+
function addTypingIndicator() {
|
| 248 |
+
const id = 'typing-' + Date.now();
|
| 249 |
+
const typingDiv = document.createElement('div');
|
| 250 |
+
typingDiv.id = id;
|
| 251 |
+
typingDiv.className = 'message assistant';
|
| 252 |
+
typingDiv.innerHTML = `
|
| 253 |
+
<div class="message-avatar">
|
| 254 |
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 255 |
+
<rect x="3" y="11" width="18" height="11" rx="2" ry="2"/>
|
| 256 |
+
<path d="M7 11V7a5 5 0 0110 0v4"/>
|
| 257 |
+
</svg>
|
| 258 |
+
</div>
|
| 259 |
+
<div class="message-content">
|
| 260 |
+
<div class="typing-indicator">
|
| 261 |
+
<div class="typing-dot"></div>
|
| 262 |
+
<div class="typing-dot"></div>
|
| 263 |
+
<div class="typing-dot"></div>
|
| 264 |
+
</div>
|
| 265 |
+
</div>
|
| 266 |
+
`;
|
| 267 |
+
chatMessages.appendChild(typingDiv);
|
| 268 |
+
chatMessages.scrollTop = chatMessages.scrollHeight;
|
| 269 |
+
return id;
|
| 270 |
+
}
|
| 271 |
+
|
| 272 |
+
function removeTypingIndicator(id) {
|
| 273 |
+
const element = document.getElementById(id);
|
| 274 |
+
if (element) element.remove();
|
| 275 |
+
}
|
| 276 |
+
|
| 277 |
+
sendBtn.addEventListener('click', sendMessage);
|
| 278 |
+
chatInput.addEventListener('keypress', (e) => {
|
| 279 |
+
if (e.key === 'Enter' && !e.shiftKey) {
|
| 280 |
+
e.preventDefault();
|
| 281 |
+
sendMessage();
|
| 282 |
+
}
|
| 283 |
+
});
|
| 284 |
+
|
| 285 |
+
// End
|
static/styles.css
ADDED
|
@@ -0,0 +1,687 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
* {
|
| 2 |
+
margin: 0;
|
| 3 |
+
padding: 0;
|
| 4 |
+
box-sizing: border-box;
|
| 5 |
+
}
|
| 6 |
+
|
| 7 |
+
:root {
|
| 8 |
+
/* Dark Mode Colors */
|
| 9 |
+
--background: hsl(220, 15%, 9%);
|
| 10 |
+
--foreground: hsl(0, 0%, 93%);
|
| 11 |
+
--card: hsl(220, 12%, 13%);
|
| 12 |
+
--card-foreground: hsl(0, 0%, 96%);
|
| 13 |
+
--primary: hsl(210, 90%, 60%);
|
| 14 |
+
--primary-foreground: hsl(0, 0%, 100%);
|
| 15 |
+
--primary-glow: hsl(210, 90%, 70%);
|
| 16 |
+
--secondary: hsl(220, 12%, 17%);
|
| 17 |
+
--secondary-foreground: hsl(0, 0%, 92%);
|
| 18 |
+
--muted: hsl(220, 10%, 15%);
|
| 19 |
+
--muted-foreground: hsl(220, 8%, 65%);
|
| 20 |
+
--accent: hsl(28, 85%, 58%);
|
| 21 |
+
--accent-foreground: hsl(0, 0%, 100%);
|
| 22 |
+
--border: hsl(220, 10%, 22%);
|
| 23 |
+
--input: hsl(220, 10%, 18%);
|
| 24 |
+
--radius: 0.75rem;
|
| 25 |
+
|
| 26 |
+
/* Gradients */
|
| 27 |
+
--gradient-primary: linear-gradient(135deg, hsl(210, 90%, 60%), hsl(200, 85%, 65%));
|
| 28 |
+
--gradient-accent: linear-gradient(135deg, hsl(28, 85%, 58%), hsl(210, 90%, 60%));
|
| 29 |
+
--gradient-subtle: linear-gradient(180deg, hsl(220, 15%, 10%), hsl(220, 15%, 8%));
|
| 30 |
+
--gradient-card: linear-gradient(135deg, hsl(220, 12%, 14%), hsl(220, 12%, 11%));
|
| 31 |
+
|
| 32 |
+
/* Shadows */
|
| 33 |
+
--shadow-elegant: 0 10px 30px -10px hsl(210, 90%, 60%, 0.3);
|
| 34 |
+
--shadow-strong: 0 20px 45px -15px hsl(210, 90%, 60%, 0.35);
|
| 35 |
+
--shadow-glow: 0 0 30px hsl(210, 90%, 65%, 0.3);
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
body {
|
| 39 |
+
font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
|
| 40 |
+
background: var(--gradient-subtle);
|
| 41 |
+
color: var(--foreground);
|
| 42 |
+
line-height: 1.6;
|
| 43 |
+
min-height: 100vh;
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
/* Header */
|
| 47 |
+
header {
|
| 48 |
+
background: rgba(20, 20, 30, 0.85);
|
| 49 |
+
backdrop-filter: blur(10px);
|
| 50 |
+
border-bottom: 1px solid var(--border);
|
| 51 |
+
position: sticky;
|
| 52 |
+
top: 0;
|
| 53 |
+
z-index: 100;
|
| 54 |
+
}
|
| 55 |
+
|
| 56 |
+
.header-content {
|
| 57 |
+
max-width: 1200px;
|
| 58 |
+
margin: 0 auto;
|
| 59 |
+
padding: 1rem 1.5rem;
|
| 60 |
+
display: flex;
|
| 61 |
+
align-items: center;
|
| 62 |
+
justify-content: space-between;
|
| 63 |
+
gap: 1rem;
|
| 64 |
+
}
|
| 65 |
+
|
| 66 |
+
.header-left {
|
| 67 |
+
display: flex;
|
| 68 |
+
align-items: center;
|
| 69 |
+
gap: 1rem;
|
| 70 |
+
}
|
| 71 |
+
|
| 72 |
+
.api-key-section {
|
| 73 |
+
display: flex;
|
| 74 |
+
align-items: center;
|
| 75 |
+
gap: 0.5rem;
|
| 76 |
+
}
|
| 77 |
+
|
| 78 |
+
.api-key-input {
|
| 79 |
+
padding: 0.5rem 1rem;
|
| 80 |
+
border: 1px solid var(--border);
|
| 81 |
+
border-radius: 0.5rem;
|
| 82 |
+
background: var(--card);
|
| 83 |
+
color: var(--foreground);
|
| 84 |
+
font-size: 0.875rem;
|
| 85 |
+
width: 16rem;
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
.api-key-input:focus {
|
| 89 |
+
outline: none;
|
| 90 |
+
border-color: var(--primary);
|
| 91 |
+
box-shadow: 0 0 0 3px rgba(158, 109, 246, 0.2);
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
.api-key-input::placeholder {
|
| 95 |
+
color: var(--muted-foreground);
|
| 96 |
+
}
|
| 97 |
+
|
| 98 |
+
.logo {
|
| 99 |
+
width: 2.5rem;
|
| 100 |
+
height: 2.5rem;
|
| 101 |
+
border-radius: 0.75rem;
|
| 102 |
+
background: var(--gradient-accent);
|
| 103 |
+
display: flex;
|
| 104 |
+
align-items: center;
|
| 105 |
+
justify-content: center;
|
| 106 |
+
box-shadow: var(--shadow-glow);
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
.logo svg {
|
| 110 |
+
width: 1.25rem;
|
| 111 |
+
height: 1.25rem;
|
| 112 |
+
color: var(--primary-foreground);
|
| 113 |
+
}
|
| 114 |
+
|
| 115 |
+
.header-title h1 {
|
| 116 |
+
font-size: 1.25rem;
|
| 117 |
+
font-weight: 700;
|
| 118 |
+
background: var(--gradient-primary);
|
| 119 |
+
-webkit-background-clip: text;
|
| 120 |
+
background-clip: text;
|
| 121 |
+
-webkit-text-fill-color: transparent;
|
| 122 |
+
}
|
| 123 |
+
|
| 124 |
+
.header-title p {
|
| 125 |
+
font-size: 0.75rem;
|
| 126 |
+
color: var(--muted-foreground);
|
| 127 |
+
}
|
| 128 |
+
|
| 129 |
+
/* Help modal styling */
|
| 130 |
+
.help-modal {
|
| 131 |
+
position: fixed;
|
| 132 |
+
inset: 0;
|
| 133 |
+
background: rgba(0, 0, 0, 0.6);
|
| 134 |
+
display: flex;
|
| 135 |
+
justify-content: center;
|
| 136 |
+
align-items: center;
|
| 137 |
+
padding-top: 10rem;
|
| 138 |
+
z-index: 9999;
|
| 139 |
+
}
|
| 140 |
+
|
| 141 |
+
.help-content {
|
| 142 |
+
background: var(--card);
|
| 143 |
+
color: var(--foreground);
|
| 144 |
+
padding: 1.5rem;
|
| 145 |
+
border-radius: 0.75rem;
|
| 146 |
+
max-width: 400px;
|
| 147 |
+
width: 90%;
|
| 148 |
+
text-align: left;
|
| 149 |
+
box-shadow: 0 4px 12px rgba(0,0,0,0.2);
|
| 150 |
+
animation: fadeIn 0.3s ease-in-out;
|
| 151 |
+
}
|
| 152 |
+
|
| 153 |
+
.help-content h3 {
|
| 154 |
+
margin-bottom: 1rem;
|
| 155 |
+
color: var(--accent);
|
| 156 |
+
font-size: 1.25rem;
|
| 157 |
+
}
|
| 158 |
+
|
| 159 |
+
.help-content ol {
|
| 160 |
+
margin-left: 1.25rem;
|
| 161 |
+
margin-bottom: 1rem;
|
| 162 |
+
font-size: 0.95rem;
|
| 163 |
+
}
|
| 164 |
+
|
| 165 |
+
.help-content a {
|
| 166 |
+
color: var(--accent);
|
| 167 |
+
text-decoration: underline;
|
| 168 |
+
}
|
| 169 |
+
|
| 170 |
+
/* Main Content */
|
| 171 |
+
main {
|
| 172 |
+
max-width: 1400px;
|
| 173 |
+
margin: 0 auto;
|
| 174 |
+
padding: 2rem 1.5rem;
|
| 175 |
+
}
|
| 176 |
+
|
| 177 |
+
.hero {
|
| 178 |
+
text-align: center;
|
| 179 |
+
margin-bottom: 3rem;
|
| 180 |
+
animation: fadeIn 0.6s ease-out;
|
| 181 |
+
}
|
| 182 |
+
|
| 183 |
+
.hero h2 {
|
| 184 |
+
font-size: clamp(2rem, 5vw, 3rem);
|
| 185 |
+
font-weight: 700;
|
| 186 |
+
margin-bottom: 1rem;
|
| 187 |
+
background: var(--gradient-primary);
|
| 188 |
+
-webkit-background-clip: text;
|
| 189 |
+
background-clip: text;
|
| 190 |
+
-webkit-text-fill-color: transparent;
|
| 191 |
+
}
|
| 192 |
+
|
| 193 |
+
.hero p {
|
| 194 |
+
font-size: 1.125rem;
|
| 195 |
+
color: var(--muted-foreground);
|
| 196 |
+
max-width: 42rem;
|
| 197 |
+
margin: 0 auto;
|
| 198 |
+
}
|
| 199 |
+
|
| 200 |
+
.two-column {
|
| 201 |
+
display: grid;
|
| 202 |
+
grid-template-columns: 1fr 1fr;
|
| 203 |
+
gap: 2rem;
|
| 204 |
+
margin-bottom: 4rem;
|
| 205 |
+
}
|
| 206 |
+
|
| 207 |
+
@media (max-width: 768px) {
|
| 208 |
+
.two-column {
|
| 209 |
+
grid-template-columns: 1fr;
|
| 210 |
+
}
|
| 211 |
+
|
| 212 |
+
/* Header responsive stacking */
|
| 213 |
+
.header-content {
|
| 214 |
+
flex-direction: column;
|
| 215 |
+
align-items: center;
|
| 216 |
+
text-align: center;
|
| 217 |
+
gap: 1rem;
|
| 218 |
+
}
|
| 219 |
+
|
| 220 |
+
.header-left {
|
| 221 |
+
flex-direction: row;
|
| 222 |
+
align-items: center;
|
| 223 |
+
text-align: center;
|
| 224 |
+
}
|
| 225 |
+
|
| 226 |
+
.api-key-section {
|
| 227 |
+
width: 100%;
|
| 228 |
+
justify-content: center;
|
| 229 |
+
flex-wrap: wrap;
|
| 230 |
+
}
|
| 231 |
+
|
| 232 |
+
.api-key-input {
|
| 233 |
+
width: 90%;
|
| 234 |
+
max-width: 20rem;
|
| 235 |
+
}
|
| 236 |
+
|
| 237 |
+
.header-title h1 {
|
| 238 |
+
font-size: 1.5rem;
|
| 239 |
+
}
|
| 240 |
+
|
| 241 |
+
.header-title p {
|
| 242 |
+
font-size: 0.875rem;
|
| 243 |
+
}
|
| 244 |
+
}
|
| 245 |
+
|
| 246 |
+
.card {
|
| 247 |
+
background: var(--gradient-card);
|
| 248 |
+
border: 1px solid var(--border);
|
| 249 |
+
border-radius: var(--radius);
|
| 250 |
+
padding: 2rem;
|
| 251 |
+
box-shadow: var(--shadow-elegant);
|
| 252 |
+
animation: slideUp 0.6s ease-out;
|
| 253 |
+
}
|
| 254 |
+
|
| 255 |
+
/* Upload Section */
|
| 256 |
+
.upload-header {
|
| 257 |
+
text-align: center;
|
| 258 |
+
margin-bottom: 1.5rem;
|
| 259 |
+
}
|
| 260 |
+
|
| 261 |
+
.upload-icon {
|
| 262 |
+
width: 4rem;
|
| 263 |
+
height: 4rem;
|
| 264 |
+
border-radius: 50%;
|
| 265 |
+
background: var(--gradient-accent);
|
| 266 |
+
display: inline-flex;
|
| 267 |
+
align-items: center;
|
| 268 |
+
justify-content: center;
|
| 269 |
+
margin-bottom: 1rem;
|
| 270 |
+
box-shadow: var(--shadow-glow);
|
| 271 |
+
}
|
| 272 |
+
|
| 273 |
+
.upload-icon svg {
|
| 274 |
+
width: 2rem;
|
| 275 |
+
height: 2rem;
|
| 276 |
+
color: var(--primary-foreground);
|
| 277 |
+
}
|
| 278 |
+
|
| 279 |
+
.upload-header h2 {
|
| 280 |
+
font-size: 1.5rem;
|
| 281 |
+
font-weight: 700;
|
| 282 |
+
margin-bottom: 0.5rem;
|
| 283 |
+
background: var(--gradient-primary);
|
| 284 |
+
-webkit-background-clip: text;
|
| 285 |
+
background-clip: text;
|
| 286 |
+
-webkit-text-fill-color: transparent;
|
| 287 |
+
}
|
| 288 |
+
|
| 289 |
+
.upload-header p {
|
| 290 |
+
color: var(--muted-foreground);
|
| 291 |
+
font-size: 0.875rem;
|
| 292 |
+
}
|
| 293 |
+
|
| 294 |
+
.drop-zone {
|
| 295 |
+
border: 2px dashed var(--border);
|
| 296 |
+
border-radius: var(--radius);
|
| 297 |
+
padding: 3rem;
|
| 298 |
+
text-align: center;
|
| 299 |
+
transition: all 0.3s ease;
|
| 300 |
+
cursor: pointer;
|
| 301 |
+
}
|
| 302 |
+
|
| 303 |
+
.drop-zone:hover,
|
| 304 |
+
.drop-zone.drag-over {
|
| 305 |
+
border-color: var(--primary);
|
| 306 |
+
background: rgba(158, 109, 246, 0.05);
|
| 307 |
+
box-shadow: var(--shadow-glow);
|
| 308 |
+
}
|
| 309 |
+
|
| 310 |
+
.drop-zone.uploaded {
|
| 311 |
+
background: rgba(91, 218, 252, 0.1);
|
| 312 |
+
border-color: var(--accent);
|
| 313 |
+
}
|
| 314 |
+
|
| 315 |
+
.drop-zone svg {
|
| 316 |
+
width: 3rem;
|
| 317 |
+
height: 3rem;
|
| 318 |
+
color: var(--muted-foreground);
|
| 319 |
+
margin-bottom: 1rem;
|
| 320 |
+
}
|
| 321 |
+
|
| 322 |
+
.drop-zone.uploaded svg {
|
| 323 |
+
color: var(--accent);
|
| 324 |
+
}
|
| 325 |
+
|
| 326 |
+
.drop-zone h3 {
|
| 327 |
+
font-size: 1.125rem;
|
| 328 |
+
font-weight: 600;
|
| 329 |
+
margin-bottom: 0.5rem;
|
| 330 |
+
}
|
| 331 |
+
|
| 332 |
+
.drop-zone p {
|
| 333 |
+
font-size: 0.875rem;
|
| 334 |
+
color: var(--muted-foreground);
|
| 335 |
+
margin-bottom: 1rem;
|
| 336 |
+
}
|
| 337 |
+
|
| 338 |
+
.btn {
|
| 339 |
+
display: inline-flex;
|
| 340 |
+
align-items: center;
|
| 341 |
+
gap: 0.5rem;
|
| 342 |
+
padding: 0.625rem 1.5rem;
|
| 343 |
+
border-radius: 0.5rem;
|
| 344 |
+
font-weight: 500;
|
| 345 |
+
font-size: 0.875rem;
|
| 346 |
+
cursor: pointer;
|
| 347 |
+
border: none;
|
| 348 |
+
transition: all 0.3s ease;
|
| 349 |
+
}
|
| 350 |
+
|
| 351 |
+
.btn-gradient {
|
| 352 |
+
background: var(--gradient-accent);
|
| 353 |
+
color: var(--primary-foreground);
|
| 354 |
+
box-shadow: var(--shadow-strong);
|
| 355 |
+
}
|
| 356 |
+
|
| 357 |
+
.btn-gradient:hover {
|
| 358 |
+
box-shadow: var(--shadow-glow);
|
| 359 |
+
transform: translateY(-2px);
|
| 360 |
+
}
|
| 361 |
+
|
| 362 |
+
.btn-ghost {
|
| 363 |
+
background: transparent;
|
| 364 |
+
color: var(--foreground);
|
| 365 |
+
}
|
| 366 |
+
|
| 367 |
+
.btn-ghost:hover {
|
| 368 |
+
background: var(--secondary);
|
| 369 |
+
}
|
| 370 |
+
|
| 371 |
+
.btn svg {
|
| 372 |
+
width: 1rem;
|
| 373 |
+
height: 1rem;
|
| 374 |
+
}
|
| 375 |
+
|
| 376 |
+
/* Chat Section */
|
| 377 |
+
.chat-container {
|
| 378 |
+
display: flex;
|
| 379 |
+
flex-direction: column;
|
| 380 |
+
height: 37.5rem;
|
| 381 |
+
}
|
| 382 |
+
|
| 383 |
+
.chat-header {
|
| 384 |
+
padding: 1rem;
|
| 385 |
+
border-bottom: 1px solid var(--border);
|
| 386 |
+
background: var(--gradient-subtle);
|
| 387 |
+
border-radius: var(--radius) var(--radius) 0 0;
|
| 388 |
+
display: flex;
|
| 389 |
+
align-items: center;
|
| 390 |
+
gap: 0.75rem;
|
| 391 |
+
}
|
| 392 |
+
|
| 393 |
+
.chat-avatar {
|
| 394 |
+
width: 2.5rem;
|
| 395 |
+
height: 2.5rem;
|
| 396 |
+
border-radius: 50%;
|
| 397 |
+
background: var(--gradient-accent);
|
| 398 |
+
display: flex;
|
| 399 |
+
align-items: center;
|
| 400 |
+
justify-content: center;
|
| 401 |
+
box-shadow: var(--shadow-glow);
|
| 402 |
+
}
|
| 403 |
+
|
| 404 |
+
.chat-avatar svg {
|
| 405 |
+
width: 1.25rem;
|
| 406 |
+
height: 1.25rem;
|
| 407 |
+
color: var(--primary-foreground);
|
| 408 |
+
}
|
| 409 |
+
|
| 410 |
+
.chat-info h3 {
|
| 411 |
+
font-weight: 600;
|
| 412 |
+
font-size: 0.9375rem;
|
| 413 |
+
}
|
| 414 |
+
|
| 415 |
+
.chat-info p {
|
| 416 |
+
font-size: 0.75rem;
|
| 417 |
+
color: var(--muted-foreground);
|
| 418 |
+
}
|
| 419 |
+
|
| 420 |
+
.chat-messages {
|
| 421 |
+
flex: 1;
|
| 422 |
+
overflow-y: auto;
|
| 423 |
+
padding: 1rem;
|
| 424 |
+
}
|
| 425 |
+
|
| 426 |
+
.chat-empty {
|
| 427 |
+
display: flex;
|
| 428 |
+
align-items: center;
|
| 429 |
+
justify-content: center;
|
| 430 |
+
height: 100%;
|
| 431 |
+
text-align: center;
|
| 432 |
+
animation: slideUp 0.6s ease-out;
|
| 433 |
+
}
|
| 434 |
+
|
| 435 |
+
.chat-empty svg {
|
| 436 |
+
width: 3rem;
|
| 437 |
+
height: 3rem;
|
| 438 |
+
color: var(--muted-foreground);
|
| 439 |
+
margin-bottom: 1rem;
|
| 440 |
+
}
|
| 441 |
+
|
| 442 |
+
.chat-empty h3 {
|
| 443 |
+
font-size: 1.125rem;
|
| 444 |
+
font-weight: 600;
|
| 445 |
+
margin-bottom: 0.5rem;
|
| 446 |
+
}
|
| 447 |
+
|
| 448 |
+
.chat-empty p {
|
| 449 |
+
font-size: 0.875rem;
|
| 450 |
+
color: var(--muted-foreground);
|
| 451 |
+
max-width: 20rem;
|
| 452 |
+
}
|
| 453 |
+
|
| 454 |
+
.message {
|
| 455 |
+
display: flex;
|
| 456 |
+
gap: 0.75rem;
|
| 457 |
+
margin-bottom: 1rem;
|
| 458 |
+
animation: slideUp 0.4s ease-out;
|
| 459 |
+
}
|
| 460 |
+
|
| 461 |
+
.message.user {
|
| 462 |
+
flex-direction: row-reverse;
|
| 463 |
+
}
|
| 464 |
+
|
| 465 |
+
.message-avatar {
|
| 466 |
+
width: 2rem;
|
| 467 |
+
height: 2rem;
|
| 468 |
+
border-radius: 50%;
|
| 469 |
+
flex-shrink: 0;
|
| 470 |
+
display: flex;
|
| 471 |
+
align-items: center;
|
| 472 |
+
justify-content: center;
|
| 473 |
+
box-shadow: var(--shadow-elegant);
|
| 474 |
+
}
|
| 475 |
+
|
| 476 |
+
.message.assistant .message-avatar {
|
| 477 |
+
background: var(--gradient-accent);
|
| 478 |
+
}
|
| 479 |
+
|
| 480 |
+
.message.user .message-avatar {
|
| 481 |
+
background: var(--accent);
|
| 482 |
+
}
|
| 483 |
+
|
| 484 |
+
.message-avatar svg {
|
| 485 |
+
width: 1rem;
|
| 486 |
+
height: 1rem;
|
| 487 |
+
color: white;
|
| 488 |
+
}
|
| 489 |
+
|
| 490 |
+
.message-content {
|
| 491 |
+
max-width: 80%;
|
| 492 |
+
padding: 1rem;
|
| 493 |
+
border-radius: var(--radius);
|
| 494 |
+
font-size: 0.875rem;
|
| 495 |
+
line-height: 1.5;
|
| 496 |
+
}
|
| 497 |
+
|
| 498 |
+
.message.assistant .message-content {
|
| 499 |
+
background: var(--secondary);
|
| 500 |
+
}
|
| 501 |
+
|
| 502 |
+
.message.user .message-content {
|
| 503 |
+
background: var(--primary);
|
| 504 |
+
color: var(--primary-foreground);
|
| 505 |
+
box-shadow: var(--shadow-elegant);
|
| 506 |
+
}
|
| 507 |
+
|
| 508 |
+
.typing-indicator {
|
| 509 |
+
display: flex;
|
| 510 |
+
gap: 0.25rem;
|
| 511 |
+
padding: 1rem;
|
| 512 |
+
}
|
| 513 |
+
|
| 514 |
+
.typing-dot {
|
| 515 |
+
width: 0.5rem;
|
| 516 |
+
height: 0.5rem;
|
| 517 |
+
background: var(--primary);
|
| 518 |
+
border-radius: 50%;
|
| 519 |
+
animation: bounce 1.4s infinite ease-in-out both;
|
| 520 |
+
}
|
| 521 |
+
|
| 522 |
+
.typing-dot:nth-child(1) { animation-delay: -0.5s; }
|
| 523 |
+
.typing-dot:nth-child(2) { animation-delay: -0.25s; }
|
| 524 |
+
|
| 525 |
+
.chat-input {
|
| 526 |
+
padding: 1rem;
|
| 527 |
+
border-top: 1px solid var(--border);
|
| 528 |
+
background: var(--gradient-subtle);
|
| 529 |
+
border-radius: 0 0 var(--radius) var(--radius);
|
| 530 |
+
display: flex;
|
| 531 |
+
gap: 0.5rem;
|
| 532 |
+
}
|
| 533 |
+
|
| 534 |
+
.chat-input input {
|
| 535 |
+
flex: 1;
|
| 536 |
+
padding: 0.625rem 1rem;
|
| 537 |
+
border: 1px solid var(--input);
|
| 538 |
+
border-radius: 0.5rem;
|
| 539 |
+
font-size: 0.875rem;
|
| 540 |
+
background: var(--card);
|
| 541 |
+
color: var(--foreground);
|
| 542 |
+
}
|
| 543 |
+
|
| 544 |
+
.chat-input input:focus {
|
| 545 |
+
outline: none;
|
| 546 |
+
border-color: var(--primary);
|
| 547 |
+
box-shadow: 0 0 0 3px rgba(158, 109, 246, 0.1);
|
| 548 |
+
}
|
| 549 |
+
|
| 550 |
+
.btn-icon {
|
| 551 |
+
width: 2.5rem;
|
| 552 |
+
height: 2.5rem;
|
| 553 |
+
padding: 0;
|
| 554 |
+
display: flex;
|
| 555 |
+
align-items: center;
|
| 556 |
+
justify-content: center;
|
| 557 |
+
}
|
| 558 |
+
|
| 559 |
+
/* Features */
|
| 560 |
+
.features {
|
| 561 |
+
display: grid;
|
| 562 |
+
grid-template-columns: repeat(auto-fit, minmax(15rem, 1fr));
|
| 563 |
+
gap: 1.5rem;
|
| 564 |
+
margin-top: 4rem;
|
| 565 |
+
animation: fadeIn 0.6s ease-out 0.3s both;
|
| 566 |
+
}
|
| 567 |
+
|
| 568 |
+
.feature-card {
|
| 569 |
+
text-align: center;
|
| 570 |
+
padding: 1.5rem;
|
| 571 |
+
background: var(--card);
|
| 572 |
+
border-radius: var(--radius);
|
| 573 |
+
box-shadow: var(--shadow-elegant);
|
| 574 |
+
}
|
| 575 |
+
|
| 576 |
+
.feature-icon {
|
| 577 |
+
width: 3rem;
|
| 578 |
+
height: 3rem;
|
| 579 |
+
border-radius: 50%;
|
| 580 |
+
background: var(--gradient-accent);
|
| 581 |
+
display: inline-flex;
|
| 582 |
+
align-items: center;
|
| 583 |
+
justify-content: center;
|
| 584 |
+
margin-bottom: 1rem;
|
| 585 |
+
box-shadow: var(--shadow-glow);
|
| 586 |
+
}
|
| 587 |
+
|
| 588 |
+
.feature-icon svg {
|
| 589 |
+
width: 1.5rem;
|
| 590 |
+
height: 1.5rem;
|
| 591 |
+
color: var(--primary-foreground);
|
| 592 |
+
}
|
| 593 |
+
|
| 594 |
+
.feature-card h3 {
|
| 595 |
+
font-weight: 600;
|
| 596 |
+
margin-bottom: 0.5rem;
|
| 597 |
+
}
|
| 598 |
+
|
| 599 |
+
.feature-card p {
|
| 600 |
+
font-size: 0.875rem;
|
| 601 |
+
color: var(--muted-foreground);
|
| 602 |
+
}
|
| 603 |
+
|
| 604 |
+
/* Footer */
|
| 605 |
+
footer {
|
| 606 |
+
border-top: 1px solid var(--border);
|
| 607 |
+
margin-top: 4rem;
|
| 608 |
+
padding: 1.5rem;
|
| 609 |
+
text-align: center;
|
| 610 |
+
font-size: 0.875rem;
|
| 611 |
+
color: var(--muted-foreground);
|
| 612 |
+
}
|
| 613 |
+
|
| 614 |
+
/* Animations */
|
| 615 |
+
@keyframes fadeIn {
|
| 616 |
+
from {
|
| 617 |
+
opacity: 0;
|
| 618 |
+
transform: translateY(10px);
|
| 619 |
+
}
|
| 620 |
+
to {
|
| 621 |
+
opacity: 1;
|
| 622 |
+
transform: translateY(0);
|
| 623 |
+
}
|
| 624 |
+
}
|
| 625 |
+
|
| 626 |
+
@keyframes slideUp {
|
| 627 |
+
from {
|
| 628 |
+
opacity: 0;
|
| 629 |
+
transform: translateY(20px);
|
| 630 |
+
}
|
| 631 |
+
to {
|
| 632 |
+
opacity: 1;
|
| 633 |
+
transform: translateY(0);
|
| 634 |
+
}
|
| 635 |
+
}
|
| 636 |
+
|
| 637 |
+
@keyframes bounce {
|
| 638 |
+
0%, 80%, 100% {
|
| 639 |
+
transform: scale(0);
|
| 640 |
+
}
|
| 641 |
+
40% {
|
| 642 |
+
transform: scale(1);
|
| 643 |
+
}
|
| 644 |
+
}
|
| 645 |
+
|
| 646 |
+
@keyframes spin {
|
| 647 |
+
from { transform: rotate(0deg); }
|
| 648 |
+
to { transform: rotate(360deg); }
|
| 649 |
+
}
|
| 650 |
+
|
| 651 |
+
.spinner {
|
| 652 |
+
width: 3rem;
|
| 653 |
+
height: 3rem;
|
| 654 |
+
border: 4px solid var(--primary);
|
| 655 |
+
border-top-color: transparent;
|
| 656 |
+
border-radius: 50%;
|
| 657 |
+
animation: spin 1s linear infinite;
|
| 658 |
+
margin: 0 auto 1rem;
|
| 659 |
+
}
|
| 660 |
+
|
| 661 |
+
/* Toast notifications */
|
| 662 |
+
.toast {
|
| 663 |
+
position: fixed;
|
| 664 |
+
bottom: 2rem;
|
| 665 |
+
right: 2rem;
|
| 666 |
+
background: var(--card);
|
| 667 |
+
padding: 1rem 1.5rem;
|
| 668 |
+
border-radius: var(--radius);
|
| 669 |
+
box-shadow: var(--shadow-strong);
|
| 670 |
+
display: flex;
|
| 671 |
+
align-items: center;
|
| 672 |
+
gap: 0.75rem;
|
| 673 |
+
animation: slideUp 0.3s ease-out;
|
| 674 |
+
z-index: 1000;
|
| 675 |
+
}
|
| 676 |
+
|
| 677 |
+
.toast.success {
|
| 678 |
+
border-left: 4px solid var(--accent);
|
| 679 |
+
}
|
| 680 |
+
|
| 681 |
+
.toast.error {
|
| 682 |
+
border-left: 4px solid hsl(0, 84%, 60%);
|
| 683 |
+
}
|
| 684 |
+
|
| 685 |
+
#file-input {
|
| 686 |
+
display: none;
|
| 687 |
+
}
|
templates/chat_page.html
ADDED
|
@@ -0,0 +1,196 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
|
| 3 |
+
<html lang="en">
|
| 4 |
+
<head>
|
| 5 |
+
<meta charset="UTF-8">
|
| 6 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 7 |
+
<title>Gemini RAG Chatbot - AI Document Intelligence</title>
|
| 8 |
+
<meta name="description" content="Upload documents and chat with AI. Get instant answers from your PDFs and text files using advanced RAG technology powered by Gemini AI.">
|
| 9 |
+
<link rel="stylesheet" href="/static/styles.css">
|
| 10 |
+
<script src="/static/scripts.js" defer></script>
|
| 11 |
+
</head>
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
<body>
|
| 15 |
+
<!-- Header -->
|
| 16 |
+
<header>
|
| 17 |
+
<div class="header-content">
|
| 18 |
+
<div class="header-left">
|
| 19 |
+
<div class="logo">
|
| 20 |
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 21 |
+
<path d="M9.663 17h4.673M12 3v1m6.364 1.636l-.707.707M21 12h-1M4 12H3m3.343-5.657l-.707-.707m2.828 9.9a5 5 0 117.072 0l-.548.547A3.374 3.374 0 0014 18.469V19a2 2 0 11-4 0v-.531c0-.895-.356-1.754-.988-2.386l-.548-.547z"/>
|
| 22 |
+
</svg>
|
| 23 |
+
</div>
|
| 24 |
+
<div class="header-title">
|
| 25 |
+
<h1>Gemini RAG Chatbot</h1>
|
| 26 |
+
<p>AI-Powered Document Intelligence</p>
|
| 27 |
+
</div>
|
| 28 |
+
</div>
|
| 29 |
+
<div class="api-key-section">
|
| 30 |
+
|
| 31 |
+
<!-- api key info -->
|
| 32 |
+
<button id="help-btn" class="btn btn-ghost" title="How to get your API key">
|
| 33 |
+
<svg xmlns="http://www.w3.org/2000/svg" width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 34 |
+
<circle cx="12" cy="12" r="10"/>
|
| 35 |
+
<path d="M9.09 9a3 3 0 015.82 1c0 2-3 3-3 3"/>
|
| 36 |
+
<line x1="12" y1="17" x2="12" y2="17"/>
|
| 37 |
+
</svg>
|
| 38 |
+
</button>
|
| 39 |
+
|
| 40 |
+
<div style="display: flex; align-items: center; gap: 8px;">
|
| 41 |
+
<input type="password" id="api-key-input" class="api-key-input" placeholder="Enter your Gemini API Key here">
|
| 42 |
+
|
| 43 |
+
<button class="btn btn-gradient" id="save-api-key-btn" onclick="saveApiKey()">
|
| 44 |
+
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 45 |
+
<path d="M19 21H5a2 2 0 01-2-2V5a2 2 0 012-2h11l5 5v11a2 2 0 01-2 2z"/>
|
| 46 |
+
<polyline points="17 21 17 13 7 13 7 21"/>
|
| 47 |
+
<polyline points="7 3 7 8 15 8"/>
|
| 48 |
+
</svg>
|
| 49 |
+
Save
|
| 50 |
+
</button>
|
| 51 |
+
</div>
|
| 52 |
+
</div>
|
| 53 |
+
|
| 54 |
+
<!-- how to get api-key, help -->
|
| 55 |
+
<div id="help-modal" class="help-modal" style="display: none;">
|
| 56 |
+
<div class="help-content">
|
| 57 |
+
<h3>How to Get Your Gemini API Key</h3>
|
| 58 |
+
<ol>
|
| 59 |
+
<li>Go to <a href="https://aistudio.google.com/app/apikey" target="_blank">Google AI Studio</a>.</li>
|
| 60 |
+
<li>Sign in with your Google account.</li>
|
| 61 |
+
<li>Click <b>“Create API Key”</b>.</li>
|
| 62 |
+
<li>Copy your key and paste it into the input box here.</li>
|
| 63 |
+
</ol>
|
| 64 |
+
<p><b>Note:</b> Your API key is stored locally on your device only, no where else.</p>
|
| 65 |
+
<button class="btn btn-gradient" id="close-help-btn">Got it</button>
|
| 66 |
+
</div>
|
| 67 |
+
</div>
|
| 68 |
+
|
| 69 |
+
</div>
|
| 70 |
+
</header>
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
<!-- Main content -->
|
| 74 |
+
<main>
|
| 75 |
+
<!-- Hero section -->
|
| 76 |
+
<div class="hero">
|
| 77 |
+
<h2>Chat with Your Documents</h2>
|
| 78 |
+
<p>Upload any PDF or text document and ask questions. The AI-powered RAG system will help you find answers instantly based on your uploaded file.</p>
|
| 79 |
+
</div>
|
| 80 |
+
|
| 81 |
+
<!-- Two Column Layout -->
|
| 82 |
+
<div class="two-column">
|
| 83 |
+
<!-- Upload section -->
|
| 84 |
+
<div id="upload-section" class="card upload-card" style="animation-delay: 100ms;">
|
| 85 |
+
<div class="upload-header">
|
| 86 |
+
<div class="upload-icon">
|
| 87 |
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 88 |
+
<path d="M21 15v4a2 2 0 01-2 2H5a2 2 0 01-2-2v-4M17 8l-5-5-5 5M12 3v12"/>
|
| 89 |
+
</svg>
|
| 90 |
+
</div>
|
| 91 |
+
<h2>Upload Document</h2><br>
|
| 92 |
+
<p>Upload a PDF or TXT file to start chatting with your document</p>
|
| 93 |
+
</div>
|
| 94 |
+
|
| 95 |
+
<div id="drop-zone" class="drop-zone">
|
| 96 |
+
<svg id="upload-icon" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 97 |
+
<path d="M14 2H6a2 2 0 00-2 2v16a2 2 0 002 2h12a2 2 0 002-2V8z"/>
|
| 98 |
+
<polyline points="14 2 14 8 20 8"/>
|
| 99 |
+
<line x1="16" y1="13" x2="8" y2="13"/>
|
| 100 |
+
<line x1="16" y1="17" x2="8" y2="17"/>
|
| 101 |
+
<polyline points="10 9 9 9 8 9"/>
|
| 102 |
+
</svg>
|
| 103 |
+
<div id="upload-content">
|
| 104 |
+
<h3>Drop your file here</h3>
|
| 105 |
+
<p>or</p>
|
| 106 |
+
<label for="file-input">
|
| 107 |
+
<button class="btn btn-gradient" onclick="document.getElementById('file-input').click(); return false;">
|
| 108 |
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 109 |
+
<path d="M21 15v4a2 2 0 01-2 2H5a2 2 0 01-2-2v-4M17 8l-5-5-5 5M12 3v12"/>
|
| 110 |
+
</svg>
|
| 111 |
+
Choose File
|
| 112 |
+
</button>
|
| 113 |
+
</label>
|
| 114 |
+
<p style="margin-top: 1rem; font-size: 0.75rem;">Supports PDF and TXT files</p>
|
| 115 |
+
</div>
|
| 116 |
+
</div>
|
| 117 |
+
<input type="file" id="file-input" accept=".pdf,.txt">
|
| 118 |
+
</div>
|
| 119 |
+
|
| 120 |
+
<!-- Chat section -->
|
| 121 |
+
<div id="chat-section" class="card chat-card" style="animation-delay: 200ms;">
|
| 122 |
+
<div class="chat-container">
|
| 123 |
+
<div class="chat-header">
|
| 124 |
+
<div class="chat-avatar">
|
| 125 |
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 126 |
+
<rect x="3" y="11" width="18" height="11" rx="2" ry="2"/>
|
| 127 |
+
<path d="M7 11V7a5 5 0 0110 0v4"/>
|
| 128 |
+
</svg>
|
| 129 |
+
</div>
|
| 130 |
+
<div class="chat-info">
|
| 131 |
+
<h3>RAG Assistant</h3>
|
| 132 |
+
<p id="chat-status">No document uploaded</p>
|
| 133 |
+
</div>
|
| 134 |
+
</div>
|
| 135 |
+
|
| 136 |
+
<div id="chat-messages" class="chat-messages">
|
| 137 |
+
<div class="chat-empty">
|
| 138 |
+
<div>
|
| 139 |
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 140 |
+
<rect x="3" y="11" width="18" height="11" rx="2" ry="2"/>
|
| 141 |
+
<path d="M7 11V7a5 5 0 0110 0v4"/>
|
| 142 |
+
</svg>
|
| 143 |
+
<h3>Upload a Document First</h3>
|
| 144 |
+
<p>Please upload a PDF or TXT file to start asking questions</p>
|
| 145 |
+
</div>
|
| 146 |
+
</div>
|
| 147 |
+
</div>
|
| 148 |
+
|
| 149 |
+
<div class="chat-input">
|
| 150 |
+
<input type="text" id="chat-input" placeholder="Ask a question about your document..." disabled>
|
| 151 |
+
<button class="btn btn-gradient btn-icon" id="send-btn" disabled>
|
| 152 |
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 153 |
+
<line x1="22" y1="2" x2="11" y2="13"/>
|
| 154 |
+
<polygon points="22 2 15 22 11 13 2 9 22 2"/>
|
| 155 |
+
</svg>
|
| 156 |
+
</button>
|
| 157 |
+
</div>
|
| 158 |
+
</div>
|
| 159 |
+
</div>
|
| 160 |
+
</div>
|
| 161 |
+
|
| 162 |
+
<!-- features display -->
|
| 163 |
+
<div class="features">
|
| 164 |
+
<div class="feature-card">
|
| 165 |
+
<div class="feature-icon">
|
| 166 |
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 167 |
+
<path d="M14 2H6a2 2 0 00-2 2v16a2 2 0 002 2h12a2 2 0 002-2V8z"/>
|
| 168 |
+
<polyline points="14 2 14 8 20 8"/>
|
| 169 |
+
<line x1="16" y1="13" x2="8" y2="13"/>
|
| 170 |
+
<line x1="16" y1="17" x2="8" y2="17"/>
|
| 171 |
+
</svg>
|
| 172 |
+
</div>
|
| 173 |
+
<h3>Multiple Formats</h3>
|
| 174 |
+
<p>Support for PDF and TXT documents with seamless processing.</p>
|
| 175 |
+
</div>
|
| 176 |
+
|
| 177 |
+
<div class="feature-card">
|
| 178 |
+
<div class="feature-icon">
|
| 179 |
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
| 180 |
+
<polyline points="13 17 18 12 13 7"/>
|
| 181 |
+
<polyline points="6 17 11 12 6 7"/>
|
| 182 |
+
</svg>
|
| 183 |
+
</div>
|
| 184 |
+
<h3>Instant Answers</h3>
|
| 185 |
+
<p>Get accurate, context-aware responses powered by Gemini AI.</p>
|
| 186 |
+
</div>
|
| 187 |
+
</div>
|
| 188 |
+
</main>
|
| 189 |
+
|
| 190 |
+
<!-- Footer -->
|
| 191 |
+
<footer>
|
| 192 |
+
<p>Powered by Gemini AI • Built by <a href="https://github.com/Wills17" target="_blank" style="color: var(--accent);">Wills17</a></p>
|
| 193 |
+
</footer>
|
| 194 |
+
|
| 195 |
+
</body>
|
| 196 |
+
</html>
|