Spaces:

prathamesh788
/

pravaah

Sleeping

App Files Files Community

Prathamesh Sutar commited on Sep 28

Commit

49e67a8

1 Parent(s): 6675816

Initial deployment of Pravaah Ocean Hazard Detection System

Browse files

Files changed (14) hide show

.gitattributes +0 -35
.gitignore +53 -0
DEPLOYMENT_GUIDE.md +189 -0
Dockerfile +48 -0
README.md +139 -6
api.py +252 -0
app.py +197 -0
classifier.py +52 -0
ner.py +125 -0
pg_db.py +133 -0
requirements.txt +29 -0
scraper.py +92 -0
sentiment.py +59 -0
translate.py +69 -0

.gitattributes DELETED Viewed

@@ -1,35 +0,0 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
-*.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,53 @@

+# Environment variables
+.env
+.env.local
+.env.production
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+venv/
+env/
+ENV/
+class/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db
+# Logs
+*.log
+# Database
+*.db
+*.sqlite3
+# Model cache (optional - uncomment if you want to ignore cached models)
+# .cache/
+# models/

DEPLOYMENT_GUIDE.md ADDED Viewed

	@@ -0,0 +1,189 @@

+# 🚀 Pravaah Deployment Guide
+This guide will help you deploy the Pravaah Ocean Hazard Detection System with both FastAPI and Gradio interfaces.
+## 📁 **Files in Pravaah Folder**
+### **Core Application Files:**
+- **`app.py`** - Gradio web interface (Port 7860)
+- **`api.py`** - FastAPI REST API (Port 8000)
+- **`Dockerfile`** - Docker configuration for both services
+- **`requirements.txt`** - Python dependencies
+### **AI/ML Modules:**
+- **`classifier.py`** - Hazard classification
+- **`scraper.py`** - Twitter data fetching
+- **`ner.py`** - Named Entity Recognition
+- **`sentiment.py`** - Sentiment analysis
+- **`translate.py`** - Translation pipeline
+- **`pg_db.py`** - Database operations
+### **Documentation:**
+- **`README.md`** - Project documentation
+- **`DEPLOYMENT_GUIDE.md`** - This file
+## 🌐 **Services Overview**
+### **FastAPI (Port 8000)**
+- **REST API** for programmatic access
+- **Swagger UI** at `/docs`
+- **ReDoc** at `/redoc`
+- **Health checks** at `/health`
+### **Gradio (Port 7860)**
+- **Web interface** for interactive use
+- **Real-time analysis** with visual results
+- **JSON export** functionality
+## 🚀 **Deployment Steps**
+### **1. Deploy to Hugging Face Spaces**
+1. **Create a new Space:**
+   - Go to [huggingface.co/spaces](https://huggingface.co/spaces)
+   - Choose **Docker** SDK
+   - Name: `pravaah-ocean-hazard-detection`
+2. **Upload files:**
+   - Copy all files from the `pravaah` folder
+   - Upload to your Space repository
+3. **Set environment variables:**
+   - `TWITTER_API_KEY` - Your Twitter API key
+   - `SUPABASE_URL` - Your Supabase connection string
+4. **Deploy:**
+   - Push to repository
+   - Monitor build logs
+### **2. Local Development**
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Run FastAPI
+uvicorn api:app --host 0.0.0.0 --port 8000
+# Run Gradio (in another terminal)
+python app.py
+```
+### **3. Docker Deployment**
+```bash
+# Build image
+docker build -t pravaah-ocean-hazard .
+# Run container
+docker run -p 8000:8000 -p 7860:7860 \
+  -e TWITTER_API_KEY=your_key \
+  -e SUPABASE_URL=your_url \
+  pravaah-ocean-hazard
+```
+## 📊 **API Endpoints**
+### **Analysis**
+- **POST** `/analyze` - Analyze tweets for hazards
+- **GET** `/hazardous-tweets` - Get stored hazardous tweets
+- **GET** `/stats` - Get analysis statistics
+### **Health & Monitoring**
+- **GET** `/health` - Health check
+- **GET** `/` - Root endpoint
+## 🔧 **Configuration**
+### **Environment Variables**
+```bash
+# Required
+TWITTER_API_KEY=your_twitter_api_key
+SUPABASE_URL=postgresql://postgres:[password]@db.[project-ref].supabase.co:5432/postgres
+# Optional
+SUPABASE_ANON_KEY=your_anon_key
+SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
+```
+### **Ports**
+- **8000** - FastAPI REST API
+- **7860** - Gradio Web Interface
+## 🧪 **Testing**
+### **Test API Endpoints**
+```bash
+# Health check
+curl http://localhost:8000/health
+# Analyze tweets
+curl -X POST "http://localhost:8000/analyze" \
+  -H "Content-Type: application/json" \
+  -d '{"limit": 10}'
+# Get statistics
+curl http://localhost:8000/stats
+```
+### **Test Web Interface**
+- Open `http://localhost:7860` in browser
+- Use the interactive interface to analyze tweets
+## 📈 **Monitoring**
+### **Health Checks**
+- **FastAPI**: `http://localhost:8000/health`
+- **Gradio**: Check if port 7860 is accessible
+### **Logs**
+- Both services log to stdout
+- Check Docker logs: `docker logs <container_id>`
+## 🎯 **Features**
+### **FastAPI Features**
+- ✅ RESTful API endpoints
+- ✅ Automatic API documentation
+- ✅ Request/response validation
+- ✅ Error handling
+- ✅ CORS support
+- ✅ Database integration
+### **Gradio Features**
+- ✅ Interactive web interface
+- ✅ Real-time analysis
+- ✅ Visual results display
+- ✅ JSON export
+- ✅ User-friendly controls
+## 🔄 **Updates**
+To update your deployment:
+1. Make changes to your code
+2. Commit and push to repository
+3. Hugging Face Spaces will automatically rebuild
+4. Both services will restart with new code
+## 🆘 **Troubleshooting**
+### **Common Issues**
+1. **Port conflicts** - Ensure ports 8000 and 7860 are available
+2. **Database connection** - Check Supabase credentials
+3. **API key issues** - Verify Twitter API key is valid
+4. **Model loading** - Check internet connection for model downloads
+### **Getting Help**
+- Check logs for error messages
+- Verify environment variables
+- Test individual components
+- Check Hugging Face Spaces documentation
+## 🎉 **Success!**
+Once deployed, you'll have:
+- **FastAPI** at `https://your-space.hf.space:8000`
+- **Gradio** at `https://your-space.hf.space:7860`
+- **API docs** at `https://your-space.hf.space:8000/docs`
+Your Ocean Hazard Detection System is now live with both API and web interfaces! 🌊

Dockerfile ADDED Viewed

	@@ -0,0 +1,48 @@

+# Dockerfile for Hugging Face Spaces with FastAPI + Gradio
+# Use an official Python runtime as a parent image
+FROM python:3.9-slim
+# Set the working directory inside the container
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    gcc \
+    g++ \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+# Copy the requirements file and install dependencies first for better caching
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy all your application code into the container
+COPY . .
+# IMPORTANT: Download and cache the models during the build process.
+# This makes the application start much faster when the Space wakes up.
+RUN python -c "from classifier import classify_with_model; classify_with_model('test')"
+RUN python -c "from ner import get_ner_pipeline; get_ner_pipeline()"
+RUN python -c "from sentiment import get_emotion_classifier; get_emotion_classifier()"
+RUN python -c "from translate import get_translator; get_translator()"
+# Create startup script
+RUN echo '#!/bin/bash\n\
+echo "🚀 Starting FastAPI server on port 8000..."\n\
+python -m uvicorn api:app --host 0.0.0.0 --port 8000 &\n\
+echo "🌊 Starting Gradio web interface on port 7860..."\n\
+python app.py' > start_services.sh
+RUN chmod +x start_services.sh
+# Expose ports for both services
+EXPOSE 7860  # Gradio web interface
+EXPOSE 8000  # FastAPI
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+  CMD curl -f http://localhost:8000/health || exit 1
+# Start both services
+CMD ["./start_services.sh"]

README.md CHANGED Viewed

@@ -1,12 +1,145 @@
 ---
-title: Pravaah
-emoji: 🐨
-colorFrom: purple
-colorTo: purple
 sdk: docker
 pinned: false
 license: mit
-short_description: nlp pl - translate, classify, sentiment analyse, ner (posts)
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Pravaah - Ocean Hazard Detection System
+emoji: 🌊
+colorFrom: blue
+colorTo: green
 sdk: docker
 pinned: false
 license: mit
+short_description: AI-powered system to detect ocean hazards with FastAPI + Gradio interface
 ---
+# 🌊 Ocean Hazard Detection System
+An AI-powered system that analyzes social media posts to detect ocean-related hazards in real-time. This system uses advanced natural language processing to identify hazardous tweets, translate them to English, analyze sentiment, and extract location information.
+## 🚀 Features
+- **Multilingual Support**: Analyzes tweets in 20+ Indian languages including Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, and English
+- **Hazard Classification**: Uses XLM-RoBERTa zero-shot classification to identify ocean hazards
+- **Sentiment Analysis**: Analyzes emotional context using GoEmotions model
+- **Named Entity Recognition**: Extracts hazard types and locations from text
+- **Real-time Processing**: Processes tweets from Indian coastal regions
+- **Database Storage**: Stores hazardous tweets for tracking and analysis
+## 🔍 What It Detects
+### Hazard Types
+- Floods and tsunamis
+- Cyclones and storm surges
+- High tides and waves
+- Coastal flooding and erosion
+- Rip currents and marine debris
+- Water discoloration and algal blooms
+- Marine pollution
+### Geographic Coverage
+- **Major Cities**: Mumbai, Chennai, Kolkata, Vizag, Puri
+- **States**: Odisha, Kerala, Gujarat, Goa, Andhra Pradesh, West Bengal
+- **Water Bodies**: Bay of Bengal, Arabian Sea
+## 🛠️ Technical Stack
+- **AI Models**:
+  - XLM-RoBERTa for hazard classification
+  - Helsinki-NLP for translation
+  - GoEmotions for sentiment analysis
+  - Custom NER for location extraction
+- **Backend**: FastAPI + Gradio
+- **Database**: PostgreSQL
+- **Languages**: Python 3.9+
+## 📊 How It Works
+1. **Tweet Collection**: Scrapes tweets using Twitter API with hazard and location keywords
+2. **Hazard Classification**: Uses zero-shot learning to classify tweets as hazardous or safe
+3. **Translation**: Translates non-English tweets to English for consistent processing
+4. **Sentiment Analysis**: Analyzes emotional context (panic, calm, confusion, neutral)
+5. **Entity Extraction**: Identifies specific hazard types and locations
+6. **Database Storage**: Stores hazardous tweets with metadata for tracking
+## 🚀 Usage
+### Web Interface (Gradio)
+1. **Set Tweet Limit**: Choose how many tweets to analyze (1-50)
+2. **Click Analyze**: The system will process tweets and show results
+3. **View Results**: See hazardous tweets with sentiment, location, and hazard type
+4. **Export Data**: Download complete analysis as JSON
+### API Endpoints (FastAPI)
+#### **POST /analyze**
+Analyze tweets for ocean hazards
+```bash
+curl -X POST "http://localhost:8000/analyze" \
+  -H "Content-Type: application/json" \
+  -d '{"limit": 20, "query": "flood OR tsunami"}'
+```
+#### **GET /hazardous-tweets**
+Get stored hazardous tweets
+```bash
+curl "http://localhost:8000/hazardous-tweets?limit=50&offset=0"
+```
+#### **GET /stats**
+Get analysis statistics
+```bash
+curl "http://localhost:8000/stats"
+```
+#### **GET /health**
+Health check endpoint
+```bash
+curl "http://localhost:8000/health"
+```
+### API Documentation
+- **Swagger UI**: `http://localhost:8000/docs`
+- **ReDoc**: `http://localhost:8000/redoc`
+## 🔧 Environment Variables
+The system requires the following environment variables:
+```bash
+# Twitter API (required)
+TWITTER_API_KEY=your_twitter_api_key
+# PostgreSQL Database (optional for demo)
+PGHOST=localhost
+PGPORT=5432
+PGDATABASE=postgres
+PGUSER=postgres
+PGPASSWORD=your_password
+```
+## 📈 Use Cases
+- **Emergency Response**: Early detection of ocean hazards for rapid response
+- **Environmental Monitoring**: Track marine pollution and coastal issues
+- **Research**: Analyze public sentiment about ocean-related events
+- **Policy Making**: Data-driven insights for coastal management policies
+## 🔬 Model Details
+- **Classification Model**: `joeddav/xlm-roberta-large-xnli`
+- **Translation Model**: Helsinki-NLP OPUS-MT models
+- **Sentiment Model**: Google GoEmotions
+- **NER**: Custom keyword-based extraction with fallback
+## 📝 License
+This project is licensed under the MIT License - see the LICENSE file for details.
+## 🤝 Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.
+## 📞 Support
+For support, please open an issue in the GitHub repository.
+---
+**Note**: This is a demonstration system. In production, it would process real-time tweets and integrate with emergency response systems.

api.py ADDED Viewed

	@@ -0,0 +1,252 @@

+from fastapi import FastAPI, BackgroundTasks, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel
+from typing import List, Optional
+import json
+import logging
+from datetime import datetime
+from email.utils import parsedate_to_datetime
+# Import our modules
+from scraper import fetch_hazard_tweets
+from classifier import classify_tweets
+from pg_db import init_db, upsert_hazardous_tweet
+# Set up logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Initialize FastAPI app
+app = FastAPI(
+    title="Ocean Hazard Detection API",
+    description="API for detecting ocean hazards from social media posts",
+    version="1.0.0"
+)
+# CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],  # Configure this properly for production
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Initialize database
+try:
+    init_db()
+    logger.info("Database initialized successfully")
+except Exception as e:
+    logger.warning(f"Database initialization failed: {e}. API will work without database persistence.")
+# Pydantic models
+class TweetAnalysisRequest(BaseModel):
+    limit: int = 20
+    query: Optional[str] = None
+class TweetAnalysisResponse(BaseModel):
+    total_tweets: int
+    hazardous_tweets: int
+    results: List[dict]
+    processing_time: float
+class HealthResponse(BaseModel):
+    status: str
+    message: str
+    timestamp: str
+# Health check endpoint
+@app.get("/", response_model=HealthResponse)
+def health_check():
+    """Health check endpoint"""
+    return HealthResponse(
+        status="healthy",
+        message="Ocean Hazard Detection API is running",
+        timestamp=datetime.utcnow().isoformat()
+    )
+@app.get("/health", response_model=HealthResponse)
+def health():
+    """Alternative health check endpoint"""
+    return health_check()
+# Main analysis endpoint
+@app.post("/analyze", response_model=TweetAnalysisResponse)
+async def analyze_tweets(request: TweetAnalysisRequest):
+    """
+    Analyze tweets for ocean hazards
+    - **limit**: Number of tweets to analyze (1-50)
+    - **query**: Custom search query (optional)
+    """
+    start_time = datetime.utcnow()
+    try:
+        logger.info(f"Starting analysis with limit: {request.limit}")
+        # Fetch tweets
+        if request.query:
+            # Use custom query if provided
+            from scraper import search_tweets, extract_tweets
+            result = search_tweets(request.query, limit=request.limit)
+            tweets = extract_tweets(result)
+        else:
+            # Use default hazard query
+            tweets = fetch_hazard_tweets(limit=request.limit)
+        logger.info(f"Fetched {len(tweets)} tweets")
+        # Classify tweets
+        results = classify_tweets(tweets)
+        logger.info(f"Classified {len(results)} tweets")
+        # Store hazardous tweets in database
+        hazardous_count = 0
+        try:
+            for r in results:
+                if r.get('hazardous') == 1:
+                    hazardous_count += 1
+                    hazards = (r.get('ner') or {}).get('hazards') or []
+                    hazard_type = ", ".join(hazards) if hazards else "unknown"
+                    locs = (r.get('ner') or {}).get('locations') or []
+                    if not locs and r.get('location'):
+                        locs = [r['location']]
+                    location = ", ".join(locs) if locs else "unknown"
+                    sentiment = r.get('sentiment') or {"label": "unknown", "score": 0.0}
+                    created_at = r.get('created_at') or ""
+                    tweet_date = ""
+                    tweet_time = ""
+                    if created_at:
+                        dt = None
+                        try:
+                            dt = parsedate_to_datetime(created_at)
+                        except Exception:
+                            dt = None
+                        if dt is None and 'T' in created_at:
+                            try:
+                                iso = created_at.replace('Z', '+00:00')
+                                dt = datetime.fromisoformat(iso)
+                            except Exception:
+                                dt = None
+                        if dt is not None:
+                            tweet_date = dt.date().isoformat()
+                            tweet_time = dt.time().strftime('%H:%M:%S')
+                    upsert_hazardous_tweet(
+                        tweet_url=r.get('tweet_url') or "",
+                        hazard_type=hazard_type,
+                        location=location,
+                        sentiment_label=sentiment.get('label', 'unknown'),
+                        sentiment_score=float(sentiment.get('score', 0.0)),
+                        tweet_date=tweet_date,
+                        tweet_time=tweet_time,
+                    )
+            logger.info(f"Stored {hazardous_count} hazardous tweets in database")
+        except Exception as db_error:
+            logger.warning(f"Database storage failed: {db_error}. Results will not be persisted.")
+        # Calculate processing time
+        processing_time = (datetime.utcnow() - start_time).total_seconds()
+        return TweetAnalysisResponse(
+            total_tweets=len(results),
+            hazardous_tweets=hazardous_count,
+            results=results,
+            processing_time=processing_time
+        )
+    except Exception as e:
+        logger.error(f"Analysis failed: {str(e)}")
+        raise HTTPException(status_code=500, detail=str(e))
+# Get stored hazardous tweets
+@app.get("/hazardous-tweets")
+async def get_hazardous_tweets(limit: int = 100, offset: int = 0):
+    """
+    Get stored hazardous tweets from database
+    - **limit**: Maximum number of tweets to return (default: 100)
+    - **offset**: Number of tweets to skip (default: 0)
+    """
+    try:
+        from pg_db import get_conn
+        with get_conn() as conn:
+            with conn.cursor() as cur:
+                cur.execute("""
+                    SELECT tweet_url, hazard_type, location, sentiment_label,
+                           sentiment_score, tweet_date, tweet_time, inserted_at
+                    FROM hazardous_tweets
+                    ORDER BY inserted_at DESC
+                    LIMIT %s OFFSET %s
+                """, (limit, offset))
+                columns = [desc[0] for desc in cur.description]
+                results = [dict(zip(columns, row)) for row in cur.fetchall()]
+                return {
+                    "tweets": results,
+                    "count": len(results),
+                    "limit": limit,
+                    "offset": offset
+                }
+    except Exception as e:
+        logger.error(f"Failed to fetch hazardous tweets: {str(e)}")
+        raise HTTPException(status_code=500, detail=str(e))
+# Get statistics
+@app.get("/stats")
+async def get_stats():
+    """Get analysis statistics"""
+    try:
+        from pg_db import get_conn
+        with get_conn() as conn:
+            with conn.cursor() as cur:
+                # Total hazardous tweets
+                cur.execute("SELECT COUNT(*) FROM hazardous_tweets")
+                total_hazardous = cur.fetchone()[0]
+                # By hazard type
+                cur.execute("""
+                    SELECT hazard_type, COUNT(*) as count
+                    FROM hazardous_tweets
+                    GROUP BY hazard_type
+                    ORDER BY count DESC
+                """)
+                hazard_types = [{"type": row[0], "count": row[1]} for row in cur.fetchall()]
+                # By location
+                cur.execute("""
+                    SELECT location, COUNT(*) as count
+                    FROM hazardous_tweets
+                    WHERE location != 'unknown'
+                    GROUP BY location
+                    ORDER BY count DESC
+                    LIMIT 10
+                """)
+                locations = [{"location": row[0], "count": row[1]} for row in cur.fetchall()]
+                # By sentiment
+                cur.execute("""
+                    SELECT sentiment_label, COUNT(*) as count
+                    FROM hazardous_tweets
+                    GROUP BY sentiment_label
+                    ORDER BY count DESC
+                """)
+                sentiments = [{"sentiment": row[0], "count": row[1]} for row in cur.fetchall()]
+                return {
+                    "total_hazardous_tweets": total_hazardous,
+                    "hazard_types": hazard_types,
+                    "top_locations": locations,
+                    "sentiment_distribution": sentiments
+                }
+    except Exception as e:
+        logger.error(f"Failed to fetch statistics: {str(e)}")
+        raise HTTPException(status_code=500, detail=str(e))
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)

app.py ADDED Viewed

	@@ -0,0 +1,197 @@

+import gradio as gr
+import json
+import os
+import logging
+from datetime import datetime
+from email.utils import parsedate_to_datetime
+# Set up logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+try:
+    from scraper import fetch_hazard_tweets
+    from classifier import classify_tweets
+    from pg_db import init_db, upsert_hazardous_tweet
+    # Initialize database (optional - will work without it)
+    try:
+        init_db()
+        logger.info("Database initialized successfully")
+    except Exception as e:
+        logger.warning(f"Database initialization failed: {e}. App will work without database persistence.")
+except ImportError as e:
+    logger.error(f"Failed to import required modules: {e}")
+    raise
+def run_pipeline(limit=20):
+    """Run the hazard detection pipeline"""
+    try:
+        logger.info(f"Starting pipeline with limit: {limit}")
+        tweets = fetch_hazard_tweets(limit=limit)
+        logger.info(f"Fetched {len(tweets)} tweets")
+        results = classify_tweets(tweets)
+        logger.info(f"Classified {len(results)} tweets")
+        # Store hazardous tweets in database (optional)
+        try:
+            hazardous_count = 0
+            for r in results:
+                if r.get('hazardous') == 1:
+                    hazardous_count += 1
+                    hazards = (r.get('ner') or {}).get('hazards') or []
+                    hazard_type = ", ".join(hazards) if hazards else "unknown"
+                    locs = (r.get('ner') or {}).get('locations') or []
+                    if not locs and r.get('location'):
+                        locs = [r['location']]
+                    location = ", ".join(locs) if locs else "unknown"
+                    sentiment = r.get('sentiment') or {"label": "unknown", "score": 0.0}
+                    created_at = r.get('created_at') or ""
+                    tweet_date = ""
+                    tweet_time = ""
+                    if created_at:
+                        dt = None
+                        try:
+                            dt = parsedate_to_datetime(created_at)
+                        except Exception:
+                            dt = None
+                        if dt is None and 'T' in created_at:
+                            try:
+                                iso = created_at.replace('Z', '+00:00')
+                                dt = datetime.fromisoformat(iso)
+                            except Exception:
+                                dt = None
+                        if dt is not None:
+                            tweet_date = dt.date().isoformat()
+                            tweet_time = dt.time().strftime('%H:%M:%S')
+                    upsert_hazardous_tweet(
+                        tweet_url=r.get('tweet_url') or "",
+                        hazard_type=hazard_type,
+                        location=location,
+                        sentiment_label=sentiment.get('label', 'unknown'),
+                        sentiment_score=float(sentiment.get('score', 0.0)),
+                        tweet_date=tweet_date,
+                        tweet_time=tweet_time,
+                    )
+            logger.info(f"Stored {hazardous_count} hazardous tweets in database")
+        except Exception as db_error:
+            logger.warning(f"Database storage failed: {db_error}. Results will not be persisted.")
+        return results
+    except Exception as e:
+        logger.error(f"Pipeline failed: {str(e)}")
+        return f"Error: {str(e)}"
+def analyze_tweets(limit):
+    """Gradio interface function to analyze tweets"""
+    try:
+        limit = int(limit) if limit else 20
+        results = run_pipeline(limit=limit)
+        if isinstance(results, str):  # Error case
+            return results, ""
+        # Count hazardous tweets
+        hazardous_count = sum(1 for r in results if r.get('hazardous') == 1)
+        total_count = len(results)
+        # Format results for display
+        display_text = f"Analyzed {total_count} tweets, found {hazardous_count} hazardous tweets.\n\n"
+        for i, result in enumerate(results, 1):
+            status = "🚨 HAZARDOUS" if result.get('hazardous') == 1 else "✅ Safe"
+            display_text += f"{i}. {status}\n"
+            display_text += f"   Text: {result.get('text', 'N/A')[:100]}...\n"
+            if result.get('translated_text'):
+                display_text += f"   Translated: {result.get('translated_text', 'N/A')[:100]}...\n"
+            if result.get('hazardous') == 1:
+                sentiment = result.get('sentiment', {})
+                display_text += f"   Sentiment: {sentiment.get('label', 'unknown')} ({sentiment.get('score', 0):.2f})\n"
+                ner = result.get('ner', {})
+                if ner.get('hazards'):
+                    display_text += f"   Hazards: {', '.join(ner.get('hazards', []))}\n"
+                if ner.get('locations'):
+                    display_text += f"   Locations: {', '.join(ner.get('locations', []))}\n"
+            display_text += f"   URL: {result.get('tweet_url', 'N/A')}\n\n"
+        # Create JSON output
+        json_output = json.dumps(results, indent=2, ensure_ascii=False)
+        return display_text, json_output
+    except Exception as e:
+        return f"Error: {str(e)}", ""
+# Health check endpoint
+def health_check():
+    """Simple health check for Docker"""
+    return {"status": "healthy", "message": "Ocean Hazard Detection System is running"}
+# Create Gradio interface
+with gr.Blocks(title="Ocean Hazard Detection", theme=gr.themes.Soft()) as demo:
+    gr.Markdown("""
+    # 🌊 Ocean Hazard Detection System
+    This system analyzes tweets to detect ocean-related hazards using AI. It:
+    - Scrapes tweets about ocean hazards from Indian coastal regions
+    - Classifies tweets as hazardous or safe using multilingual AI
+    - Translates non-English tweets to English
+    - Analyzes sentiment and extracts hazard types and locations
+    - Stores hazardous tweets in a database for tracking
+    **Note**: This demo uses a limited dataset. In production, it would analyze real-time tweets.
+    """)
+    with gr.Row():
+        with gr.Column():
+            limit_input = gr.Number(
+                label="Number of tweets to analyze",
+                value=10,
+                minimum=1,
+                maximum=50,
+                step=1
+            )
+            analyze_btn = gr.Button("🔍 Analyze Tweets", variant="primary")
+        with gr.Column():
+            gr.Markdown("### 📊 Analysis Results")
+            results_text = gr.Textbox(
+                label="Analysis Summary",
+                lines=15,
+                max_lines=20,
+                interactive=False
+            )
+    with gr.Row():
+        gr.Markdown("### 📄 Raw JSON Output")
+        json_output = gr.Textbox(
+            label="Complete Analysis Data (JSON)",
+            lines=10,
+            max_lines=15,
+            interactive=False
+        )
+    # Event handlers
+    analyze_btn.click(
+        fn=analyze_tweets,
+        inputs=[limit_input],
+        outputs=[results_text, json_output]
+    )
+    # Add some example queries
+    gr.Markdown("""
+    ### 🔍 What this system looks for:
+    - **Hazard Keywords**: flood, tsunami, cyclone, storm surge, high tide, high waves, swell, coastal flooding, rip current, coastal erosion, water discoloration, algal bloom, marine debris, pollution
+    - **Locations**: Mumbai, Chennai, Kolkata, Odisha, Kerala, Gujarat, Goa, Andhra Pradesh, West Bengal, Vizag, Puri, Bay of Bengal, Arabian Sea
+    - **Languages**: Supports 20+ Indian languages including Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, and English
+    """)
+if __name__ == "__main__":
+    # Add health check route
+    demo.launch(
+        server_name="0.0.0.0",  # Important for Docker
+        server_port=7860,       # Gradio default port
+        show_error=True,        # Show errors in the interface
+        share=False,            # Don't create public link
+        debug=True              # Enable debug mode
+    )

classifier.py ADDED Viewed

	@@ -0,0 +1,52 @@

+from transformers import pipeline
+from scraper import fetch_hazard_tweets
+from translate import translate_to_english
+from sentiment import classify_emotion_text
+from ner import extract_hazard_and_locations
+import json
+model_name = "joeddav/xlm-roberta-large-xnli"
+classifier = pipeline("zero-shot-classification", model=model_name,framework="pt")
+def classify_with_model(tweet_text):
+    """
+    Classifies a tweet using a MULTILINGUAL zero-shot learning model.
+    Returns 1 if hazardous, else 0.
+    """
+    if not tweet_text or not tweet_text.strip():
+        return 0
+    candidate_labels = ["report of an ocean hazard", "not an ocean hazard"]
+    result = classifier(tweet_text, candidate_labels)
+    top_label = result['labels'][0]
+    top_score = result['scores'][0]
+    if top_label == "report of an ocean hazard" and top_score > 0.75:
+        return 1
+    return 0
+def classify_tweets(tweets):
+    """
+    Accepts list of tweet dicts with 'text' field.
+    Pipeline: classify hazard -> if hazardous, translate -> sentiment -> NER.
+    Returns enriched dicts.
+    """
+    classified = []
+    for t in tweets:
+        text = t.get('text', '')
+        hazardous = classify_with_model(text)
+        item = dict(t)
+        item['hazardous'] = hazardous
+        translated = translate_to_english(text)
+        item['translated_text'] = translated
+        if hazardous == 1:
+            sentiment = classify_emotion_text(translated)
+            item['sentiment'] = sentiment
+            ner_info = extract_hazard_and_locations(translated)
+            item['ner'] = ner_info
+        classified.append(item)
+    return classified
+if __name__ == "__main__":
+    tweets = fetch_hazard_tweets(limit=20)
+    classified = classify_tweets(tweets)
+    print(json.dumps(classified, indent=2, ensure_ascii=False))

ner.py ADDED Viewed

	@@ -0,0 +1,125 @@

+from transformers import pipeline
+_ner_pipeline = None
+def get_ner_pipeline():
+    """
+    Lazily load and return NER pipeline for multilingual location extraction.
+    """
+    global _ner_pipeline
+    if _ner_pipeline is not None:
+        return _ner_pipeline
+    # Use a lighter multilingual model to avoid OOM on constrained machines
+    ner_model_name = "Davlan/bert-base-multilingual-cased-ner-hrl"
+    try:
+        _ner_pipeline = pipeline("ner", model=ner_model_name, aggregation_strategy="simple")
+    except Exception:
+        # Return None to allow regex/location keyword fallback downstream
+        _ner_pipeline = None
+    return _ner_pipeline
+def extract_hazard_and_locations(text):
+    """
+    Extract hazard keywords and locations from a single text.
+    Returns dict: {hazards: [..], locations: [..]}
+    """
+    if not text or not text.strip():
+        return {"hazards": [], "locations": []}
+    hazard_keywords = [
+        'Tsunami', 'High Waves', 'Coastal Flooding', 'Storm Surge',
+        'Rip Current', 'Coastal Erosion', 'Algal Bloom',
+        'Marine Pollution', 'Cyclone', 'flood'
+    ]
+    detected_hazards = []
+    text_lower = text.lower()
+    for hazard in hazard_keywords:
+        if hazard.lower() in text_lower:
+            detected_hazards.append(hazard)
+    ner = get_ner_pipeline()
+    locations = []
+    if ner is not None:
+        try:
+            ner_results = ner(text)
+            locations = [entity['word'] for entity in ner_results if entity.get('entity_group') == 'LOC']
+        except Exception:
+            locations = []
+    # Fallback: simple keyword-based location spotting if NER unavailable
+    if not locations:
+        location_keywords = [
+            "Mumbai","Chennai","Kolkata","Odisha","Kerala","Gujarat","Goa",
+            "Andhra Pradesh","West Bengal","Vizag","Visakhapatnam","Puri",
+            "Bay of Bengal","Arabian Sea","Tamil Nadu","Maharashtra","Karnataka",
+            "Andaman","Nicobar","Lakshadweep","Kochi","Cochin","Mangaluru","Mangalore",
+            "Chandipur","Paradip","Digha","Gopalpur"
+        ]
+        text_lower = text.lower()
+        for name in location_keywords:
+            if name.lower() in text_lower:
+                locations.append(name)
+    return {"hazards": detected_hazards, "locations": locations}
+# Removed hard-coded demo runner; this module now only provides reusable functions.
+    """
+    Loads a Named Entity Recognition (NER) model to find locations and then
+    searches the text for specific hazard-related keywords.
+    """
+    # --- 1. Load the NER Model for Location Extraction ---
+    # UPDATED: Using the large, high-accuracy model as requested.
+    ner_model_name = "Davlan/xlm-roberta-large-ner-hrl"
+    print(f"Loading NER model: '{ner_model_name}'...")
+    try:
+        ner_pipeline = pipeline("ner", model=ner_model_name, aggregation_strategy="simple")
+        print("NER model loaded successfully!")
+    except Exception as e:
+        print(f"Failed to load NER model. Error: {e}")
+        return
+    # --- 2. Define the Hazard Keywords to search for ---
+    # These are the exact phrases we will look for in the text.
+    hazard_keywords = [
+        'Tsunami', 'High Waves', 'Coastal Flooding', 'Storm Surge',
+        'Rip Current', 'Coastal Erosion', 'Algal Bloom',
+        'Marine Pollution', 'Cyclone', 'flood' # Added "flood" as a common variation
+    ]
+    # --- 3. Prepare Example Tweets for Analysis ---
+    tweets_to_analyze = [
+        "Major coastal flooding reported in Chennai due to the storm surge. All residents advised to stay indoors.",
+        "Authorities have issued a tsunami warning for the entire Odisha coastline after the earthquake.",
+        "The recent cyclone has caused severe coastal erosion near Puri beach.",
+        "मुंबई में ऊंची लहरों की चेतावनी है, कृपया समुद्र तट से दूर रहें।", # Hindi: "Warning of high waves in Mumbai, please stay away from the beach."
+        "Not a hazard: The sunset over the calm sea in Goa was beautiful today."
+    ]
+    print("\n--- Analyzing Tweets for Hazards and Locations ---")
+    for tweet in tweets_to_analyze:
+        try:
+            # --- Step 1: Extract Locations using the NER model ---
+            ner_results = ner_pipeline(tweet)
+            # Filter the results to get only the words identified as locations ('LOC').
+            locations = [entity['word'] for entity in ner_results if entity['entity_group'] == 'LOC']
+            # --- Step 2: Extract Hazard Keywords directly from the text ---
+            detected_hazards = []
+            tweet_lower = tweet.lower()
+            for hazard in hazard_keywords:
+                # Check if the hazard keyword exists in the tweet (case-insensitive)
+                if hazard.lower() in tweet_lower:
+                    detected_hazards.append(hazard)
+            # --- Print the structured results ---
+            print(f"Text: '{tweet}'")
+            print(f"  -> Location(s): {locations if locations else 'None Detected'}")
+            print(f"  -> Detected Hazard(s): {detected_hazards if detected_hazards else 'None Detected'}")
+            print("-" * 25)
+        except Exception as e:
+            print(f"Could not process tweet: '{tweet}'. Error: {e}")
+if __name__ == "__main__":
+    extract_hazard_info()

pg_db.py ADDED Viewed

	@@ -0,0 +1,133 @@

+import os
+from contextlib import contextmanager
+from datetime import datetime
+import psycopg2
+def _load_env_file(path: str = ".env"):
+    if not os.path.isfile(path):
+        return
+    try:
+        with open(path, "r", encoding="utf-8") as f:
+            for line in f:
+                line = line.strip()
+                if not line or line.startswith("#"):
+                    continue
+                if "=" in line:
+                    key, value = line.split("=", 1)
+                    key = key.strip()
+                    value = value.strip().strip('"').strip("'")
+                    if key and key not in os.environ:
+                        os.environ[key] = value
+    except Exception:
+        pass
+def _conn_params():
+    # Load .env into environment if present
+    _load_env_file()
+    # Check if we have Supabase URL (preferred method)
+    supabase_url = os.getenv('SUPABASE_URL')
+    if supabase_url:
+        # Extract connection details from Supabase URL
+        # Format: postgresql://postgres:[password]@[host]:[port]/postgres
+        import urllib.parse
+        parsed = urllib.parse.urlparse(supabase_url)
+        return dict(
+            host=parsed.hostname,
+            port=parsed.port or 5432,
+            dbname=parsed.path[1:] if parsed.path else 'postgres',
+            user=parsed.username or 'postgres',
+            password=parsed.password,
+            sslmode='require'  # Supabase requires SSL
+        )
+    else:
+        # Fallback to individual environment variables
+        return dict(
+            host=os.getenv("PGHOST", "localhost"),
+            port=int(os.getenv("PGPORT", "5432")),
+            dbname=os.getenv("PGDATABASE", "postgres"),
+            user=os.getenv("PGUSER", "postgres"),
+            password=os.getenv("PGPASSWORD", ""),
+            sslmode='require' if os.getenv('PGHOST') and 'supabase' in os.getenv('PGHOST', '') else 'prefer'
+        )
+@contextmanager
+def get_conn():
+    conn = psycopg2.connect(**_conn_params())
+    try:
+        yield conn
+        conn.commit()
+    finally:
+        conn.close()
+def init_db():
+    create_sql = """
+    CREATE TABLE IF NOT EXISTS hazardous_tweets (
+        id SERIAL PRIMARY KEY,
+        tweet_url TEXT UNIQUE,
+        hazard_type TEXT,
+        location TEXT,
+        sentiment_label TEXT,
+        sentiment_score DOUBLE PRECISION,
+        tweet_date DATE,
+        tweet_time TIME,
+        inserted_at TIMESTAMPTZ DEFAULT NOW()
+    );
+    """
+    try:
+        with get_conn() as conn:
+            with conn.cursor() as cur:
+                cur.execute(create_sql)
+                print("✅ Database table initialized successfully")
+    except Exception as e:
+        print(f"❌ Error initializing database: {e}")
+        print("💡 Try running: python fix_database.py")
+        raise
+def upsert_hazardous_tweet(
+    *,
+    tweet_url: str,
+    hazard_type: str,
+    location: str,
+    sentiment_label: str,
+    sentiment_score: float,
+    tweet_date: str,
+    tweet_time: str,
+):
+    """
+    Insert if new; ignore duplicates based on tweet_url.
+    """
+    insert_sql = """
+    INSERT INTO hazardous_tweets (
+        tweet_url, hazard_type, location, sentiment_label, sentiment_score,
+        tweet_date, tweet_time, inserted_at
+    ) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)
+    ON CONFLICT (tweet_url) DO NOTHING;
+    """
+    # Convert date/time strings to PostgreSQL-friendly formats
+    date_val = tweet_date if tweet_date else None
+    time_val = tweet_time if tweet_time else None
+    with get_conn() as conn:
+        with conn.cursor() as cur:
+            cur.execute(
+                insert_sql,
+                (
+                    tweet_url,
+                    hazard_type,
+                    location,
+                    sentiment_label,
+                    float(sentiment_score),
+                    date_val,
+                    time_val,
+                    datetime.utcnow().isoformat(timespec="seconds") + "Z",
+                ),
+            )

requirements.txt ADDED Viewed

	@@ -0,0 +1,29 @@

+# Core ML/NLP dependencies
+transformers>=4.30.0
+torch>=2.0.0
+tokenizers>=0.13.0
+sentencepiece>=0.1.99
+protobuf>=3.20.0
+# Database connectivity
+psycopg2-binary>=2.9.0
+# HTTP requests and API calls
+requests>=2.28.0
+fastapi>=0.100.0
+uvicorn>=0.20.0
+# Environment variable management
+python-dotenv>=1.0.0
+# Gradio for Hugging Face Spaces
+gradio>=3.40.0
+# Additional dependencies for specific models
+# Required for XLM-RoBERTa and multilingual models
+sacrebleu>=2.3.0
+sacremoses>=0.0.53
+# Additional utilities
+numpy>=1.24.0
+pandas>=2.0.0

scraper.py ADDED Viewed

	@@ -0,0 +1,92 @@

+import requests
+import json
+from datetime import date, timedelta
+from dotenv import load_dotenv
+import os
+# Load values from .env into environment
+load_dotenv()
+# Access the API key
+API_KEY = os.getenv("TWITTER_API_KEY")
+def search_tweets(query, query_type="Latest", limit=20):
+    """
+    Searches for tweets using the twitterapi.io advanced search endpoint.
+    """
+    url = "https://api.twitterapi.io/twitter/tweet/advanced_search"
+    headers = {"X-API-Key": API_KEY}
+    params = {"query": query, "queryType": query_type, "limit": limit}
+    print(f"🔍 Executing search with query: {query}")
+    response = requests.get(url, headers=headers, params=params)
+    if response.status_code == 200:
+        return response.json()
+    else:
+        print(f"Error: {response.status_code}")
+        print(response.text)
+        return None
+def extract_tweets(result_json):
+    """
+    Extracts a normalized list of tweets from the API result.
+    Returns a list of dicts with keys: tweet_url, location, created_at, text, hashtags
+    """
+    if not result_json or 'tweets' not in result_json:
+        return []
+    tweets = result_json.get('tweets', [])
+    extracted_data = []
+    for tweet in tweets:
+        tweet_url = tweet.get('url')
+        text = tweet.get('text')
+        created_at = tweet.get('createdAt')
+        location = tweet.get('author', {}).get('location', None)
+        hashtags = [tag['text'] for tag in tweet.get('entities', {}).get('hashtags', [])]
+        extracted_data.append({
+            'tweet_url': tweet_url,
+            'location': location,
+            'created_at': created_at,
+            'text': text,
+            'hashtags': hashtags
+        })
+    return extracted_data
+def build_default_query():
+    """
+    Builds the default hazard + India coastal locations + language + date query.
+    """
+    hazard_keywords = (
+    "(flood OR tsunami OR cyclone OR \"storm surge\" OR \"high tide\" OR \"high waves\" OR swell OR "
+    "\"coastal flooding\" OR \"rip current\" OR \"coastal erosion\" OR \"water discoloration\" OR "
+    "\"algal bloom\" OR \"marine debris\" OR pollution)"
+)
+    location_keywords = (
+        "(Mumbai OR Chennai OR Kolkata OR Odisha OR Kerala OR Gujarat OR Goa OR \"Andhra Pradesh\" "
+        "OR \"West Bengal\" OR Vizag OR Puri OR \"Bay of Bengal\" OR \"Arabian Sea\")"
+    )
+    allowed_languages = [
+        "as", "bn", "brx", "doi", "gu", "hi", "kn", "ks", "kok", "ml", "mni",
+        "mr", "ne", "or", "pa", "sa", "sat", "sd", "ta", "te", "ur", "en", "bh", "en"
+    ]
+    lang_query = "(" + " OR ".join([f"lang:{lang}" for lang in allowed_languages]) + ")"
+    yesterday = date.today() - timedelta(days=1)
+    date_filter = f"since:{yesterday.strftime('%Y-%m-%d')}"
+    full_query = f"{hazard_keywords} {location_keywords} {lang_query} {date_filter}"
+    return full_query
+def fetch_hazard_tweets(limit=20):
+    """
+    Fetches tweets matching the default hazard query and returns extracted list.
+    """
+    query = build_default_query()
+    result = search_tweets(query=query, query_type="Latest", limit=limit)
+    return extract_tweets(result)
+if __name__ == "__main__":
+    tweets = fetch_hazard_tweets(limit=20)
+    if tweets:
+        print("\nExtracted tweets:")
+        print(json.dumps(tweets, indent=2, ensure_ascii=False))

sentiment.py ADDED Viewed

	@@ -0,0 +1,59 @@

+from transformers import pipeline
+_emotion_classifier = None
+def get_emotion_classifier():
+    """
+    Load (lazily) and return a text-classification pipeline for emotions.
+    Using GoEmotions for strong multilingual-ish coverage via RoBERTa base.
+    """
+    global _emotion_classifier
+    if _emotion_classifier is not None:
+        return _emotion_classifier
+    model_name = "SamLowe/roberta-base-go_emotions"
+    _emotion_classifier = pipeline("text-classification", model=model_name, framework="pt")
+    return _emotion_classifier
+def classify_emotion_text(text):
+    """
+    Classify a single text into one of: panic | calm | confusion | neutral | unknown
+    Returns dict: {label, score}
+    """
+    if not text or not text.strip():
+        return {"label": "unknown", "score": 0.0}
+    emotion_to_category = {
+        'fear': 'panic', 'nervousness': 'panic', 'remorse': 'panic',
+        'joy': 'calm', 'love': 'calm', 'admiration': 'calm', 'approval': 'calm',
+        'caring': 'calm', 'excitement': 'calm', 'gratitude': 'calm', 'optimism': 'calm',
+        'relief': 'calm', 'pride': 'calm',
+        'confusion': 'confusion', 'curiosity': 'confusion', 'realization': 'confusion',
+        'neutral': 'neutral',
+        'anger': 'unknown', 'annoyance': 'unknown', 'disappointment': 'unknown',
+        'disapproval': 'unknown', 'disgust': 'unknown', 'embarrassment': 'unknown',
+        'grief': 'unknown', 'sadness': 'unknown', 'surprise': 'unknown', 'desire': 'unknown'
+    }
+    classifier = get_emotion_classifier()
+    try:
+        result = classifier(text)
+        top_label = result[0]['label']
+        top_score = float(result[0]['score'])
+    except Exception:
+        return {"label": "unknown", "score": 0.0}
+    mapped = emotion_to_category.get(top_label, 'unknown')
+    return {"label": mapped, "score": top_score}
+if __name__ == "__main__":
+    # Simple demo
+    examples = [
+        "Cyclone warning issued; please evacuate immediately.",
+        "Beautiful calm sea today.",
+        "Why is the alert not clear?",
+        "Meeting at 3 PM.",
+    ]
+    clf = get_emotion_classifier()
+    for ex in examples:
+        print(ex, "->", classify_emotion_text(ex))

translate.py ADDED Viewed

	@@ -0,0 +1,69 @@

+from transformers import pipeline
+_translator = None
+def get_translator():
+    """
+    Lazily load and return a multilingual→English translation pipeline.
+    """
+    global _translator
+    if _translator is not None:
+        return _translator
+    model_name = "Helsinki-NLP/opus-mt-mul-en"
+    _translator = pipeline("translation", model=model_name)
+    return _translator
+def translate_to_english(text):
+    """
+    Translate a single text to English. Returns the translated string.
+    If text is empty, returns the original text.
+    """
+    if not text or not text.strip():
+        return text
+    translator = get_translator()
+    try:
+        return translator(text)[0]['translation_text']
+    except Exception:
+        return text
+def translate_indian_languages():
+    """
+    Loads a highly reliable multilingual model to translate text from various
+    languages into English.
+    """
+    # This model is from the Helsinki-NLP group, the standard for translation tasks.
+    # It handles multiple source languages automatically without needing special tags.
+    print(f"Loading translation model for demo...")
+    try:
+        translator = get_translator()
+        print("Model loaded successfully!")
+    except Exception as e:
+        print(f"Failed to load the model. Please check your internet connection and library installation. Error: {e}")
+        return
+    # --- Prepare a list of example sentences to translate ---
+    sentences_to_translate = [
+        "चेतावनी! चक्रवात तट के करीब आ रहा है, तुरंत खाली करें!", # Hindi
+        "আজ আবহাওয়া খুব মনোরম।", # Bengali
+        "మీరు ఎలా ఉన్నారు?", # Telugu
+        "The meeting is scheduled for 3 PM tomorrow.", # English (will be handled gracefully)
+    ]
+    print("\n--- Translating Sentences ---")
+    for sentence in sentences_to_translate:
+        try:
+            # --- SIMPLIFIED: No language detection needed ---
+            # This model automatically handles different source languages.
+            translated_text = translator(sentence)[0]['translation_text']
+            print(f"Original: '{sentence}'")
+            print(f"Translated: '{translated_text}'")
+            print("-" * 25)
+        except Exception as e:
+            print(f"Could not process sentence: '{sentence}'. Error: {e}")
+if __name__ == "__main__":
+    translate_indian_languages()