Spaces:

zpsajst
/

linkscout-backend

Running

App Files Files Community

linkscout-backend / README.md

zpsajst

Fix HuggingFace metadata - use valid colors

6227297 11 days ago

preview code

raw

history blame contribute delete

8.4 kB

metadata

title: LinkScout Backend
emoji: 🔍
colorFrom: yellow
colorTo: red
sdk: docker
pinned: false

LinkScout - Smart Analysis. Simple Answers.

The Ultimate AI-Powered Misinformation Detection Extension

LinkScout combines the best of both worlds - powerful AI analysis from Groq with pre-trained machine learning models to provide comprehensive fact-checking and misinformation detection.

🚀 Features

Dual AI Analysis System

Groq AI Agent: Advanced natural language understanding and reasoning
Pre-trained Models: RoBERTa, Emotion Analysis, NER, Hate Speech Detection, Clickbait Detection, Bias Detection

Revolutionary Detection (8 Phases)

Linguistic Fingerprint Analysis: Detects manipulation patterns in text
Claim-by-Claim Verification: Verifies individual claims against databases
Source Credibility Analysis: Rates source reliability
Entity Verification: Validates people, organizations, places
Propaganda Detection: Identifies propaganda techniques
Contradiction Detection: Finds logical inconsistencies
Network Analysis: Detects bot/astroturfing patterns
Reinforcement Learning: Learns from user feedback to improve accuracy

User Interface Features

Smart Paragraph Highlighting: Color-coded suspicious content detection
Sidebar Analysis Report: Comprehensive results without blocking the page
Real-time Google Search Integration: Verifies claims with recent sources
Interactive Results Display: Organized tabs for overview, details, and sources
One-Click Analysis: Analyze entire pages or paste text/URLs

Technical Capabilities

Chunk-based Analysis: Analyzes content paragraph-by-paragraph for precision
Multi-language Support: English, Hindi, Marathi, and 15+ Indian languages
Image Analysis: Detects AI-generated/manipulated images
Offline Database: Fast local verification of known false claims
Context-Aware Scoring: Adjusts detection based on content type and category

📦 Installation

Prerequisites

Python 3.8+
Node.js (optional, for development)
Google Chrome or Microsoft Edge browser

Backend Setup

Install Python Dependencies:

cd d:\mis_2\LinkScout
pip install -r requirements_mis.txt
pip install flask flask-cors requests beautifulsoup4 torch transformers pillow

Download AI Models (if not already cached):

# Models will auto-download to D:\huggingface_cache
# Requires ~5GB disk space
python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('hamzab/roberta-fake-news-classification', cache_dir=r'D:\huggingface_cache')"

Configure Google Search (optional):
- Get Google Custom Search API key from https://developers.google.com/custom-search
- Update google_config.json with your API key and CSE ID
Start the Server:

python combined_server.py

Server will start at http://localhost:5000

Extension Installation

Open Chrome/Edge
Navigate to Extensions: chrome://extensions or edge://extensions
Enable Developer Mode: Toggle in top-right corner
Load Unpacked: Click button and select d:\mis_2\LinkScout\extension folder
Pin Extension: Click puzzle icon and pin LinkScout for easy access

🎯 Usage

Method 1: Analyze Current Page

Navigate to any news article or webpage
Click the LinkScout extension icon
Click "Scan Page"
View results in popup and check highlighted suspicious content on page

Method 2: Paste Text or URL

Click the LinkScout extension icon
Paste text or URL in the input box
Click "Analyze"
Review comprehensive analysis results

Method 3: Highlight Suspicious Content

After scanning a page, click "Highlight" button
Suspicious paragraphs will be color-coded:
- 🔴 Red: High risk (>70% suspicious)
- 🟡 Yellow: Medium risk (40-70% suspicious)
- 🔵 Blue: Low risk (<40% suspicious)
Click "Clear" to remove highlights

Method 4: View Detailed Report

Analysis results appear in a sidebar on the right
Shows percentage score, verdict, summary, and flagged content
Includes Google search results for fact-checking

🔧 Configuration

Server Configuration

Edit combined_server.py:

# Groq API Key (for AI analysis)
GROQ_API_KEY = 'your_groq_api_key_here'

# Change port if needed
app.run(host='0.0.0.0', port=5000, debug=False)

Extension Configuration

Edit extension/content.js:

const CONFIG = {
    API_ENDPOINT: 'http://localhost:5000/api/v1/analyze-chunks',
    REQUEST_TIMEOUT: 180000, // 3 minutes
    AUTO_SCAN_DELAY: 3000
};

📊 How It Works

Analysis Pipeline

Content Extraction
- Extracts all paragraphs, headings, and article text
- Filters out navigation, ads, and boilerplate
Multi-Model Analysis
- RoBERTa: Fake news probability
- Emotion Model: Sentiment and emotional manipulation
- NER: Entity extraction and verification
- Hate Speech: Toxic content detection
- Clickbait: Sensationalism detection
- Bias: Political/ideological bias detection
Revolutionary Detection
- Linguistic patterns (sentence structure, word choice)
- Claim extraction and database verification
- Source credibility scoring
- Entity validation (real people/organizations)
- Propaganda technique identification
- Logical contradiction detection
- Bot/astroturfing pattern analysis
Google Research
- Searches recent sources for claims
- Compares against credible news outlets
- Provides links for manual verification
Scoring & Verdict
- Combines all signals into final score (0-100%)
- Determines verdict: FAKE, SUSPICIOUS, or REAL
- Generates human-readable explanation
Reinforcement Learning
- Learns from user feedback
- Improves accuracy over time
- Adapts to new misinformation patterns

🎓 Understanding Results

Misinformation Percentage

0-30%: Low Risk - Mostly Credible
30-60%: Medium Risk - Verify Claims
60-100%: High Risk - Likely Misinformation

Verdict Types

REAL: Content appears authentic and fact-checked
SUSPICIOUS: Mixed signals, requires verification
FAKE: Strong indicators of misinformation

Confidence Indicators

High confidence: Multiple models agree + external verification
Medium confidence: Some conflicting signals
Low confidence: Limited data or unclear content

🐛 Troubleshooting

Server Won't Start

Check if port 5000 is available: netstat -ano | findstr :5000
Ensure Python dependencies are installed
Check for errors in terminal output

Extension Not Working

Verify server is running at http://localhost:5000
Check browser console for errors (F12 → Console)
Try reloading the extension
Ensure you're on a valid webpage (not chrome:// pages)

Models Not Loading

Check disk space (requires ~5GB)
Verify D:\huggingface_cache directory exists and is writable
Run download script manually if needed

Slow Analysis

Large articles (>100 paragraphs) take 1-2 minutes
Check CPU/GPU usage
Consider reducing REQUEST_TIMEOUT for faster (less accurate) results

🤝 Contributing

This project combines features from two advanced misinformation detection systems. To contribute:

Keep backend functionality intact - both systems are working correctly
Test thoroughly before committing changes
Maintain clean, organized frontend code
Update documentation for new features

📝 Credits

LinkScout combines:

MIS Extension: Groq AI agentic analysis, RL, image detection, revolutionary detection phases
MIS_2 Extension: Pre-trained models, chunk analysis, Google search, sidebar UI

Created by combining the best features of both systems into one powerful tool.

🔒 Privacy & Security

All analysis is performed locally or through your own API keys
No data is collected or stored by LinkScout
Google Search API (if configured) follows Google's privacy policy
Groq API usage follows Groq's terms of service

📄 License

For educational and research purposes. Please respect API usage limits and terms of service.

LinkScout - Smart Analysis. Simple Answers. 🔍✨