linkscout-backend / README.md
zpsajst's picture
Fix HuggingFace metadata - use valid colors
6227297
metadata
title: LinkScout Backend
emoji: πŸ”
colorFrom: yellow
colorTo: red
sdk: docker
pinned: false

LinkScout - Smart Analysis. Simple Answers.

The Ultimate AI-Powered Misinformation Detection Extension

LinkScout combines the best of both worlds - powerful AI analysis from Groq with pre-trained machine learning models to provide comprehensive fact-checking and misinformation detection.

πŸš€ Features

Dual AI Analysis System

  • Groq AI Agent: Advanced natural language understanding and reasoning
  • Pre-trained Models: RoBERTa, Emotion Analysis, NER, Hate Speech Detection, Clickbait Detection, Bias Detection

Revolutionary Detection (8 Phases)

  1. Linguistic Fingerprint Analysis: Detects manipulation patterns in text
  2. Claim-by-Claim Verification: Verifies individual claims against databases
  3. Source Credibility Analysis: Rates source reliability
  4. Entity Verification: Validates people, organizations, places
  5. Propaganda Detection: Identifies propaganda techniques
  6. Contradiction Detection: Finds logical inconsistencies
  7. Network Analysis: Detects bot/astroturfing patterns
  8. Reinforcement Learning: Learns from user feedback to improve accuracy

User Interface Features

  • Smart Paragraph Highlighting: Color-coded suspicious content detection
  • Sidebar Analysis Report: Comprehensive results without blocking the page
  • Real-time Google Search Integration: Verifies claims with recent sources
  • Interactive Results Display: Organized tabs for overview, details, and sources
  • One-Click Analysis: Analyze entire pages or paste text/URLs

Technical Capabilities

  • Chunk-based Analysis: Analyzes content paragraph-by-paragraph for precision
  • Multi-language Support: English, Hindi, Marathi, and 15+ Indian languages
  • Image Analysis: Detects AI-generated/manipulated images
  • Offline Database: Fast local verification of known false claims
  • Context-Aware Scoring: Adjusts detection based on content type and category

πŸ“¦ Installation

Prerequisites

  • Python 3.8+
  • Node.js (optional, for development)
  • Google Chrome or Microsoft Edge browser

Backend Setup

  1. Install Python Dependencies:
cd d:\mis_2\LinkScout
pip install -r requirements_mis.txt
pip install flask flask-cors requests beautifulsoup4 torch transformers pillow
  1. Download AI Models (if not already cached):
# Models will auto-download to D:\huggingface_cache
# Requires ~5GB disk space
python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('hamzab/roberta-fake-news-classification', cache_dir=r'D:\huggingface_cache')"
  1. Configure Google Search (optional):

  2. Start the Server:

python combined_server.py

Server will start at http://localhost:5000

Extension Installation

  1. Open Chrome/Edge
  2. Navigate to Extensions: chrome://extensions or edge://extensions
  3. Enable Developer Mode: Toggle in top-right corner
  4. Load Unpacked: Click button and select d:\mis_2\LinkScout\extension folder
  5. Pin Extension: Click puzzle icon and pin LinkScout for easy access

🎯 Usage

Method 1: Analyze Current Page

  1. Navigate to any news article or webpage
  2. Click the LinkScout extension icon
  3. Click "Scan Page"
  4. View results in popup and check highlighted suspicious content on page

Method 2: Paste Text or URL

  1. Click the LinkScout extension icon
  2. Paste text or URL in the input box
  3. Click "Analyze"
  4. Review comprehensive analysis results

Method 3: Highlight Suspicious Content

  1. After scanning a page, click "Highlight" button
  2. Suspicious paragraphs will be color-coded:
    • πŸ”΄ Red: High risk (>70% suspicious)
    • 🟑 Yellow: Medium risk (40-70% suspicious)
    • πŸ”΅ Blue: Low risk (<40% suspicious)
  3. Click "Clear" to remove highlights

Method 4: View Detailed Report

  • Analysis results appear in a sidebar on the right
  • Shows percentage score, verdict, summary, and flagged content
  • Includes Google search results for fact-checking

πŸ”§ Configuration

Server Configuration

Edit combined_server.py:

# Groq API Key (for AI analysis)
GROQ_API_KEY = 'your_groq_api_key_here'

# Change port if needed
app.run(host='0.0.0.0', port=5000, debug=False)

Extension Configuration

Edit extension/content.js:

const CONFIG = {
    API_ENDPOINT: 'http://localhost:5000/api/v1/analyze-chunks',
    REQUEST_TIMEOUT: 180000, // 3 minutes
    AUTO_SCAN_DELAY: 3000
};

πŸ“Š How It Works

Analysis Pipeline

  1. Content Extraction

    • Extracts all paragraphs, headings, and article text
    • Filters out navigation, ads, and boilerplate
  2. Multi-Model Analysis

    • RoBERTa: Fake news probability
    • Emotion Model: Sentiment and emotional manipulation
    • NER: Entity extraction and verification
    • Hate Speech: Toxic content detection
    • Clickbait: Sensationalism detection
    • Bias: Political/ideological bias detection
  3. Revolutionary Detection

    • Linguistic patterns (sentence structure, word choice)
    • Claim extraction and database verification
    • Source credibility scoring
    • Entity validation (real people/organizations)
    • Propaganda technique identification
    • Logical contradiction detection
    • Bot/astroturfing pattern analysis
  4. Google Research

    • Searches recent sources for claims
    • Compares against credible news outlets
    • Provides links for manual verification
  5. Scoring & Verdict

    • Combines all signals into final score (0-100%)
    • Determines verdict: FAKE, SUSPICIOUS, or REAL
    • Generates human-readable explanation
  6. Reinforcement Learning

    • Learns from user feedback
    • Improves accuracy over time
    • Adapts to new misinformation patterns

πŸŽ“ Understanding Results

Misinformation Percentage

  • 0-30%: Low Risk - Mostly Credible
  • 30-60%: Medium Risk - Verify Claims
  • 60-100%: High Risk - Likely Misinformation

Verdict Types

  • REAL: Content appears authentic and fact-checked
  • SUSPICIOUS: Mixed signals, requires verification
  • FAKE: Strong indicators of misinformation

Confidence Indicators

  • High confidence: Multiple models agree + external verification
  • Medium confidence: Some conflicting signals
  • Low confidence: Limited data or unclear content

πŸ› Troubleshooting

Server Won't Start

  • Check if port 5000 is available: netstat -ano | findstr :5000
  • Ensure Python dependencies are installed
  • Check for errors in terminal output

Extension Not Working

  • Verify server is running at http://localhost:5000
  • Check browser console for errors (F12 β†’ Console)
  • Try reloading the extension
  • Ensure you're on a valid webpage (not chrome:// pages)

Models Not Loading

  • Check disk space (requires ~5GB)
  • Verify D:\huggingface_cache directory exists and is writable
  • Run download script manually if needed

Slow Analysis

  • Large articles (>100 paragraphs) take 1-2 minutes
  • Check CPU/GPU usage
  • Consider reducing REQUEST_TIMEOUT for faster (less accurate) results

🀝 Contributing

This project combines features from two advanced misinformation detection systems. To contribute:

  1. Keep backend functionality intact - both systems are working correctly
  2. Test thoroughly before committing changes
  3. Maintain clean, organized frontend code
  4. Update documentation for new features

πŸ“ Credits

LinkScout combines:

  • MIS Extension: Groq AI agentic analysis, RL, image detection, revolutionary detection phases
  • MIS_2 Extension: Pre-trained models, chunk analysis, Google search, sidebar UI

Created by combining the best features of both systems into one powerful tool.

πŸ”’ Privacy & Security

  • All analysis is performed locally or through your own API keys
  • No data is collected or stored by LinkScout
  • Google Search API (if configured) follows Google's privacy policy
  • Groq API usage follows Groq's terms of service

πŸ“„ License

For educational and research purposes. Please respect API usage limits and terms of service.


LinkScout - Smart Analysis. Simple Answers. πŸ”βœ¨