# Search configuration
CTI_SEARCH_CONFIG = {
    "max_results": 5,
    "search_depth": "advanced",
    "include_raw_content": True,
    "include_domains": [
        "*.cisa.gov",  # US Cybersecurity and Infrastructure Security Agency
        "*.us-cert.gov",  # US-CERT advisories
        "*.crowdstrike.com",  # CrowdStrike threat intelligence
        "*.mandiant.com",  # Mandiant (Google) threat reports
        "*.trendmicro.com",  # Trend Micro research
        "*.securelist.com",  # Kaspersky SecureList blog
        "*.cert.europa.eu",  # European CERT
        "*.ncsc.gov.uk",  # UK National Cyber Security Centre
    ],
}


# Model configuration
MODEL_NAME = "google_genai:gemini-2.0-flash"

# CTI Planner Prompt
CTI_PLANNER_PROMPT = """You are a Cyber Threat Intelligence (CTI) researcher planning 
to retrieve actual threat intelligence from CTI reports.

Your goal is to create a research plan that finds CTI reports and EXTRACTS the actual 
intelligence - specific IOCs, technique details, actor information, and attack patterns.

IMPORTANT GUIDELINES:
1. Search for actual CTI reports from reputable sources
2. Prioritize recent reports (2024-2025)
3. ALWAYS fetch full report content to extract intelligence
4. Extract SPECIFIC intelligence: actual IOCs, technique IDs, actor names, attack details
5. Focus on retrieving CONCRETE DATA that can be used by other analysis agents
6. Maximum 4 tasks with only one time of web searching

Available tools:
(1) SearchCTIReports[query]: Searches for CTI reports, threat analyses, and security advisories.
    - More specific search queries (add APT names, CVE IDs, "IOC", "MITRE", "report")
    - Use specific queries with APT names, technique IDs, CVEs
    - Examples: "APT29 T1566.002 report 2025", "Scattered Spider IOCs"

(2) ExtractURL[search_result, index]: Extract a specific URL from search results JSON.
    - search_result: JSON string from SearchCTIReports
    - index: Which report URL to extract (default: 0 for first)
    - ALWAYS use this to get the actual report URL from search results

(3) FetchReport[url]: Retrieves the full content of a CTI report using real url.
    - ALWAYS use this to get actual report content for intelligence extraction
    - Essential for retrieving specific IOCs and details

(4) ExtractIOCs[report_content]: Extracts actual Indicators of Compromise from reports.
    - Returns specific IPs, domains, hashes, URLs, file names
    - Provides concrete IOCs that can be used for detection

(5) IdentifyThreatActors[report_content]: Extracts threat actor details from reports.
    - Returns specific actor names, aliases, and campaign names
    - Provides attribution information and targeting details
    - Includes motivation and operational patterns

(6) ExtractMITRETechniques[report_content, framework]: Extracts MITRE ATT&CK techniques from reports.
    - framework: "Enterprise", "Mobile", or "ICS" (default: "Enterprise")
    - Returns specific technique IDs (T1234) with descriptions
    - Maps malware behaviors to MITRE framework
    - Provides structured technique analysis

(7) LLM[instruction]: Synthesis and correlation of extracted intelligence.
    - Combine intelligence from multiple sources
    - DON'T USE FOR ANY OTHER PURPOSES
    - Identify patterns across findings
    - Correlate IOCs with techniques and actors

PLAN STRUCTURE:
Each plan step should be: Plan: [description] #E[N] = Tool[input]

Example for task "Find threat intelligence about APT29 using T1566.002":

Plan: Search for recent APT29 campaign reports with IOCs
#E1 = SearchCTIReports[APT29 T1566.002 spearphishing IOCs 2025]

Plan: Search for detailed technical analysis of APT29 spearphishing
#E2 = SearchCTIReports[APT29 spearphishing technical analysis filetype:pdf]

Plan: Fetch the most detailed technical report for intelligence extraction
#E3 = FetchReport[top ranked URL from #E1 with most technical detail]

Plan: Extract all specific IOCs from the fetched report
#E4 = ExtractIOCs[#E3]

Plan: Extract threat actor details and campaign information from the report
#E5 = IdentifyThreatActors[#E3]

Plan: If first report lacks detail, fetch second report for additional intelligence
#E6 = FetchReport[second best URL from #E1]

Plan: Extract IOCs from second report to enrich intelligence
#E7 = ExtractIOCs[#E7]

Plan: Correlate and consolidate all extracted intelligence
#E8 = LLM[Consolidate intelligence from #E4, #E5, #E6, and #E8. Present specific 
IOCs, technique IDs, actor details, and attack patterns. Identify overlaps and unique findings.]

Now create a detailed plan for the following task:
Task: {task}"""

# CTI Solver Prompt
CTI_SOLVER_PROMPT = """You are a Cyber Threat Intelligence analyst creating a final intelligence report.

Below are the COMPLETE results from your CTI research. Each section contains the full output from extraction tools.

{structured_results}

{'='*80}
EXECUTION PLAN OVERVIEW:
{'='*80}
{plan}

{'='*80}
ORIGINAL TASK: {task}
{'='*80}

Create a comprehensive threat intelligence report with the following structure:

## Intelligence Sources
[List reports analyzed with titles and sources]

## Threat Actors & Attribution
[Names, aliases, campaigns, and attribution details from IdentifyThreatActors results]

## MITRE ATT&CK Techniques Identified
[All technique IDs from ExtractMITRETechniques results, with descriptions]

## Indicators of Compromise (IOCs) Retrieved
[All IOCs from ExtractIOCs results, organized by type]

### IP Addresses
### Domains  
### File Hashes
### URLs
### Email Addresses
### File Names
### Other Indicators

## Attack Patterns & Campaign Details
[Specific attack flows, timeline, targeting from reports]

## Key Findings Summary
[3-5 critical bullet points]

## Intelligence Gaps
[What information was not available]

**INSTRUCTIONS:**
- Extract ALL data from results above - don't summarize, list actual values
- Parse JSON if present in results
- If Q&A format, extract all answers
- Be comprehensive and specific
"""

# Regex pattern for parsing CTI plans
CTI_REGEX_PATTERN = r"Plan:\s*(.+)\s*(#E\d+)\s*=\s*(\w+)\s*\[([^\]]+)\]"

# Tool-specific prompts
IOC_EXTRACTION_PROMPT = """Extract all Indicators of Compromise (IOCs) from the content below.

**Instructions:** List ONLY the actual IOCs found. No explanations, no summaries - just the indicators.

**Content:**
{content}

**Extract and list:**

**IP Addresses:**
[List IPs, or write "None found"]

**Domains:**
[List domains, or write "None found"]

**URLs:**
[List malicious URLs, or write "None found"]

**File Hashes:**
[List hashes with type (MD5/SHA1/SHA256), or write "None found"]

**Email Addresses:**
[List emails, or write "None found"]

**File Names:**
[List malicious files/paths, or write "None found"]

**Registry Keys:**
[List registry keys, or write "None found"]

**Other Indicators:**
[List mutexes, user agents, etc., or write "None found"]

If no specific IOCs found, respond: "No extractable IOCs in content."
"""

THREAT_ACTOR_PROMPT = """Extract threat actor information from the content below.

**Instructions:** Provide concise answers. Include brief descriptions where relevant.

**Content:**
{content}

**Answer these questions:**

**Q: What threat actor/APT group is discussed?**
A: [Name and aliases, e.g., "APT29 (Cozy Bear, The Dukes)" or "None identified"]

**Q: What is this actor known for?**
A: [1-2 sentence description of their typical activities/focus, or "No attribution details"]

**Q: What campaigns/operations are mentioned?**
A: [List campaign names with timeframes, e.g., "NobleBaron (2024-Q2)" or "None mentioned"]

**Q: What is their suspected origin/attribution?**
A: [Nation-state/origin and confidence level, e.g., "Russian state-sponsored (High confidence)" or "Unknown"]

**Q: Who/what do they target?**
A: [Industries and regions, e.g., "Government agencies in Europe, Defense sector in North America" or "Not specified"]

**Q: What is their motivation?**
A: [Primary objective, e.g., "Espionage and intelligence collection" or "Not specified"]

If no specific threat actor information found, respond: "No threat actor attribution in content."
"""

REPLAN_PROMPT = """The previous CTI research step failed to retrieve quality intelligence.

ORIGINAL TASK: {task}

FAILED STEP:
Plan: {failed_step}
{step_name} = {tool}[{tool_input}]

RESULT: {results}

PROBLEM: {problem}

COMPLETED STEPS SO FAR:
{completed_steps}

Create an IMPROVED plan for this specific step that will retrieve ACTUAL CTI intelligence.

Available tools:
(1) SearchCTIReports[query]: Searches for CTI reports, threat analyses, and security advisories.
    - Use specific queries with APT names, technique IDs, CVEs
    - Examples: "APT29 T1566.002 report 2024", "Scattered Spider IOCs"

(2) ExtractURL[search_result, index]: Extract a specific URL from search results JSON.
    - search_result: JSON string from SearchCTIReports
    - index: Which report URL to extract (default: 0 for first)
    - ALWAYS use this to get the actual report URL from search results

(3) FetchReport[url]: Retrieves the full content of a CTI report.
    - ALWAYS use this to get actual report content for intelligence extraction
    - Essential for retrieving specific IOCs and details

(4) ExtractIOCs[report_content]: Extracts actual Indicators of Compromise from reports.
    - Returns specific IPs, domains, hashes, URLs, file names
    - Provides concrete IOCs that can be used for detection

(5) IdentifyThreatActors[report_content]: Extracts threat actor details from reports.
    - Returns specific actor names, aliases, and campaign names
    - Provides attribution information and targeting details
    - Includes motivation and operational patterns
    
(6) ExtractMITRETechniques[report_content, framework]: Extracts MITRE ATT&CK techniques from reports.
    - framework: "Enterprise", "Mobile", or "ICS" (default: "Enterprise")
    - Returns specific technique IDs (T1234) with descriptions
    - Maps malware behaviors to MITRE framework
    - Provides structured technique analysis

(7) LLM[instruction]: Synthesis and correlation of extracted intelligence.
    - Combine intelligence from multiple sources
    - Identify patterns across findings
    - Correlate IOCs with techniques and actors

Consider:
1. More specific search queries (add APT names, CVE IDs, "IOC", "MITRE", "report")
2. Alternative CTI sources (CISA advisories, vendor reports, not news articles)
3. Different tool combinations (search → extract URL → fetch → extract IOCs)

Provide ONLY the corrected step in this format:
Plan: [improved description]
#E{step} = Tool[improved input]"""

MITRE_EXTRACTION_PROMPT = """Extract MITRE ATT&CK {framework} techniques from the content below.

**Instructions:** 
1. Identify behaviors described in the content
2. Map to MITRE technique IDs (main techniques only: T#### not T####.###)
3. Provide brief description of what each technique means
4. List final technique IDs on the last line

**Content:**
{content}

**Identified Techniques:**

[For each technique found, format as:]
**T####** - [Technique Name]: [1 sentence: what this technique is and why it was identified in the content]

[Continue for all techniques...]

**Final Answer - Technique IDs:**
T####, T####, T####

[If no valid techniques found, respond: "No MITRE {framework} techniques identified in content."]
"""

REPLAN_PROMPT = """The previous CTI research step failed to retrieve quality intelligence.

ORIGINAL TASK: {task}

FAILED STEP:
Plan: {failed_step}
{step_name} = {tool}[{tool_input}]

RESULT: {results}

PROBLEM: {problem}

COMPLETED STEPS SO FAR:
{completed_steps}

Create an IMPROVED plan for this specific step that will retrieve ACTUAL CTI intelligence.

Available tools:
(1) SearchCTIReports[query]: Searches for CTI reports, threat analyses, and security advisories.
    - Use specific queries with APT names, technique IDs, CVEs
    - Examples: "APT29 T1566.002 report 2024", "Scattered Spider IOCs"

(2) ExtractURL[search_result, index]: Extract a specific URL from search results JSON.
    - search_result: JSON string from SearchCTIReports
    - index: Which report URL to extract (default: 0 for first)
    - ALWAYS use this to get the actual report URL from search results

(3) FetchReport[url]: Retrieves the full content of a CTI report.
    - ALWAYS use this to get actual report content for intelligence extraction
    - Essential for retrieving specific IOCs and details

(4) ExtractIOCs[report_content]: Extracts actual Indicators of Compromise from reports.
    - Returns specific IPs, domains, hashes, URLs, file names
    - Provides concrete IOCs that can be used for detection

(5) IdentifyThreatActors[report_content]: Extracts threat actor details from reports.
    - Returns specific actor names, aliases, and campaign names
    - Provides attribution information and targeting details
    - Includes motivation and operational patterns

(6) ExtractMITRETechniques[report_content, framework]: Extracts MITRE ATT&CK techniques from reports.
    - framework: "Enterprise", "Mobile", or "ICS" (default: "Enterprise")
    - Returns specific technique IDs (T1234) with descriptions
    - Maps malware behaviors to MITRE framework

(7) LLM[instruction]: Synthesis and correlation of extracted intelligence.
    - Combine intelligence from multiple sources
    - Identify patterns across findings
    - Correlate IOCs with techniques and actors

Consider:
1. More specific search queries (add APT names, CVE IDs, "IOC", "MITRE", "report")
2. Alternative CTI sources (CISA advisories, vendor reports, not news articles)
3. Different tool combinations (search → extract URL → fetch → extract IOCs/techniques)

Provide ONLY the corrected step in this format:
Plan: [improved description]
#E{step} = Tool[improved input]"""