# Search configuration CTI_SEARCH_CONFIG = { "max_results": 5, "search_depth": "advanced", "include_raw_content": True, "include_domains": [ "*.cisa.gov", # US Cybersecurity and Infrastructure Security Agency "*.us-cert.gov", # US-CERT advisories "*.crowdstrike.com", # CrowdStrike threat intelligence "*.mandiant.com", # Mandiant (Google) threat reports "*.trendmicro.com", # Trend Micro research "*.securelist.com", # Kaspersky SecureList blog "*.cert.europa.eu", # European CERT "*.ncsc.gov.uk", # UK National Cyber Security Centre ], } # Model configuration MODEL_NAME = "google_genai:gemini-2.0-flash" # CTI Planner Prompt CTI_PLANNER_PROMPT = """You are a Cyber Threat Intelligence (CTI) researcher planning to retrieve actual threat intelligence from CTI reports. Your goal is to create a research plan that finds CTI reports and EXTRACTS the actual intelligence - specific IOCs, technique details, actor information, and attack patterns. IMPORTANT GUIDELINES: 1. Search for actual CTI reports from reputable sources 2. Prioritize recent reports (2024-2025) 3. ALWAYS fetch full report content to extract intelligence 4. Extract SPECIFIC intelligence: actual IOCs, technique IDs, actor names, attack details 5. Focus on retrieving CONCRETE DATA that can be used by other analysis agents 6. Maximum 4 tasks with only one time of web searching Available tools: (1) SearchCTIReports[query]: Searches for CTI reports, threat analyses, and security advisories. - More specific search queries (add APT names, CVE IDs, "IOC", "MITRE", "report") - Use specific queries with APT names, technique IDs, CVEs - Examples: "APT29 T1566.002 report 2025", "Scattered Spider IOCs" (2) ExtractURL[search_result, index]: Extract a specific URL from search results JSON. - search_result: JSON string from SearchCTIReports - index: Which report URL to extract (default: 0 for first) - ALWAYS use this to get the actual report URL from search results (3) FetchReport[url]: Retrieves the full content of a CTI report using real url. - ALWAYS use this to get actual report content for intelligence extraction - Essential for retrieving specific IOCs and details (4) ExtractIOCs[report_content]: Extracts actual Indicators of Compromise from reports. - Returns specific IPs, domains, hashes, URLs, file names - Provides concrete IOCs that can be used for detection (5) IdentifyThreatActors[report_content]: Extracts threat actor details from reports. - Returns specific actor names, aliases, and campaign names - Provides attribution information and targeting details - Includes motivation and operational patterns (6) ExtractMITRETechniques[report_content, framework]: Extracts MITRE ATT&CK techniques from reports. - framework: "Enterprise", "Mobile", or "ICS" (default: "Enterprise") - Returns specific technique IDs (T1234) with descriptions - Maps malware behaviors to MITRE framework - Provides structured technique analysis (7) LLM[instruction]: Synthesis and correlation of extracted intelligence. - Combine intelligence from multiple sources - DON'T USE FOR ANY OTHER PURPOSES - Identify patterns across findings - Correlate IOCs with techniques and actors PLAN STRUCTURE: Each plan step should be: Plan: [description] #E[N] = Tool[input] Example for task "Find threat intelligence about APT29 using T1566.002": Plan: Search for recent APT29 campaign reports with IOCs #E1 = SearchCTIReports[APT29 T1566.002 spearphishing IOCs 2025] Plan: Search for detailed technical analysis of APT29 spearphishing #E2 = SearchCTIReports[APT29 spearphishing technical analysis filetype:pdf] Plan: Fetch the most detailed technical report for intelligence extraction #E3 = FetchReport[top ranked URL from #E1 with most technical detail] Plan: Extract all specific IOCs from the fetched report #E4 = ExtractIOCs[#E3] Plan: Extract threat actor details and campaign information from the report #E5 = IdentifyThreatActors[#E3] Plan: If first report lacks detail, fetch second report for additional intelligence #E6 = FetchReport[second best URL from #E1] Plan: Extract IOCs from second report to enrich intelligence #E7 = ExtractIOCs[#E7] Plan: Correlate and consolidate all extracted intelligence #E8 = LLM[Consolidate intelligence from #E4, #E5, #E6, and #E8. Present specific IOCs, technique IDs, actor details, and attack patterns. Identify overlaps and unique findings.] Now create a detailed plan for the following task: Task: {task}""" # CTI Solver Prompt CTI_SOLVER_PROMPT = """You are a Cyber Threat Intelligence analyst creating a final intelligence report. Below are the COMPLETE results from your CTI research. Each section contains the full output from extraction tools. {structured_results} {'='*80} EXECUTION PLAN OVERVIEW: {'='*80} {plan} {'='*80} ORIGINAL TASK: {task} {'='*80} Create a comprehensive threat intelligence report with the following structure: ## Intelligence Sources [List reports analyzed with titles and sources] ## Threat Actors & Attribution [Names, aliases, campaigns, and attribution details from IdentifyThreatActors results] ## MITRE ATT&CK Techniques Identified [All technique IDs from ExtractMITRETechniques results, with descriptions] ## Indicators of Compromise (IOCs) Retrieved [All IOCs from ExtractIOCs results, organized by type] ### IP Addresses ### Domains ### File Hashes ### URLs ### Email Addresses ### File Names ### Other Indicators ## Attack Patterns & Campaign Details [Specific attack flows, timeline, targeting from reports] ## Key Findings Summary [3-5 critical bullet points] ## Intelligence Gaps [What information was not available] **INSTRUCTIONS:** - Extract ALL data from results above - don't summarize, list actual values - Parse JSON if present in results - If Q&A format, extract all answers - Be comprehensive and specific """ # Regex pattern for parsing CTI plans CTI_REGEX_PATTERN = r"Plan:\s*(.+)\s*(#E\d+)\s*=\s*(\w+)\s*\[([^\]]+)\]" # Tool-specific prompts IOC_EXTRACTION_PROMPT = """Extract all Indicators of Compromise (IOCs) from the content below. **Instructions:** List ONLY the actual IOCs found. No explanations, no summaries - just the indicators. **Content:** {content} **Extract and list:** **IP Addresses:** [List IPs, or write "None found"] **Domains:** [List domains, or write "None found"] **URLs:** [List malicious URLs, or write "None found"] **File Hashes:** [List hashes with type (MD5/SHA1/SHA256), or write "None found"] **Email Addresses:** [List emails, or write "None found"] **File Names:** [List malicious files/paths, or write "None found"] **Registry Keys:** [List registry keys, or write "None found"] **Other Indicators:** [List mutexes, user agents, etc., or write "None found"] If no specific IOCs found, respond: "No extractable IOCs in content." """ THREAT_ACTOR_PROMPT = """Extract threat actor information from the content below. **Instructions:** Provide concise answers. Include brief descriptions where relevant. **Content:** {content} **Answer these questions:** **Q: What threat actor/APT group is discussed?** A: [Name and aliases, e.g., "APT29 (Cozy Bear, The Dukes)" or "None identified"] **Q: What is this actor known for?** A: [1-2 sentence description of their typical activities/focus, or "No attribution details"] **Q: What campaigns/operations are mentioned?** A: [List campaign names with timeframes, e.g., "NobleBaron (2024-Q2)" or "None mentioned"] **Q: What is their suspected origin/attribution?** A: [Nation-state/origin and confidence level, e.g., "Russian state-sponsored (High confidence)" or "Unknown"] **Q: Who/what do they target?** A: [Industries and regions, e.g., "Government agencies in Europe, Defense sector in North America" or "Not specified"] **Q: What is their motivation?** A: [Primary objective, e.g., "Espionage and intelligence collection" or "Not specified"] If no specific threat actor information found, respond: "No threat actor attribution in content." """ REPLAN_PROMPT = """The previous CTI research step failed to retrieve quality intelligence. ORIGINAL TASK: {task} FAILED STEP: Plan: {failed_step} {step_name} = {tool}[{tool_input}] RESULT: {results} PROBLEM: {problem} COMPLETED STEPS SO FAR: {completed_steps} Create an IMPROVED plan for this specific step that will retrieve ACTUAL CTI intelligence. Available tools: (1) SearchCTIReports[query]: Searches for CTI reports, threat analyses, and security advisories. - Use specific queries with APT names, technique IDs, CVEs - Examples: "APT29 T1566.002 report 2024", "Scattered Spider IOCs" (2) ExtractURL[search_result, index]: Extract a specific URL from search results JSON. - search_result: JSON string from SearchCTIReports - index: Which report URL to extract (default: 0 for first) - ALWAYS use this to get the actual report URL from search results (3) FetchReport[url]: Retrieves the full content of a CTI report. - ALWAYS use this to get actual report content for intelligence extraction - Essential for retrieving specific IOCs and details (4) ExtractIOCs[report_content]: Extracts actual Indicators of Compromise from reports. - Returns specific IPs, domains, hashes, URLs, file names - Provides concrete IOCs that can be used for detection (5) IdentifyThreatActors[report_content]: Extracts threat actor details from reports. - Returns specific actor names, aliases, and campaign names - Provides attribution information and targeting details - Includes motivation and operational patterns (6) ExtractMITRETechniques[report_content, framework]: Extracts MITRE ATT&CK techniques from reports. - framework: "Enterprise", "Mobile", or "ICS" (default: "Enterprise") - Returns specific technique IDs (T1234) with descriptions - Maps malware behaviors to MITRE framework - Provides structured technique analysis (7) LLM[instruction]: Synthesis and correlation of extracted intelligence. - Combine intelligence from multiple sources - Identify patterns across findings - Correlate IOCs with techniques and actors Consider: 1. More specific search queries (add APT names, CVE IDs, "IOC", "MITRE", "report") 2. Alternative CTI sources (CISA advisories, vendor reports, not news articles) 3. Different tool combinations (search → extract URL → fetch → extract IOCs) Provide ONLY the corrected step in this format: Plan: [improved description] #E{step} = Tool[improved input]""" MITRE_EXTRACTION_PROMPT = """Extract MITRE ATT&CK {framework} techniques from the content below. **Instructions:** 1. Identify behaviors described in the content 2. Map to MITRE technique IDs (main techniques only: T#### not T####.###) 3. Provide brief description of what each technique means 4. List final technique IDs on the last line **Content:** {content} **Identified Techniques:** [For each technique found, format as:] **T####** - [Technique Name]: [1 sentence: what this technique is and why it was identified in the content] [Continue for all techniques...] **Final Answer - Technique IDs:** T####, T####, T#### [If no valid techniques found, respond: "No MITRE {framework} techniques identified in content."] """ REPLAN_PROMPT = """The previous CTI research step failed to retrieve quality intelligence. ORIGINAL TASK: {task} FAILED STEP: Plan: {failed_step} {step_name} = {tool}[{tool_input}] RESULT: {results} PROBLEM: {problem} COMPLETED STEPS SO FAR: {completed_steps} Create an IMPROVED plan for this specific step that will retrieve ACTUAL CTI intelligence. Available tools: (1) SearchCTIReports[query]: Searches for CTI reports, threat analyses, and security advisories. - Use specific queries with APT names, technique IDs, CVEs - Examples: "APT29 T1566.002 report 2024", "Scattered Spider IOCs" (2) ExtractURL[search_result, index]: Extract a specific URL from search results JSON. - search_result: JSON string from SearchCTIReports - index: Which report URL to extract (default: 0 for first) - ALWAYS use this to get the actual report URL from search results (3) FetchReport[url]: Retrieves the full content of a CTI report. - ALWAYS use this to get actual report content for intelligence extraction - Essential for retrieving specific IOCs and details (4) ExtractIOCs[report_content]: Extracts actual Indicators of Compromise from reports. - Returns specific IPs, domains, hashes, URLs, file names - Provides concrete IOCs that can be used for detection (5) IdentifyThreatActors[report_content]: Extracts threat actor details from reports. - Returns specific actor names, aliases, and campaign names - Provides attribution information and targeting details - Includes motivation and operational patterns (6) ExtractMITRETechniques[report_content, framework]: Extracts MITRE ATT&CK techniques from reports. - framework: "Enterprise", "Mobile", or "ICS" (default: "Enterprise") - Returns specific technique IDs (T1234) with descriptions - Maps malware behaviors to MITRE framework (7) LLM[instruction]: Synthesis and correlation of extracted intelligence. - Combine intelligence from multiple sources - Identify patterns across findings - Correlate IOCs with techniques and actors Consider: 1. More specific search queries (add APT names, CVE IDs, "IOC", "MITRE", "report") 2. Alternative CTI sources (CISA advisories, vendor reports, not news articles) 3. Different tool combinations (search → extract URL → fetch → extract IOCs/techniques) Provide ONLY the corrected step in this format: Plan: [improved description] #E{step} = Tool[improved input]"""