{
 "cells": [
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "## CTI Agent",
   "id": "1e014677902bc4a2"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "## Set up",
   "id": "57d21ad42c51b7bb"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:09:48.553649Z",
     "start_time": "2025-09-24T14:09:40.747722Z"
    }
   },
   "cell_type": "code",
   "source": [
    "%%capture --no-stderr\n",
    "%pip install --quiet -U langgraph langchain-community langchain-google-genai langchain-tavily"
   ],
   "id": "64e62b8be724effb",
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING: Ignoring invalid distribution ~umpy (D:\\Swinburne University of Technology\\2025\\Swinburne Semester 2 2025\\COS30018 - Intelligent Systems\\Assignment\\Cyber-Agent\\.venv\\Lib\\site-packages)\n",
      "WARNING: Ignoring invalid distribution ~umpy (D:\\Swinburne University of Technology\\2025\\Swinburne Semester 2 2025\\COS30018 - Intelligent Systems\\Assignment\\Cyber-Agent\\.venv\\Lib\\site-packages)\n",
      "WARNING: Ignoring invalid distribution ~umpy (D:\\Swinburne University of Technology\\2025\\Swinburne Semester 2 2025\\COS30018 - Intelligent Systems\\Assignment\\Cyber-Agent\\.venv\\Lib\\site-packages)\n",
      "\n",
      "[notice] A new release of pip is available: 25.0.1 -> 25.2\n",
      "[notice] To update, run: python.exe -m pip install --upgrade pip\n"
     ]
    }
   ],
   "execution_count": 1
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:09:59.629541Z",
     "start_time": "2025-09-24T14:09:49.858591Z"
    }
   },
   "cell_type": "code",
   "source": [
    "import getpass\n",
    "import os\n",
    "\n",
    "def set_env_variable(var_name):\n",
    "    if var_name not in os.environ:\n",
    "        os.environ[var_name] = getpass.getpass(f\"{var_name}=\")\n",
    "\n",
    "set_env_variable(\"GEMINI_API_KEY\")\n",
    "set_env_variable(\"TAVILY_API_KEY\")"
   ],
   "id": "b9b8036f5182062b",
   "outputs": [],
   "execution_count": 2
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "### CTI Agent",
   "id": "b7ccb1c1f41b189"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:00.191781Z",
     "start_time": "2025-09-24T14:10:00.135222Z"
    }
   },
   "cell_type": "code",
   "source": [
    "from typing import List\n",
    "from typing_extensions import TypedDict\n",
    "\n",
    "class ReWOO(TypedDict):\n",
    "    task: str\n",
    "    plan_string: str\n",
    "    steps: List\n",
    "    results: dict\n",
    "    result: str"
   ],
   "id": "1ff523d16a86a18c",
   "outputs": [],
   "execution_count": 3
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "#### Planner",
   "id": "62b86e7dd440db74"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:30.386536Z",
     "start_time": "2025-09-24T14:10:00.376586Z"
    }
   },
   "cell_type": "code",
   "source": [
    "from langchain_google_genai import GoogleGenerativeAI\n",
    "\n",
    "llm = GoogleGenerativeAI(model=\"gemini-2.5-flash\", api_key=os.environ[\"GEMINI_API_KEY\"])"
   ],
   "id": "7ee558c30d4e1c2c",
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "D:\\Swinburne University of Technology\\2025\\Swinburne Semester 2 2025\\COS30018 - Intelligent Systems\\Assignment\\Cyber-Agent\\.venv\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "execution_count": 4
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:30.432069Z",
     "start_time": "2025-09-24T14:10:30.421360Z"
    }
   },
   "cell_type": "code",
   "source": [
    "prompt = \"\"\"For the following task, make plans that can solve the problem step by step. For each plan, indicate \\\n",
    "which external tool together with tool input to retrieve evidence. You can store the evidence into a \\\n",
    "variable #E that can be called by later tools. (Plan, #E1, Plan, #E2, Plan, ...)\n",
    "\n",
    "Tools can be one of the following:\n",
    "(1) Google[input]: Worker that searches results from Google. Useful when you need to find short\n",
    "and succinct answers about a specific topic. The input should be a search query.\n",
    "(2) LLM[input]: A pretrained LLM like yourself. Useful when you need to act with general\n",
    "world knowledge and common sense. Prioritize it when you are confident in solving the problem\n",
    "yourself. Input can be any instruction.\n",
    "\n",
    "For example,\n",
    "Task: Thomas, Toby, and Rebecca worked a total of 157 hours in one week. Thomas worked x\n",
    "hours. Toby worked 10 hours less than twice what Thomas worked, and Rebecca worked 8 hours\n",
    "less than Toby. How many hours did Rebecca work?\n",
    "Plan: Given Thomas worked x hours, translate the problem into algebraic expressions and solve\n",
    "with Wolfram Alpha. #E1 = WolframAlpha[Solve x + (2x − 10) + ((2x − 10) − 8) = 157]\n",
    "Plan: Find out the number of hours Thomas worked. #E2 = LLM[What is x, given #E1]\n",
    "Plan: Calculate the number of hours Rebecca worked. #E3 = Calculator[(2 ∗ #E2 − 10) − 8]\n",
    "\n",
    "Begin!\n",
    "Describe your plans with rich details. Each Plan should be followed by only one #E.\n",
    "\n",
    "Task: {task}\"\"\""
   ],
   "id": "320871448adc80c",
   "outputs": [],
   "execution_count": 5
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:30.518680Z",
     "start_time": "2025-09-24T14:10:30.508496Z"
    }
   },
   "cell_type": "code",
   "source": "task = \"What are the latest CTI reports of the ATP that uses the T1566.002: Spearphishing Links techniques?\"",
   "id": "cfbfbc30cd1f2a2d",
   "outputs": [],
   "execution_count": 6
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:36.513049Z",
     "start_time": "2025-09-24T14:10:30.637595Z"
    }
   },
   "cell_type": "code",
   "source": "result = llm.invoke(prompt.format(task=task))",
   "id": "cb8c925be339d309",
   "outputs": [],
   "execution_count": 7
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:36.543369Z",
     "start_time": "2025-09-24T14:10:36.536547Z"
    }
   },
   "cell_type": "code",
   "source": "print(result)",
   "id": "77cfb38f9b210b50",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Plan: Search for the latest CTI reports that specifically mention ATP groups using the T1566.002: Spearphishing Links technique. I will prioritize recent publications.\n",
      "#E1 = Google[latest CTI reports ATP T1566.002 Spearphishing Links]\n",
      "Plan: Review the search results from #E1 to identify relevant reports from reputable cybersecurity intelligence sources. I will look for titles or snippets that indicate a focus on ATP activities and the specified MITRE ATT&CK technique. I will then extract the most pertinent information about the ATPs and their use of T1566.002.\n",
      "#E2 = LLM[Analyze the search results from #E1 to identify specific CTI reports (title, source, date) that discuss ATPs using T1566.002: Spearphishing Links. Summarize the key findings from these reports, mentioning any specific ATP groups identified.]\n"
     ]
    }
   ],
   "execution_count": 8
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "#### Planner Node",
   "id": "9e462bfcf2ec91f4"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:36.743644Z",
     "start_time": "2025-09-24T14:10:36.631943Z"
    }
   },
   "cell_type": "code",
   "source": [
    "import re\n",
    "\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "\n",
    "# Regex to match expressions of the form E#... = ...[...]\n",
    "regex_pattern = r\"Plan:\\s*(.+)\\s*(#E\\d+)\\s*=\\s*(\\w+)\\s*\\[([^\\]]+)\\]\"\n",
    "prompt_template = ChatPromptTemplate.from_messages([(\"user\", prompt)])\n",
    "planner = prompt_template | llm\n",
    "\n",
    "\n",
    "def get_plan(state: ReWOO):\n",
    "    task = state[\"task\"]\n",
    "    result = planner.invoke({\"task\": task})\n",
    "    # Find all matches in the sample text\n",
    "    matches = re.findall(regex_pattern, result)\n",
    "    return {\"steps\": matches, \"plan_string\": result}"
   ],
   "id": "5c3693b5fd44aefa",
   "outputs": [],
   "execution_count": 9
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "### Executor",
   "id": "ca86ebf96a47fff6"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:36.918073Z",
     "start_time": "2025-09-24T14:10:36.775677Z"
    }
   },
   "cell_type": "code",
   "source": [
    "from langchain_tavily import TavilySearch\n",
    "\n",
    "search_config = {\n",
    "    \"api_key\": os.environ[\"TAVILY_API_KEY\"],\n",
    "    \"max_results\": 10,\n",
    "    \"search_depth\": \"advanced\",\n",
    "    \"include_raw_content\": True\n",
    "}\n",
    "\n",
    "search = TavilySearch(**search_config)"
   ],
   "id": "b7367781aeac5c5",
   "outputs": [],
   "execution_count": 10
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:36.964885Z",
     "start_time": "2025-09-24T14:10:36.953023Z"
    }
   },
   "cell_type": "code",
   "source": [
    "def _get_current_task(state: ReWOO):\n",
    "    if \"results\" not in state or state[\"results\"] is None:\n",
    "        return 1\n",
    "    if len(state[\"results\"]) == len(state[\"steps\"]):\n",
    "        return None\n",
    "    else:\n",
    "        return len(state[\"results\"]) + 1\n",
    "\n",
    "\n",
    "def tool_execution(state: ReWOO):\n",
    "    \"\"\"Worker node that executes the tools of a given plan.\"\"\"\n",
    "    _step = _get_current_task(state)\n",
    "    _, step_name, tool, tool_input = state[\"steps\"][_step - 1]\n",
    "    _results = (state[\"results\"] or {}) if \"results\" in state else {}\n",
    "    for k, v in _results.items():\n",
    "        tool_input = tool_input.replace(k, v)\n",
    "    if tool == \"Google\":\n",
    "        result = search.invoke(tool_input)\n",
    "    elif tool == \"LLM\":\n",
    "        result = llm.invoke(tool_input)\n",
    "    else:\n",
    "        raise ValueError\n",
    "    _results[step_name] = str(result)\n",
    "    return {\"results\": _results}"
   ],
   "id": "efb45424fa750ce5",
   "outputs": [],
   "execution_count": 11
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "### Solver",
   "id": "4cf82df72d40e9cd"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:37.018935Z",
     "start_time": "2025-09-24T14:10:37.008762Z"
    }
   },
   "cell_type": "code",
   "source": [
    "solve_prompt = \"\"\"Solve the following task or problem. To solve the problem, we have made step-by-step Plan and \\\n",
    "retrieved corresponding Evidence to each Plan. Use them with caution since long evidence might \\\n",
    "contain irrelevant information.\n",
    "\n",
    "{plan}\n",
    "\n",
    "Now solve the question or task according to provided Evidence above. Respond with the answer\n",
    "directly with no extra words.\n",
    "\n",
    "Task: {task}\n",
    "Response:\"\"\"\n",
    "\n",
    "\n",
    "def solve(state: ReWOO):\n",
    "    plan = \"\"\n",
    "    for _plan, step_name, tool, tool_input in state[\"steps\"]:\n",
    "        _results = (state[\"results\"] or {}) if \"results\" in state else {}\n",
    "        for k, v in _results.items():\n",
    "            tool_input = tool_input.replace(k, v)\n",
    "            step_name = step_name.replace(k, v)\n",
    "        plan += f\"Plan: {_plan}\\n{step_name} = {tool}[{tool_input}]\"\n",
    "    prompt = solve_prompt.format(plan=plan, task=state[\"task\"])\n",
    "    result = llm.invoke(prompt)\n",
    "    return {\"result\": result}"
   ],
   "id": "b545c04c30414789",
   "outputs": [],
   "execution_count": 12
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "### Define Graph",
   "id": "3b3fbec2f9880412"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:37.080389Z",
     "start_time": "2025-09-24T14:10:37.071333Z"
    }
   },
   "cell_type": "code",
   "source": [
    "def _route(state):\n",
    "    _step = _get_current_task(state)\n",
    "    if _step is None:\n",
    "        # We have executed all tasks\n",
    "        return \"solve\"\n",
    "    else:\n",
    "        # We are still executing tasks, loop back to the \"tool\" node\n",
    "        return \"tool\""
   ],
   "id": "6fee70503c849ab",
   "outputs": [],
   "execution_count": 13
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:37.812966Z",
     "start_time": "2025-09-24T14:10:37.134773Z"
    }
   },
   "cell_type": "code",
   "source": [
    "from langgraph.graph import END, StateGraph, START\n",
    "\n",
    "graph = StateGraph(ReWOO)\n",
    "graph.add_node(\"plan\", get_plan)\n",
    "graph.add_node(\"tool\", tool_execution)\n",
    "graph.add_node(\"solve\", solve)\n",
    "graph.add_edge(\"plan\", \"tool\")\n",
    "graph.add_edge(\"solve\", END)\n",
    "graph.add_conditional_edges(\"tool\", _route)\n",
    "graph.add_edge(START, \"plan\")\n",
    "\n",
    "app = graph.compile()"
   ],
   "id": "a10ad4abef949d17",
   "outputs": [],
   "execution_count": 14
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:10:37.864440Z",
     "start_time": "2025-09-24T14:10:37.849889Z"
    }
   },
   "cell_type": "code",
   "source": [
    "from typing import Dict, Any\n",
    "\n",
    "def format_output(state: Dict[str, Any]) -> str:\n",
    "    \"\"\"Format the CTI agent output for better readability.\"\"\"\n",
    "    output = []\n",
    "\n",
    "    for node_name, node_data in state.items():\n",
    "        output.append(f\"\\n🔹 **{node_name.upper()}**\")\n",
    "        output.append(\"=\" * 50)\n",
    "\n",
    "        if node_name == \"plan\":\n",
    "            if \"plan_string\" in node_data:\n",
    "                output.append(\"📋 **Generated Plan:**\")\n",
    "                output.append(node_data[\"plan_string\"])\n",
    "\n",
    "            if \"steps\" in node_data and node_data[\"steps\"]:\n",
    "                output.append(\"\\n📝 **Extracted Steps:**\")\n",
    "                for i, (plan, step_name, tool, tool_input) in enumerate(node_data[\"steps\"], 1):\n",
    "                    output.append(f\"  {i}. {plan}\")\n",
    "                    output.append(f\"     🔧 {step_name} = {tool}[{tool_input}]\")\n",
    "\n",
    "        elif node_name == \"tool\":\n",
    "            if \"results\" in node_data:\n",
    "                output.append(\"🔍 **Execution Results:**\")\n",
    "                for step_name, result in node_data[\"results\"].items():\n",
    "                    output.append(f\"  {step_name}:\")\n",
    "                    # Truncate long results for readability\n",
    "                    result_str = str(result)\n",
    "                    if len(result_str) > 500:\n",
    "                        result_str = result_str[:500] + \"... [truncated]\"\n",
    "                    output.append(f\"    {result_str}\")\n",
    "\n",
    "        elif node_name == \"solve\":\n",
    "            if \"result\" in node_data:\n",
    "                output.append(\"✅ **Final Answer:**\")\n",
    "                output.append(node_data[\"result\"])\n",
    "\n",
    "        output.append(\"\")\n",
    "\n",
    "    return \"\\n\".join(output)\n"
   ],
   "id": "30f337a626e2fbf9",
   "outputs": [],
   "execution_count": 15
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-09-24T14:11:24.978749Z",
     "start_time": "2025-09-24T14:10:37.901866Z"
    }
   },
   "cell_type": "code",
   "source": [
    "print(\"**CTI Agent Execution**\")\n",
    "print(\"=\" * 60)\n",
    "\n",
    "for s in app.stream({\"task\": task}):\n",
    "    formatted_output = format_output(s)\n",
    "    print(formatted_output)\n",
    "    print(\"-\" * 60)"
   ],
   "id": "b45aa62c23719738",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**CTI Agent Execution**\n",
      "============================================================\n",
      "\n",
      "🔹 **PLAN**\n",
      "==================================================\n",
      "📋 **Generated Plan:**\n",
      "Plan: Search for the latest CTI reports that specifically mention ATPs and the MITRE ATT&CK technique T1566.002 (Spearphishing Links). I will use keywords to narrow down the search to recent publications.\n",
      "#E1 = Google[latest CTI reports ATP T1566.002 \"Spearphishing Links\" 2023 2024]\n",
      "Plan: Review the search results from #E1 to identify specific CTI reports from reputable sources (e.g., major cybersecurity vendors, government agencies) that discuss ATPs utilizing spearphishing links. Synthesize the key findings, including the names of ATPs and the context of their T1566.002 usage.\n",
      "#E2 = LLM[Based on the search results in #E1, identify and summarize the latest CTI reports that detail ATPs using T1566.002: Spearphishing Links. Include the names of the ATPs and a brief description of their activities related to this technique.]\n",
      "\n",
      "📝 **Extracted Steps:**\n",
      "  1. Search for the latest CTI reports that specifically mention ATPs and the MITRE ATT&CK technique T1566.002 (Spearphishing Links). I will use keywords to narrow down the search to recent publications.\n",
      "     🔧 #E1 = Google[latest CTI reports ATP T1566.002 \"Spearphishing Links\" 2023 2024]\n",
      "  2. Review the search results from #E1 to identify specific CTI reports from reputable sources (e.g., major cybersecurity vendors, government agencies) that discuss ATPs utilizing spearphishing links. Synthesize the key findings, including the names of ATPs and the context of their T1566.002 usage.\n",
      "     🔧 #E2 = LLM[Based on the search results in #E1, identify and summarize the latest CTI reports that detail ATPs using T1566.002: Spearphishing Links. Include the names of the ATPs and a brief description of their activities related to this technique.]\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n",
      "🔹 **TOOL**\n",
      "==================================================\n",
      "🔍 **Execution Results:**\n",
      "  #E1:\n",
      "    {'query': 'latest CTI reports ATP T1566.002 \"Spearphishing Links\" 2023 2024', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://attack.mitre.org/techniques/T1566/002/', 'title': 'Phishing: Spearphishing Link, Sub-technique T1566.002 - Enterprise', 'content': '| C0036 | Pikabot Distribution February 2024 | Pikabot Distribution February 2024 utilized emails with hyperlinks leading to malicious ZIP archive files containing scripts to download and install Pikabo... [truncated]\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n",
      "🔹 **TOOL**\n",
      "==================================================\n",
      "🔍 **Execution Results:**\n",
      "  #E1:\n",
      "    {'query': 'latest CTI reports ATP T1566.002 \"Spearphishing Links\" 2023 2024', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://attack.mitre.org/techniques/T1566/002/', 'title': 'Phishing: Spearphishing Link, Sub-technique T1566.002 - Enterprise', 'content': '| C0036 | Pikabot Distribution February 2024 | Pikabot Distribution February 2024 utilized emails with hyperlinks leading to malicious ZIP archive files containing scripts to download and install Pikabo... [truncated]\n",
      "  #E2:\n",
      "    Based on the provided search results, the following CTI reports detail APTs and campaigns using T1566.002 (Spearphishing Link) in 2023 and 2024:\n",
      "\n",
      "*   **Pikabot Distribution February 2024 (C0036):** This campaign, observed in **February 2024**, utilized emails with hyperlinks that led victims to malicious ZIP archive files. These archives contained scripts designed to download and install the Pikabot malware.\n",
      "*   **TA577 (G1037) / Latrodectus (S1160):** The threat group TA577, in campaigns report... [truncated]\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n",
      "🔹 **SOLVE**\n",
      "==================================================\n",
      "✅ **Final Answer:**\n",
      "The latest CTI reports of ATPs using the T1566.002 (Spearphishing Links) technique include:\n",
      "\n",
      "*   **Pikabot Distribution February 2024 (C0036):** This campaign, observed in February 2024, used emails with hyperlinks leading to malicious ZIP archive files for Pikabot malware distribution.\n",
      "*   **TA577 (G1037) / Latrodectus (S1160):** In April 2024, TA577 sent emails with malicious links to distribute Latrodectus malware via malicious JavaScript files.\n",
      "*   **Storm-1811 (G1046):** In May 2024, Storm-1811 distributed malicious links that redirected victims to EvilProxy-based phishing sites to harvest credentials.\n",
      "*   **OilRig (G0049) / APT34 / Earth Simnavaz:** This group continues to use spearphishing links. Recent activity under the name \"Earth Simnavaz\" was reported in October 2024, and \"Crambus\" (an associated group name) in October 2023.\n",
      "\n",
      "------------------------------------------------------------\n"
     ]
    }
   ],
   "execution_count": 16
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}