bert-micro-cybersecurity

1. Model Details

Model description
"bert-micro-cybersecurity" is a compact transformer model adapted for cybersecurity text classification tasks (e.g., threat detection, incident reports, malicious vs benign content).

  • Model type: fine-tuned lightweight BERT variant
  • Languages: English & Indonesia
  • Finetuned from: boltuix/bert-micro
  • Status: Early version — trained on 53.61% of planned data.

Model sources

2. Uses

Direct use

You can use this model to classify cybersecurity-related text — for example, whether a given message, report or log entry indicates malicious intent, abnormal behaviour, or threat presence.

Downstream use

  • Embedding extraction for clustering.
  • Named Entity Recognition on log or security data.
  • Classification of security data.
  • Anomaly detection in security logs.
  • As part of a pipeline for phishing detection, malicious email filtering, incident triage.
  • As a feature extractor feeding a downstream system (e.g., alert-generation, SOC dashboard).

Out-of-scope use

  • Not meant for high-stakes automated blocking decisions without human review.
  • Not optimized for languages other than English and Indonesian.
  • Not tested for non-cybersecurity domains or out-of-distribution data.

Downstream Usecase in Development using this model

  • NER on security log, botnet data, and json data.
  • Early classification of SIEM alert & events.

3. Bias, Risks, and Limitations

Because the model is based on a small subset (53.61%) of planned data, performance is preliminary and may degrade on unseen or specialized domains (industrial control, IoT logs, foreign language).

  • Inherits any biases present in the base model (boltuix/bert-micro) and in the fine-tuning data — e.g., over-representation of certain threat types, vendor or tooling-specific vocabulary.
  • Should not be used as sole authority for incident decisions; only as an aid to human analysts.

4. Training Details

Text Processing & Chunking

Since cybersecurity data often contains lengthy alert descriptions and execution logs that exceed BERT's 512 token limit, we implement an overlapping chunking strategy:

  • Max sequence length: 512 tokens
  • Stride: 32 tokens (overlap between consecutive chunks)
  • Chunking behavior: Long texts are split into overlapping segments. For example, with max_length=512 and stride=128, a 1000-token document becomes ~3 chunks with 128-token overlaps, preserving context across boundaries.

Training Hyperparameters

  • Base model: boltuix/bert-micro
  • Training epochs: 3
  • Learning rate: 5e-05
  • Batch size: 16
  • Weight decay: 0.01
  • Warmup ratio: 0.06
  • Gradient accumulation steps: 1
  • Optimizer: AdamW
  • LR scheduler: Linear with warmup

Training Data

  • Total database rows: 240,570
  • Rows processed (cumulative): 128,974 (53.61%)
  • Training date: 2025-10-29 04:46:42

Post-Training Metrics

  • Final training loss:
  • Rows→Samples ratio:
Downloads last month
1,208
Safetensors
Model size
4.42M params
Tensor type
F32
·
Inference Providers NEW
Examples
Mask token: [MASK]

Model tree for codechrl/bert-micro-cybersecurity

Base model

boltuix/bert-micro
Finetuned
(1)
this model