bert-micro-cybersecurity
1. Model Details
Model description
"bert-micro-cybersecurity" is a compact transformer model adapted for cybersecurity text classification tasks (e.g., threat detection, incident reports, malicious vs benign content).
- Model type: fine-tuned lightweight BERT variant
- Languages: English & Indonesia
- Finetuned from:
boltuix/bert-micro - Status: Early version — trained on 53.61% of planned data.
Model sources
- Base model: boltuix/bert-micro
- Data: Cybersecurity Data
2. Uses
Direct use
You can use this model to classify cybersecurity-related text — for example, whether a given message, report or log entry indicates malicious intent, abnormal behaviour, or threat presence.
Downstream use
- Embedding extraction for clustering.
- Named Entity Recognition on log or security data.
- Classification of security data.
- Anomaly detection in security logs.
- As part of a pipeline for phishing detection, malicious email filtering, incident triage.
- As a feature extractor feeding a downstream system (e.g., alert-generation, SOC dashboard).
Out-of-scope use
- Not meant for high-stakes automated blocking decisions without human review.
- Not optimized for languages other than English and Indonesian.
- Not tested for non-cybersecurity domains or out-of-distribution data.
Downstream Usecase in Development using this model
- NER on security log, botnet data, and json data.
- Early classification of SIEM alert & events.
3. Bias, Risks, and Limitations
Because the model is based on a small subset (53.61%) of planned data, performance is preliminary and may degrade on unseen or specialized domains (industrial control, IoT logs, foreign language).
- Inherits any biases present in the base model (
boltuix/bert-micro) and in the fine-tuning data — e.g., over-representation of certain threat types, vendor or tooling-specific vocabulary. - Should not be used as sole authority for incident decisions; only as an aid to human analysts.
4. Training Details
Text Processing & Chunking
Since cybersecurity data often contains lengthy alert descriptions and execution logs that exceed BERT's 512 token limit, we implement an overlapping chunking strategy:
- Max sequence length: 512 tokens
- Stride: 32 tokens (overlap between consecutive chunks)
- Chunking behavior: Long texts are split into overlapping segments. For example, with max_length=512 and stride=128, a 1000-token document becomes ~3 chunks with 128-token overlaps, preserving context across boundaries.
Training Hyperparameters
- Base model:
boltuix/bert-micro - Training epochs: 3
- Learning rate: 5e-05
- Batch size: 16
- Weight decay: 0.01
- Warmup ratio: 0.06
- Gradient accumulation steps: 1
- Optimizer: AdamW
- LR scheduler: Linear with warmup
Training Data
- Total database rows: 240,570
- Rows processed (cumulative): 128,974 (53.61%)
- Training date: 2025-10-29 04:46:42
Post-Training Metrics
- Final training loss:
- Rows→Samples ratio:
- Downloads last month
- 1,208
Model tree for codechrl/bert-micro-cybersecurity
Base model
boltuix/bert-micro