license: apache-2.0
datasets:
  - honicky/hdfs-logs-encoded-blocks
  - Kingslayer5437/BGL
language:
  - en
metrics:
  - f1
  - precision
  - recall
  - roc_auc
base_model:
  - distilbert/distilbert-base-uncased
pipeline_tag: text-classification
library_name: transformers
tags:
  - log-analysis
  - anomaly-detection
  - bert
  - huggingface
model-index:
  - name: CloudOpsBERT (distributed-storage)
    results:
      - task:
          type: text-classification
          name: Anomaly Detection
        dataset:
          name: HDFS
          type: honicky/hdfs-logs-encoded-blocks
          split: test
        metrics:
          - type: f1
            value: 0.571
          - type: precision
            value: 0.992
          - type: recall
            value: 0.401
          - type: auroc
            value: 0.73
          - type: threshold
            value: 0.5
  - name: CloudOpsBERT (HPC)
    results:
      - task:
          type: text-classification
          name: Anomaly Detection
        dataset:
          name: BGL
          type: Kingslayer5437/BGL
          split: test
        metrics:
          - type: f1
            value: 1
          - type: precision
            value: 1
          - type: recall
            value: 1
          - type: auroc
            value: 1
          - type: threshold
            value: 0.05
CloudOpsBERT: Domain-Specific Language Models for Cloud Operations
CloudOpsBERT is an open-source project exploring domain-adapted transformer models for cloud operations log analysis β specifically anomaly detection, reliability monitoring, and cost optimization.
This project fine-tunes lightweight BERT variants (e.g., DistilBERT) on large-scale system log datasets (HDFS, BGL) and provides ready-to-use models for the research and practitioner community.
π Motivation
Modern cloud platforms generate massive amounts of logs. Detecting anomalies in these logs is crucial for:
- Ensuring reliability (catching failures early),
- Improving cost efficiency (identifying waste or misconfigurations),
- Supporting autonomous operations (AIOps).
Generic LLMs and BERT models are not optimized for this domain. CloudOpsBERT bridges that gap by:
- Training on real log datasets (HDFS, BGL),
- Addressing imbalanced anomaly detection with class weighting,
- Publishing open-source checkpoints for reproducibility.
π Inference (Pretrained)
Predict anomaly probability for a single log line:
python src/predict.py \
  --model_dir vaibhav2507/cloudops-bert \
  --subfolder distributed-storage \
  --text "ERROR dfs.DataNode: Lost connection to namenode"
Batch inference (file with one log line per row):
python src/predict.py \
  --model_dir vaibhav2507/cloudops-bert \
  --subfolder distributed-storage \
  --file samples/sample_logs.txt \
  --threshold 0.5 \
  --jsonl_out predictions.jsonl
π Results
- HDFS (in-domain, test set)- F1: 0.571
- Precision: 0.992
- Recall: 0.401
- AUROC: 0.730
- Threshold: 0.50 (tuneable)
 
- Cross-domain (HDFS β BGL)
- Performance degrades significantly due to dataset/domain shift (see paper).
- BGL (training in progress)
- Will be released as cloudops-bert (subfolder bgl) once full training is complete.
π¦ Models
- vaibhav2507/cloudops-bert (Hugging Face Hub)- subfolder="distributed-storage" β HDFS-trained CloudOpsBERT
- subfolder="hpc" β BGL-trained CloudOpsBERT
 
- Each export includes:- Model weights (pytorch_model.bin)
- Config with label mappings (normal, anomaly)
- Tokenizer files
 
π Quickstart (Scripts)
- Setup folders
bash scripts/setup_dirs.sh
- (optional) Download a local copy of a submodel from Hugging Face
bash scripts/fetch_pretrained.sh                # downloads 'hdfs' by default
SUBFOLDER=bgl bash scripts/fetch_pretrained.sh  # downloads 'bgl'
- Single-line prediction (directly from HF)
bash scripts/predict_line.sh "ERROR dfs.DataNode: Lost connection to namenode" hdfs
- Batch prediction (using local model folder)
bash scripts/make_sample_logs.sh
bash scripts/predict_file.sh samples/sample_logs.txt hdfs models/cloudops-bert-hdfs preds/preds_hdfs.jsonl
π Related Work
Several prior works have explored using BERT for log anomaly detection:
- Leveraging BERT and Hugging Face Transformers for Log Anomaly Detection
- Tutorial-style blog post demonstrating how to fine-tune BERT on log data with Hugging Face. Useful as an introduction, but not intended as a reproducible research artifact.
LogBERT (HelenGuohx/logbert)
- Academic prototype from ~2019β2020 focusing on modeling log sequences with BERT. Demonstrates feasibility but limited to in-domain experiments and lacks integration with modern Hugging Face tooling.
AnomalyBERT (Jhryu30/AnomalyBERT)
- Another exploratory repository showing BERT-based anomaly detection on logs, with dataset-specific preprocessing. Similar limitations in generalization and reproducibility.
π How CloudOpsBERT is different
- Domain-specific adaptation: explicitly trained for cloud operations logs (HDFS, BGL) with class-weighted loss.
- Cross-domain evaluation: includes in-domain and cross-domain benchmarks, highlighting generalization challenges.
- Reproducibility & usability: clean repo, scripts, and ready-to-use Hugging Face exports.
- Future directions: introduces MicroLM β compressed micro-language models for efficient edge/cloud hybrid inference.
- In short: previous work showed that βBERT can work for logs.β
- CloudOpsBERT operationalizes this idea into reproducible benchmarks, public models, and deployable tools for both researchers and practitioners.
π Citation
If you use CloudOpsBERT in your research or tools, please cite:
@misc{pandey2025cloudopsbert,
  title={CloudOpsBERT: Domain-Specific Transformer Models for Cloud Operations Anomaly Detection},
  author={Pandey, Vaibhav},
  year={2025},
  howpublished={GitHub, Hugging Face},
  url={https://github.com/vaibhav-research/cloudops-bert}
}
