File size: 5,881 Bytes
			
			| 73b7421 b1af3fd 73b7421 b1af3fd 73b7421 0920d88 73b7421 0920d88 73b7421 0920d88 73b7421 0920d88 73b7421 19d3f30 5470541 19d3f30 868d8f8 19d3f30 868d8f8 ea746aa 19d3f30 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | ---
license: apache-2.0
datasets:
- honicky/hdfs-logs-encoded-blocks
- Kingslayer5437/BGL
language:
- en
metrics:
- f1
- precision
- recall
- roc_auc
base_model:
- distilbert/distilbert-base-uncased
pipeline_tag: text-classification
library_name: transformers
tags:
- log-analysis
- anomaly-detection
- bert
- huggingface
model-index:
- name: CloudOpsBERT (distributed-storage)
  results:
  - task:
      type: text-classification
      name: Anomaly Detection
    dataset:
      name: HDFS
      type: honicky/hdfs-logs-encoded-blocks
      split: test
    metrics:
    - type: f1
      value: 0.571
    - type: precision
      value: 0.992
    - type: recall
      value: 0.401
    - type: auroc
      value: 0.73
    - type: threshold
      value: 0.5
- name: CloudOpsBERT (HPC)
  results:
  - task:
      type: text-classification
      name: Anomaly Detection
    dataset:
      name: BGL
      type: Kingslayer5437/BGL
      split: test
    metrics:
    - type: f1
      value: 1.00
    - type: precision
      value: 1.00
    - type: recall
      value: 1.00
    - type: auroc
      value: 1.00
    - type: threshold
      value: 0.05
---
---
# CloudOpsBERT: Domain-Specific Language Models for Cloud Operations
CloudOpsBERT is an open-source project exploring **domain-adapted transformer models** for **cloud operations log analysis** β specifically anomaly detection, reliability monitoring, and cost optimization.
This project fine-tunes lightweight BERT variants (e.g., DistilBERT) on large-scale system log datasets (HDFS, BGL) and provides ready-to-use models for the research and practitioner community.
---
## π Motivation
Modern cloud platforms generate massive amounts of logs. Detecting anomalies in these logs is crucial for:
- Ensuring **reliability** (catching failures early),
- Improving **cost efficiency** (identifying waste or misconfigurations),
- Supporting **autonomous operations** (AIOps).
Generic LLMs and BERT models are not optimized for this domain. CloudOpsBERT bridges that gap by:
- Training on **real log datasets** (HDFS, BGL),
- Addressing **imbalanced anomaly detection** with class weighting,
- Publishing **open-source checkpoints** for reproducibility.
---
## π Inference (Pretrained)
Predict anomaly probability for a single log line:
```
python src/predict.py \
  --model_dir vaibhav2507/cloudops-bert \
  --subfolder distributed-storage \
  --text "ERROR dfs.DataNode: Lost connection to namenode"
```
Batch inference (file with one log line per row):
```
python src/predict.py \
  --model_dir vaibhav2507/cloudops-bert \
  --subfolder distributed-storage \
  --file samples/sample_logs.txt \
  --threshold 0.5 \
  --jsonl_out predictions.jsonl
```
## π Results
* HDFS (in-domain, test set)
  * F1: 0.571
  * Precision: 0.992
  * Recall: 0.401
  * AUROC: 0.730
  * Threshold: 0.50 (tuneable)
- Cross-domain (HDFS β BGL)
- Performance degrades significantly due to dataset/domain shift (see paper).
- BGL (training in progress)
- Will be released as cloudops-bert (subfolder bgl) once full training is complete.
## π¦ Models
* vaibhav2507/cloudops-bert (Hugging Face Hub)
  * subfolder="distributed-storage" β HDFS-trained CloudOpsBERT
  * subfolder="hpc" β BGL-trained CloudOpsBERT
* Each export includes:
  * Model weights (pytorch_model.bin)
  * Config with label mappings (normal, anomaly)
  * Tokenizer files
## π Quickstart (Scripts)
 1) Setup folders
```
bash scripts/setup_dirs.sh
```
 2) (optional) Download a local copy of a submodel from Hugging Face
```
bash scripts/fetch_pretrained.sh                # downloads 'hdfs' by default
SUBFOLDER=bgl bash scripts/fetch_pretrained.sh  # downloads 'bgl'
```
 3) Single-line prediction (directly from HF)
```
bash scripts/predict_line.sh "ERROR dfs.DataNode: Lost connection to namenode" hdfs
```
 4) Batch prediction (using local model folder)
```
bash scripts/make_sample_logs.sh
bash scripts/predict_file.sh samples/sample_logs.txt hdfs models/cloudops-bert-hdfs preds/preds_hdfs.jsonl
```
## π Related Work
Several prior works have explored using BERT for log anomaly detection:
- Leveraging BERT and Hugging Face Transformers for Log Anomaly Detection
- Tutorial-style blog post demonstrating how to fine-tune BERT on log data with Hugging Face. Useful as an introduction, but not intended as a reproducible research artifact.
LogBERT (HelenGuohx/logbert)
- Academic prototype from ~2019β2020 focusing on modeling log sequences with BERT. Demonstrates feasibility but limited to in-domain experiments and lacks integration with modern Hugging Face tooling.
  
AnomalyBERT (Jhryu30/AnomalyBERT)
- Another exploratory repository showing BERT-based anomaly detection on logs, with dataset-specific preprocessing. Similar limitations in generalization and reproducibility.
## π How CloudOpsBERT is different
- Domain-specific adaptation: explicitly trained for cloud operations logs (HDFS, BGL) with class-weighted loss.
- Cross-domain evaluation: includes in-domain and cross-domain benchmarks, highlighting generalization challenges.
- Reproducibility & usability: clean repo, scripts, and ready-to-use Hugging Face exports.
- Future directions: introduces MicroLM β compressed micro-language models for efficient edge/cloud hybrid inference.
- In short: previous work showed that βBERT can work for logs.β
- CloudOpsBERT operationalizes this idea into reproducible benchmarks, public models, and deployable tools for both researchers and practitioners.
## π Citation
If you use CloudOpsBERT in your research or tools, please cite:
```
@misc{pandey2025cloudopsbert,
  title={CloudOpsBERT: Domain-Specific Transformer Models for Cloud Operations Anomaly Detection},
  author={Pandey, Vaibhav},
  year={2025},
  howpublished={GitHub, Hugging Face},
  url={https://github.com/vaibhav-research/cloudops-bert}
}
```
 | 
