marcel/phi-2-openhermes-30k

This model was converted to MLX format from microsoft/phi-2. Refer to the original model card for more details on the model.

Use with mlx

pip install mlx
git clone https://github.com/ml-explore/mlx-examples.git
cd mlx-examples/llms/hf_llm
python generate.py --model marcel/phi-2-openhermes-30k --prompt "My name is"

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "marcel/phi-2-openhermes-30k",
    low_cpu_mem_usage=True,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained("phi-2-openhermes-30k")

input_text = "### Human: Give me a good recipe for a chinese dish\n\n### Assistant:"

outputs = model.generate(
    tokenizer(input_text, return_tensors="pt").to(model.device)['input_ids'],
    max_length=1024,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	60.37
AI2 Reasoning Challenge (25-Shot)	61.01
HellaSwag (10-Shot)	74.72
MMLU (5-Shot)	57.17
TruthfulQA (0-shot)	45.38
Winogrande (5-shot)	74.90
GSM8k (5-shot)	49.05

Downloads last month: 5

Safetensors

Model size

3B params

Tensor type

F32

Dataset used to train marcel/phi-2-openhermes-30k

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

61.010
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

74.720
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

57.170
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

45.380
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

74.900
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

49.050

View on Papers With Code