File size: 7,523 Bytes
4c346eb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
# ether0 Reward Model

[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Future-House/ether0)
[![arXiv](https://img.shields.io/badge/arXiv-2506.17238-b31b1b.svg)](https://arxiv.org/abs/2506.17238)
[![Project Status: Active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)

[![Tests](https://github.com/Future-House/ether0/actions/workflows/lint-test.yaml/badge.svg)](https://github.com/Future-House/ether0/actions)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![python](https://img.shields.io/badge/python-3.11+-blue?style=flat&logo=python&logoColor=white)](https://www.python.org)
[![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-md-dark.svg)](https://huggingface.co/futurehouse/ether0)
[![Dataset on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/dataset-on-hf-md-dark.svg)](https://huggingface.co/datasets/futurehouse/ether0-benchmark)

![ether0 logo](docs/assets/ether0_logo.svg)

_ether0: a scientific reasoning model, dataset, and reward functions for chemistry._

This repo contains the reward model for evaluating ether0 and similar models,
along with utilities for working with the verifiable rewards in
[our benchmark](https://huggingface.co/datasets/futurehouse/ether0-benchmark).

## Overview

ether0 is a reasoning language model post-trained through a loop of:

1. Supervised fine-tuning (SFT) on long chain-of-thought reasoning traces,
   to elicit reasoning from a base model.
2. Reinforcement learning with verifiable rewards (RLVR)
   to improve reasoning on focused task groups, at their own pace.
   These multitask learned models are referred to as 'specialists'.
3. Rejection sampling to filter specialists' reasoning
   for correctness and quality.
4. SFT on the base model again to make a 'generalist' reasoning model.
5. RLVR to recover any lost performance and push further in an all-task setting.

![ether0 training info](docs/assets/training_info.png)

### Repo Structure

This repo contains several packages:

- `ether0`: reward functions, `rdkit` data utilities,
  dataset generation prompts, dataset data models,
  language model training prompts, and data models.
- `ether0.remotes`: server code for ether0 reward functions involving
  exotic packages and/or third party models.

> [!NOTE]
> This repo does not contain training code,
> although you can find open source repositories like [NeMo-RL](https://github.com/NVIDIA/NeMo-RL)
> or [Hugging Face TRL](https://github.com/huggingface/trl)
> that can do the SFT and RL phases of training.

### Open Weights

Please see our open-source weights on Hugging Face:
<https://huggingface.co/futurehouse/ether0>

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("futurehouse/ether0")
tokenizer = AutoTokenizer.from_pretrained("futurehouse/ether0")
```

### Open Test Set

Please see our open-source benchmark (test set) on Hugging Face:
<https://huggingface.co/datasets/futurehouse/ether0-benchmark>

```python
from datasets import load_dataset

test_ds = load_dataset("futurehouse/ether0-benchmark", split="test")
```

## Usage

### Installation

The easiest way to get started is a `pip install` from GitHub:

```bash
pip install git+https://github.com/Future-House/ether0.git
```

Or if you want the full set up, clone the repo and use `uv`:

```bash
git clone https://github.com/Future-House/ether0.git
cd ether0
uv sync
```

### Reward Functions

Here is a basic example of how to use the reward functions:

```python
from ether0.rewards import valid_mol_eval

# Task: provide a valid completion of this molecule
partial_smiles = "O=C(OC1C(OC(=O)C=2C=CC=CC2)C3(O)C(C)(C)CCCC3(C)C4CC=5OC=CC5C(C)C14"

# Here's two model-proposed SMILES completions
invalid_completion_smiles = "CCC"
valid_completion_smiles = ")C=6C=CC=CC6"

# Evaluate the completions
assert not valid_mol_eval(invalid_completion_smiles, partial_smiles)
assert valid_mol_eval(valid_completion_smiles, partial_smiles)
```

### Visualization

If it helps, you can visualize the molecules:

```python
from ether0.data import draw_molecule

# See above reward functions demo for where these came from
partial_smiles = "O=C(OC1C(OC(=O)C=2C=CC=CC2)C3(O)C(C)(C)CCCC3(C)C4CC=5OC=CC5C(C)C14"
invalid_completion_smiles = "CCC"
valid_completion_smiles = ")C=6C=CC=CC6"

valid_mol_text = draw_molecule(partial_smiles + valid_completion_smiles)
with open("valid_molecule.svg", "w") as f:
    f.write(valid_mol_text)
```

The output of `draw_molecule` can also be easily visualized using `IPython.display`,
or in your terminal via `chafa valid_molecule.svg`
([chafa docs](https://hpjansson.org/chafa/)).

![valid molecule](docs/assets/valid_molecule.svg)

### Benchmark

Here is a sample baseline of
[`ether0-benchmark`](https://huggingface.co/datasets/futurehouse/ether0-benchmark)
on `gpt-4o` using [`lmi`](https://github.com/Future-House/ldp/tree/main/packages/lmi).
To install `lmi`, please install `ether0` with the `baselines` extra
(for example `uv sync --extra baselines`).

We also need to run our remote rewards server via `ether0-serve`
(for more information, see [`ether0.remotes` docs](packages/remotes/README.md)):

```bash
ETHER0_REMOTES_API_TOKEN=abc123 ether0-serve
```

Next, start `ipython` with the relevant environment variables set:

```bash
ETHER0_REMOTES_API_BASE_URL="http://127.0.0.1:8000" ETHER0_REMOTES_API_TOKEN=abc123 \
    ipython
```

And run the following Python code:

```python
import itertools
import statistics
from collections import defaultdict

from aviary.core import Message
from datasets import load_dataset
from lmi import LiteLLMModel
from tqdm.asyncio import tqdm_asyncio as asyncio

from ether0.data import get_problem_category
from ether0.model_prompts import LOOSE_XML_ANSWER_USER_PROMPT, extract_answer_loose
from ether0.models import RewardFunctionInfo
from ether0.rewards import EVAL_FUNCTIONS

# Add LLM prompt of your making to the dataset
test_ds = load_dataset("futurehouse/ether0-benchmark", split="test").map(
    lambda x: {"prompt": "\n\n".join((LOOSE_XML_ANSWER_USER_PROMPT, x["problem"]))}
)

# Prompt to LLM
model = LiteLLMModel(name="gpt-4o")
results = await asyncio.gather(
    *(model.acompletion([Message(content=row["prompt"])]) for row in test_ds),
    desc="Running evaluation",
)

# Compute rewards
per_category_rewards = defaultdict(list)
for row, result in zip(test_ds, results, strict=True):
    # NOTE: you can also use `ether0.rewards.accuracy_reward`,
    # but we decided to go a bit "lower level" for this demo
    reward_info = RewardFunctionInfo.model_validate(row["solution"])
    yhat = extract_answer_loose(result[0].text)
    reward = EVAL_FUNCTIONS[reward_info.fxn_name](
        yhat=yhat, y=reward_info.answer_info, test=True
    )
    per_category_rewards[get_problem_category(reward_info.problem_type)].append(reward)

for category, rewards in sorted(per_category_rewards.items()):
    print(
        f"In category {category!r} of {len(rewards)} questions,"
        f" average reward was {statistics.mean(rewards):.3f}."
    )
accuracy = statistics.mean(itertools.chain.from_iterable(per_category_rewards.values()))
print(f"Cumulative average reward across {len(test_ds)} questions was {accuracy:.3f}.")
```