jonahkall commited on
Commit
4c48175
·
verified ·
1 Parent(s): 4c346eb

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -207
README.md CHANGED
@@ -1,207 +1,9 @@
1
- # ether0 Reward Model
2
-
3
- [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Future-House/ether0)
4
- [![arXiv](https://img.shields.io/badge/arXiv-2506.17238-b31b1b.svg)](https://arxiv.org/abs/2506.17238)
5
- [![Project Status: Active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
6
- ![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)
7
-
8
- [![Tests](https://github.com/Future-House/ether0/actions/workflows/lint-test.yaml/badge.svg)](https://github.com/Future-House/ether0/actions)
9
- [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
10
- [![python](https://img.shields.io/badge/python-3.11+-blue?style=flat&logo=python&logoColor=white)](https://www.python.org)
11
- [![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-md-dark.svg)](https://huggingface.co/futurehouse/ether0)
12
- [![Dataset on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/dataset-on-hf-md-dark.svg)](https://huggingface.co/datasets/futurehouse/ether0-benchmark)
13
-
14
- ![ether0 logo](docs/assets/ether0_logo.svg)
15
-
16
- _ether0: a scientific reasoning model, dataset, and reward functions for chemistry._
17
-
18
- This repo contains the reward model for evaluating ether0 and similar models,
19
- along with utilities for working with the verifiable rewards in
20
- [our benchmark](https://huggingface.co/datasets/futurehouse/ether0-benchmark).
21
-
22
- ## Overview
23
-
24
- ether0 is a reasoning language model post-trained through a loop of:
25
-
26
- 1. Supervised fine-tuning (SFT) on long chain-of-thought reasoning traces,
27
- to elicit reasoning from a base model.
28
- 2. Reinforcement learning with verifiable rewards (RLVR)
29
- to improve reasoning on focused task groups, at their own pace.
30
- These multitask learned models are referred to as 'specialists'.
31
- 3. Rejection sampling to filter specialists' reasoning
32
- for correctness and quality.
33
- 4. SFT on the base model again to make a 'generalist' reasoning model.
34
- 5. RLVR to recover any lost performance and push further in an all-task setting.
35
-
36
- ![ether0 training info](docs/assets/training_info.png)
37
-
38
- ### Repo Structure
39
-
40
- This repo contains several packages:
41
-
42
- - `ether0`: reward functions, `rdkit` data utilities,
43
- dataset generation prompts, dataset data models,
44
- language model training prompts, and data models.
45
- - `ether0.remotes`: server code for ether0 reward functions involving
46
- exotic packages and/or third party models.
47
-
48
- > [!NOTE]
49
- > This repo does not contain training code,
50
- > although you can find open source repositories like [NeMo-RL](https://github.com/NVIDIA/NeMo-RL)
51
- > or [Hugging Face TRL](https://github.com/huggingface/trl)
52
- > that can do the SFT and RL phases of training.
53
-
54
- ### Open Weights
55
-
56
- Please see our open-source weights on Hugging Face:
57
- <https://huggingface.co/futurehouse/ether0>
58
-
59
- ```python
60
- from transformers import AutoModelForCausalLM, AutoTokenizer
61
-
62
- model = AutoModelForCausalLM.from_pretrained("futurehouse/ether0")
63
- tokenizer = AutoTokenizer.from_pretrained("futurehouse/ether0")
64
- ```
65
-
66
- ### Open Test Set
67
-
68
- Please see our open-source benchmark (test set) on Hugging Face:
69
- <https://huggingface.co/datasets/futurehouse/ether0-benchmark>
70
-
71
- ```python
72
- from datasets import load_dataset
73
-
74
- test_ds = load_dataset("futurehouse/ether0-benchmark", split="test")
75
- ```
76
-
77
- ## Usage
78
-
79
- ### Installation
80
-
81
- The easiest way to get started is a `pip install` from GitHub:
82
-
83
- ```bash
84
- pip install git+https://github.com/Future-House/ether0.git
85
- ```
86
-
87
- Or if you want the full set up, clone the repo and use `uv`:
88
-
89
- ```bash
90
- git clone https://github.com/Future-House/ether0.git
91
- cd ether0
92
- uv sync
93
- ```
94
-
95
- ### Reward Functions
96
-
97
- Here is a basic example of how to use the reward functions:
98
-
99
- ```python
100
- from ether0.rewards import valid_mol_eval
101
-
102
- # Task: provide a valid completion of this molecule
103
- partial_smiles = "O=C(OC1C(OC(=O)C=2C=CC=CC2)C3(O)C(C)(C)CCCC3(C)C4CC=5OC=CC5C(C)C14"
104
-
105
- # Here's two model-proposed SMILES completions
106
- invalid_completion_smiles = "CCC"
107
- valid_completion_smiles = ")C=6C=CC=CC6"
108
-
109
- # Evaluate the completions
110
- assert not valid_mol_eval(invalid_completion_smiles, partial_smiles)
111
- assert valid_mol_eval(valid_completion_smiles, partial_smiles)
112
- ```
113
-
114
- ### Visualization
115
-
116
- If it helps, you can visualize the molecules:
117
-
118
- ```python
119
- from ether0.data import draw_molecule
120
-
121
- # See above reward functions demo for where these came from
122
- partial_smiles = "O=C(OC1C(OC(=O)C=2C=CC=CC2)C3(O)C(C)(C)CCCC3(C)C4CC=5OC=CC5C(C)C14"
123
- invalid_completion_smiles = "CCC"
124
- valid_completion_smiles = ")C=6C=CC=CC6"
125
-
126
- valid_mol_text = draw_molecule(partial_smiles + valid_completion_smiles)
127
- with open("valid_molecule.svg", "w") as f:
128
- f.write(valid_mol_text)
129
- ```
130
-
131
- The output of `draw_molecule` can also be easily visualized using `IPython.display`,
132
- or in your terminal via `chafa valid_molecule.svg`
133
- ([chafa docs](https://hpjansson.org/chafa/)).
134
-
135
- ![valid molecule](docs/assets/valid_molecule.svg)
136
-
137
- ### Benchmark
138
-
139
- Here is a sample baseline of
140
- [`ether0-benchmark`](https://huggingface.co/datasets/futurehouse/ether0-benchmark)
141
- on `gpt-4o` using [`lmi`](https://github.com/Future-House/ldp/tree/main/packages/lmi).
142
- To install `lmi`, please install `ether0` with the `baselines` extra
143
- (for example `uv sync --extra baselines`).
144
-
145
- We also need to run our remote rewards server via `ether0-serve`
146
- (for more information, see [`ether0.remotes` docs](packages/remotes/README.md)):
147
-
148
- ```bash
149
- ETHER0_REMOTES_API_TOKEN=abc123 ether0-serve
150
- ```
151
-
152
- Next, start `ipython` with the relevant environment variables set:
153
-
154
- ```bash
155
- ETHER0_REMOTES_API_BASE_URL="http://127.0.0.1:8000" ETHER0_REMOTES_API_TOKEN=abc123 \
156
- ipython
157
- ```
158
-
159
- And run the following Python code:
160
-
161
- ```python
162
- import itertools
163
- import statistics
164
- from collections import defaultdict
165
-
166
- from aviary.core import Message
167
- from datasets import load_dataset
168
- from lmi import LiteLLMModel
169
- from tqdm.asyncio import tqdm_asyncio as asyncio
170
-
171
- from ether0.data import get_problem_category
172
- from ether0.model_prompts import LOOSE_XML_ANSWER_USER_PROMPT, extract_answer_loose
173
- from ether0.models import RewardFunctionInfo
174
- from ether0.rewards import EVAL_FUNCTIONS
175
-
176
- # Add LLM prompt of your making to the dataset
177
- test_ds = load_dataset("futurehouse/ether0-benchmark", split="test").map(
178
- lambda x: {"prompt": "\n\n".join((LOOSE_XML_ANSWER_USER_PROMPT, x["problem"]))}
179
- )
180
-
181
- # Prompt to LLM
182
- model = LiteLLMModel(name="gpt-4o")
183
- results = await asyncio.gather(
184
- *(model.acompletion([Message(content=row["prompt"])]) for row in test_ds),
185
- desc="Running evaluation",
186
- )
187
-
188
- # Compute rewards
189
- per_category_rewards = defaultdict(list)
190
- for row, result in zip(test_ds, results, strict=True):
191
- # NOTE: you can also use `ether0.rewards.accuracy_reward`,
192
- # but we decided to go a bit "lower level" for this demo
193
- reward_info = RewardFunctionInfo.model_validate(row["solution"])
194
- yhat = extract_answer_loose(result[0].text)
195
- reward = EVAL_FUNCTIONS[reward_info.fxn_name](
196
- yhat=yhat, y=reward_info.answer_info, test=True
197
- )
198
- per_category_rewards[get_problem_category(reward_info.problem_type)].append(reward)
199
-
200
- for category, rewards in sorted(per_category_rewards.items()):
201
- print(
202
- f"In category {category!r} of {len(rewards)} questions,"
203
- f" average reward was {statistics.mean(rewards):.3f}."
204
- )
205
- accuracy = statistics.mean(itertools.chain.from_iterable(per_category_rewards.values()))
206
- print(f"Cumulative average reward across {len(test_ds)} questions was {accuracy:.3f}.")
207
- ```
 
1
+ title: Ether0 Inference Server
2
+ emoji: 🧞‍♂️
3
+ colorFrom: red
4
+ colorTo: blue
5
+ sdk: gradio
6
+ sdk_version: 5.44.0
7
+ app_file: app.py
8
+ pinned: false
9
+ short_description: Ether0