You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for neg_aware_qwen

This model is a fine-tuned version of Qwen/Qwen2.5-VL-3B-Instruct. It has been trained using TRL on zai-org/VisionRewardDB-Image. It is aimming for evaluating image's quality similar to VisionReward but with a much smaller and faster model.

Quick start

To use this model, you can just ask the model on an image using one of the dim in VisionReward.

from datasets import load_dataset

# load dataset, you can also use any images
train_ds = load_dataset("zai-org/VisionRewardDB-Image", split='train[:40000]')
test_ds = load_dataset("zai-org/VisionRewardDB-Image", split='train[40000:]')

from transformers import pipeline
pipe = pipeline("image-text-to-text", model="weathon/qwen_2_5_vision_reward")

from transformers import pipeline
import pandas as pd
df = pd.read_csv("rules.csv")
import pandas as pd
import re
from PIL import Image

df.columns = df.columns.str.strip()
df['Dimension'] = df['Dimension'].ffill()

df['dim_key'] = df['Dimension'].apply(lambda x: re.search(r'\((.*?)\)', x).group(1) if re.search(r'\((.*?)\)', x) else x)

guide = {
    dim_key: {
        int(row['Score']): str(row['Description']).strip()
        for _, row in group.iterrows()
    }
    for dim_key, group in df.groupby('dim_key')
}

question = f"You need to rate the quality of an image, guideline: {guide}."

import json
def rate(image):
  messages = [
      {
          "role": "system",
          "content": [{"type": "text", "text": question}],
      },
      {
          "role": "user",
          "content": [
              {
                  "type": "image",
                  "image": image.resize((512, 512)),
              }
          ],
  }]
  gen = pipe(text=messages, return_full_text=False)
  return sum(json.loads(gen[0]["generated_text"].replace("'", '"')).values())

rate(test_ds[3]["image"])

sum(test_ds[3]["annotation"].values())

This model was trained with SFT.

Framework versions

  • TRL: 0.24.0.dev0
  • Transformers: 4.56.1
  • Pytorch: 2.8.0+cu126
  • Datasets: 4.0.0
  • Tokenizers: 0.22.0

Citations

We are still working on the Paper, please keep an eye on the update.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for weathon/qwen_2_5_vision_reward_long

Finetuned
(525)
this model

Dataset used to train weathon/qwen_2_5_vision_reward_long