Eigen-Banana-Qwen-Image-Edit: Fast Image Editing with Qwen-Image-Edit LoRA

⚡ Lightning Demo Website / 📄 Blog Post

Eigen-Banana-Qwen-Image-Edit is a LoRA (Low-Rank Adaptation) checkpoint for the Qwen-Image-Edit model, optimized for fast, high-quality image editing with text prompts. This model enables efficient text-guided image transformations with reduced inference steps while maintaining excellent quality.

Trained on the Pico-Banana-400K dataset from Apple—a large-scale collection of ~400K text–image–edit triplets covering 35 edit operations across diverse semantic categories—Eigen-Banana-Qwen-Image-Edit excels at a wide range of editing tasks from object manipulation to stylistic transformations.

Model Details

Base Model: Qwen/Qwen-Image-Edit
Model Type: LoRA Fine-tuned Diffusion Transformer
Training Dataset: Pico-Banana-400K
Training Method: EigenTrain (LoRA fine-tuning)
Format: FP16 SafeTensors
License: Apache 2.0
Use Cases: Text-guided image editing, style transfer, object modification, scene transformation

Features

✨ Fast Inference: Optimized for quick generation with distilled knowledge
🎨 High Quality: Maintains excellent visual quality with fewer steps
🌐 Multilingual: Supports both English and Chinese prompts
⚡ Efficient: Lightweight LoRA weights for easy deployment

Training Dataset

This model was trained on Pico-Banana-400K, a large-scale dataset of ~400K text–image–edit triplets designed for text-guided image editing research.

Dataset Highlights

~257K single-turn text–image–edit triplets for supervised fine-tuning
35 edit operations across 8 semantic categories:
- Object-Level Semantic (35%): Add, remove, replace, or relocate objects
- Scene Composition & Multi-Subject (20%): Contextual transformations
- Human-Centric (18%): Clothing, expression, appearance edits
- Stylistic (10%): Domain and artistic style transfer
- Text & Symbol (8%): Edits involving visible text or signs
- Pixel & Photometric (5%): Brightness, contrast, tonal adjustments
- Scale & Perspective (2%): Zoom, viewpoint changes
- Spatial/Layout (2%): Outpainting, composition extension
Source Images: Open Images Dataset
Instruction Generation: Gemini-2.5-Flash for natural-language editing prompts
Quality Control: Automated self-evaluation using Gemini-2.5-Pro

Training Methodology

The model was fine-tuned using EigenTrain, a training platform that unifies SFT, offline RL, and online RL for training text LLMs and VLMs, and includes first-class workflows for multimodal image/video generation. Here, we used EigenTrain to do LoRA fine-tuning on Qwen-Image-Edit. Key training parameters:

LoRA Rank: 32
Target Modules: to_q, to_k, to_v, add_q_proj, add_k_proj, add_v_proj, to_out.0, to_add_out, img_mlp.net.2, img_mod.1, txt_mlp.net.2, txt_mod.1
Learning Rate: 1e-4
Training Data: ~257K high-quality text-image-edit triplets
Precision: FP16 for efficient deployment

This combination of high-quality training data and efficient LoRA adaptation enables fast, accurate image editing while maintaining the base model's strong capabilities.

Demo Images

We present several examples to qualitatively compare the original qwen-image-edit and our eigen-banana-qwen-image-edit.

Example 1 – Add a new object to the scene

example1-input

Prompt: Integrate a minimalist, dark-toned, rectangular gallery bench into the mid-ground, positioned slightly to the right of the central pillar and facing the right wall, ensuring its texture, lighting, and subtle shadows are consistent with the existing black and white aesthetic and diffused ambient light of the art gallery.

Outputs