AI & ML interests

None defined yet.

Recent Activity

IliaLarchenkoĀ 
posted an update 18 days ago
view post
Post
1048
šŸ† BEHAVIOR Challenge 1st Place – Solution Summary

My team recently won 1st place in the BEHAVIOR Challenge at NeurIPS.
The competition focused on training a single policy to complete 50 long-horizon household tasks in simulation.

We built an end-to-end policy based on Pi0.5 with a bunch of custom modifications. Everything is open-sourced, and it should be useful for anyone exploring VLAs or adapting them to specific tasks.

Key Architecture Changes:
- Replaced language model with 50 trainable task embeddings (no text at all)
- Correlated noise for Flow Matching: ϵ ∼ N(0, 0.5I + 0.5Σ) using dataset action covariance
- Learnable mixed-layer attention: each action expert layer attends to a trainable mix of all VLM layers
- System 2 stage tracking: model predicts task stage, we smooth it with voting and feed it back as context

Training:
- Multi-sample Flow Matching: 15 FM samples per VLM pass to reduce gradient variance
- Delta action space + per-timestamp normalization
- FAST auxiliary loss and stage prediction loss
- Trained on 224Ɨ224 RGB + proprioception only
- We use 4 fine-tuned checkpoints, all derived from a multi-task model trained on all 50 tasks

Inference Optimizations:
- Soft inpainting: predict 30 actions, execute 26, use 4 as an input for the next chunk
- Correlation-aware guidance of inpainting to keep action chunks smooth
- 1.3Ɨ speedup via cubic spline compression
- General correction rule: reopen gripper after failed grasps

šŸ”— Code and Models:
- Code: https://github.com/IliaLarchenko/behavior-1k-solution
- Weights: IliaLarchenko/behavior_submission
- Paper: Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge (2512.06951)
IliaLarchenkoĀ 
posted an update 11 months ago
view post
Post
3181
I am presenting Decoder-Only Transformer (DOT) Policy a simple Behavioral Control policy that outperforms SOTA models on two simple benchmark tasks:

āœ… PushT (pushing an object to a goal) – 84% success on keypoints, 74% on images (previous best: 75% / 69%)
āœ… ALOHA Insert (precise bimanual insertion) – 30% success (previous best: ~21%)

The best part? DOT is much smaller (sometimes 100 times less parameters) than previous SOTA models, trains faster, and avoids complexity:
🚫 No generative models (Diffusion, VAE, GANs)
🚫 No discretization/tokenization of actions
🚫 No reinforcement learning or multi-stage training
āœ… Just learns from human demos, plain and simple

This is still early — more complex real-life tasks need testing, and no guarantees it will actually work well there, but I think it's interesting to share. Sometimes, simpler approaches can be just as effective (or even better) than complex ones.

šŸ”— Open-source code and detailed description: https://github.com/IliaLarchenko/dot_policy

Trained models on Hugging Face:
IliaLarchenko/dot_pusht_keypoints
IliaLarchenko/dot_pusht_images
IliaLarchenko/dot_bimanual_insert