Fine Tuning Openvla Oft

Name: Fine Tuning Openvla Oft
Author: orchestra-research

orchestra-research/ai-research-skills

340 installs
11.2k repo stars
Updated June 16, 2026
orchestra-research/ai-research-skills

fine-tuning-openvla-oft is a research agent skill that fine-tunes OpenVLA-OFT+ on ALOHA demonstrations and runs server-client robot inference for developers building embodied AI manipulation policies.

About

fine-tuning-openvla-oft is an orchestra-research/ai-research-skills guide for OpenVLA-OFT+ training and real-robot evaluation on the ALOHA stack. The workflow uses server-client inference: a server machine hosts the VLA model behind a /act endpoint via uvicorn and FastAPI, while a client machine controls the robot environment and requests actions. Setup creates separate conda environments with Python 3.10, installing torch, torchvision, torchaudio, and project dependencies on both sides. Developers reach for this skill when building embodied AI or manipulation policies that require fine-tuning on ALOHA demonstrations rather than generic LLM chat workflows. Outputs include configured training and inference environments, server endpoints, and client control paths for policy evaluation on physical or simulated robots.

Dual conda stacks: OpenVLA-OFT server (FastAPI /act) plus ALOHA client robot control
Preprocess and split raw ALOHA demos, then convert to unified RLDS via external builder flow
Register datasets in OXE configs, transforms, and mixtures; tune NUM_ACTIONS_CHUNK in constants
Server-client inference: VLA on server, environment and action requests on client machine

Fine Tuning Openvla Oft by the numbers

340 all-time installs (skills.sh)
+35 installs in the week ending Jul 18, 2026 (Skillselion tracking)
Ranked #544 of 2,066 Data Science & ML skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/orchestra-research/ai-research-skills --skill fine-tuning-openvla-oft

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/orchestra-research/ai-research-skills/fine-tuning-openvla-oft.svg)](https://skillselion.com/skills/orchestra-research/ai-research-skills/fine-tuning-openvla-oft)

Installs	340
repo stars	★ 11.2k
Security audit	2 / 3 scanners passed
Last updated	June 16, 2026
Repository	orchestra-research/ai-research-skills ↗

How do you fine-tune OpenVLA-OFT on ALOHA robots?

Fine-tune OpenVLA-OFT+ on ALOHA demonstrations and run server-client robot inference when building embodied AI or manipulation policies.

Who is it for?

ML engineers building embodied manipulation policies who need OpenVLA-OFT+ fine-tuning and ALOHA server-client inference wiring.

Skip if: Developers working on text-only LLM apps, web frontends, or robotics projects outside the OpenVLA-OFT+ and ALOHA demonstration stack.

When should I use this skill?

The user asks to fine-tune OpenVLA-OFT+, train on ALOHA demonstrations, set up VLA server-client inference, or configure /act endpoints for robot control.

What you get

Trained OpenVLA-OFT+ weights, FastAPI /act server, ALOHA client environment, and conda Python 3.10 setups

fine-tuned OpenVLA-OFT+ weights
FastAPI inference server
ALOHA client control setup

By the numbers

Uses Python 3.10 conda environments on server and client machines
Exposes VLA inference through a FastAPI /act endpoint via uvicorn

Files

SKILL.mdMarkdownGitHub ↗

OpenVLA-OFT

Fine-tuning and evaluation workflows for OpenVLA-OFT and OpenVLA-OFT+ from the official openvla-oft codebase. Covers blank-machine setup plus LoRA-based adaptation of OpenVLA for robot action generation with continuous action prediction heads.

Quick start

Clone the public repo, follow the official setup, then evaluate a pretrained LIBERO checkpoint:

git clone https://github.com/moojink/openvla-oft.git
cd openvla-oft
python experiments/robot/libero/run_libero_eval.py \
  --pretrained_checkpoint moojink/openvla-7b-oft-finetuned-libero-spatial \
  --task_suite_name libero_spatial \
  --center_crop True \
  --num_trials_per_task 50 \
  --seed 7

Core concepts

What OpenVLA-OFT changes: Standard OpenVLA tokenizes continuous actions into discrete bins, losing precision. OFT replaces this with dedicated continuous action heads (L1 regression or diffusion) while keeping the VLA backbone frozen and adapting via LoRA.

OFT vs OFT+ variants:

Variant	FiLM	Images	Typical use
OFT	Off	2 (front + wrist)	LIBERO simulation
OFT+	On	3 (high + left + right wrist)	ALOHA real-world

Key architecture choices:

LoRA adaptation: Rank-32 LoRA on VLA backbone (no full fine-tuning needed)
Continuous actions: L1 regression head (default) or diffusion head
FiLM conditioning: Feature-wise Linear Modulation for stronger language grounding in OFT+
Multi-image input: Configurable 2 or 3 camera streams via num_images_in_input

Compute requirements

Task	GPU	VRAM	Notes
LIBERO evaluation	1x A100/A40	~16 GB	Single GPU
ALOHA evaluation	1x A100/A40	~18 GB	Single GPU
LIBERO fine-tuning	8x A100	~27 GB/GPU	Paper default
ALOHA fine-tuning (OFT+)	8x A100	~35 GB/GPU	FiLM + 3 images
LoRA merge	1x any GPU	~16 GB	One-time step

Expected performance benchmarks

Official results (paper setup, seed=7, 50 trials per task):

Task Suite	Task-Specific	Combined Policy	Notes
LIBERO-Spatial	97.2%	96.8%	Easiest suite
LIBERO-Object	97.4%	97.0%	Object manipulation
LIBERO-Goal	95.8%	95.4%	May peak at 50k-100k steps
LIBERO-10	98.0%	98.0%	Long-horizon tasks
Average	97.1%	96.8%	Near-equivalent

Reproduction notes: results are tied to Python 3.10.14, PyTorch 2.2.0, NVIDIA A100, and custom Transformers fork.

When to use vs alternatives

Use OpenVLA-OFT when:

The target task is robot action generation with visual and language conditioning
LoRA-based adaptation of openvla/openvla-7b is preferred
You need official LIBERO or ALOHA workflows from the OpenVLA-OFT paper
You want continuous action heads (L1 regression or diffusion) instead of tokenized actions

Use alternatives when:

You need a different VLA architecture (use fine-tuning-serving-openpi for pi0/pi0.5 models)
You need the NVIDIA Cosmos Policy stack (use evaluating-cosmos-policy)
You need general LLM fine-tuning without robot action heads

---

Workflow 1: Set up environment

Copy this checklist and track progress:

Setup Progress:
- [ ] Step 1: Create conda env and install PyTorch
- [ ] Step 2: Install openvla-oft package in editable mode
- [ ] Step 3: Install FlashAttention2
- [ ] Step 4: Verify critical versions

Step 1: Create conda env and clone repo

conda create -n openvla-oft python=3.10 -y
conda activate openvla-oft
git clone https://github.com/moojink/openvla-oft.git
cd openvla-oft
pip3 install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0
pip3 install robosuite==1.4.0

Step 2: Install package

pip install -e .

Step 3: Install FlashAttention2

pip install packaging ninja
pip install "flash-attn==2.5.5" --no-build-isolation

Step 4: Verify versions

import torch, transformers, peft
print(f"PyTorch: {torch.__version__}")         # Expected: 2.2.0
print(f"Transformers: {transformers.__version__}")
print(f"PEFT: {peft.__version__}")             # Expected: 0.11.1

---

Workflow 2: Evaluate pretrained checkpoints on LIBERO

LIBERO Eval Progress:
- [ ] Step 1: Install LIBERO dependencies
- [ ] Step 2: Choose checkpoint and task suite
- [ ] Step 3: Run evaluation
- [ ] Step 4: Parse and validate results

Step 1: Install LIBERO

git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
pip install -e LIBERO
pip install -r experiments/robot/libero/libero_requirements.txt

Step 2: Choose checkpoint

Checkpoint	Task suite
`moojink/openvla-7b-oft-finetuned-libero-spatial`	`libero_spatial`
`moojink/openvla-7b-oft-finetuned-libero-object`	`libero_object`
`moojink/openvla-7b-oft-finetuned-libero-goal`	`libero_goal`
`moojink/openvla-7b-oft-finetuned-libero-10`	`libero_10`
`moojink/openvla-7b-oft-finetuned-libero-spatial-object-goal-10`	Combined

Step 3: Run evaluation

python experiments/robot/libero/run_libero_eval.py \
  --pretrained_checkpoint moojink/openvla-7b-oft-finetuned-libero-spatial \
  --task_suite_name libero_spatial \
  --center_crop True \
  --num_trials_per_task 50 \
  --seed 7

Step 4: Parse results

import re

def parse_libero_log(log_path):
    """Extract per-task success rates from LIBERO eval log."""
    with open(log_path) as f:
        content = f.read()
    matches = re.findall(r"Task (.+?): (\d+)/(\d+) successes", content)
    for task, successes, trials in matches:
        rate = int(successes) / int(trials)
        print(f"  {task}: {rate:.0%} ({successes}/{trials})")

parse_libero_log("experiments/logs/latest.log")

---

Workflow 3: Fine-tune on LIBERO

Detailed reference: See references/libero-workflow.md for the full LIBERO setup, checkpoint selection strategy, and LoRA merge instructions.

LIBERO Fine-Tune Progress:
- [ ] Step 1: Prepare RLDS dataset
- [ ] Step 2: Launch torchrun with OFT defaults
- [ ] Step 3: Evaluate intermediate and final checkpoints
- [ ] Step 4: Merge LoRA for deployment if needed

Step 1: Dataset

Use RLDS datasets: libero_spatial_no_noops, libero_object_no_noops, libero_goal_no_noops, libero_10_no_noops.

Step 2: Launch training

torchrun --standalone --nnodes 1 --nproc-per-node 8 vla-scripts/finetune.py \
  --vla_path openvla/openvla-7b \
  --data_root_dir /PATH/TO/RLDS/DATASETS/ \
  --dataset_name libero_spatial_no_noops \
  --run_root_dir /YOUR/CHECKPOINTS/ \
  --use_l1_regression True \
  --use_diffusion False \
  --use_film False \
  --num_images_in_input 2 \
  --use_proprio True \
  --batch_size 8 \
  --learning_rate 5e-4 \
  --num_steps_before_decay 100000 \
  --max_steps 150005 \
  --save_freq 10000 \
  --save_latest_checkpoint_only False \
  --image_aug True \
  --lora_rank 32 \
  --wandb_entity YOUR_WANDB_ENTITY \
  --wandb_project YOUR_WANDB_PROJECT

Step 3: Evaluate checkpoints

Evaluate 50k, 100k, and 150k checkpoints — LIBERO-Goal may peak earlier than other suites. Keep best checkpoint per suite by actual task success, not only training loss.

Step 4: Merge LoRA

python vla-scripts/merge_lora_weights_and_save.py \
  --base_checkpoint openvla/openvla-7b \
  --lora_finetuned_checkpoint_dir /PATH/TO/CHECKPOINT_DIR

---

Workflow 4: Train and evaluate OpenVLA-OFT+ on ALOHA

Detailed reference: See references/aloha-workflow.md for the full ALOHA server-client setup, data preprocessing, dataset registration, and troubleshooting.

ALOHA Progress:
- [ ] Step 1: Preprocess raw ALOHA demonstrations
- [ ] Step 2: Convert to RLDS and register dataset configs
- [ ] Step 3: Fine-tune OFT+ with FiLM and 3 images
- [ ] Step 4: Start VLA server on GPU machine
- [ ] Step 5: Run client-side robot evaluation

Step 1: Preprocess raw data

python experiments/robot/aloha/preprocess_split_aloha_data.py \
  --dataset_path /path/to/aloha_raw/task_name/ \
  --out_base_dir /path/to/aloha_preprocessed/ \
  --percent_val 0.05

Step 2: Register RLDS dataset

Add entries in:

prismatic/vla/datasets/rlds/oxe/configs.py
prismatic/vla/datasets/rlds/oxe/transforms.py
prismatic/vla/datasets/rlds/oxe/mixtures.py

Set ALOHA constants in prismatic/vla/constants.py:

# Expected defaults for ALOHA
NUM_ACTIONS_CHUNK = 25        # Match control frequency (25 Hz)
ACTION_DIM = 14               # 7 joints x 2 arms
PROPRIO_DIM = 14
ACTION_PROPRIO_NORMALIZATION_TYPE = "BOUNDS"  # Absolute joint angles

Step 3: Fine-tune OFT+

torchrun --standalone --nnodes 1 --nproc-per-node 8 vla-scripts/finetune.py \
  --vla_path openvla/openvla-7b \
  --data_root_dir /PATH/TO/RLDS/DATASETS/ \
  --dataset_name aloha_task_name \
  --run_root_dir /YOUR/CHECKPOINTS/ \
  --use_l1_regression True \
  --use_diffusion False \
  --use_film True \
  --num_images_in_input 3 \
  --use_proprio True \
  --batch_size 4 \
  --learning_rate 5e-4 \
  --num_steps_before_decay 50000 \
  --max_steps 100005 \
  --use_val_set True \
  --val_freq 10000 \
  --save_freq 10000 \
  --lora_rank 32

Step 4: Start VLA server (GPU machine)

python vla-scripts/deploy.py \
  --pretrained_checkpoint /PATH/TO/FINETUNED/CHECKPOINT/ \
  --use_l1_regression True \
  --use_film True \
  --num_images_in_input 3 \
  --use_proprio True \
  --center_crop True \
  --unnorm_key aloha_task_name

Server listens on http://<server-ip>:8777/act.

Step 5: Run client evaluation

python experiments/robot/aloha/run_aloha_eval.py \
  --center_crop True \
  --num_open_loop_steps 25 \
  --use_vla_server True \
  --vla_server_url http://<SERVER_IP>:8777 \
  --num_rollouts_planned 50 \
  --max_steps 1500

---

Critical invariants

These flags must be consistent between training and inference. Mismatches cause silent failures:

Area	Required consistency	Failure if mismatched
Action head	`use_l1_regression` vs `use_diffusion`	Wrong head loading, invalid actions
FiLM	`use_film` across train/eval/deploy	Reduced language grounding
Image streams	`num_images_in_input` parity	Shape mismatch or performance drop
Proprio	`use_proprio` parity	State conditioning mismatch
LoRA rank	`lora_rank` parity	Adapter loading errors
Crop	`image_aug=True` in train → `center_crop=True` in eval	Significant success-rate drop
Action chunk	`num_open_loop_steps` ≈ `NUM_ACTIONS_CHUNK`	Latency/success tradeoff shifts
Unnorm key	`unnorm_key` present in checkpoint stats	Bad action scale

Quick validation:

# Verify config parity before long eval runs
train_flags = {"use_film": False, "num_images": 2, "use_proprio": True, "lora_rank": 32}
eval_flags  = {"use_film": False, "num_images": 2, "use_proprio": True, "lora_rank": 32}
for k in train_flags:
    assert train_flags[k] == eval_flags[k], f"Mismatch: {k}: {train_flags[k]} vs {eval_flags[k]}"
print("All flags consistent")

---

Common issues

Issue: Action quality drops after moving checkpoints across GPU types

Fix: re-merge LoRA adapter on the downstream device:

python vla-scripts/merge_lora_weights_and_save.py \
  --base_checkpoint openvla/openvla-7b \
  --lora_finetuned_checkpoint_dir /PATH/TO/CHECKPOINT_DIR

Issue: Wrong action scale or failed un-normalization

Fix: check --unnorm_key matches dataset statistics in checkpoint:

import torch
ckpt = torch.load("checkpoint/model.pt", map_location="cpu")
print("Available norm keys:", list(ckpt.get("norm_stats", {}).keys()))

Issue: Eval success unexpectedly low

Fix: verify all invariants in the table above. Most common culprit: missing center_crop=True when trained with image_aug=True.

Issue: LIBERO eval crashes with `EOFError` asking for dataset path

Fix: set LIBERO_CONFIG_PATH and write a non-interactive config before headless eval.

Issue: ALOHA client ROS import fails with `libffi` symbol errors

Fix: conda install -c conda-forge libffi

Issue: `flash-attn` install fails

Fix: export TMPDIR and PIP_CACHE_DIR to the same filesystem, retry with --no-cache-dir.

Issue: EGL teardown logs show `EGL_NOT_INITIALIZED`

Fix: treat as teardown noise unless exit code is non-zero. Set EGL env vars:

export MUJOCO_GL=egl PYOPENGL_PLATFORM=egl
export CUDA_VISIBLE_DEVICES=0 MUJOCO_EGL_DEVICE_ID=0

---

For HPC/cluster users

On Slurm clusters, route caches to scratch to avoid filling /home quota:

export HF_HOME=/scratch/$USER/.cache/huggingface
export XDG_CACHE_HOME=/scratch/$USER/.cache
export PIP_CACHE_DIR=/scratch/$USER/.cache/pip
export TMPDIR=/scratch/$USER/tmp

Avoid stacking cluster Python modules when using conda. Typically module load cuda is sufficient.

---

Advanced topics

Paper summary and checkpoints: See references/paper-and-checkpoints.md Detailed LIBERO workflow: See references/libero-workflow.md Detailed ALOHA workflow: See references/aloha-workflow.md Config map and troubleshooting matrix: See references/config-troubleshooting.md

Resources

Project website: https://openvla-oft.github.io/
Paper: https://arxiv.org/abs/2502.19645
Repository: https://github.com/moojink/openvla-oft
RLDS builder: https://github.com/moojink/rlds_dataset_builder

ALOHA Workflow

Scope

Use this guide for OpenVLA-OFT+ training and real-robot evaluation with the ALOHA stack.

The ALOHA path uses server-client inference:

Server machine hosts the VLA model and exposes /act.
Client machine controls robot env and requests actions from the server.

1) Prepare environments

Server-side environment:

conda create -n openvla-oft python=3.10 -y
conda activate openvla-oft
pip3 install torch torchvision torchaudio
pip install -e .
pip install uvicorn fastapi json-numpy

Client-side environment:

conda create -n openvla-oft-aloha python=3.10 -y
conda activate openvla-oft-aloha
pip3 install torch torchvision torchaudio
pip install -e .
pip install -r experiments/robot/aloha/requirements_aloha.txt

2) Preprocess and split raw demonstrations

python experiments/robot/aloha/preprocess_split_aloha_data.py \
  --dataset_path /path/to/aloha_raw/task_name/ \
  --out_base_dir /path/to/aloha_preprocessed/ \
  --percent_val 0.05

Repeat preprocessing per object/task variant, then convert to unified RLDS dataset using the RLDS builder flow.

RLDS builder reference: https://github.com/moojink/rlds_dataset_builder

3) Register dataset and constants

Add dataset entries in:

prismatic/vla/datasets/rlds/oxe/configs.py
prismatic/vla/datasets/rlds/oxe/transforms.py
prismatic/vla/datasets/rlds/oxe/mixtures.py

Set platform constants in prismatic/vla/constants.py:

Set NUM_ACTIONS_CHUNK to match control frequency (often 25 for 25 Hz).
Keep ALOHA normalization type for absolute joint-angle actions (BOUNDS).
Avoid clipping normalization for absolute-angle output.

4) Launch OFT+ training

torchrun --standalone --nnodes 1 --nproc-per-node 8 vla-scripts/finetune.py \
  --vla_path openvla/openvla-7b \
  --data_root_dir /PATH/TO/RLDS/DATASETS/ \
  --dataset_name aloha_task_name \
  --run_root_dir /YOUR/CHECKPOINTS/ \
  --use_l1_regression True \
  --use_diffusion False \
  --use_film True \
  --num_images_in_input 3 \
  --use_proprio True \
  --batch_size 4 \
  --learning_rate 5e-4 \
  --num_steps_before_decay 50000 \
  --max_steps 100005 \
  --use_val_set True \
  --val_freq 10000 \
  --save_freq 10000 \
  --save_latest_checkpoint_only False \
  --image_aug True \
  --lora_rank 32 \
  --wandb_entity YOUR_WANDB_ENTITY \
  --wandb_project YOUR_WANDB_PROJECT

High-impact knobs:

use_film=True for language grounding in OFT+.
num_images_in_input=3 for high + left wrist + right wrist streams.
LR decay timing relative to dataset size.

5) Deploy VLA server

On GPU server:

python vla-scripts/deploy.py \
  --pretrained_checkpoint /PATH/TO/FINETUNED/CHECKPOINT/ \
  --use_l1_regression True \
  --use_film True \
  --num_images_in_input 3 \
  --use_proprio True \
  --center_crop True \
  --unnorm_key aloha_task_name

Notes:

Default API endpoint: http://<server-ip>:8777/act
Ensure client can resolve vla_server_url.

6) Run client-side robot evaluation

python experiments/robot/aloha/run_aloha_eval.py \
  --center_crop True \
  --num_open_loop_steps 25 \
  --use_vla_server True \
  --vla_server_url http://<SERVER_IP>:8777 \
  --num_rollouts_planned 50 \
  --max_steps 1500

During rollout:

Script prompts operator to start.
Script asks for success label (y or n) after each rollout.
Logs and replay videos are saved locally.

7) Troubleshooting notes

ROS/libffi import issue on client:

conda install -c conda-forge libffi

Action quality issues:

Check server and training config parity (use_film, num_images_in_input, lora_rank).
Check unnorm_key against dataset stats.
Keep num_open_loop_steps aligned with trained chunk size.

Cross-device performance drop:

Merge LoRA on target hardware before final evaluation.

Configuration and Troubleshooting

Core files map

Training:

vla-scripts/finetune.py

Server deployment:

vla-scripts/deploy.py

LIBERO evaluation:

experiments/robot/libero/run_libero_eval.py

ALOHA evaluation:

experiments/robot/aloha/run_aloha_eval.py

Action/policy utilities:

experiments/robot/openvla_utils.py

Platform constants:

prismatic/vla/constants.py

High-risk configuration matrix

Area	Required consistency	Typical failure if mismatched
Action head mode	`use_l1_regression` vs `use_diffusion`	Wrong head loading, unstable or invalid action generation
FiLM usage	`use_film` in train/eval/deploy	Reduced language grounding, degraded policy quality
Image streams	`num_images_in_input` across train/eval/deploy	Shape mismatch or strong performance drop
Proprio input	`use_proprio` parity	State conditioning mismatch, action drift
LoRA rank	`lora_rank` parity	Adapter loading errors or wrong effective model
Crop behavior	`image_aug` in training implies `center_crop=True` in eval/deploy	Significant success-rate drop
Action chunk	`num_open_loop_steps` close to `NUM_ACTIONS_CHUNK`	Latency/success tradeoff shifts, lower success
Un-normalization key	`unnorm_key` present in checkpoint stats	Bad action scale or assertion failures

Constants behavior notes

prismatic/vla/constants.py auto-selects constants by command-line text (libero, aloha, bridge).

Implications:

If command path does not include expected platform tokens, constants may default to LIBERO.
For custom entrypoints or renamed scripts, verify selected platform constants in logs.

Expected defaults:

LIBERO: NUM_ACTIONS_CHUNK=8, ACTION_DIM=7, PROPRIO_DIM=8
ALOHA: NUM_ACTIONS_CHUNK=25, ACTION_DIM=14, PROPRIO_DIM=14

Sanity checks before long runs

Check package versions:

python -c "import torch, transformers, peft; print('torch', torch.__version__); print('transformers', transformers.__version__); print('peft', peft.__version__)"

Check detected constants in launch logs:

Using LIBERO constants: ... or Using ALOHA constants: ...

Dry-run one short evaluation before full benchmark:

python experiments/robot/libero/run_libero_eval.py \
  --pretrained_checkpoint moojink/openvla-7b-oft-finetuned-libero-spatial \
  --task_suite_name libero_spatial \
  --num_trials_per_task 2 \
  --seed 7

Frequent failures and precise fixes

Failure: `Action un-norm key ... not found in VLA norm_stats`

Cause: wrong unnorm_key or dataset stats not bundled with checkpoint.
Fix: use dataset-specific key and verify checkpoint directory contains normalization artifacts.

Failure: Large performance drop after moving from H100 to A100

Cause: merged adapter/model artifact mismatch across hardware/runtime stack.
Fix: re-merge LoRA on target machine, then evaluate with same runtime flags.

Failure: Poor LIBERO performance despite good training loss

Cause: eval config mismatch (center_crop, num_images_in_input, chunk settings).
Fix: align eval with paper-style inference defaults and verify constants output.

Failure: ALOHA client cannot query server

Cause: bad vla_server_url, networking, or server not running on 8777.
Fix: ensure vla-scripts/deploy.py is active, verify endpoint from client, check firewall and DNS.

Failure: ALOHA ROS import error with `libp11-kit` / `libffi`

Cause: binary dependency mismatch in client conda environment.
Fix: conda install -c conda-forge libffi

Decision hints for key training flags

Prefer use_l1_regression=True for the default paper-style OFT/OFT+ runs.
Enable use_film=True when tasks require stronger language grounding.
Keep use_diffusion=False unless intentionally exploring diffusion action heads.
Keep image_aug=True in training and center_crop=True in eval/deploy for consistency.

LIBERO Workflow

Scope

Use this guide for OpenVLA-OFT setup, evaluation, and fine-tuning on LIBERO simulation task suites.

Task suite names used by evaluator:

libero_spatial
libero_object
libero_goal
libero_10

1) Setup and dependencies

conda create -n openvla-oft python=3.10 -y
conda activate openvla-oft
pip3 install torch torchvision torchaudio
pip install -e .

git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
pip install -e LIBERO
pip install -r experiments/robot/libero/libero_requirements.txt

Optional dataset download from docs:

git clone git@hf.co:datasets/openvla/modified_libero_rlds

2) Evaluate official checkpoints

Example for LIBERO-Spatial:

python experiments/robot/libero/run_libero_eval.py \
  --pretrained_checkpoint moojink/openvla-7b-oft-finetuned-libero-spatial \
  --task_suite_name libero_spatial \
  --center_crop True \
  --num_trials_per_task 50 \
  --seed 7

Common changes:

--task_suite_name libero_object|libero_goal|libero_10
--num_trials_per_task for shorter sanity runs
--use_wandb True --wandb_project ... --wandb_entity ...

3) Fine-tune on LIBERO RLDS

Base recipe (paper-style command):

torchrun --standalone --nnodes 1 --nproc-per-node 8 vla-scripts/finetune.py \
  --vla_path openvla/openvla-7b \
  --data_root_dir /PATH/TO/RLDS/DATASETS/DIR/ \
  --dataset_name libero_spatial_no_noops \
  --run_root_dir /YOUR/CHECKPOINTS/AND/LOG/DIR/ \
  --use_l1_regression True \
  --use_diffusion False \
  --use_film False \
  --num_images_in_input 2 \
  --use_proprio True \
  --batch_size 8 \
  --learning_rate 5e-4 \
  --num_steps_before_decay 100000 \
  --max_steps 150005 \
  --save_freq 10000 \
  --save_latest_checkpoint_only False \
  --image_aug True \
  --lora_rank 32 \
  --wandb_entity YOUR_WANDB_ENTITY \
  --wandb_project YOUR_WANDB_PROJECT

Replace dataset_name with one of:

libero_spatial_no_noops
libero_object_no_noops
libero_goal_no_noops
libero_10_no_noops

4) Selection and validation strategy

Suggested checkpoint strategy:

Evaluate 50k, 100k, and 150k checkpoints.
Keep the best checkpoint per suite by actual task success, not only train loss.

Reason: docs report LIBERO-Goal may peak earlier than other suites.

Validation checks:

Confirm center_crop=True during eval if trained with image_aug=True.
Confirm num_open_loop_steps matches NUM_ACTIONS_CHUNK.
Confirm unnorm_key exists in model.norm_stats.

5) LoRA merge for deployment

Use this when serving or evaluating on different hardware:

python vla-scripts/merge_lora_weights_and_save.py \
  --base_checkpoint openvla/openvla-7b \
  --lora_finetuned_checkpoint_dir /PATH/TO/CHECKPOINT_DIR

If performance drops after migrating to a different GPU family:

Re-merge on target machine.
Re-run eval with matched runtime flags.

6) Logging locations

Default local logs: experiments/logs/
Training checkpoints: under run_root_dir
W&B (if enabled): user-defined entity/project

OpenVLA-OFT Paper and Checkpoints

Paper identity

Title: Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
Authors: Moo Jin Kim, Chelsea Finn, Percy Liang
Year: 2025
ArXiv: https://arxiv.org/abs/2502.19645
Project page: https://openvla-oft.github.io/
Summary video: https://youtu.be/T3Zkkr_NTSA

What OpenVLA-OFT changes

OpenVLA-OFT adapts OpenVLA for robot action generation with:

LoRA-based fine-tuning on VLA policies.
Continuous action prediction through dedicated action heads.
Optional FiLM conditioning for stronger language grounding (called OFT+ in ALOHA setup).
Multi-image and proprio input support via configurable model components.

Compute requirements from official docs

Inference:

LIBERO tasks: about 16 GB VRAM.
ALOHA tasks: about 18 GB VRAM.

Training:

1 to 8 GPUs, roughly 27 GB to 80 GB VRAM depending on batch size, feature toggles, and precision.

Reproduction-sensitive environment notes

For reported LIBERO numbers, docs recommend:

Python 3.10.14
PyTorch 2.2.0
OpenVLA-OFT custom Transformers fork (transformers-openvla-oft)
NVIDIA A100 when matching paper setup

If reproduction diverges, check:

Different GPU architecture
Dependency drift (torch, transformers, peft)
Inference mismatches (center_crop, action chunk settings, and un-normalization keys)

Official LIBERO checkpoints

Task-specific:

moojink/openvla-7b-oft-finetuned-libero-spatial
moojink/openvla-7b-oft-finetuned-libero-object
moojink/openvla-7b-oft-finetuned-libero-goal
moojink/openvla-7b-oft-finetuned-libero-10

Combined training across all four suites:

moojink/openvla-7b-oft-finetuned-libero-spatial-object-goal-10

Reported comparison note

The repository documentation reports comparable average success across four suites between:

task-specific policies: 97.1%
combined policy: 96.8%

Treat these as reference values tied to official setup and seeds.

Model mode selection: OFT vs OFT+

Typical defaults:

OFT (LIBERO): use_film=False, num_images_in_input=2, use_proprio=True.
OFT+ (ALOHA): use_film=True, num_images_in_input=3, use_proprio=True.

Always match training and inference flags for:

use_l1_regression / use_diffusion
use_film
num_images_in_input
use_proprio
lora_rank

Citation block

@article{kim2025fine,
  title={Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success},
  author={Kim, Moo Jin and Finn, Chelsea and Liang, Percy},
  journal={arXiv preprint arXiv:2502.19645},
  year={2025}
}

Related skills

Microsoft FoundryDeploy, evaluate, and continuously improve Microsoft Foundry agents from a single agent interface.478k1.3k

Ai Research ReproductionOrchestrate trustworthy, auditable reproduction of deep learning repositories directly from their READMEs.164k507

Run TrainSafely execute selected deep learning training commands with standardized evidence capture.164k507

Explore RunSafely run isolated exploratory experiments with clear recording and conservative selection before committing changes.164k507

Paper Context ResolverFetch precise reproduction-critical details like dataset splits, preprocessing steps, or evaluation protocols from the original academic paper when the repo README leav141k507

Repo Intake And PlanScan unfamiliar AI research repositories and receive a minimal, trustworthy reproduction target before investing significant time.140k507

FAQ

How does fine-tuning-openvla-oft run inference?

fine-tuning-openvla-oft uses server-client inference where a server hosts the VLA model on a /act FastAPI endpoint and an ALOHA client controls the robot environment and requests actions.

What environments does fine-tuning-openvla-oft require?

fine-tuning-openvla-oft creates separate conda environments with Python 3.10, installing torch, torchvision, torchaudio, uvicorn, fastapi, and json-numpy on server and client machines.

Is Fine Tuning Openvla Oft safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Data Science & MLagentsautomation

About

Fine Tuning Openvla Oft by the numbers

Add your badge

How do you fine-tune OpenVLA-OFT on ALOHA robots?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

OpenVLA-OFT

Quick start

Core concepts

Compute requirements

Expected performance benchmarks

When to use vs alternatives

Workflow 1: Set up environment

Workflow 2: Evaluate pretrained checkpoints on LIBERO

Workflow 3: Fine-tune on LIBERO

Workflow 4: Train and evaluate OpenVLA-OFT+ on ALOHA

Critical invariants

Common issues

For HPC/cluster users

Advanced topics

Resources

ALOHA Workflow

Scope

1) Prepare environments

2) Preprocess and split raw demonstrations

3) Register dataset and constants

4) Launch OFT+ training

5) Deploy VLA server

6) Run client-side robot evaluation

7) Troubleshooting notes

Configuration and Troubleshooting

Core files map

High-risk configuration matrix

Constants behavior notes

Sanity checks before long runs

Frequent failures and precise fixes

Decision hints for key training flags

LIBERO Workflow

Scope

1) Setup and dependencies

2) Evaluate official checkpoints

3) Fine-tune on LIBERO RLDS

4) Selection and validation strategy

5) LoRA merge for deployment

6) Logging locations

OpenVLA-OFT Paper and Checkpoints

Paper identity

What OpenVLA-OFT changes

Compute requirements from official docs

Reproduction-sensitive environment notes

Official LIBERO checkpoints

Reported comparison note

Model mode selection: OFT vs OFT+

Citation block

Related skills

FAQ

How does fine-tuning-openvla-oft run inference?

What environments does fine-tuning-openvla-oft require?

Is Fine Tuning Openvla Oft safe to install?

This week in AI coding