Tribev2 Brain Encoding

Name: Tribev2 Brain Encoding
Author: aradotso

aradotso/trending-skills

773 installs
66 repo stars
Updated July 9, 2026
aradotso/trending-skills

tribev2-brain-encoding is a Claude Code skill that runs Meta's TRIBE v2 multimodal foundation model to predict human fMRI brain responses from video, audio, and text naturalistic stimuli.

About

tribev2-brain-encoding is an aradotso trending-skills agent skill for Meta's TRIBE v2 foundation model that predicts cortical fMRI responses to naturalistic video, audio, and text inputs. The model combines LLaMA-based language understanding with multimodal encoders for in-silico neuroscience experiments without collecting new neuroimaging data. Developers reach for tribev2-brain-encoding when building brain encoding pipelines, running TRIBE v2 inference on stimulus sets, or exploring multimodal predictors of cortical activity. Trigger phrases include fMRI encoding models, brain response prediction from video, and tribev2 training or inference workflows for computational neuroscience research.

Multimodal foundation model combining LLaMA 3.2 text, V-JEPA2 video, and Wav2Vec-BERT audio encoders
Predicts cortical activity on fsaverage5 surface (~20k vertices) from naturalistic stimuli
Supports inference, visualization with PyVista & Nilearn, and full training pipelines
Pretrained weights available on Hugging Face with one-line loading
Three install variants: inference-only, plotting, and full training with PyTorch Lightning and W&B

Tribev2 Brain Encoding by the numbers

773 all-time installs (skills.sh)
+10 installs in the week ending Jul 22, 2026 (Skillselion tracking)
Ranked #1,326 of 16,659 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Jul 22, 2026 (Skillselion catalog sync)

npx skills add https://github.com/aradotso/trending-skills --skill tribev2-brain-encoding

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/aradotso/trending-skills/tribev2-brain-encoding.svg)](https://skillselion.com/skills/aradotso/trending-skills/tribev2-brain-encoding)

Installs	773
repo stars	★ 66
Security audit	3 / 3 scanners passed
Last updated	July 9, 2026
Repository	aradotso/trending-skills ↗

How do you predict fMRI brain responses from video?

Run Meta's TRIBE v2 model that predicts human brain fMRI responses from video, audio, and text inputs.

Who is it for?

Computational neuroscientists and ML engineers integrating Meta TRIBE v2 for predicting brain responses to video, audio, and text without new scanner sessions.

Skip if: Clinical diagnostic workflows or real-time BCI applications requiring validated medical-device-grade neuroimaging pipelines.

When should I use this skill?

A developer asks to predict brain responses to video, run TRIBE v2 encoding models, or forecast fMRI activity from multimodal stimuli.

What you get

Predicted cortical fMRI response maps, encoding model outputs, and multimodal brain activity forecasts for naturalistic stimuli.

predicted fMRI response maps
encoding model outputs

Files

SKILL.mdMarkdownGitHub ↗

TRIBE v2 Brain Encoding Model

Skill by ara.so — Daily 2026 Skills collection

TRIBE v2 is Meta's multimodal foundation model that predicts fMRI brain responses to naturalistic stimuli (video, audio, text). It combines LLaMA 3.2 (text), V-JEPA2 (video), and Wav2Vec-BERT (audio) encoders into a unified Transformer architecture that maps multimodal representations onto the cortical surface (fsaverage5, ~20k vertices).

Installation

# Inference only
pip install -e .

# With brain visualization (PyVista & Nilearn)
pip install -e ".[plotting]"

# Full training dependencies (PyTorch Lightning, W&B, etc.)
pip install -e ".[training]"

Quick Start — Inference

Load pretrained model and predict from video

from tribev2 import TribeModel

# Load from HuggingFace (downloads weights to cache)
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

# Build events dataframe from a video file
df = model.get_events_dataframe(video_path="path/to/video.mp4")

# Predict brain responses
preds, segments = model.predict(events=df)
print(preds.shape)  # (n_timesteps, n_vertices) on fsaverage5

Multimodal input — video + audio + text

from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

# All modalities together (text is auto-converted to speech and transcribed)
df = model.get_events_dataframe(
    video_path="path/to/video.mp4",
    audio_path="path/to/audio.wav",   # optional, overrides video audio
    text_path="path/to/script.txt",   # optional, auto-timed
)

preds, segments = model.predict(events=df)
print(preds.shape)  # (n_timesteps, n_vertices)

Text-only prediction

from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

df = model.get_events_dataframe(text_path="path/to/narration.txt")
preds, segments = model.predict(events=df)

Brain Visualization

from tribev2 import TribeModel
from tribev2.plotting import plot_brain_surface

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="path/to/video.mp4")
preds, segments = model.predict(events=df)

# Plot a single timepoint on the cortical surface
plot_brain_surface(preds[0], backend="nilearn")   # or backend="pyvista"

Training a Model from Scratch

1. Set environment variables

export DATAPATH="/path/to/studies"
export SAVEPATH="/path/to/output"
export SLURM_PARTITION="your_slurm_partition"

2. Authenticate with HuggingFace (required for LLaMA 3.2)

huggingface-cli login
# Paste a HuggingFace read token when prompted
# Request access at: https://huggingface.co/meta-llama/Llama-3.2-3B

3. Local test run

python -m tribev2.grids.test_run

4. Full grid search on Slurm

# Cortical surface model
python -m tribev2.grids.run_cortical

# Subcortical regions
python -m tribev2.grids.run_subcortical

Key API — TribeModel

from tribev2 import TribeModel

# Load pretrained weights
model = TribeModel.from_pretrained(
    "facebook/tribev2",
    cache_folder="./cache"  # local cache for HuggingFace weights
)

# Build events dataframe (word-level timings, chunking, etc.)
df = model.get_events_dataframe(
    video_path=None,   # str path to .mp4
    audio_path=None,   # str path to .wav
    text_path=None,    # str path to .txt
)

# Run prediction
preds, segments = model.predict(events=df)
# preds: np.ndarray of shape (n_timesteps, n_vertices)
# segments: list of segment metadata dicts

Project Structure

tribev2/
├── main.py              # Experiment pipeline: Data, TribeExperiment
├── model.py             # FmriEncoder: Transformer multimodal→fMRI model
├── pl_module.py         # PyTorch Lightning training module
├── demo_utils.py        # TribeModel and inference helpers
├── eventstransforms.py  # Event transforms (word extraction, chunking)
├── utils.py             # Multi-study loading, splitting, subject weighting
├── utils_fmri.py        # Surface projection (MNI / fsaverage) and ROI analysis
├── grids/
│   ├── defaults.py      # Full default experiment configuration
│   └── test_run.py      # Quick local test entry point
├── plotting/            # Brain visualization backends
└── studies/             # Dataset definitions (Algonauts2025, Lahner2024, …)

Configuration — Defaults

Edit tribev2/grids/defaults.py or set environment variables:

# tribev2/grids/defaults.py (key fields)
{
    "datapath": "/path/to/studies",       # override with DATAPATH env var
    "savepath": "/path/to/output",        # override with SAVEPATH env var
    "slurm_partition": "learnfair",       # override with SLURM_PARTITION env var
    "model": "FmriEncoder",
    "modalities": ["video", "audio", "text"],
    "surface": "fsaverage5",              # ~20k vertices
}

Custom Experiment with PyTorch Lightning

from tribev2.main import Data, TribeExperiment
from tribev2.pl_module import TribePLModule
import pytorch_lightning as pl

# Configure experiment
experiment = TribeExperiment(
    datapath="/path/to/studies",
    savepath="/path/to/output",
    modalities=["video", "audio", "text"],
)

data = Data(experiment)
module = TribePLModule(experiment)

trainer = pl.Trainer(
    max_epochs=50,
    accelerator="gpu",
    devices=4,
)
trainer.fit(module, data)

Working with fMRI Surfaces

from tribev2.utils_fmri import project_to_fsaverage, get_roi_mask

# Project MNI coordinates to fsaverage5 surface
surface_data = project_to_fsaverage(mni_data, target="fsaverage5")

# Get a specific ROI mask (e.g., early visual cortex)
roi_mask = get_roi_mask(roi_name="V1", surface="fsaverage5")
v1_responses = preds[:, roi_mask]
print(v1_responses.shape)  # (n_timesteps, n_v1_vertices)

Common Patterns

Batch prediction over multiple videos

from tribev2 import TribeModel
import numpy as np

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

video_paths = ["video1.mp4", "video2.mp4", "video3.mp4"]
all_predictions = []

for vp in video_paths:
    df = model.get_events_dataframe(video_path=vp)
    preds, segments = model.predict(events=df)
    all_predictions.append(preds)

# all_predictions: list of (n_timesteps_i, n_vertices) arrays

Extract predictions for specific brain region

from tribev2 import TribeModel
from tribev2.utils_fmri import get_roi_mask

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="video.mp4")
preds, segments = model.predict(events=df)

# Focus on auditory cortex
ac_mask = get_roi_mask("auditory_cortex", surface="fsaverage5")
auditory_responses = preds[:, ac_mask]  # (n_timesteps, n_ac_vertices)

Access segment timing metadata

preds, segments = model.predict(events=df)

for i, seg in enumerate(segments):
    print(f"Segment {i}: onset={seg['onset']:.2f}s, duration={seg['duration']:.2f}s")
    print(f"  Brain response shape: {preds[i].shape}")

Troubleshooting

LLaMA 3.2 access denied

# Must request access at https://huggingface.co/meta-llama/Llama-3.2-3B
# Then authenticate:
huggingface-cli login
# Use a HuggingFace token with read permissions

CUDA out of memory during inference

# Use CPU for inference on smaller machines
import torch
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
model.to("cpu")

Missing visualization dependencies

pip install -e ".[plotting]"
# Installs pyvista and nilearn backends

Slurm training not submitting

# Check env vars are set
echo $DATAPATH $SAVEPATH $SLURM_PARTITION
# Or edit tribev2/grids/defaults.py directly

Video without audio track causes error

# Provide audio separately or use text-only mode
df = model.get_events_dataframe(
    video_path="silent_video.mp4",
    audio_path="separate_audio.wav",
)

Citation

@article{dAscoli2026TribeV2,
  title={A foundation model of vision, audition, and language for in-silico neuroscience},
  author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon
          and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
  year={2026}
}

Resources

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

FAQ

What inputs does tribev2-brain-encoding accept?

tribev2-brain-encoding accepts multimodal naturalistic stimuli including video, audio, and text, feeding them into Meta's TRIBE v2 foundation model to predict corresponding fMRI cortical responses.

What is TRIBE v2 used for in tribev2-brain-encoding?

TRIBE v2 in tribev2-brain-encoding is Meta's multimodal foundation model combining LLaMA-based encoders to predict human brain fMRI responses, enabling in-silico neuroscience without new scanner data collection.

Is Tribev2 Brain Encoding safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

AI & Agent Buildingresearchllmautomation