Gemma Dev

Name: Gemma Dev
Author: google-gemma

google-gemma/gemma-skills

Spin up a local streaming Gradio chat UI on Google Gemma instruction-tuned models for quick model smoke-tests and demos.

Overview

gemma-dev is an agent skill for the Build phase that provides a Gradio plus transformers streaming chat template for Google Gemma models.

Install

npx skills add https://github.com/google-gemma/gemma-skills --skill gemma-dev

What is this skill?

Hugging Face text-generation pipeline with device_map auto and dtype auto
TextIteratorStreamer plus background Thread for token-by-token Gradio yields
Maps Gradio chat history into role/content messages for chat templates
GenerationConfig max_new_tokens=256 with swappable model_id (default google/gemma-4-E2B-it)
gr.ChatInterface scaffold titled Gemma Chatbot for one-file local demos
Default model_id google/gemma-4-E2B-it

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 529 GitHub stars.

What problem does it solve?

You want to talk to a Gemma model locally but do not have a small, copy-paste chat app that streams tokens correctly.

Who is it for?

Solo builders validating Gemma behavior on their machine with a few dozen lines of Python.

Skip if: Teams needing production inference, autoscaling, billing, or hardened security around model access.

When should I use this skill?

When you need a minimal streaming Gradio chat wired to a Google Gemma Hugging Face checkpoint for local experimentation.

What do I get? / Deliverables

You get a runnable Gradio ChatInterface wired to Gemma with streamed replies so you can iterate on prompts and model IDs before harder integration work.

Runnable Gradio ChatInterface script targeting a Gemma model_id
Streaming chat loop with history-aware message formatting

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Primary fit

BuildAgent skills & templates

Canonical shelf is Build because the artifact is runnable inference UI code, not distribution or ops. Agent-tooling fits Hugging Face pipelines, token streaming, and chat-shaped agent prototypes.

Also useful

ValidatePrototype & spike

How it compares

Use as a starter script rather than a managed inference API or full agent framework.

Common Questions / FAQ

Who is gemma-dev for?

Indie developers and agent builders who already use Python and want the fastest path to a streaming Gemma chat demo on Hugging Face stacks.

When should I use gemma-dev?

During Validate when you prototype model UX, and during Build when you wire agent-tooling or compare Gemma checkpoints before production APIs.

Is gemma-dev safe to install?

Review the Security Audits panel on this Prism page and treat downloaded weights and local GPU access as sensitive before running in shared environments.

SKILL.md

READMESKILL.md - Gemma Dev

import gradio as gr
from transformers import pipeline, TextIteratorStreamer, GenerationConfig
from threading import Thread

# Load the pipeline
# Replace "google/gemma-4-E2B-it" with other available models
model_id = "google/gemma-4-E2B-it"

pipe = pipeline(
    "text-generation",
    model=model_id,
    device_map="auto",
    dtype="auto",
)

def chat(message, history):
    messages = []

    # Add conversation history
    for msg in history:
        role = msg["role"]

        # Extract text from the content list (e.g. [{'text': 'hello', 'type': 'text'}])
        if isinstance(msg["content"], list):
            content_text = "".join([item["text"] for item in msg["content"] if item["type"] == "text"])
        else:
            content_text = msg["content"]

        messages.append({"role": role, "content": content_text})

    # Add current user message
    messages.append({"role": "user", "content": message})

    streamer = TextIteratorStreamer(pipe.tokenizer, skip_prompt=True, skip_special_tokens=True)
    config = GenerationConfig(max_new_tokens=256)
    thread = Thread(target=pipe, args=(messages,), kwargs=dict(
        generation_config=config,
        streamer=streamer
    ))
    thread.start()

    # Generate response
    generated_text = ""
    for new_text in streamer:
        generated_text += new_text
        yield generated_text

# Create the ChatInterface
demo = gr.ChatInterface(
    fn=chat,
    title="Gemma Chatbot",
    description="Ask Gemma anything!",
)

if __name__ == "__main__":
    demo.launch()

import { pipeline, TextStreamer } from '@huggingface/transformers';
import cliProgress from 'cli-progress';
import inquirer from 'inquirer';

let generator;

async function initializeGemma() {
    console.log('Initializing Gemma model...');
    const progressBar = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic);
    progressBar.start(100, 0);

    generator = await pipeline('text-generation', 'onnx-community/gemma-4-E2B-it-ONNX', {
        device: 'webgpu',
        dtype: 'q4',
        progress_callback: (progress) => {
            progressBar.update(progress.progress);
        },
    });

    progressBar.stop();
    console.log('Gemma model initialized!');
}

async function* generate(question) {
    const messages = [
        {role: 'user', content: question}
    ];

    const prompt = generator.tokenizer.apply_chat_template(messages, {
        tokenize:false,
        add_generation_prompt: true,
    });

    const streamer = new TextStreamer(generator.tokenizer, {
        skip_prompt: true, // Don't stream the user's prompt back
        skip_special_tokens: true,
    });

    await generator(prompt, {
        max_new_tokens: 256,
        streamer: streamer,
    });
}

async function main() {
    console.clear();
    await initializeGemma();

    while (true) {
        const { question } = await inquirer.prompt({
            type: 'input',
            name: 'question',
            message: "Ask Gemma anything:",
        });

        if (question.toLowerCase() === 'exit') {
            console.log('See you!');
            break;
        }

        console.log('\nGemma: ');

        for await (const chunk of generate(question)) {
            console.log(chunk);
        }

        console.log('\n');
    }
}

main().catch(err => {
    console.error('An error occurred:', err);
});

import os
from transformers import AutoTokenizer
from google.cloud import aiplatform

PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT")
LOCATION = os.environ.get("GOOGLE_CLOUD_LOCATION")
ENDPOINT_ID = os.environ.get("GOOGLE_CLOUD_ENDPOINT_ID")

MODEL_ID = "google/gemma-4-31B-it"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

def predict_gemma(project: str, endpoint_id: str, prompt: str, location: str = "us-central1"):
    # Initialize the Vertex AI client
    aiplatform.init(project=project, location=location)
    
    # Reference the deployed endpoint
    endpoint = aiplatform.Endpoint(endpoint_id)
    
    # F

What is this skill?

Hugging Face text-generation pipeline with device_map auto and dtype auto

TextIteratorStreamer plus background Thread for token-by-token Gradio yields

Maps Gradio chat history into role/content messages for chat templates

GenerationConfig max_new_tokens=256 with swappable model_id (default google/gemma-4-E2B-it)

gr.ChatInterface scaffold titled Gemma Chatbot for one-file local demos

Default model_id google/gemma-4-E2B-it

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 529 GitHub stars.

Journey fit

Primary fit

BuildAgent skills & templates

Canonical shelf is Build because the artifact is runnable inference UI code, not distribution or ops. Agent-tooling fits Hugging Face pipelines, token streaming, and chat-shaped agent prototypes.

Also useful

ValidatePrototype & spike

SKILL.md

READMESKILL.md - Gemma Dev

import gradio as gr
from transformers import pipeline, TextIteratorStreamer, GenerationConfig
from threading import Thread

# Load the pipeline
# Replace "google/gemma-4-E2B-it" with other available models
model_id = "google/gemma-4-E2B-it"

pipe = pipeline(
    "text-generation",
    model=model_id,
    device_map="auto",
    dtype="auto",
)

def chat(message, history):
    messages = []

    # Add conversation history
    for msg in history:
        role = msg["role"]

        # Extract text from the content list (e.g. [{'text': 'hello', 'type': 'text'}])
        if isinstance(msg["content"], list):
            content_text = "".join([item["text"] for item in msg["content"] if item["type"] == "text"])
        else:
            content_text = msg["content"]

        messages.append({"role": role, "content": content_text})

    # Add current user message
    messages.append({"role": "user", "content": message})

    streamer = TextIteratorStreamer(pipe.tokenizer, skip_prompt=True, skip_special_tokens=True)
    config = GenerationConfig(max_new_tokens=256)
    thread = Thread(target=pipe, args=(messages,), kwargs=dict(
        generation_config=config,
        streamer=streamer
    ))
    thread.start()

    # Generate response
    generated_text = ""
    for new_text in streamer:
        generated_text += new_text
        yield generated_text

# Create the ChatInterface
demo = gr.ChatInterface(
    fn=chat,
    title="Gemma Chatbot",
    description="Ask Gemma anything!",
)

if __name__ == "__main__":
    demo.launch()

import { pipeline, TextStreamer } from '@huggingface/transformers';
import cliProgress from 'cli-progress';
import inquirer from 'inquirer';

let generator;

async function initializeGemma() {
    console.log('Initializing Gemma model...');
    const progressBar = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic);
    progressBar.start(100, 0);

    generator = await pipeline('text-generation', 'onnx-community/gemma-4-E2B-it-ONNX', {
        device: 'webgpu',
        dtype: 'q4',
        progress_callback: (progress) => {
            progressBar.update(progress.progress);
        },
    });

    progressBar.stop();
    console.log('Gemma model initialized!');
}

async function* generate(question) {
    const messages = [
        {role: 'user', content: question}
    ];

    const prompt = generator.tokenizer.apply_chat_template(messages, {
        tokenize:false,
        add_generation_prompt: true,
    });

    const streamer = new TextStreamer(generator.tokenizer, {
        skip_prompt: true, // Don't stream the user's prompt back
        skip_special_tokens: true,
    });

    await generator(prompt, {
        max_new_tokens: 256,
        streamer: streamer,
    });
}

async function main() {
    console.clear();
    await initializeGemma();

    while (true) {
        const { question } = await inquirer.prompt({
            type: 'input',
            name: 'question',
            message: "Ask Gemma anything:",
        });

        if (question.toLowerCase() === 'exit') {
            console.log('See you!');
            break;
        }

        console.log('\nGemma: ');

        for await (const chunk of generate(question)) {
            console.log(chunk);
        }

        console.log('\n');
    }
}

main().catch(err => {
    console.error('An error occurred:', err);
});

import os
from transformers import AutoTokenizer
from google.cloud import aiplatform

PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT")
LOCATION = os.environ.get("GOOGLE_CLOUD_LOCATION")
ENDPOINT_ID = os.environ.get("GOOGLE_CLOUD_ENDPOINT_ID")

MODEL_ID = "google/gemma-4-31B-it"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

def predict_gemma(project: str, endpoint_id: str, prompt: str, location: str = "us-central1"):
    # Initialize the Vertex AI client
    aiplatform.init(project=project, location=location)
    
    # Reference the deployed endpoint
    endpoint = aiplatform.Endpoint(endpoint_id)
    
    # F

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is gemma-dev for?

When should I use gemma-dev?

Is gemma-dev safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is gemma-dev for?

When should I use gemma-dev?

Is gemma-dev safe to install?

SKILL.md