
Transformers
Generate and decode text with Hugging Face causal LMs using greedy, sampling, and beam-search strategies under full `generate()` control.
Install
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill transformersWhat is this skill?
- End-to-end `AutoModelForCausalLM` + `AutoTokenizer` generation and decode examples
- Greedy decoding (`do_sample=False`) for deterministic factual or translation-style output
- Sampling with temperature, top-k, and top-p for creative or diverse text
- Beam search for parallel hypothesis exploration
- Guidance to use Pipeline API for quick prototyping vs direct `model.generate()` for control
Adoption & trust: 560 installs on skills.sh; 27.6k GitHub stars; 2/3 security scanners passed (skills.sh audits).
Recommended Skills
Paper Context Resolverlllllllama/ai-paper-reproduction-skill
Repo Intake And Planlllllllama/ai-paper-reproduction-skill
Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill
Minimal Run And Auditlllllllama/ai-paper-reproduction-skill
Analyze Projectlllllllama/rigorpilot-skills
Ai Research Reproductionlllllllama/rigorpilot-skills
Journey fit
Primary fit
Canonical shelf is Build because the skill documents model loading, tokenization, and inference code you add to a product backend or ML service. Backend fits causal LM `generate()` workflows, decoding parameters, and when to bypass Pipelines for custom preprocessing.
Common Questions / FAQ
Is Transformers safe to install?
skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Transformers
# Text Generation ## Overview Generate text with language models using the `generate()` method. Control output quality and style through generation strategies and parameters. For quick prototyping, the [Pipeline API](pipelines.md) wraps tokenization and `generate()`; use `model.generate()` directly when you need custom preprocessing or decoding control. ## Basic Generation ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("gpt2") tokenizer = AutoTokenizer.from_pretrained("gpt2") # Tokenize input inputs = tokenizer("Once upon a time", return_tensors="pt") # Generate outputs = model.generate(**inputs, max_new_tokens=50) # Decode text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(text) ``` ## Generation Strategies ### Greedy Decoding Select highest probability token at each step (deterministic): ```python outputs = model.generate( **inputs, max_new_tokens=50, do_sample=False # Greedy decoding (default) ) ``` **Use for**: Factual text, translations, where determinism is needed. ### Sampling Randomly sample from probability distribution: ```python outputs = model.generate( **inputs, max_new_tokens=50, do_sample=True, temperature=0.7, top_k=50, top_p=0.95 ) ``` **Use for**: Creative writing, diverse outputs, open-ended generation. ### Beam Search Explore multiple hypotheses in parallel: ```python outputs = model.generate( **inputs, max_new_tokens=50, num_beams=5, early_stopping=True ) ``` **Use for**: Translations, summarization, where quality is critical. ### Contrastive Search Balance quality and diversity: ```python outputs = model.generate( **inputs, max_new_tokens=50, penalty_alpha=0.6, top_k=4 ) ``` **Use for**: Long-form generation, reducing repetition. ## Key Parameters ### Length Control **max_new_tokens**: Maximum tokens to generate ```python max_new_tokens=100 # Generate up to 100 new tokens ``` **max_length**: Maximum total length (input + output) ```python max_length=512 # Total sequence length ``` **min_new_tokens**: Minimum tokens to generate ```python min_new_tokens=50 # Force at least 50 tokens ``` **min_length**: Minimum total length ```python min_length=100 ``` ### Temperature Controls randomness (only with sampling): ```python temperature=1.0 # Default, balanced temperature=0.7 # More focused, less random temperature=1.5 # More creative, more random ``` Lower temperature → more deterministic Higher temperature → more random ### Top-K Sampling Consider only top K most likely tokens: ```python do_sample=True top_k=50 # Sample from top 50 tokens ``` **Common values**: 40-100 for balanced output, 10-20 for focused output. ### Top-P (Nucleus) Sampling Consider tokens with cumulative probability ≥ P: ```python do_sample=True top_p=0.95 # Sample from smallest set with 95% cumulative probability ``` **Common values**: 0.9-0.95 for balanced, 0.7-0.85 for focused. ### Repetition Penalty Discourage repetition: ```python repetition_penalty=1.2 # Penalize repeated tokens ``` **Values**: 1.0 = no penalty, 1.2-1.5 = moderate, 2.0+ = strong penalty. ### Beam Search Parameters **num_beams**: Number of beams ```python num_beams=5 # Keep 5 hypotheses ``` **early_stopping**: Stop when num_beams sentences are finished ```python early_stopping=True ``` **no_repeat_ngram_size**: Prevent n-gram repetition ```python no_repeat_ngram_size=3 # Don't repeat any 3-gram ``` ### Output Control **num_return_sequences**: Generate multiple outputs ```python outputs = model.generate( **inputs, max_new_tokens=50, num_beams=5, num_return_sequences=3 # Return 3 different sequences ) ``` **pad_token_id**: Specify padding token ```python pad_token_id=tokenizer.eos_token_id ``` **eos_token_id**: Stop generation at specific token ```python eos_token_id=tokenizer.eos_token_id ``` ## Advanced Features ### Batch Generatio