Protein Sequence Msa

Name: Protein Sequence Msa
Author: google-deepmind

google-deepmind/science-skills

Submit multi-sequence FASTA-style input to EBI Clustal Omega via the bundled script and retrieve a multiple sequence alignment from your agent.

Overview

Protein Sequence MSA is an agent skill for the Build phase that runs EBI Clustal Omega to produce a multiple sequence alignment from submitted protein sequences.

Install

npx skills add https://github.com/google-deepmind/science-skills --skill protein-sequence-msa

What is this skill?

Submits sequences to EBI Clustal Omega REST API with encoded email, title, and sequence payload
Shared `HttpClient` with QPS=1 against the Clustal Omega services base URL
15-minute polling timeout (`_POLLING_TIMEOUT_SECS`) for long-running alignment jobs
Python ≥3.10 script with `scienceskillscommon` and `python-dotenv` dependencies
Command-line entry via argparse for agent-driven MSA runs
15-minute polling timeout for Clustal Omega jobs
HTTP client rate limit QPS=1 to EBI Clustal Omega REST endpoint

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 546 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You have several protein sequences and need a reliable MSA from Clustal Omega but manual EBI REST submission and job polling is error-prone in agent workflows.

Who is it for?

Indie bioinformatics or health-AI builders wiring agent-driven pipelines that need standard Clustal Omega alignments via EBI.

Skip if: Large in-house alignment clusters or workflows that require local HMMER/MMseqs2 instead of public EBI job queues.

When should I use this skill?

You need a multiple sequence alignment from Clustal Omega for a file of protein sequences inside an automated or agent-driven workflow.

What do I get? / Deliverables

The agent executes the bundled alignment script, polls the EBI service within a 15-minute window, and returns the completed MSA output for downstream modeling or analysis.

Clustal Omega alignment result from completed EBI job
Job submission and poll trace suitable for pipeline logs

Recommended Skills

Paper Context Resolverlllllllama/ai-paper-reproduction-skill

Optional helper-tier skill that supplements README-guided deep learning reproduction by resolving specific paper details…140k installs·412 stars

Repo Intake And Planlllllllama/ai-paper-reproduction-skill

Rigor Intake scans repository docs and layout to classify documented commands and propose a minimal reproduction plan fo…140k installs·412 stars

Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill

Rigor Setup establishes conservative environment and asset assumptions aligned with README and config evidence before ex…140k installs·412 stars

Minimal Run And Auditlllllllama/ai-paper-reproduction-skill

RigorPilot executes the selected minimal reproduction command and produces normalized, auditable run evidence for paper …140k installs·412 stars

Analyze Projectlllllllama/rigorpilot-skills

analyze-project is a read-only agent skill from the RigorPilot family aimed at solo builders and small teams inheriting …32.3k installs·412 stars

Ai Research Reproductionlllllllama/rigorpilot-skills

ai-research-reproduction is the RigorPilot Reproduce orchestrator for solo builders and small teams who need to rerun a …32.3k installs·412 stars

Journey fit

Primary fit

BuildIntegrations & version control

Canonical shelf in Build because the skill implements an external bioinformatics API integration rather than early ideation alone. Integrations subphase reflects REST job submission, polling, and HTTP client usage against EBI Clustal Omega.

Also useful

IdeaOpportunity & market research

How it compares

Skill-packaged EBI Clustal Omega client, not a local MSA binary or generic sequence-editing template.

Common Questions / FAQ

Who is protein-sequence-msa for?

Solo developers and small teams building science agents who need programmatic Clustal Omega alignments through Google DeepMind science-skills patterns.

When should I use protein-sequence-msa?

During Build when integrating sequence alignment into a pipeline, preparing features for structure prediction, or automating homology steps from a multi-sequence input file.

Is protein-sequence-msa safe to install?

It runs network jobs against EBI and may require email parameters; review Security Audits on this page and treat sequence data as sensitive before sending to external services.

SKILL.md

READMESKILL.md - Protein Sequence Msa

# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "scienceskillscommon",
#   "python-dotenv",
# ]
# [tool.uv.sources]
# scienceskillscommon = { path = "../../scienceskillscommon" }
# ///

"""Runs EBI Clustal Omega for MSA computation.

Takes a file with multiple sequences and provides the alignment.
"""

import argparse
import os
import sys
import time
import urllib.parse

import dotenv
from science_skills.scienceskillscommon import http_client

_POLLING_TIMEOUT_SECS = 15 * 60  # 15 minutes.

_CLIENT = http_client.HttpClient(
    "https://www.ebi.ac.uk/Tools/services/rest/clustalo/", qps=1
)


def _prepare_payload(email: str, title: str, sequences: str) -> bytes:
  """Prepares the payload for the EBI Clustal Omega API."""
  params = {
      "email": email,
      "title": title,
      "sequence": sequences,
  }
  return urllib.parse.urlencode(params).encode("utf-8")


def _align_sequences(
    *, input_file: str, output_file: str, dry_run: bool = False
) -> None:
  """Runs EBI Clustal Omega alignment for sequences in a FASTA file.

  This function takes a FASTA formatted file, submits the sequences to the
  EBI Clustal Omega web service, polls for the alignment completion, and
  saves the resulting alignment in FASTA format to the specified output file.

  Args:
    input_file: Path to the input file containing sequences in FASTA format.
    output_file: Path where the resulting MSA in FASTA format will be saved.
    dry_run: If True, print the payload and exit without submitting the job.
  """
  if not os.path.exists(input_file):
    print(f"[!] Error: Input file not found: {input_file}")
    sys.exit(1)

  max_size_bytes = 4 * 1024 * 1024  # 4 MB
  file_size = os.path.getsize(input_file)
  if file_size > max_size_bytes:
    print(
        "[!] Error: At most 4 MB file size supported. Found"
        f" {file_size / (1024 * 1024):.2f} MB."
    )
    sys.exit(1)

  with open(input_file, "r") as f:
    sequences = f.read().strip()

  if not sequences:
    print("[!] Error: Empty input file.")
    sys.exit(1)

  num_sequences = sequences.count(">")
  if num_sequences < 2:
    print(f"[!] Error: At least 2 sequences required. Found {num_sequences}.")
    sys.exit(1)
  if num_sequences > 4000:
    print(
        f"[!] Error: At most 4000 sequences supported. Found {num_sequences}."
    )
    sys.exit(1)

  print("[*] Submitting sequences to EBI Clustal Omega API...")

  # 1. Submit Job
  user_email = os.environ.get("USER_EMAIL")
  if not user_email:
    print("[!] Error: USER_EMAIL environment variable is required.")
    sys.exit(1)
  data = _prepare_payload(user_email, "MSA", sequences)

  if dry_run:
    print(data)
    sys.exit(0)

  job_id = _CLIENT.fetch_text(
      "run", method="POST", data=data, headers={"Accept": "text/plain"}
  ).strip()
  print(f"[*] Job ID generated: {job_id}")

  # 2. Poll the server
  print("[*] Polling server for completion...")

  start_time = time.time()
  while time.time() - start_time < _POLLING_TIMEOUT_SECS:
    status = _CLIENT.fetch_text(
        f"status/{job_id}", headers={"Accept": "text/plain"}, timeout=20
    ).strip()
    sys.stdout.write(".")
    sys.stdout.flush()

    if status == "FINISHED":
      print("\n[*] Job marked as FINISHED.")
      break
    elif status in ["ERROR", "FAILURE", "NOT_FOUND"]:
      print(f"\n[!] Job failed with status: {status}")
      sys.exit(1)
    time.sleep(10)

  else:
    pr

What is this skill?

Submits sequences to EBI Clustal Omega REST API with encoded email, title, and sequence payload

Shared `HttpClient` with QPS=1 against the Clustal Omega services base URL

15-minute polling timeout (`_POLLING_TIMEOUT_SECS`) for long-running alignment jobs

Python ≥3.10 script with `scienceskillscommon` and `python-dotenv` dependencies

Command-line entry via argparse for agent-driven MSA runs

15-minute polling timeout for Clustal Omega jobs

HTTP client rate limit QPS=1 to EBI Clustal Omega REST endpoint

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 546 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

BuildIntegrations & version control

Also useful

IdeaOpportunity & market research

SKILL.md

READMESKILL.md - Protein Sequence Msa

# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "scienceskillscommon",
#   "python-dotenv",
# ]
# [tool.uv.sources]
# scienceskillscommon = { path = "../../scienceskillscommon" }
# ///

"""Runs EBI Clustal Omega for MSA computation.

Takes a file with multiple sequences and provides the alignment.
"""

import argparse
import os
import sys
import time
import urllib.parse

import dotenv
from science_skills.scienceskillscommon import http_client

_POLLING_TIMEOUT_SECS = 15 * 60  # 15 minutes.

_CLIENT = http_client.HttpClient(
    "https://www.ebi.ac.uk/Tools/services/rest/clustalo/", qps=1
)


def _prepare_payload(email: str, title: str, sequences: str) -> bytes:
  """Prepares the payload for the EBI Clustal Omega API."""
  params = {
      "email": email,
      "title": title,
      "sequence": sequences,
  }
  return urllib.parse.urlencode(params).encode("utf-8")


def _align_sequences(
    *, input_file: str, output_file: str, dry_run: bool = False
) -> None:
  """Runs EBI Clustal Omega alignment for sequences in a FASTA file.

  This function takes a FASTA formatted file, submits the sequences to the
  EBI Clustal Omega web service, polls for the alignment completion, and
  saves the resulting alignment in FASTA format to the specified output file.

  Args:
    input_file: Path to the input file containing sequences in FASTA format.
    output_file: Path where the resulting MSA in FASTA format will be saved.
    dry_run: If True, print the payload and exit without submitting the job.
  """
  if not os.path.exists(input_file):
    print(f"[!] Error: Input file not found: {input_file}")
    sys.exit(1)

  max_size_bytes = 4 * 1024 * 1024  # 4 MB
  file_size = os.path.getsize(input_file)
  if file_size > max_size_bytes:
    print(
        "[!] Error: At most 4 MB file size supported. Found"
        f" {file_size / (1024 * 1024):.2f} MB."
    )
    sys.exit(1)

  with open(input_file, "r") as f:
    sequences = f.read().strip()

  if not sequences:
    print("[!] Error: Empty input file.")
    sys.exit(1)

  num_sequences = sequences.count(">")
  if num_sequences < 2:
    print(f"[!] Error: At least 2 sequences required. Found {num_sequences}.")
    sys.exit(1)
  if num_sequences > 4000:
    print(
        f"[!] Error: At most 4000 sequences supported. Found {num_sequences}."
    )
    sys.exit(1)

  print("[*] Submitting sequences to EBI Clustal Omega API...")

  # 1. Submit Job
  user_email = os.environ.get("USER_EMAIL")
  if not user_email:
    print("[!] Error: USER_EMAIL environment variable is required.")
    sys.exit(1)
  data = _prepare_payload(user_email, "MSA", sequences)

  if dry_run:
    print(data)
    sys.exit(0)

  job_id = _CLIENT.fetch_text(
      "run", method="POST", data=data, headers={"Accept": "text/plain"}
  ).strip()
  print(f"[*] Job ID generated: {job_id}")

  # 2. Poll the server
  print("[*] Polling server for completion...")

  start_time = time.time()
  while time.time() - start_time < _POLLING_TIMEOUT_SECS:
    status = _CLIENT.fetch_text(
        f"status/{job_id}", headers={"Accept": "text/plain"}, timeout=20
    ).strip()
    sys.stdout.write(".")
    sys.stdout.flush()

    if status == "FINISHED":
      print("\n[*] Job marked as FINISHED.")
      break
    elif status in ["ERROR", "FAILURE", "NOT_FOUND"]:
      print(f"\n[!] Job failed with status: {status}")
      sys.exit(1)
    time.sleep(10)

  else:
    pr

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is protein-sequence-msa for?

When should I use protein-sequence-msa?

Is protein-sequence-msa safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is protein-sequence-msa for?

When should I use protein-sequence-msa?

Is protein-sequence-msa safe to install?

SKILL.md