What is BioContext7

An overview of BioContext7 — the bioinformatics registry aggregator and pipeline generator

Overview

BioContext7 is a bioinformatics registry aggregator and self-healing pipeline generator designed for integration with Claude Code via the Model Context Protocol (MCP).

It solves a core problem in bioinformatics: discovering the right tools for a workflow and generating correct, executable pipeline code from natural language descriptions.

Architecture

BioContext7 is composed of four layers:

Registry Layer

Aggregates tool metadata from multiple sources into a unified, searchable index:

bio.tools — 47,000+ curated bioinformatics tools with EDAM annotations
BioContainers — Docker and Singularity container images for reproducible execution
EDAM Ontology — Semantic terms for operations, topics, data types, and formats
UniProt — Protein sequences, annotations, and ID mapping
GA4GH Standards — Beacon (variant queries), VRS (variant representation), WES (workflow execution)
Metabolomics — HMDB, Metabolomics Workbench, MassBank, LIPID MAPS

Compiler Layer

Translates a language-agnostic intermediate representation (PipelineSpec) into target-specific pipeline code:

Target	Output
Nextflow DSL2	`main.nf` + `nextflow.config`
Snakemake	`Snakefile` + rule files
WDL	WDL task and workflow definitions
CWL	CWL workflow + tool definitions

Healing Layer

Validates generated pipelines using Language Server Protocol (LSP) integration and automatically fixes errors through iterative correction loops:

Generate pipeline code
Run LSP validation (syntax, type checking)
Collect diagnostics
Apply auto-fixes
Re-validate until clean or max iterations reached

MCP Layer

Exposes BioContext7 capabilities as five MCP tools for Claude Code integration:

resolve-library-id — Search 47K+ bioinformatics tools by name or keyword
get-library-docs — Fetch versioned documentation for a specific tool (supports topic filtering, token budgets, and chunk size control)
find-skills — Semantic skill search with quality scoring (bc7score), install commands, and health signals. Supports compact mode for LLM-optimized responses and platform filtering.
recommend-tools — Get opinionated, ranked tool recommendations with benchmark references for 10 analysis patterns
report-snippet-quality — Relevance feedback loop that penalizes unhelpful snippets in subsequent retrievals

Design Principles

Deterministic core — Data and specs produce build artifacts without LLM involvement
Provenance everywhere — Full tracking of inputs, tool versions, and outputs
Grounded text — Every output references concrete artifacts
Self-healing — LSP validation loops catch and fix errors automatically