pip install esm@git+https://github.com/Biohub/esm.git@mainESMC
ESMC is the latest in the ESM family of protein language models, establishing a new frontier in representation learning for protein biology. Trained on billions of evolutionary sequences, it learns representations that reflect a mechanistic reduction of protein structure and function.
Get Started
Quickstart Guide
Install the esm Python package
Create an API key
Connect to the Biohub Platform API
from esm.sdk.forge import ESMCForgeInferenceClient
client = ESMCForgeInferenceClient(model="esmc-6b-2024-12", url="https://biohub.ai", token="<your API token>")Run your inference
Model Tutorials
Explore All TutorialsEmbedding sequences with ESMC
Embed protein sequences and explore how different transformer layers encode structural and functional information.
Zero-shot entropy and mutation analysis
Compute per-position entropy and log-likelihood ratios to identify constrained vs. mutation-tolerant sites.
Layer sweep for enzyme function classification
Learn how to sweep all layers to find which one is best using enzyme classification as a task.
Understanding proteins with SAE features
Extract and visualize sparse autoencoder features, rank by peak activation, and map activations onto 3D structures.
Model Details
Model Card
Version
2026-04
Architecture
Transformer
Supported Modalities
Sequence
Training Data
Up to 6 billion proteins
Intended Use
ESMC is designed for protein science research including structure prediction, function annotation, protein design, and understanding evolutionary relationships between proteins. It can generate novel proteins given partial sequence, structure, or functional constraints.
Limitations & Risks
Outputs should be validated experimentally. The model may generate proteins that are not synthesizable or functional. Not intended for clinical or therapeutic applications without further validation.
Explore the Model

INFERENCE TOOL
Fold
Predict high-resolution, all-atom 3D structures of biomolecular complexes directly from protein, DNA, and RNA sequences to output comprehensive structural formation with ESMFold2.
ESM Atlas Data
Dataset | Size | CLI Command |
|---|---|---|
SequencesProtein sequences (6.8B proteins) | 2.20 GB | aws s3 sync s3://esm-protein-atlas/v1/sequences/ /mydrive |
StructuresProtein structures (1B proteins) | 68.9 TB | aws s3 sync s3://esm-protein-atlas/v1/folds/ /mydrive |
SAE featuresPer protein and per-residue feature vectors (6.8B proteins) | 306 TB | aws s3 sync s3://esm-protein-atlas/v1/sae/data_shards/ /mydrive |
SAE ClustersCluster-level organization based on SAE features (7.5M clusters) | 26.0 GB | aws s3 sync s3://esm-protein-atlas/v1/clusters/indexes/secondary/cluster_members/ /mydrive |
HMM ResultsPredicted pfam and taxonomy (6.8B proteins) | 653 MB | aws s3 sync s3://esm-protein-atlas/v1/clusters/data/representative_proteins.parquet /mydrive |
Protein_to_accessionMapping of protein IDs to accession numbers (6.8B proteins) | 162 GB | aws s3 sync s3://esm-protein-atlas/v1/shared_indexes/ /mydrive |
NormalizationSAE feature normalization | 192 KB | aws s3 sync s3://esm-protein-atlas/v1/normalization/max_idf_log10.pkl /mydrive |
All DataComplete set of sequences, structures, features, and clusters | 377 TB | aws s3 sync s3://esm-protein-atlas/v1/ /mydrive |
aws s3 sync s3://esm-protein-atlas/v1/sequences/ /mydrive
aws s3 sync s3://esm-protein-atlas/v1/folds/ /mydrive
aws s3 sync s3://esm-protein-atlas/v1/sae/data_shards/ /mydrive
aws s3 sync s3://esm-protein-atlas/v1/clusters/indexes/secondary/cluster_members/ /mydrive
aws s3 sync s3://esm-protein-atlas/v1/clusters/data/representative_proteins.parquet /mydrive
aws s3 sync s3://esm-protein-atlas/v1/shared_indexes/ /mydrive
aws s3 sync s3://esm-protein-atlas/v1/normalization/max_idf_log10.pkl /mydrive
aws s3 sync s3://esm-protein-atlas/v1/ /mydrive