pip install esm@git+https://github.com/Biohub/esm.git@mainGet Started with ESM Models
Introduction
The Quickstart shows how to set up the ESM models by installing the esm python package. Once the esm package is set up, you can choose from ESMC, ESMFold2, the ESMC sparse autoencoders (SAEs) or ESM3 to begin fine tuning or working with the model.
Quickstart guide
Install the esm Python package
Import the necessary libraries
from esm.sdk.forge import ESMCForgeInferenceClient
from esm.sdk import client
from esm.sdk.api import ESMProtein, ESMProteinError, LogitsConfig, LogitsOutputCreate an API key
Generate an API key and add it to your Biohub account. This API key manages your access to credits and tokens, and the term API key/token is often used interchangeably within documentation.
Connect to the Biohub API
Call the appropriate ESM client with the selected model of choice and replace <your API token> with your token name.
client = ESMCForgeInferenceClient(model="esmc-6b-2024-12", url="https://biohub.ai", token="<your API token>")Embed multiple sequences at once
You may want to embed multiple sequences with ESMC. In these cases, we recommend using a threaded async call to the API to take care of batching and parallelization on the backend. The code below provides useful functions for batch embedding and configuring embedding outputs.
from concurrent.futures import ThreadPoolExecutor
from typing import Sequence
from esm.sdk.api import (
ESMCInferenceClient,
ESMProtein,
ESMProteinError,
LogitsConfig,
LogitsOutput,
ProteinType,
)
#sequence returns per-position AA logits
#return_embeddings returns last layer token embeddings
#return_hidden_states returns all intermediate layer outputs
#Configure what fields you want to output from your sequence with LogitsConfig
EMBEDDING_CONFIG = LogitsConfig(
sequence=True, return_embeddings=True, return_hidden_states=True
)
#LogitOutputs contains outputs requested by LogitsConfig
def embed_sequence(model: ESMCInferenceClient, sequence: str) -> LogitsOutput:
protein = ESMProtein(sequence=sequence)
protein_tensor = model.encode(protein)
output = model.logits(protein_tensor, EMBEDDING_CONFIG)
return output
def batch_embed(
model: ESMCInferenceClient, inputs: Sequence[ProteinType]
) -> Sequence[LogitsOutput]:
"""We supports auto-batching. So batch_embed() is as simple as running a collection
of embed calls in parallel using asyncio.
"""
with ThreadPoolExecutor() as executor:
futures = [
executor.submit(embed_sequence, model, protein) for protein in inputs
]
results = []
for future in futures:
try:
results.append(future.result())
except Exception as e:
results.append(ESMProteinError(500, str(e)))
return resultsRun your inference
Now you are ready to use your model. For examples of specific use cases or to append an SAE onto your ESMC model, check out our Tutorials.