Get Started with ESM Models

Introduction

The Quickstart shows how to set up the ESM models by installing the esm python package. Once the esm package is set up, you can choose from ESMC, ESMFold2, the ESMC sparse autoencoders (SAEs) or ESM3 to begin fine tuning or working with the model.

Quickstart guide

Install the `esm` Python package

Python

pip install esm@git+https://github.com/Biohub/esm.git@main

Import the necessary libraries

Python

from esm.sdk.forge import ESMCForgeInferenceClient
from esm.sdk import client
from esm.sdk.api import ESMProtein, ESMProteinError, LogitsConfig, LogitsOutput

Create an API key

Generate an API key and add it to your Biohub account. This API key manages your access to credits and tokens, and the term API key/token is often used interchangeably within documentation.

Connect to the Biohub API

Call the appropriate ESM client with the selected model of choice and replace <your API token> with your token name.

Python

client = ESMCForgeInferenceClient(model="esmc-6b-2024-12", url="https://biohub.ai", token="<your API token>")

Embed multiple sequences at once

You may want to embed multiple sequences with ESMC. In these cases, we recommend using a threaded async call to the API to take care of batching and parallelization on the backend. The code below provides useful functions for batch embedding and configuring embedding outputs.

Python

from concurrent.futures import ThreadPoolExecutor
from typing import Sequence

from esm.sdk.api import (
    ESMCInferenceClient,
    ESMProtein,
    ESMProteinError,
    LogitsConfig,
    LogitsOutput,
    ProteinType,
)

#sequence returns per-position AA logits
#return_embeddings returns last layer token embeddings
#return_hidden_states returns all intermediate layer outputs

#Configure what fields you want to output from your sequence with LogitsConfig

EMBEDDING_CONFIG = LogitsConfig(
    sequence=True, return_embeddings=True, return_hidden_states=True
)

#LogitOutputs contains outputs requested by LogitsConfig

def embed_sequence(model: ESMCInferenceClient, sequence: str) -> LogitsOutput:
    protein = ESMProtein(sequence=sequence)
    protein_tensor = model.encode(protein)
    output = model.logits(protein_tensor, EMBEDDING_CONFIG)
    return output

def batch_embed(
    model: ESMCInferenceClient, inputs: Sequence[ProteinType]
) -> Sequence[LogitsOutput]:
    """We supports auto-batching. So batch_embed() is as simple as running a collection
    of embed calls in parallel using asyncio.
    """
    with ThreadPoolExecutor() as executor:
        futures = [
            executor.submit(embed_sequence, model, protein) for protein in inputs
        ]
        results = []
        for future in futures:
            try:
                results.append(future.result())
            except Exception as e:
                results.append(ESMProteinError(500, str(e)))
    return results

Run your inference

Now you are ready to use your model. For examples of specific use cases or to append an SAE onto your ESMC model, check out our Tutorials.

Get Started with ESM Models

Introduction

Quickstart guide

Install the esm Python package

Import the necessary libraries

Create an API key

Connect to the Biohub API

Embed multiple sequences at once

Run your inference

Install the `esm` Python package