Heroku Managed Inference and Agents

Reliable and Powerful Inference as a Service

Managed Inference

Managed Inference and Agents simplifies AI integration by providing access to powerful foundation models, including text, embedding, and diffusion models. Easily attach model resources to your Heroku app, and the add-on will automatically configure environment variables, enabling seamless API calls. Invoke models using the CLI plug-in or with API endpoints.

Agents

Extend Agents with tools that allow Large Language Models (LLMs) to execute actions within Heroku’s trusted environment. Deploy autonomous agents that can call APIs, run code, or interact with your app through tools like code_exec, http, or custom ones. Move from prototyping to production with optimized inference latency and minimal infrastructure management.

Model Context protocol

The Model Context Protocol (MCP) is an open standard that helps you extend Agents by connecting large language models to tools, services, and data sources. You can bring your own custom tools by deploying them as a heroku app and registering them by attaching the addon. Access all you mcp servers through a single toolkit.

Use Cases

Text Generation & Chat Use models like Claude-Sonnet to generate text, write code, or chat intelligently.

Retrieval-Augmented Generation (RAG) Bring your own data to power LLMs with up-to-date, domain-specific knowledge.

Personalize User Experiences: Leverage agents to deliver tailored content, recommendations, or support.

Data Analysis and Business Intelligence: Deploy agents that can analyze large datasets, identify trends, generate reports, and provide actionable insights.

Automate Complex Workflows and Tasks: Use agents to automate multi-step processes within your business.

Metered Billing

For those customers paying by credit card, Heroku Managed Inference and Agents uses metered billing, as set forth in the Plans & Pricing tables below

For enterprise customers, your usage of Heroku Managed Inference and Agents will consume your General Add-on Credits and/or Data Add-on Credits as set forth in the Plans & Pricing tables below.

Plans & Pricing

  • Common Runtime
  • Private Spaces
  • Claude-4-sonnet (us)

    Provisioned in: us

    A state-of-the-art large language model that supports chat and tool-calling.


    • Type Text → Text
    • API endpoint v1/chat/completions
    • v1/agents/heroku
    • Model Source Anthropic

    Metered usage amounts

    • Input Token $3 per million tokens
    • Output Token $15 per million tokens
    • Per image N/A

    Availability

    Region Available
    Dublin Available
    Frankfurt Available
    London Available
    Montreal Available
    Mumbai Available
    Oregon Available
    Singapore Available
    Sydney Available
    Tokyo Available
    Virginia Available

    This model only runs in the us region. Apps with a eu region by default cannot provision this model. To override this, apply the --region=us flag:

    This model only runs in the us region. Only private space apps in the oregon, virginia, or montreal regions can provision this model by default. To override this for apps in other private space regions, apply the --region=us flag:

    Install Heroku Managed Inference and Agents
    heroku addons:create heroku-inference:claude-4-sonnet -a $APP_NAME -- --region=us

    To provision, copy the snippet into your CLI or use the install button above.

  • Claude-3-7-sonnet (us)

    Provisioned in: us

    A state-of-the-art large language model that supports chat and tool-calling.


    • Type Text → Text
    • API endpoint v1/chat/completions
    • v1/agents/heroku
    • Model Source Anthropic

    Metered usage amounts

    • Input Token $3 per million tokens
    • Output Token $15 per million tokens
    • Per image N/A

    Availability

    Region Available
    Dublin Available
    Frankfurt Available
    London Available
    Montreal Available
    Mumbai Available
    Oregon Available
    Singapore Available
    Sydney Available
    Tokyo Available
    Virginia Available

    This model is hosted in both the us and eu regions. By default, apps with a us region provision the us plan, and apps with the eu region provision the eu plan. To create your model resource, run:

    This model is hosted in both the us and eu regions. oregon, virginia, and montreal private space apps provision the us plan by default. All other private space apps provision the eu plan by default. To create your model resource, run:

    Install Heroku Managed Inference and Agents
    heroku addons:create heroku-inference:claude-3-7-sonnet -a $APP_NAME

    To provision, copy the snippet into your CLI or use the install button above.

  • Claude-3-7-sonnet (eu)

    Provisioned in: eu

    A state-of-the-art large language model that supports chat and tool-calling.


    • Type Text → Text
    • API endpoint v1/chat/completions
    • v1/agents/heroku
    • Model Source Anthropic

    Metered usage amounts

    • Input Token $3 per million tokens
    • Output Token $15 per million tokens
    • Per image N/A

    Availability

    Region Available
    Dublin Available
    Frankfurt Available
    London Available
    Montreal Available
    Mumbai Available
    Oregon Available
    Singapore Available
    Sydney Available
    Tokyo Available
    Virginia Available

    This model is hosted in both the us and eu regions. By default, apps with a us region provision the us plan, and apps with the eu region provision the eu plan. To create your model resource, run:

    This model is hosted in both the us and eu regions. oregon, virginia, and montreal private space apps provision the us plan by default. All other private space apps provision the eu plan by default. To create your model resource, run:

    Install Heroku Managed Inference and Agents
    heroku addons:create heroku-inference:claude-3-7-sonnet -a $APP_NAME

    To provision, copy the snippet into your CLI or use the install button above.

  • Claude-3-5-sonnet-latest (us)

    Provisioned in: us

    A state-of-the-art large language model that supports chat and tool-calling.


    • Type Text → Text
    • API endpoint v1/chat/completions
    • v1/agents/heroku
    • Model Source Anthropic

    Metered usage amounts

    • Input Token $3 per million tokens
    • Output Token $15 per million tokens
    • Per image N/A

    Availability

    Region Available
    Dublin Available
    Frankfurt Available
    London Available
    Montreal Available
    Mumbai Available
    Oregon Available
    Singapore Available
    Sydney Available
    Tokyo Available
    Virginia Available

    This model only runs in the us region. Apps with a eu region by default cannot provision this model. To override this, apply the --region=us flag:

    This model only runs in the us region. Only private space apps in the oregon, virginia, or montreal regions can provision this model by default. To override this for apps in other private space regions, apply the --region=us flag:

    Install Heroku Managed Inference and Agents
    heroku addons:create heroku-inference:claude-3-5-sonnet-latest -a $APP_NAME -- --region=us

    To provision, copy the snippet into your CLI or use the install button above.

  • Claude-3-5-haiku (us)

    Provisioned in: us

    A faster, more affordable large language model that supports chat and tool-calling.


    • Type Text → Text
    • API endpoint v1/chat/completions
    • v1/agents/heroku
    • Model Source Anthropic

    Metered usage amounts

    • Input Token $0.8 per million tokens
    • Output Token $4 per million tokens
    • Per image N/A

    Availability

    Region Available
    Dublin Available
    Frankfurt Available
    London Available
    Montreal Available
    Mumbai Available
    Oregon Available
    Singapore Available
    Sydney Available
    Tokyo Available
    Virginia Available

    This model only runs in the us region. Apps with a eu region by default cannot provision this model. To override this, apply the --region=us flag:

    This model only runs in the us region. Only private space apps in the oregon, virginia, or montreal regions can provision this model by default. To override this for apps in other private space regions, apply the --region=us flag:

    Install Heroku Managed Inference and Agents
    heroku addons:create heroku-inference:claude-3-5-haiku -a $APP_NAME -- --region=us

    To provision, copy the snippet into your CLI or use the install button above.

  • Claude-3-haiku (eu)

    Provisioned in: eu

    A faster, more affordable large language model that supports chat and tool-calling.


    • Type Text → Text
    • API endpoint v1/chat/completions
    • v1/agents/heroku
    • Model Source Anthropic

    Metered usage amounts

    • Input Token $0.25 per million tokens
    • Output Token $1.25 per million tokens
    • Per image N/A

    Availability

    Region Available
    Dublin Available
    Frankfurt Available
    London Available
    Montreal Available
    Mumbai Available
    Oregon Available
    Singapore Available
    Sydney Available
    Tokyo Available
    Virginia Available

    This model only runs in the eu region. Apps with a us region by default cannot provision this model. To override this, apply the --region=eu flag:

    This model only runs in the eu region. Private space apps in the oregon, virginia, or montreal regions by default cannot provision this model. To override this, apply the --region=eu flag:

    Install Heroku Managed Inference and Agents
    heroku addons:create heroku-inference:claude-3-haiku -a $APP_NAME -- --region=eu

    To provision, copy the snippet into your CLI or use the install button above.

  • Cohere-embed-multilingual (us)

    Provisioned in: us

    A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing Retrieval Augmented Generation (RAG) search.


    • Type Text → Embedding
    • API endpoint v1/embeddings
    • Model Source Cohere

    Metered usage amounts

    • Input Token $0.10 per million tokens
    • Output Token N/A
    • Per image N/A

    Availability

    Region Available
    Dublin Available
    Frankfurt Available
    London Available
    Montreal Available
    Mumbai Available
    Oregon Available
    Singapore Available
    Sydney Available
    Tokyo Available
    Virginia Available

    This model is hosted in both the us and eu regions. By default, apps with a us region provision the us plan, and apps with the eu region provision the eu plan. To create your model resource, run:

    This model is hosted in both the us and eu regions. oregon, virginia, and montreal private space apps provision the us plan by default. All other private space apps provision the eu plan by default. To create your model resource, run:

    Install Heroku Managed Inference and Agents
    heroku addons:create heroku-inference:cohere-embed-multilingual -a $APP_NAME

    To provision, copy the snippet into your CLI or use the install button above.

  • Cohere-embed-multilingual (eu)

    Provisioned in: eu

    A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing Retrieval Augmented Generation (RAG) search.


    • Type Text → Embedding
    • API endpoint v1/embeddings
    • Model Source Cohere

    Metered usage amounts

    • Input Token $0.10 per million tokens
    • Output Token N/A
    • Per image N/A

    Availability

    Region Available
    Dublin Available
    Frankfurt Available
    London Available
    Montreal Available
    Mumbai Available
    Oregon Available
    Singapore Available
    Sydney Available
    Tokyo Available
    Virginia Available

    This model is hosted in both the us and eu regions. By default, apps with a us region provision the us plan, and apps with the eu region provision the eu plan. To create your model resource, run:

    This model is hosted in both the us and eu regions. oregon, virginia, and montreal private space apps provision the us plan by default. All other private space apps provision the eu plan by default. To create your model resource, run:

    Install Heroku Managed Inference and Agents
    heroku addons:create heroku-inference:cohere-embed-multilingual -a $APP_NAME

    To provision, copy the snippet into your CLI or use the install button above.

  • Stable-image-ultra (us)

    Provisioned in: us

    A state-of-the-art diffusion (image generation) model.


    • Type Text → Image
    • API endpoint v1/images/generations
    • Model Source Stability AI

    Metered usage amounts

    • Input Token N/A
    • Output Token N/A
    • Per image $0.14

    Availability

    Region Available
    Dublin Available
    Frankfurt Available
    London Available
    Montreal Available
    Mumbai Available
    Oregon Available
    Singapore Available
    Sydney Available
    Tokyo Available
    Virginia Available

    This model only runs in the us region. Apps with a eu region by default cannot provision this model. To override this, apply the --region=us flag:

    This model only runs in the us region. Only private space apps in the oregon, virginia, or montreal regions can provision this model by default. To override this for apps in other private space regions, apply the --region=us flag:

    Install Heroku Managed Inference and Agents
    heroku addons:create heroku-inference:stable-image-ultra -a $APP_NAME -- --region=us

    To provision, copy the snippet into your CLI or use the install button above.

Heroku Managed Inference and Agents Documentation