Foundry Agent Service quotas and limits

Note

This document refers to the Microsoft Foundry (classic) portal.

🔄 Switch to the Microsoft Foundry (new) documentation if you're using the new portal.

Note

This document refers to the Microsoft Foundry (new) portal.

This article describes the quotas and limits for Foundry Agent Service. Understanding these limits helps you design agents that scale reliably and avoid runtime errors in production.

How quotas and limits apply

Foundry Agent Service enforces limits in two places:

Agent Service limits. Limits for agent and thread artifacts, such as file uploads, vector store attachments, message counts, and tool registration.
Model limits. Quotas and rate limits for the model deployments your agents call.

If you're using threads and messages, see Threads, runs, and messages in Foundry Agent Service. If you're using file search, see Vector stores for file search.

Default quotas and limits for the service

The following table lists default limits enforced by the Agent Service. These limits apply to all Foundry projects regardless of subscription type or region.

Limit name	Limit value
Maximum number of files per agent/thread	10,000
Maximum file size for agents	512 MB
Maximum size for all uploaded files for agents	300 GB
Maximum file size in tokens for attaching to a vector store	2,000,000 tokens
Maximum number of messages per thread	100,000
Maximum size of `text` content per message	1,500,000 characters
Maximum number of tools registered per agent	128

Agent Service doesn't impose separate rate limits on API calls. Rate limiting is applied at the model deployment level. See Azure OpenAI quotas and limits for model-specific rate limits.

Handle limit errors

When you exceed a limit, the Agent Service returns an error. Handle these errors gracefully in your application.

Error scenario	HTTP status	Error code	Recommended action
File too large	400	`file_size_exceeded`	Split content into smaller files
Vector store token limit	400	`token_limit_exceeded`	Reduce file content or split files
Thread message cap	400	`message_limit_exceeded`	Create a new thread
Message content too large	400	`content_size_exceeded`	Use file search for large content
Too many tools	400	`tool_limit_exceeded`	Remove unused tools
Rate limit exceeded	429	`rate_limit_exceeded`	Implement exponential backoff

For example:

File exceeds the maximum size: Uploading the file fails. Split the content into smaller files or reduce file size before you upload.
Vector store token limit: Attaching a file to a vector store fails if the file exceeds the token limit. Reduce the file content or split it into multiple files.
Thread message cap: Adding messages can fail after a thread reaches the message limit. Create a new thread for a new conversation session, or archive and rotate threads as part of your application design.
Message content size: Creating a message can fail if the text content is too large. Send smaller messages, or move large content into files and use file search.
Tool registration cap: Creating or updating an agent can fail if you register too many tools. Register only the tools you need, and prefer fewer, reusable tools.

For file search scenarios, see Vector stores for file search for guidance on managing vector store growth.

Best practices to stay within limits

Use the following practices to reduce limit-related failures:

Keep files small and focused. Prefer multiple smaller documents over a single large document.
Avoid very large messages. Put long content in uploaded files and query it by using file search.
Plan for long conversations. Treat threads as session state and rotate to new threads when conversations become very long.
Register only required tools. Remove unused tools from agent definitions.
Monitor usage trends. Track agent activity using Foundry Agent Service metrics to identify growth before you hit limits.

Quotas and limits for models

Agents follow the quotas and rate limits for the model deployments they use.

For current model quotas and limits, see:

To view or request more model quota, see Manage and increase quotas for resources with Microsoft Foundry (Foundry projects).

Request a limit increase

The limits in this article are default values for Foundry Agent Service. If your workload requires higher limits:

Model quotas: You can request increases for model deployment quotas. See Manage and increase quotas for resources with Microsoft Foundry.
Agent Service limits: The file, message, and tool limits listed in this article are fixed service limits and can't be increased. Design your application to work within these constraints using the best practices described earlier.

Feedback

Was this page helpful?

Last updated on 2026-02-03