Flash is currently in beta. Join our Discord to provide feedback and get support.
- Standalone scripts: Add the
@remotedecorator to Python functions, and they’ll run automatically on Runpod’s cloud infrastructure when you run the script locally. - API endpoints: Convert those functions into persistent endpoints that can be accessed via HTTP, scaling GPU/CPU resources automatically based on demand.
Get started
Follow the quickstart to create your first Flash function in minutes.
Examples
Check out our repository of prebuilt Flash applications.
Create remote functions
Learn about resource configuration, dependencies, and parallel execution.
Create an API endpoint
Build HTTP APIs with FastAPI and Flash.
Why use Flash?
Flash is the easiest and fastest way to test and deploy AI/ML workloads on Runpod. It’s designed for local development and live-testing workflows, but can also be used to deploy production-ready applications. When you run a@remote function, Flash:
- Automatically provisions resources on Runpod’s infrastructure.
- Installs your dependencies automatically.
- Runs your function on a remote GPU/CPU.
- Returns the result to your local environment.
Install Flash
Install Flash withpip:
.env file and add your Runpod API key, replacing YOUR_API_KEY with your actual API key:
Concepts
Remote functions
The@remote decorator marks functions for execution on Runpod’s infrastructure. Code inside the decorated function runs remotely on a Serverless worker, while code outside the function runs locally on your machine.
Resource configuration
Flash provides fine-grained control over hardware allocation through configuration objects. You can configure GPU types, worker counts, idle timeouts, environment variables, and more.Dependency management
Specify Python packages in the decorator, and Flash installs them automatically on the remote worker:Parallel execution
Run multiple remote functions concurrently using Python’s async capabilities:How it works
Flash orchestrates workflow execution through a multi-step process:- Function identification: The
@remotedecorator marks functions for remote execution, enabling Flash to distinguish between local and remote operations. - Dependency analysis: Flash automatically analyzes function dependencies to construct an optimal execution order.
- Resource provisioning and execution: For each remote function, Flash:
- Dynamically provisions endpoint and worker resources on Runpod’s infrastructure.
- Serializes and securely transfers input data to the remote worker.
- Executes the function on the remote infrastructure with the specified GPU or CPU resources.
- Returns results to your local environment.
- Data orchestration: Results flow seamlessly between functions according to your local Python code structure.
Use cases
Flash is well-suited for a range of AI and data processing workloads:- Multi-modal AI pipelines: Orchestrate unified workflows combining text, image, and audio models with GPU acceleration.
- Distributed model training: Scale training operations across multiple GPU workers for faster model development.
- AI research experimentation: Rapidly prototype and test complex model combinations without infrastructure overhead.
- Production inference systems: Deploy multi-stage inference pipelines for real-world applications.
- Data processing workflows: Process large datasets using CPU workers for general computation and GPU workers for accelerated tasks.
- Hybrid GPU/CPU workflows: Optimize cost and performance by combining CPU preprocessing with GPU inference.
Development workflow
A typical Flash development workflow looks like this:- Write Python functions with the
@remotedecorator. - Specify resource requirements and dependencies in the decorator.
- Run your script locally. Flash handles remote execution automatically.
flash init to create a project, then flash run to start your server. For a full walkthrough, see Create a Flash API endpoint.
Limitations
- Serverless deployments using Flash are currently restricted to the
EU-RO-1datacenter. - Flash is designed primarily for local development and live-testing workflows.
- Endpoints created by Flash persist until manually deleted through the Runpod console. A
flash undeploycommand is currently in development to clean up unused endpoints. - Be aware of your account’s maximum worker capacity limits. Flash can rapidly scale workers across multiple endpoints, and you may hit capacity constraints. Contact Runpod support to increase your account’s capacity allocation if needed.
Next steps
Quickstart
Get started with your first Flash function.
Configuration reference
Complete reference for resource configuration options.
Getting help
- Join the Runpod community on Discord for support and discussion.
Next steps
- View the resource configuration reference for all available options.
- Learn about pricing to optimize costs.
- Deploy Flash applications for production.