How pricing works
You’re billed from when a worker starts until it completes your request, plus any idle time before scaling down. If a worker is already warm, you skip the cold start and only pay for execution time.Compute cost breakdown
Flash workers incur charges during these periods:- Start time: The time required to initialize a worker and load models into GPU memory. This includes starting the container, installing dependencies, and preparing the runtime environment.
- Execution time: The time spent processing your request (running your
@remotedecorated function). - Idle time: The period a worker remains active after completing a request, waiting for additional requests before scaling down.
Pricing by resource type
Flash supports both GPU and CPU workers. Pricing varies based on the hardware type:- GPU workers: Use
LiveServerlessorServerlessEndpointwith GPU configurations. Pricing depends on the GPU type (e.g., RTX 4090, A100 80GB). - CPU workers: Use
LiveServerlessorCpuServerlessEndpointwith CPU configurations. Pricing depends on the CPU instance type.
How to estimate and optimize costs
To estimate costs for your Flash workloads, consider:- How long each function takes to execute.
- How many concurrent workers you need (
workersMaxsetting). - Which GPU or CPU types you’ll use.
- Your idle timeout configuration (
idleTimeoutsetting).
Cost optimization strategies
Choose appropriate hardware
Select the smallest GPU or CPU that meets your performance requirements. For example, if your workload fits in 24GB of VRAM, useGpuGroup.ADA_24 or GpuGroup.AMPERE_24 instead of larger GPUs.
Configure idle timeouts
Balance responsiveness and cost by adjusting theidleTimeout parameter. Shorter timeouts reduce idle costs but increase cold starts for sporadic traffic.
Use CPU workers for non-GPU tasks
For data preprocessing, postprocessing, or other tasks that don’t require GPU acceleration, use CPU workers instead of GPU workers.Limit maximum workers
SetworkersMax to prevent runaway scaling and unexpected costs:
Monitoring costs
Monitor your usage in the Runpod console to track:- Total compute time across endpoints.
- Worker utilization and idle time.
- Cost breakdown by endpoint.
Next steps
- Create remote functions with optimized resource configurations.
- View Serverless pricing details for current rates.
- Configure resources for your workloads.