Build and deploy Flash apps

Flash uses a build process to package your application for deployment. This page covers how the build process works, including handler generation, cross-platform builds, and troubleshooting common issues.

Build process and handler generation

When you run flash build, the following happens:

Discovery: Flash scans your code for @remote decorated functions.
Grouping: Functions are grouped by their resource_config.
Handler generation: For each resource config, Flash generates a lightweight handler file.
Manifest creation: A flash_manifest.json file maps functions to their endpoints.
Dependency installation: Python packages are installed with Linux x86_64 compatibility.
Packaging: Everything is bundled into archive.tar.gz for deployment.

Handler architecture

Flash uses a factory pattern for handlers to eliminate code duplication:

# Generated handler (handler_gpu_config.py)
from tetra_rp.runtime.generic_handler import create_handler
from workers.gpu import process_data

FUNCTION_REGISTRY = {
    "process_data": process_data,
}

handler = create_handler(FUNCTION_REGISTRY)

This approach provides:

Single source of truth: All handler logic in one place.
Easier maintenance: Bug fixes don’t require rebuilding projects.

Cross-platform builds

Flash automatically handles cross-platform builds, ensuring your deployments work correctly regardless of your development platform:

Automatic platform targeting: Dependencies are installed for Linux x86_64 (required for Runpod Serverless), even when building on macOS or Windows.
Python version matching: The build uses your current Python version to ensure package compatibility.
Binary wheel enforcement: Only pre-built binary wheels are used, preventing platform-specific compilation issues.

This means you can build on macOS ARM64, Windows, or any other platform, and the resulting package will run correctly on Runpod Serverless.

Cross-endpoint function calls

Flash enables functions on different endpoints to call each other:

# CPU endpoint function
@remote(resource_config=cpu_config)
def preprocess(data):
    # Preprocessing logic
    return clean_data

# GPU endpoint function
@remote(resource_config=gpu_config)
async def inference(data):
    # Can call CPU endpoint function
    clean = await preprocess(data)
    # Run inference on clean data
    return result

The runtime automatically discovers endpoints and routes calls appropriately using the flash_manifest.json file generated during the build process. This lets you build pipelines that use CPU workers for preprocessing and GPU workers for inference, optimizing costs by using the appropriate hardware for each task.

Build artifacts

After running flash build, you’ll find these artifacts in the .flash/ directory:

Artifact	Description
`.flash/.build/`	Temporary build directory (removed unless `--keep-build`)
`.flash/archive.tar.gz`	Deployment package
`.flash/flash_manifest.json`	Service discovery configuration

Managing bundle size

Runpod Serverless has a 500MB deployment limit. Exceeding this limit will cause your build to fail. Use --exclude to skip packages that are already included in your base worker image:

# For GPU deployments (PyTorch pre-installed)
flash build --exclude torch,torchvision,torchaudio

Which packages to exclude depends on your resource config:

GPU resources use PyTorch as the base image, which has torch, torchvision, and torchaudio pre-installed.
CPU resources use Python slim images, which have no ML frameworks pre-installed.
Load-balancer resources use the same base image as their GPU/CPU counterparts.

You can find details about the Flash worker image in the runpod-workers/worker-tetra repository. Find the Dockerfile for your endpoint type: Dockerfile (for GPU workers), Dockerfile-cpu (for CPU workers), or Dockerfile-lb (for load balancing workers).

Troubleshooting

No @remote functions found

If the build process can’t find your remote functions:

Ensure your functions are decorated with @remote(resource_config).
Check that Python files are not excluded by .gitignore or .flashignore.
Verify function decorators have valid syntax.

Handler generation failed

If handler generation fails:

Check for syntax errors in your Python files (they should be logged in the terminal).
Verify all imports in your worker modules are available.
Ensure resource config variables (e.g., gpu_config) are defined before a function references them.
Use --keep-build to inspect generated handler files in .flash/.build/.

Build succeeded but deployment failed

If the build succeeds but deployment fails:

Verify all function imports work in the deployment environment.
Check that environment variables required by your functions are available.
Review the generated flash_manifest.json for correct function mappings.

Dependency installation failed

If dependency installation fails during the build:

If a package doesn’t have pre-built Linux `x86_64“ wheels, the build will fail with an error.
For newer Python versions (3.13+), some packages may require `manylinux_2_27“ or higher.
Ensure you have standard pip installed (python -m ensurepip --upgrade) for best compatibility.
Check PyPI to verify the package supports your Python version on Linux.

Authentication errors

If you’re seeing authentication errors: Verify your API key is set correctly:

echo $RUNPOD_API_KEY  # Should show your key

Import errors in remote functions

Remember to import packages inside remote functions:

@remote(dependencies=["requests"])
def fetch_data(url):
    import requests  # Import here, not at top of file
    return requests.get(url).json()

Performance optimization

To optimize performance:

Set workersMin=1 to keep workers warm and avoid cold starts.
Use idleTimeout to balance cost and responsiveness.
Choose appropriate GPU types for your workload.
Use --auto-provision with flash run to eliminate cold-start delays during development.

Next steps

View the resource configuration reference for all available options.
Monitor and debug your deployments.
Learn about pricing to optimize costs.

Get started

Serverless

Flash

Pods

Storage

Public Endpoints

Instant Clusters

Fine-tuning

Hub

Reference

Build process and handler generation

Handler architecture

Cross-platform builds

Cross-endpoint function calls

Build artifacts

Managing bundle size

Troubleshooting

No @remote functions found

Handler generation failed

Build succeeded but deployment failed

Dependency installation failed

Authentication errors

Import errors in remote functions

Performance optimization

Next steps

Get started

Serverless

Flash

Pods

Storage

Public Endpoints

Instant Clusters

Fine-tuning

Hub

Reference

​Build process and handler generation

​Handler architecture

​Cross-platform builds

​Cross-endpoint function calls

​Build artifacts

​Managing bundle size

​Troubleshooting

​No @remote functions found

​Handler generation failed

​Build succeeded but deployment failed

​Dependency installation failed

​Authentication errors

​Import errors in remote functions

​Performance optimization

​Next steps

Build process and handler generation

Handler architecture

Cross-platform builds

Cross-endpoint function calls

Build artifacts

Managing bundle size

Troubleshooting

No @remote functions found

Handler generation failed

Build succeeded but deployment failed

Dependency installation failed

Authentication errors

Import errors in remote functions

Performance optimization

Next steps