Build process and handler generation
When you runflash build, the following happens:
- Discovery: Flash scans your code for
@remotedecorated functions. - Grouping: Functions are grouped by their
resource_config. - Handler generation: For each resource config, Flash generates a lightweight handler file.
- Manifest creation: A
flash_manifest.jsonfile maps functions to their endpoints. - Dependency installation: Python packages are installed with Linux
x86_64compatibility. - Packaging: Everything is bundled into
archive.tar.gzfor deployment.
Handler architecture
Flash uses a factory pattern for handlers to eliminate code duplication:- Single source of truth: All handler logic in one place.
- Easier maintenance: Bug fixes don’t require rebuilding projects.
Cross-platform builds
Flash automatically handles cross-platform builds, ensuring your deployments work correctly regardless of your development platform:- Automatic platform targeting: Dependencies are installed for Linux
x86_64(required for Runpod Serverless), even when building on macOS or Windows. - Python version matching: The build uses your current Python version to ensure package compatibility.
- Binary wheel enforcement: Only pre-built binary wheels are used, preventing platform-specific compilation issues.
Cross-endpoint function calls
Flash enables functions on different endpoints to call each other:flash_manifest.json file generated during the build process. This lets you build pipelines that use CPU workers for preprocessing and GPU workers for inference, optimizing costs by using the appropriate hardware for each task.
Build artifacts
After runningflash build, you’ll find these artifacts in the .flash/ directory:
| Artifact | Description |
|---|---|
.flash/.build/ | Temporary build directory (removed unless --keep-build) |
.flash/archive.tar.gz | Deployment package |
.flash/flash_manifest.json | Service discovery configuration |
Managing bundle size
Runpod Serverless has a 500MB deployment limit. Exceeding this limit will cause your build to fail. Use--exclude to skip packages that are already included in your base worker image:
- GPU resources use PyTorch as the base image, which has
torch,torchvision, andtorchaudiopre-installed. - CPU resources use Python slim images, which have no ML frameworks pre-installed.
- Load-balancer resources use the same base image as their GPU/CPU counterparts.
Troubleshooting
No @remote functions found
If the build process can’t find your remote functions:- Ensure your functions are decorated with
@remote(resource_config). - Check that Python files are not excluded by
.gitignoreor.flashignore. - Verify function decorators have valid syntax.
Handler generation failed
If handler generation fails:- Check for syntax errors in your Python files (they should be logged in the terminal).
- Verify all imports in your worker modules are available.
- Ensure resource config variables (e.g.,
gpu_config) are defined before a function references them. - Use
--keep-buildto inspect generated handler files in.flash/.build/.
Build succeeded but deployment failed
If the build succeeds but deployment fails:- Verify all function imports work in the deployment environment.
- Check that environment variables required by your functions are available.
- Review the generated
flash_manifest.jsonfor correct function mappings.
Dependency installation failed
If dependency installation fails during the build:- If a package doesn’t have pre-built Linux `x86_64“ wheels, the build will fail with an error.
- For newer Python versions (3.13+), some packages may require `manylinux_2_27“ or higher.
- Ensure you have standard pip installed (
python -m ensurepip --upgrade) for best compatibility. - Check PyPI to verify the package supports your Python version on Linux.
Authentication errors
If you’re seeing authentication errors: Verify your API key is set correctly:Import errors in remote functions
Remember to import packages inside remote functions:Performance optimization
To optimize performance:- Set
workersMin=1to keep workers warm and avoid cold starts. - Use
idleTimeoutto balance cost and responsiveness. - Choose appropriate GPU types for your workload.
- Use
--auto-provisionwithflash runto eliminate cold-start delays during development.
Next steps
- View the resource configuration reference for all available options.
- Monitor and debug your deployments.
- Learn about pricing to optimize costs.