What you’ll learn
In this tutorial you’ll learn how to:- Set up your development environment for Flash.
- Configure a Serverless endpoint using a
LiveServerlessobject. - Create and define remote functions with the
@remotedecorator. - Deploy a GPU-based workload using Runpod resources.
- Pass data between your local environment and remote workers.
- Run multiple operations in parallel.
Requirements
- You’ve created a Runpod account.
- You’ve created a Runpod API key.
- You’ve installed Python 3.9 (or higher).
Step 1: Install Flash
Usepip to install Flash:
Step 2: Add your API key to the environment
Add your Runpod API key to your development environment before using Flash to run workloads. Run this command to create a.env file, replacing YOUR_API_KEY with your Runpod API key:
You can create this in your project’s root directory or in the
/examples folder. Make sure your .env file is in the same folder as the Python file you create in the next step.Step 3: Create your project file
Create a new file calledmatrix_operations.py in the same directory as your .env file:
Step 4: Add imports and load the .env file
Add the necessary import statements:asyncio: Python’s asynchronous programming library, which Flash uses for non-blocking execution.dotenv: Loads environment variables from your.envfile, including your Runpod API key.remoteandLiveServerless: The core Flash components for defining remote functions and their resource requirements.
load_dotenv() reads your API key from the .env file and makes it available to Flash.
Step 5: Add Serverless endpoint configuration
Define the Serverless endpoint configuration for your Flash workload:LiveServerless object defines:
gpus=[GpuGroup.AMPERE_24, GpuGroup.ADA_24]: The GPUs that can be used by workers on this endpoint. This restricts workers to using any 24 GB GPU (L4, A5000, 3090, or 4090). See GPU pools for available GPU pool IDs. Removing this parameter allows the endpoint to use any available GPUs.workersMax=3: The maximum number of worker instances.name="tetra_gpu": The name of the endpoint that will be created/used in the Runpod console.
LiveServerless configuration to a prior run, Runpod reuses your existing endpoint rather than creating a new one. However, if any configuration values have changed (not just the name parameter), a new endpoint will be created.
Step 6: Define your remote function
Define the function that will run on the GPU worker:@remote: The decorator that marks the function for remote execution on Runpod’s infrastructure.resource_config=gpu_config: The function runs using the GPU configuration defined earlier.dependencies=["numpy", "torch"]: Python packages that must be installed on the remote worker.
tetra_matrix_operations function:
- Gets GPU details using PyTorch’s CUDA utilities.
- Creates two large random matrices using NumPy.
- Performs matrix multiplication.
- Returns statistics about the result and information about the GPU.
numpy and torch are imported inside the function, not at the top of the file. These imports need to happen on the remote worker, not in your local environment.
Step 7: Add the main function
Add amain function to execute your GPU workload:
main function:
- Calls the remote function with
await, which runs it asynchronously on Runpod’s infrastructure. - Prints the results of the matrix operations.
- Displays information about the GPU that was used.
asyncio.run(main()) is Python’s standard way to execute an asynchronous main function from synchronous code.
All code outside of the @remote decorated function runs on your local machine. The main function acts as a bridge between your local environment and Runpod’s cloud infrastructure, allowing you to send input data to remote functions, wait for remote execution to complete without blocking your local process, and process returned results locally.
The await keyword pauses execution of the main function until the remote operation completes, but doesn’t block the entire Python process.
Step 8: Run your GPU example
Run the example:Troubleshooting authentication issues
Troubleshooting authentication issues
If you’re having trouble running your code due to authentication issues:
- Verify your
.envfile is in the same directory as yourmatrix_operations.pyfile. - Check that the API key in your
.envfile is correct and properly formatted.
- macOS/Linux
- Windows
Step 9: Understand what’s happening
When you run this script:- Flash reads your GPU resource configuration and provisions a worker on Runpod.
- It installs the required dependencies (NumPy and PyTorch) on the worker.
- Your
tetra_matrix_operationsfunction runs on the remote worker. - The function creates and multiplies large matrices, then calculates statistics.
- Your local
mainfunction receives these results and displays them in your terminal.
Step 10: Run multiple operations in parallel
Flash makes it easy to run multiple remote operations in parallel. Replace yourmain function with this code:
main function demonstrates Flash’s ability to run multiple operations in parallel using asyncio.gather(). Instead of running one matrix operation at a time, you’re launching three operations with different matrix sizes (500, 1000, and 2000) simultaneously. This parallel execution significantly improves efficiency when you have multiple independent tasks.
Run the example again:
Next steps
You’ve successfully used Flash to run a GPU workload on Runpod. Now you can:- Create more complex remote functions with custom dependencies and resource configurations.
- Build API endpoints using FastAPI.
- Deploy Flash applications for production use.
- Explore more examples on the runpod/flash-examples GitHub repository.