How to Deploy ML Solutions with FastAPI, Docker, and GCP | by Shaw Talebi | Jun, 2024

[ad_1]
Step 1: Create API
FastAPI makes it super easy to take an existing Python script and turn it into an API with a few additional lines of code. Here’s what that looks like.
We’ll first import some handy libraries.
from fastapi import FastAPI
import polars as pl
from sentence_transformers import SentenceTransformer
from sklearn.metrics import DistanceMetric
import numpy as np
from app.functions import returnSearchResultIndexes
Next, we’ll define the building blocks of our semantic search function. Namely, the text embedding model, the title and transcript embeddings of the videos we want to search over, and a distance metric to evaluate which videos are most relevant to a user query. If you want a deeper dive into semantic search, I discussed it in a past article.
# define model info
model_name = 'all-MiniLM-L6-v2'# load model
model = SentenceTransformer(model_name)
# load video index
df = pl.scan_parquet('app/data/video-index.parquet')
# create distance metric object
dist_name = 'manhattan'
dist = DistanceMetric.get_metric(dist_name)
Now, we define our API operations. Here, we will create 3 GET requests. The first one is shown in the code block below.
# create FastAPI object
app = FastAPI()# API operations
@app.get("/")
def health_check():
return {'health_check': 'OK'}
In the block above, we initialize a new FastAPI application using the FastAPI() class and then create a “health check” endpoint.
To do this, we define a Python function that takes in no inputs but returns a dictionary with the key “health_check” and a value of “OK.” To turn this function into an API endpoint, we simply add a decorator to it and specify the path of our endpoint. Here, we use the root, i.e., “/”.
Let’s see another example. Here, we’ll have an endpoint called info that returns more information about the API.
@app.get("/info")
def info():
return {'name': 'yt-search', 'description': "Search API for Shaw Talebi's YouTube videos."}
We can see that this endpoint is very similar to the health check. This one, however, lives at the “/info” endpoint.
Finally, let’s make the search endpoint, which, given a user query, will return the titles and IDs of the most relevant videos.
@app.get("/search")
def search(query: str):
idx_result = returnSearchResultIndexes(query, df, model, dist)
return df.select(['title', 'video_id']).collect()[idx_result].to_dict(as_series=False)
For this endpoint, we require an input: the user’s query. This then gets passed to another Python function defined in another script that does all the math behind the search. While I won’t get into the details here, the curious reader can see the code on GitHub or the code walkthrough of the search function on YouTube.
Since this function only returns the row numbers of the search results in the df dataframe, we need to use this output to grab the titles and video IDs we care about and then return them as a Python dictionary. It’s important that all the outputs of our API endpoints be dictionaries because they adhere to the standard JSON format for APIs.
Notice that the code blocks above referenced two external files: app/functions.py and app/data/video-index.parquet, which implies the following directory structure.
To run this API locally, we can navigate to the root directory and run the following command.
uvicorn app.main:app --host 0.0.0.0 --port 8080
uvicorn is a Python library that lets us run web applications like the one we created using FastAPI. This command will run the API locally at http://0.0.0.0:8080. The reason we used this host and port will be apparent later when we deploy on Google Cloud Run.
Step 2: Create Docker Image
Yay, our API runs locally! Let’s take the next step toward running it in the cloud.
We do this by creating a Docker image for our API. This will require us to make 3 additional files: Dockerfile, requirements.txt, and app/__init__.py. Here’s what our directory should look like.
The Dockerfile contains the step by step instructions for running our Docker image. requirements.txt specifies the Python libraries (with versions) needed to run our API. Finally, app/__init__.py marks the app folder as a Python package, ensuring that Python can find and properly import our API code when running in the container.
Here’s what the inside of the Dockerfile looks like.
# start from python base image
FROM python:3.10-slim# change working directory
WORKDIR /code
# add requirements file to image
COPY ./requirements.txt /code/requirements.txt
# install python libraries
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
# add python code
COPY ./app/ /code/app/
# specify default commands
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]
The first line bootstraps our image on top of an existing image that has Python 3.10 installed. Next, we change our working directory from the root to /code.
We then copy the requirements.txt file from our codebase to the Docker image. With this in place, we can install our requirements using pip.
Next, we copy over all the code for our API.
Note: we copied and installed the Python packages before everything else because this allows us to cache the installation of the requirements. This is helpful because it avoids waiting a few minutes for the dependencies to install when running our Docker image rapidly during development.
Finally, we specify the default command which is the same thing we ran locally to run our API.
Step 3: Deploy to Google Cloud
At this point, we can build our Docker image and push it to the Docker Hub, which we can readily deploy to many different cloud services. However, I’ll follow an alternative strategy here.
Instead, I will push all our code to GitHub, from which we can directly deploy the container to Google Cloud Run. This has two key benefits.
First, it avoids resolving system architecture discrepancies between your local system and what Google Cloud Run uses (this was specifically an issue for me because Mac runs on ARM64). Second, deploying from a GitHub repo allows for continuous deployment, so if we want to update our API, we can simply push new code to our repo, and a new container will be spun up automatically.
We start by creating a new GitHub repo.
Then, we clone the repo, add our code, and push it back to GitHub. Here’s what the directory structure looks like after adding the code.
With the code in place, we can create a new Google Cloud Platform project. We do this by going to the GCP console, clicking on our project lists, and selecting “NEW PROJECT”.
Once our project has been created, we can open it and type “cloud run” into the search bar.
When that opens, we will click “CREATE SERVICE”. This will open up a page for us to configure our service.
First, we select the option to deploy our service from GitHub. Then click “SET UP WITH CLOUD BUILD”.
For the repository source, we will select GitHub and choose the repo we just made.
We will leave the branch as ^main$ and select the “Build Type” as Dockerfile.
Next, we return to the service configuration screen. We can name the service whatever we like (I leave it as the automatically generated one). For the region, I leave it as us-central1 because it’s a Tier 1, which offers the cheapest computing options (basically free for this example).
To keep things simple, I’ll “Allow unauthenticated invocations”. Of course, you will want to require authentication for most systems. Then, leave everything else as the default.
Finally, under “Container(s), Volumes, Networking, Security”, edit the container to have 1 GiB of memory. We must leave the PORT as 8080 because that’s what we configured in our Dockerfile.
We can leave everything else as default and hit “CREATE” at the bottom of the screen. After a few minutes, the container will be live!
We can then access the API using the URL specified near the top of the page. Clicking the link will open up the root endpoint, which was our health check.
We can manually run a GET request to the search API using the following URL: [your-app-url-here]/search?query=LLMs. This will search for videos relevant to LLMs.
Bonus: integrating into UI
Once our API backend is set up, we can connect it to a user-friendly interface. I do that via Hugging Face Spaces, which hosts ML apps completely for free.
Here’s what that same search looks like through a UI. You can play around with the UI here and see the code here.
Data science is more than just training fancy models. It’s about solving problems and generating value. Often, this requires us to deploy models into settings where they can make the most impact. Here, we walked through a simple 3-step strategy for deploying ML models using FastAPI, Docker, and GCP.
While this concludes the Full Stack Data Science series, these articles are accompanied by a YouTube playlist with a bonus video on the experimentation involved in creating this search tool.
More on Full Stack Data Science 👇

Full Stack Data Science
[ad_2]