How to Deploy ML Solutions with FastAPI, Docker, and GCP | by Shaw Talebi | Jun, 2024

0

[ad_1]

Step 1: Create API

FastAPI makes it super easy to take an existing Python script and turn it into an API with a few additional lines of code. Here’s what that looks like.

We’ll first import some handy libraries.

from fastapi import FastAPI
import polars as pl
from sentence_transformers import SentenceTransformer
from sklearn.metrics import DistanceMetric
import numpy as np
from app.functions import returnSearchResultIndexes

Next, we’ll define the building blocks of our semantic search function. Namely, the text embedding model, the title and transcript embeddings of the videos we want to search over, and a distance metric to evaluate which videos are most relevant to a user query. If you want a deeper dive into semantic search, I discussed it in a past article.

# define model info
model_name = 'all-MiniLM-L6-v2'

# load model
model = SentenceTransformer(model_name)

# load video index
df = pl.scan_parquet('app/data/video-index.parquet')

# create distance metric object
dist_name = 'manhattan'
dist = DistanceMetric.get_metric(dist_name)

Now, we define our API operations. Here, we will create 3 GET requests. The first one is shown in the code block below.

# create FastAPI object
app = FastAPI()

# API operations
@app.get("/")
def health_check():
return {'health_check': 'OK'}

In the block above, we initialize a new FastAPI application using the FastAPI() class and then create a “health check” endpoint.

To do this, we define a Python function that takes in no inputs but returns a dictionary with the key “health_check” and a value of “OK.” To turn this function into an API endpoint, we simply add a decorator to it and specify the path of our endpoint. Here, we use the root, i.e., “/”.

Let’s see another example. Here, we’ll have an endpoint called info that returns more information about the API.

@app.get("/info")
def info():
return {'name': 'yt-search', 'description': "Search API for Shaw Talebi's YouTube videos."}

We can see that this endpoint is very similar to the health check. This one, however, lives at the “/info” endpoint.

Finally, let’s make the search endpoint, which, given a user query, will return the titles and IDs of the most relevant videos.

@app.get("/search")
def search(query: str):
idx_result = returnSearchResultIndexes(query, df, model, dist)
return df.select(['title', 'video_id']).collect()[idx_result].to_dict(as_series=False)

For this endpoint, we require an input: the user’s query. This then gets passed to another Python function defined in another script that does all the math behind the search. While I won’t get into the details here, the curious reader can see the code on GitHub or the code walkthrough of the search function on YouTube.

Since this function only returns the row numbers of the search results in the df dataframe, we need to use this output to grab the titles and video IDs we care about and then return them as a Python dictionary. It’s important that all the outputs of our API endpoints be dictionaries because they adhere to the standard JSON format for APIs.

Notice that the code blocks above referenced two external files: app/functions.py and app/data/video-index.parquet, which implies the following directory structure.

The directory structure of FastAPI API. Image by author.

To run this API locally, we can navigate to the root directory and run the following command.

uvicorn app.main:app --host 0.0.0.0 --port 8080

uvicorn is a Python library that lets us run web applications like the one we created using FastAPI. This command will run the API locally at http://0.0.0.0:8080. The reason we used this host and port will be apparent later when we deploy on Google Cloud Run.

Step 2: Create Docker Image

Yay, our API runs locally! Let’s take the next step toward running it in the cloud.

We do this by creating a Docker image for our API. This will require us to make 3 additional files: Dockerfile, requirements.txt, and app/__init__.py. Here’s what our directory should look like.

The directory structure for Docker image creation. Image by author (no pun intended).

The Dockerfile contains the step by step instructions for running our Docker image. requirements.txt specifies the Python libraries (with versions) needed to run our API. Finally, app/__init__.py marks the app folder as a Python package, ensuring that Python can find and properly import our API code when running in the container.

Here’s what the inside of the Dockerfile looks like.

# start from python base image
FROM python:3.10-slim

# change working directory
WORKDIR /code

# add requirements file to image
COPY ./requirements.txt /code/requirements.txt

# install python libraries
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt

# add python code
COPY ./app/ /code/app/

# specify default commands
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]

The first line bootstraps our image on top of an existing image that has Python 3.10 installed. Next, we change our working directory from the root to /code.

We then copy the requirements.txt file from our codebase to the Docker image. With this in place, we can install our requirements using pip.

Next, we copy over all the code for our API.

Note: we copied and installed the Python packages before everything else because this allows us to cache the installation of the requirements. This is helpful because it avoids waiting a few minutes for the dependencies to install when running our Docker image rapidly during development.

Finally, we specify the default command which is the same thing we ran locally to run our API.

Step 3: Deploy to Google Cloud

At this point, we can build our Docker image and push it to the Docker Hub, which we can readily deploy to many different cloud services. However, I’ll follow an alternative strategy here.

Instead, I will push all our code to GitHub, from which we can directly deploy the container to Google Cloud Run. This has two key benefits.

First, it avoids resolving system architecture discrepancies between your local system and what Google Cloud Run uses (this was specifically an issue for me because Mac runs on ARM64). Second, deploying from a GitHub repo allows for continuous deployment, so if we want to update our API, we can simply push new code to our repo, and a new container will be spun up automatically.

We start by creating a new GitHub repo.

Creating GitHub repo. Image by author.

Then, we clone the repo, add our code, and push it back to GitHub. Here’s what the directory structure looks like after adding the code.

GitHub repo directory structure. Image by author.

With the code in place, we can create a new Google Cloud Platform project. We do this by going to the GCP console, clicking on our project lists, and selecting “NEW PROJECT”.

Creating a new GCP project. Image by author.

Once our project has been created, we can open it and type “cloud run” into the search bar.

Searching for Cloud Run service. Image by author.

When that opens, we will click “CREATE SERVICE”. This will open up a page for us to configure our service.

First, we select the option to deploy our service from GitHub. Then click “SET UP WITH CLOUD BUILD”.

Setting up continuous deployment of Cloud Run service from GitHub. Image by author.

For the repository source, we will select GitHub and choose the repo we just made.

Choosing our repository source. Image by author.

We will leave the branch as ^main$ and select the “Build Type” as Dockerfile.

Next, we return to the service configuration screen. We can name the service whatever we like (I leave it as the automatically generated one). For the region, I leave it as us-central1 because it’s a Tier 1, which offers the cheapest computing options (basically free for this example).

To keep things simple, I’ll “Allow unauthenticated invocations”. Of course, you will want to require authentication for most systems. Then, leave everything else as the default.

Configuring service. Image by author.

Finally, under “Container(s), Volumes, Networking, Security”, edit the container to have 1 GiB of memory. We must leave the PORT as 8080 because that’s what we configured in our Dockerfile.

Editing container. Image by author.

We can leave everything else as default and hit “CREATE” at the bottom of the screen. After a few minutes, the container will be live!

Service running on Cloud Run. Image by author.

We can then access the API using the URL specified near the top of the page. Clicking the link will open up the root endpoint, which was our health check.

Health check endpoint. Image by author.

We can manually run a GET request to the search API using the following URL: [your-app-url-here]/search?query=LLMs. This will search for videos relevant to LLMs.

Making GET request to search API endpoint. Image by author.

Bonus: integrating into UI

Once our API backend is set up, we can connect it to a user-friendly interface. I do that via Hugging Face Spaces, which hosts ML apps completely for free.

Here’s what that same search looks like through a UI. You can play around with the UI here and see the code here.

Search UI on HF Spaces. Gif by author.

Data science is more than just training fancy models. It’s about solving problems and generating value. Often, this requires us to deploy models into settings where they can make the most impact. Here, we walked through a simple 3-step strategy for deploying ML models using FastAPI, Docker, and GCP.

While this concludes the Full Stack Data Science series, these articles are accompanied by a YouTube playlist with a bonus video on the experimentation involved in creating this search tool.

More on Full Stack Data Science 👇

Shaw Talebi

Full Stack Data Science

[ad_2]

Leave a Reply

Your email address will not be published. Required fields are marked *