Until now, we had to rely on third-party services, but finally, Firestore can perform full-text searches!!
This is a full-text search using vector search, and this time I will introduce the implementation steps.
Prerequisites
The feature introduced this time is a preview version. The code below may not be executable once the official release is available.
Reference: Search using vector embeddings: https://firebase.google.com/docs/firestore/vector-search
The Function and Vertex AI used this time will incur usage fees. Please check in advance before using them.
Reference: Vertex AI: https://cloud.google.com/vertex-ai/generative-ai/pricing?hl=en
Overview of Steps
A separate article is planned for vector search itself, but this time we will use Google’s “Vertex AI” to calculate the vectors used for vector search. Currently, vector search can only be done with Python or JavaScript (Node.js), so we will use Google Cloud Function for the search.
Therefore, the following preparations are necessary to perform the search:
- Create Google Cloud Function
- Set up Vertex AI
- Obtain vectors
- Create an index
- Implement vector search code in Firestore
I will explain each step in detail.
Creating Google Cloud Function
The Cloud Functions environment created this time is as follows:
- 2nd gen
- HTTPS trigger
- Python 3.12
Also, the following is set in the runtime environment variables:
Name: GOOGLE_CLOUD_PROJECT
Value: <project ID> (Note that it is not the project name)
Since Vertex AI needs to be linked with Cloud Run, we are using 2nd gen, which creates Cloud Run when GCF is created.
Setting up Vertex AI Provided by Google
Link Vertex AI to the Cloud Run of the Function.
Reference: https://cloud.google.com/run/docs/integrate/vertex-ai?authuser=3&hl=en
For reference, I will also post the procedure here.
- Click the link below “Powered by Cloud Run” on the right side to go to Cloud Run
- Click the “Integrate” tab
- Click “Add Integration”
- Click “Vertex AI – Generative AI”, set any name, and click “submit”
Note that an error will occur if the name does not follow certain rules. If you have no particular preference, the default value is fine. - Approve if prompted to add permissions
Calculating Vectors for Vector Search
This time, since we are performing this with owner privileges, we did not add permissions, but during application development, it is necessary to assign Vertex AI and Firestore permissions to the account executing the Function.
Below is the code to calculate vectors using Vertex AI and store the data in Firestore.
Reference: https://firebase.google.com/docs/firestore/vector-search
functions-framework==3.*
google-cloud-firestore
google-cloud-aiplatform
import functions_framework
import os
# Firestore
from google.cloud import firestore
from google.cloud.firestore_v1.vector import Vector
# Vertex AI
import vertexai
from vertexai.language_models import TextEmbeddingModel
# Project name (obtained from environment variables)
MY_PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT")
# Calculate vector values from the given text
def text_embedding(text: str) -> list:
# Set the location to your location
vertexai.init(project=MY_PROJECT_ID, location="asia-northeast1")
# Use the latest AI for vector calculation, which is currently "textembedding-gecko@003"
model = TextEmbeddingModel.from_pretrained("textembedding-gecko@003")
embeddings = model.get_embeddings([text])
for embedding in embeddings:
vector = embedding.values
return Vector(vector)
# Main processing
# (Function name is arbitrary)
@functions_framework.http
def hello_http(request):
# Obtain the summary (description) of the article from the request
request_json = request.get_json(silent=True)
request_args = request.args
if request_json and 'description' in request_json:
description = request_json['description']
elif request_args and 'description' in request_args:
description = request_args['description']
else:
description = 'World'
# Initialize Firestore client
firestore_client = firestore.Client(project=MY_PROJECT_ID)
# Reference the collection (the collection name is arbitrary; create the collection in advance if it does not exist)
collection = firestore_client.collection("article_collection")
# Calculate embedding
embedding_vector = text_embedding(description)
# Prepare the document to be added to Firestore
doc = {
"description": description,
"embedding_field": embedding_vector
}
# Add the document
collection.add(doc)
return 'OK!'
This time, it was troublesome, so I confirmed the operation by executing the CLI test command from the terminal.
curl -m 70 -X POST https://asia-northeast1-python-tool-001.cloudfunctions.net/vector_chenge \
-H "Authorization: bearer $(gcloud auth print-identity-token)" \
-H "Content-Type: application/json" \
-d '{ "description": "<any text>"}'
If the execution is successful , the data will be stored in Firestore as follows:
Creating an Index for Vector Search
An index is essential for vector search. This time, I created the index by executing the following command from the console.
Reference: https://firebase.google.com/docs/firestore/vector-search
gcloud alpha firestore indexes composite create \
--collection-group=article_collection \
--query-scope=COLLECTION \
--field-config field-path=embedding_field,vector-config='{"dimension":"768", "flat": "{}"}' \
--database=<database ID>
- collection-group: Name of the collection to create the index
- query-scope: The scope for creating the index (can specify multiple collections)
- field-path: Name of the field storing the vector
- vector-config: Set the dimension of the vector (768 dimensions in this case)
- database: Specify the ID of the target database. Not required if default
After execution, an index will be created in Firestore as shown below.
Implementing Vector Search Code in Firestore
Now that the data is ready, let’s perform the actual search.
This time, I prepared the summary content of my blog articles as the search target data. I have omitted some parts for brevity.
No. | Title |
---|---|
1 | How to Handle When freezed.dart is Not Created: When designing immutable classes using freezed, if “…” |
2 | What is pubspec.yaml in Flutter? Introducing its meaning and how to write!! YAML stands for YAML Ain’t Markup Language, and it represents data concisely… |
3 | What is MVVM that is often heard in app development? MVVM (Model-View-ViewModel) is a design pattern that aims to improve development efficiency and maintainability by separating application logic and UI… |
4 | What is Flutter?? Explaining the overview of Flutter. Flutter is recognized as convenient for developing mobile apps, but what exactly makes it convenient and popular? |
5 | What is Riverpod? Introducing the most popular state management in Flutter!! Although “StatefulWidget” introduced earlier is also a state management feature, when implementing an app with multiple screens or features, management becomes… |
The executed code is as follows:
import functions_framework
import os
# Firestore
from google.cloud import firestore
from google.cloud.firestore_v1.vector import Vector
from google.cloud.firestore_v1.base_vector_query import DistanceMeasure
# Vertex AI
import vertexai
from vertexai.language_models import TextEmbeddingModel
# Project name (obtained from environment variables)
MY_PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT")
# Calculate vector values from the given text
def text_embedding(text: str) -> list:
# Set the location to your location
vertexai.init(project=MY_PROJECT_ID, location="asia-northeast1")
# Use the latest AI for vector calculation, which is currently "textembedding-gecko@003"
model = TextEmbeddingModel.from_pretrained("textembedding-gecko@003")
embeddings = model.get_embeddings([text])
for embedding in embeddings:
vector = embedding.values
return Vector(vector)
# Main processing
# (Function name is arbitrary)
@functions_framework.http
def hello_http(request):
# Obtain the summary (description) of the article from the request
request_json = request.get_json(silent=True)
request_args = request.args
if request_json and 'target' in request_json:
target = request_json['target']
elif request_args and 'target' in request_args:
target = request_args['target']
else:
target = 'World'
# Initialize Firestore client
firestore_client = firestore.Client(project=MY_PROJECT_ID)
# Reference the collection
collection = firestore_client.collection("article_collection")
# Calculate embedding
embedding_vector = text_embedding(target)
# Perform vector search
docs = collection.find_nearest(
vector_field="embedding_field",
query_vector=embedding_vector,
distance_measure=DistanceMeasure.COSINE,
limit=3
).get()
# Output for table format (here, in string format)
output = "Description \n"
output += "-" * 50 + "\n"
# Output the contents of the documents obtained through vector search
for doc in docs:
doc_data = doc.to_dict()
description = doc_data.get("description", "No description")
# Add document contents to string
output += f"{description[:100]} \n"
return output
This time, I confirmed the operation by executing the CLI test command from the terminal. I will perform a search with “about Riverpod”!
curl -m 70 -X POST https://asia-northeast1-python-tool-001.cloudfunctions.net/vector_search \
-H "Authorization: bearer $(gcloud auth print-identity-token)" \
-H "Content-Type: application/json" \
-d '{
"target": "about Riverpod"
}'
Execution result
Description
--------------------------------------------------
What is Riverpod? Introducing the most popular state management in Flutter!! Although "StatefulWidget" introduced earlier is also a state management feature, when implementing an app with multiple screens or features, management becomes
What is Flutter?? Explaining the overview of Flutter. Flutter is recognized as convenient for developing mobile apps, but what exactly makes it convenient and popular?
What is MVVM that is often heard in app development? MVVM (Model-View-ViewModel) is a design pattern that aims to improve development efficiency and maintainability by separating application logic and UI...
Although no specific sorting was performed, the Riverpod article came up as the first result!! In this case, it was unclear if the second and third results were close to the search query.
Conclusion
It seems that increasing the amount of data would help to verify the search accuracy. The vector data size is approximately 3KB if we consider float to be 4 bytes and 768 dimensions in this case. The document limit is 1MB, so it might seem a bit large.
Being able to perform full-text search in Firestore, even though it is still a preview version, is great news. We look forward to future developments.