diff --git a/notebooks/integrations/gemini/google-vertex-ai-chat-completion-notebook.ipynb b/notebooks/integrations/gemini/google-vertex-ai-chat-completion-notebook.ipynb
new file mode 100644
index 000000000..40fd5862d
--- /dev/null
+++ b/notebooks/integrations/gemini/google-vertex-ai-chat-completion-notebook.ipynb
@@ -0,0 +1,465 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "IHR_5ZfW69Mq"
+   },
+   "source": [
+    "# Google Vertex AI Chat completion with Elastic\n",
+    "\n",
+    "This notebook shows how to use Elastic API to interact with Google Vertex AI models to perform Chat completion tasks.\n",
+    "\n",
+    "You will need access to a Google Cloud project and enable the Vertex AI APIs, the GCP console will help you do that, follow the instructions. Please note that costs could derive from the use of Vertex AI.\n",
+    "\n",
+    "For more info please refer to\n",
+    "\n",
+    "https://cloud.google.com/vertex-ai\n",
+    "\n",
+    "https://www.elastic.co/docs/api/doc/elasticsearch/operation/\n",
+    "operation-inference-put-googlevertexai"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "eq9qSQkFsa3H"
+   },
+   "source": [
+    "# Install dependencies"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "KROQ28gAsoqa"
+   },
+   "source": [
+    "**Install Python dependencies**\n",
+    "\n",
+    "We will use the `elasticsearch` python library to create the inference endpoint and the `requests` library to make HTTP Calls to the Elastic API."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "Vrs0aI1fstxJ",
+    "outputId": "3fc26e2b-e381-4c3d-fc3b-7c0334496229"
+   },
+   "outputs": [],
+   "source": [
+    "!pip install elasticsearch requests"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "OAfRaQwisu9j"
+   },
+   "source": [
+    "**Import Required Libraries**\n",
+    "\n",
+    "Now import the necessary modules, including `requests` for making HTTP calls, `json` for manipulating JSON payloads, and `getpass` for secure input of username, password and API keys.\n",
+    "\n",
+    "**In production you want to use a secure secret management to handle your sensitive data like usernames, paswords and API keys.**\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "3ukQv4R1s-dc"
+   },
+   "outputs": [],
+   "source": [
+    "from elasticsearch import Elasticsearch, helpers\n",
+    "from urllib.request import urlopen\n",
+    "from getpass import getpass\n",
+    "import json\n",
+    "import time\n",
+    "import requests\n",
+    "from base64 import b64encode"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "9GUxCn0qsglg"
+   },
+   "source": [
+    "# Create Elastic client and Inference endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "UtXAKUbxtCB1"
+   },
+   "source": [
+    "**Instantiate the Elasticsearch Client**\n",
+    "\n",
+    "This section sets up your Elasticsearch client. For demonstration purposes, we're using a local Elasticsearch instance with default credentials. Adjust these settings for your specific environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "WQidM7qotF5U"
+   },
+   "outputs": [],
+   "source": [
+    "ELASTIC_USER = getpass(\"ELASTIC USER: \")\n",
+    "ELASTIC_PASSWORD = getpass(\"ELASTIC PASSWORD: \")\n",
+    "host = \"\" # use your own host\n",
+    "\n",
+    "client = Elasticsearch(\n",
+    "    hosts=[f\"http://{host}/\"],\n",
+    "    basic_auth=(ELASTIC_USER, ELASTIC_PASSWORD),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "fGf_duY5tPi2"
+   },
+   "source": [
+    "Confirm the client connected by getting its metadata:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "-vEgkrzytTy5",
+    "outputId": "d3b0cf48-315e-4ab7-b33c-1b67372f1ff4"
+   },
+   "outputs": [],
+   "source": [
+    "print(client.info())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "JZMLEKVjtUkM"
+   },
+   "source": [
+    "**Create an Inference Endpoint using Gemini**\n",
+    "\n",
+    "In this step we create the Inference endpoint to allow calling for Chat completion tasks.\n",
+    "\n",
+    "For this you will need to get the Service account key file from GCP.\n",
+    "\n",
+    "\n",
+    "**Get the service account credentials**\n",
+    "\n",
+    "You will need a SA (Service Account)  and its credentials so the Elasticsearch server can access the service.\n",
+    "\n",
+    "Go to https://console.cloud.google.com/iam-admin/serviceaccounts\n",
+    "\n",
+    "  1.   Click the button Create service account\n",
+    "  2.   Write a name that it’s suitable for you.\n",
+    "  3.   Click Create and continue\n",
+    "  4.   Grant the role Vertex AI User.\n",
+    "  5.   Click `Add another role` and then grant the role Service account token creator. This role is needed to allow the SA to generate the necessary access tokens\n",
+    "  6.   Click Done.\n",
+    "\n",
+    "After creating the Service account you need to get the JSON key file:\n",
+    "\n",
+    "Go to https://console.cloud.google.com/iam-admin/serviceaccounts and click on the SA just created.\n",
+    "\n",
+    "Go to the keys tab and click Add key -> Create new key -> JSON -> Click on Create\n",
+    "\n",
+    "If you get an error message Service account key creation is disabled your administrator needs to change the organization policy *iam.disableServiceAccountKeyCreation* or grant an exception.\n",
+    "The service account keys should be downloaded to your PC automatically.\n",
+    "\n",
+    "Once you donwload the JSON file, open it with you favorite editor and copy its contents. Paste the contents when prompted on the step below.\n",
+    "\n",
+    "\n",
+    "---\n",
+    "\n",
+    "**IMPORTANT**\n",
+    "\n",
+    "Note that the use of this service account may have an impact in the GCP billing.\n",
+    "\n",
+    "Service account keys can be vulnerable, remember to always:\n",
+    "\n",
+    "**KEEP SA KEYS SAFE**\n",
+    "\n",
+    "**ENFORCE LEAST PRIVILEGE**\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "hkM9wey7rKhQ",
+    "outputId": "8f7e53b5-c9f0-4bbb-d2fc-416f51476e73"
+   },
+   "outputs": [],
+   "source": [
+    "GOOGLE_API_KEY = getpass(\"Enter Google Service account API key:  \")\n",
+    "inference_id = \"chat_completion-notebook-test1\" # set the inference ID for the endpoint\n",
+    "project_id = \"\" # use your GCP project\n",
+    "location = \"\" # set the location in which Vertex AI models live\n",
+    "\n",
+    "model_id = \"gemini-2.5-flash-preview-05-20\" # choose the model, you could use any model from your Vertex AI."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "jlFGs66Fuqth"
+   },
+   "source": [
+    "**Generate a Chat Completion Inference**\n",
+    "\n",
+    "Using the requests library, create a POST request to the Elastic API for Chat completion inference task."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "rchfBBa-tfHJ",
+    "outputId": "ed5fae6d-c8b7-40c9-91df-2505e4a25596"
+   },
+   "outputs": [],
+   "source": [
+    "client.inference.put(\n",
+    "    task_type=\"chat_completion\",\n",
+    "    inference_id=inference_id,\n",
+    "    body={\n",
+    "        \"service\": \"googlevertexai\",\n",
+    "        \"service_settings\": {\n",
+    "            \"service_account_json\": GOOGLE_API_KEY,\n",
+    "            \"model_id\": model_id,\n",
+    "           \"location\": location,\n",
+    "           \"project_id\": project_id\n",
+    "        },\n",
+    "    },\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "nrlBxff9spM9"
+   },
+   "source": [
+    "# Call the Inference API for Chat Completion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "NP92IpAJ0usP"
+   },
+   "outputs": [],
+   "source": [
+    "api_key = b64encode(f\"{ELASTIC_USER}:{ELASTIC_PASSWORD}\".encode())\n",
+    "\n",
+    "def extract_content(json_data) -> str:\n",
+    "    try:\n",
+    "      data = json.loads(json_data)\n",
+    "      if \"choices\" in data and len(data[\"choices\"]) > 0:\n",
+    "          choice = data[\"choices\"][0]\n",
+    "          if \"delta\" in choice and \"content\" in choice[\"delta\"]:\n",
+    "              return choice[\"delta\"][\"content\"]\n",
+    "    except:\n",
+    "      pass\n",
+    "    return \"\"\n",
+    "\n",
+    "def extract_content_sse(chunk: bytearray):\n",
+    "    chunk_str :str = chunk.decode(\"utf-8\")\n",
+    "    _, data = chunk_str.split(\"data: \")\n",
+    "    return extract_content(data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "jlfINrhYuteF",
+    "outputId": "07aa9171-94f5-462e-8282-fbfd01c8bcaa"
+   },
+   "outputs": [],
+   "source": [
+    "url = f\"http://{host}/_inference/chat_completion/{inference_id}/_stream\"\n",
+    "headers = {\n",
+    "    \"Authorization\": f\"Basic {api_key}\",\n",
+    "    \"Content-Type\": \"application/json\",\n",
+    "}\n",
+    "data = {\n",
+    "    \"model\": \"gemini-2.5-flash-preview-05-20\",\n",
+    "    \"messages\": [{\"role\": \"user\", \"content\": \"What is Elastic?\"}],\n",
+    "}\n",
+    "\n",
+    "post_response = requests.post(url, headers=headers, json=data, stream=True)\n",
+    "\n",
+    "for chunk in post_response.iter_content(chunk_size=None):\n",
+    "    #extract_content_sse(chunk)\n",
+    "    print(extract_content_sse(chunk), end=\"\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "NPQz0Qmnp0qF"
+   },
+   "source": [
+    "**Call the Inference using Tools**\n",
+    "\n",
+    "You can also include the usage of tools on chat completion inference tasks."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "i-Oiyak3p6AH",
+    "outputId": "edffd50f-fd0f-4465-9ea6-60b4ffde079f"
+   },
+   "outputs": [],
+   "source": [
+    "url = f\"http://{host}/_inference/chat_completion/{inference_id}/_stream\"\n",
+    "headers = {\n",
+    "    \"Authorization\": f\"Basic {api_key}\",\n",
+    "    \"Content-Type\": \"application/json\",\n",
+    "}\n",
+    "data = {\n",
+    "    \"model\": \"gemini-2.5-flash-preview-05-20\",\n",
+    "    \"messages\": [{\"role\": \"user\", \"content\": \"What is the weather like in Boston today?\"}],\n",
+    "    \"tools\": [\n",
+    "    {\n",
+    "      \"type\": \"function\",\n",
+    "      \"function\": {\n",
+    "        \"name\": \"get_current_weather\",\n",
+    "        \"description\": \"Get the current weather in a given location\",\n",
+    "        \"parameters\": {\n",
+    "          \"type\": \"object\",\n",
+    "          \"properties\": {\n",
+    "            \"location\": {\n",
+    "              \"type\": \"string\",\n",
+    "              \"description\": \"The city and state, e.g. San Francisco, CA\"\n",
+    "            },\n",
+    "            \"unit\": {\n",
+    "              \"type\": \"string\",\n",
+    "              \"enum\": [\"celsius\", \"fahrenheit\"]\n",
+    "            }\n",
+    "          },\n",
+    "          \"required\": [\"location\"]\n",
+    "        }\n",
+    "      }\n",
+    "    }\n",
+    "  ],\n",
+    "  \"tool_choice\": \"auto\"\n",
+    "}\n",
+    "\n",
+    "post_response = requests.post(url, headers=headers, json=data, stream=True)\n",
+    "\n",
+    "print(\"Post inference response:\")\n",
+    "for chunk in post_response.iter_content(chunk_size=None):\n",
+    "    print(chunk.decode(\"utf-8\"), end=\"\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Ntn_iCanrbV7"
+   },
+   "source": [
+    "**Calling the chat completion inference task with system messages**\n",
+    "\n",
+    "System messages can be included on the messages payload to give the agent more context regarding the conversation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "aJoZhHSDrilJ",
+    "outputId": "a11eb4a0-a3dd-4c3d-b46f-1c16c612ce38"
+   },
+   "outputs": [],
+   "source": [
+    "url = f\"http://{host}/_inference/chat_completion/{inference_id}/_stream\"\n",
+    "headers = {\n",
+    "    \"Authorization\": f\"Basic {api_key}\",\n",
+    "    \"Content-Type\": \"application/json\",\n",
+    "}\n",
+    "data = {\n",
+    "    \"model\": \"gemini-2.5-flash-preview-05-20\",\n",
+    "    \"messages\": [\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"You are an AI travel assistant that can read images, call functions, and interpret structured data. Be helpful and accurate.\"\n",
+    "        },\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"When is the best time to visit Japan?\"}\n",
+    "        ],\n",
+    "}\n",
+    "\n",
+    "post_response = requests.post(url, headers=headers, json=data, stream=True)\n",
+    "\n",
+    "print(\"Post inference response:\")\n",
+    "for chunk in post_response.iter_content(chunk_size=None):\n",
+    "    print(chunk.decode(\"utf-8\"), end=\"\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "gpuType": "T4",
+   "provenance": [],
+   "toc_visible": true
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/notebooks/integrations/gemini/google-vertex-ai-completion-notebook.ipynb b/notebooks/integrations/gemini/google-vertex-ai-completion-notebook.ipynb
new file mode 100644
index 000000000..711d8a81b
--- /dev/null
+++ b/notebooks/integrations/gemini/google-vertex-ai-completion-notebook.ipynb
@@ -0,0 +1,442 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "C0Rdc95b07J2"
+   },
+   "source": [
+    "# Google Vertex AI Completion with Elastic\n",
+    "\n",
+    "This notebook shows how to use Elastic API to interact with Google Vertex AI models to perform completion tasks.\n",
+    "\n",
+    "You will need access to a Google Cloud project and enable the Vertex AI APIs, the GCP console will help you do that, follow the instructions. Please note that costs could derive from the use of Vertex AI.\n",
+    "\n",
+    "For more info please refer to\n",
+    "https://cloud.google.com/vertex-ai\n",
+    "\n",
+    "https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-googlevertexai\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "eq9qSQkFsa3H"
+   },
+   "source": [
+    "# Install dependencies\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "KROQ28gAsoqa"
+   },
+   "source": [
+    "**Install Python dependencies**\n",
+    "\n",
+    "We will use the `elasticsearch` python library to create the inference endpoint and the `requests` library to make HTTP Calls to the Elastic search API.\n",
+    "\n",
+    "You may choose a different HTTP library.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "Vrs0aI1fstxJ",
+    "outputId": "2d05d698-7f05-4e0a-99e9-b75d841abb59"
+   },
+   "outputs": [],
+   "source": [
+    "!pip install elasticsearch requests"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "OAfRaQwisu9j"
+   },
+   "source": [
+    "**Import Required Libraries**\n",
+    "\n",
+    "Now import the necessary modules, including `requests` for making HTTP calls, `json` for manipulating JSON payloads, and `getpass` for secure input of username, password and API keys.\n",
+    "\n",
+    "**In production you want to use a secure secret management to handle your sensitive data like usernames, paswords and API keys.**\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "3ukQv4R1s-dc"
+   },
+   "outputs": [],
+   "source": [
+    "from elasticsearch import Elasticsearch, helpers\n",
+    "from urllib.request import urlopen\n",
+    "from getpass import getpass\n",
+    "import json\n",
+    "import time\n",
+    "import requests\n",
+    "from base64 import b64encode"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "9GUxCn0qsglg"
+   },
+   "source": [
+    "# Create Elastic client and Inference endpoint\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "UtXAKUbxtCB1"
+   },
+   "source": [
+    "**Instantiate the Elasticsearch Client**\n",
+    "\n",
+    "This section sets up your Elasticsearch client. For demonstration purposes, we're using a local Elasticsearch instance with default credentials. Adjust these settings for your specific environment.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "WQidM7qotF5U",
+    "outputId": "1c6b93e2-6d53-4b62-a25a-f2e55622a137"
+   },
+   "outputs": [],
+   "source": [
+    "ELASTIC_USER = getpass(\"ELASTIC USER: \")\n",
+    "ELASTIC_PASSWORD = getpass(\"ELASTIC PASSWORD: \")\n",
+    "host = \"\"  # use your Elastic API host\n",
+    "client = Elasticsearch(\n",
+    "    hosts=[f\"http://{host}/\"],\n",
+    "    basic_auth=(ELASTIC_USER, ELASTIC_PASSWORD),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "fGf_duY5tPi2"
+   },
+   "source": [
+    "Confirm the Elsatic client connected by looking at its metadata:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "-vEgkrzytTy5",
+    "outputId": "14d4ac90-43e0-4c6c-a951-31720ca121db"
+   },
+   "outputs": [],
+   "source": [
+    "print(client.info())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "JZMLEKVjtUkM"
+   },
+   "source": [
+    "**Create an Inference Endpoint using Gemini**\n",
+    "\n",
+    "In this step we create the Inference endpoint to allow calling for Completion tasks.\n",
+    "\n",
+    "For this you will need to get the Service account key file from GCP.\n",
+    "\n",
+    "**Get the service account credentials**\n",
+    "\n",
+    "You will need a SA (Service Account) and its credentials so the Elasticsearch server can access the service.\n",
+    "\n",
+    "Go to https://console.cloud.google.com/iam-admin/serviceaccounts\n",
+    "\n",
+    "1.  Click the button Create service account\n",
+    "2.  Write a name that it’s suitable for you.\n",
+    "3.  Click Create and continue\n",
+    "4.  Grant the role Vertex AI User.\n",
+    "5.  Click `Add another role` and then grant the role Service account token creator. This role is needed to allow the SA to generate the necessary access tokens\n",
+    "6.  Click Done.\n",
+    "\n",
+    "After creating the Service account you need to get the JSON key file:\n",
+    "\n",
+    "Go to https://console.cloud.google.com/iam-admin/serviceaccounts and click on the SA just created.\n",
+    "\n",
+    "Go to the keys tab and click Add key -> Create new key -> JSON -> Click on Create\n",
+    "\n",
+    "If you get an error message Service account key creation is disabled your administrator needs to change the organization policy _iam.disableServiceAccountKeyCreation_ or grant an exception.\n",
+    "The service account keys should be downloaded to your PC automatically.\n",
+    "\n",
+    "Once you donwload the JSON file, open it with you favorite editor and copy its contents. Paste the contents when prompted on the step below.\n",
+    "\n",
+    "---\n",
+    "\n",
+    "**IMPORTANT**\n",
+    "\n",
+    "Note that the use of this service account may have an impact in the GCP billing.\n",
+    "\n",
+    "Service account keys can be vulnerable, remember to always:\n",
+    "\n",
+    "**KEEP SA KEYS SAFE**\n",
+    "\n",
+    "**ENFORCE LEAST PRIVILEGE**\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "hkM9wey7rKhQ",
+    "outputId": "b958c5d4-b617-4f0f-e2fd-6e785e364f57"
+   },
+   "outputs": [],
+   "source": [
+    "GOOGLE_API_KEY = getpass(\"Enter Google Service account API key:  \")\n",
+    "inference_id = \"completion-notebook-test-1\"  # set the inference ID for the endpoint\n",
+    "project_id = \"\"  # use your GCP project\n",
+    "location = \"us-central1\"  # set the region in which the model lives\n",
+    "\n",
+    "model_id = \"gemini-2.5-flash-preview-05-20\"  # choose the model, you could use any model from your Vertex AI.\n",
+    "# Availabe models per zone are listed here: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations#google_model_endpoint_locations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "jlFGs66Fuqth"
+   },
+   "source": [
+    "**Generate a Completion Inference**\n",
+    "\n",
+    "Using the requests library, create a POST request to the Elastic API for Completion inference task.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "rchfBBa-tfHJ",
+    "outputId": "f2df2fc3-4587-40f4-a0dd-125c0d49fe90"
+   },
+   "outputs": [],
+   "source": [
+    "client.inference.put(\n",
+    "    task_type=\"completion\",\n",
+    "    inference_id=inference_id,\n",
+    "    body={\n",
+    "        \"service\": \"googlevertexai\",\n",
+    "        \"service_settings\": {\n",
+    "            \"service_account_json\": GOOGLE_API_KEY,\n",
+    "            \"model_id\": model_id,\n",
+    "            \"location\": location,\n",
+    "            \"project_id\": project_id,\n",
+    "        },\n",
+    "    },\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "nrlBxff9spM9"
+   },
+   "source": [
+    "# Call the Inference API for Completion\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "NP92IpAJ0usP"
+   },
+   "outputs": [],
+   "source": [
+    "api_key = b64encode(f\"{ELASTIC_USER}:{ELASTIC_PASSWORD}\".encode())\n",
+    "\n",
+    "\n",
+    "def extract_content(json_data) -> str:\n",
+    "    try:\n",
+    "        data = json.loads(json_data)\n",
+    "        if \"choices\" in data and len(data[\"choices\"]) > 0:\n",
+    "            choice = data[\"choices\"][0]\n",
+    "            if \"delta\" in choice and \"content\" in choice[\"delta\"]:\n",
+    "                return choice[\"delta\"][\"content\"]\n",
+    "    except:\n",
+    "        pass\n",
+    "    return \"\"\n",
+    "\n",
+    "\n",
+    "def extract_content_sse(chunk):\n",
+    "    \"\"\"\n",
+    "    Extracts the 'delta' content from an SSE chunk with the specific\n",
+    "    {\"completion\":[{\"delta\":\"...\"}]} structure.\n",
+    "    Handles 'data: [DONE]' messages.\n",
+    "    \"\"\"\n",
+    "    try:\n",
+    "        chunk_str = chunk.decode(\"utf-8\")\n",
+    "        lines = chunk_str.split(\"\\n\")\n",
+    "        extracted_deltas = []\n",
+    "\n",
+    "        for line in lines:\n",
+    "            line = line.strip()\n",
+    "            if not line:\n",
+    "                continue\n",
+    "\n",
+    "            if line.startswith(\"data:\"):\n",
+    "                json_data_str = line[len(\"data:\") :].strip()\n",
+    "                if json_data_str == \"[DONE]\":\n",
+    "                    return \"\"\n",
+    "                try:\n",
+    "                    data_obj = json.loads(json_data_str)\n",
+    "                    if \"completion\" in data_obj and isinstance(\n",
+    "                        data_obj[\"completion\"], list\n",
+    "                    ):\n",
+    "                        for item in data_obj[\"completion\"]:\n",
+    "                            if \"delta\" in item:\n",
+    "                                extracted_deltas.append(item[\"delta\"])\n",
+    "                except json.JSONDecodeError:\n",
+    "                    pass\n",
+    "            elif line.startswith(\"event: message\"):\n",
+    "                pass\n",
+    "\n",
+    "        return \"\".join(extracted_deltas)\n",
+    "\n",
+    "    except Exception as e:\n",
+    "        return \"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "jlfINrhYuteF",
+    "outputId": "5d7c3747-4b8e-4e9e-e02d-f4a3bb91a7d5"
+   },
+   "outputs": [],
+   "source": [
+    "url_completion = f\"http://{host}/_inference/completion/{inference_id}\"\n",
+    "headers = {\"Authorization\": f\"Basic {api_key}\", \"content-type\": \"application/json\"}\n",
+    "data_completion = {\"input\": \"What is elastic?\"}\n",
+    "\n",
+    "try:\n",
+    "    response = requests.post(url_completion, headers=headers, json=data_completion)\n",
+    "    response.raise_for_status()\n",
+    "\n",
+    "    print(f\"Status Code: {response.status_code}\")\n",
+    "    print(\"Response Body:\")\n",
+    "    print(json.dumps(response.json(), indent=2))\n",
+    "\n",
+    "except requests.exceptions.RequestException as e:\n",
+    "    print(f\"Error during regular completion request: {e}\")\n",
+    "    if hasattr(e, \"response\") and e.response is not None:\n",
+    "        print(f\"Response content: {e.response.text}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "NPQz0Qmnp0qF"
+   },
+   "source": [
+    "**Call the Inference using Streaming**\n",
+    "\n",
+    "The API will stream the LLM response.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "i-Oiyak3p6AH",
+    "outputId": "17b8f495-bf4a-4ccc-afbb-7363f3ff4a86"
+   },
+   "outputs": [],
+   "source": [
+    "url_stream_completion = f\"http://{host}/_inference/completion/{inference_id}/_stream\"\n",
+    "headers_stream = {\n",
+    "    \"Authorization\": f\"Basic {api_key}\",\n",
+    "    \"content-type\": \"application/json\",\n",
+    "}\n",
+    "data_stream_completion = {\"input\": \"What is Elastic? (use spongebob lore to explain)\"}\n",
+    "\n",
+    "try:\n",
+    "\n",
+    "    post_response_stream = requests.post(\n",
+    "        url_stream_completion,\n",
+    "        headers=headers_stream,\n",
+    "        json=data_stream_completion,\n",
+    "        stream=True,\n",
+    "    )\n",
+    "    post_response_stream.raise_for_status()\n",
+    "\n",
+    "    print(f\"Status Code (Stream): {post_response_stream.status_code}\")\n",
+    "    print(\"Streaming Response:\")\n",
+    "\n",
+    "    for chunk in post_response_stream.iter_content(chunk_size=None):\n",
+    "        print(extract_content_sse(chunk), end=\"\")\n",
+    "\n",
+    "    print(\"\\n--- End of Stream ---\")\n",
+    "\n",
+    "except requests.exceptions.RequestException as e:\n",
+    "    print(f\"Error during streaming completion request: {e}\")\n",
+    "    if hasattr(e, \"response\") and e.response is not None:\n",
+    "        print(f\"Response content: {e.response.text}\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "gpuType": "T4",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}