DA 1153 Autovec Unstructured Data #57

giriraj-singh-couchbase · 2025-09-26T19:37:49Z

This pull request introduces a new tutorial for using Couchbase Capella's AI Services auto-vectorization feature with LangChain, focusing on unstructured data workflows—especially data stored in S3 buckets. The changes add comprehensive documentation and a runnable Jupyter notebook that walks users through deploying models, configuring workflows, importing unstructured data, and performing semantic vector search with LangChain.

The most important changes are:

Documentation and Tutorial Content:

Added a detailed README.md explaining prerequisites, installation steps, and a quick start guide for the auto-vectorization tutorial.
Added frontmatter.md to provide metadata and summary information for the tutorial, including title, description, tags, and estimated duration.

Jupyter Notebook Tutorial:

Introduced autovec_unstructured.ipynb, a step-by-step notebook covering:
- Capella cluster and model deployment.
- Setting up access control and API keys for AI models.
- Creating and configuring auto-vectorization workflows to import and process unstructured data from S3.
- Performing semantic search using vector embeddings and LangChain integration, with code samples and explanations.

github-actions · 2025-09-26T19:37:59Z

Caution

Notebooks or Frontmatter Files Have Been Modified

Please ensure that a frontmatter.md file is accompanying the notebook file, and that the frontmatter is up to date.
These changes will be published to the developer portal tutorials only if frontmatter.md is included.
Proofread all changes before merging, as changes to notebook and frontmatter content will update the developer tutorial.

1 Notebook Files Modified:

Notebook File	Frontmatter Included?
`autovec_unstructured/autovec_unstructured.ipynb`	✅

1 Frontmatter Files Modified:

Frontmatter File
`autovec_unstructured/frontmatter.md`
Note: frontmatter will be checked and tested in the Test Frontmatter* workflow.*

gemini-code-assist · 2025-09-26T19:38:00Z

Summary of Changes

Hello @giriraj-singh-couchbase, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a new, comprehensive tutorial designed to educate users on leveraging Couchbase Capella's AI Services for auto-vectorization of unstructured data sourced from S3 buckets. The tutorial provides a practical, step-by-step guide, including a Jupyter notebook, to set up and execute workflows that transform raw data into vector embeddings, enabling advanced semantic search capabilities through LangChain integration. The primary goal is to empower users to efficiently manage and query unstructured data within the Capella ecosystem.

Highlights

New Tutorial Introduction: This pull request introduces a new tutorial demonstrating Couchbase Capella's AI Services auto-vectorization feature for unstructured data, specifically focusing on data stored in S3 buckets.
LangChain Integration: The tutorial showcases how to integrate with LangChain for performing semantic vector search on the auto-vectorized data.
Comprehensive Documentation: A detailed README.md and frontmatter.md have been added to provide prerequisites, installation steps, and metadata for the tutorial.
Interactive Jupyter Notebook: A runnable Jupyter notebook (autovec_unstructured.ipynb) is included, guiding users through Capella cluster and model deployment, access control setup, configuring auto-vectorization workflows for S3 data, and executing semantic search.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new tutorial for using Couchbase Capella's AI Services auto-vectorization feature. While the tutorial is comprehensive, there are several areas that need improvement. Critically, the Jupyter notebook references images that are not included in the pull request, which will prevent users from following the visual steps. There are also significant structural issues, such as incorrect section numbering and confusing instructions that reference incorrect data sources. Additionally, there are opportunities to improve code quality by removing unused imports, using environment variables for credentials to promote security best practices, and fixing minor typos and grammatical errors. Addressing these points will greatly improve the quality and usability of the tutorial.

autovec_unstructured/autovec_unstructured.ipynb

autovec_unstructured/README.md

autovec_unstructured/autovec_unstructured.ipynb

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

nithishr · 2025-10-02T14:07:33Z

autovec_unstructured/frontmatter.md

+title: Auto-Vectorization with Couchbase Capella AI Services and LangChain
+short_title: Auto-Vectorization with Couchbase and LangChain
+description:
+  - Learn how to use Couchbase Capella's AI Services auto-vectorization feature to automatically convert your data into vector embeddings.


convert your unstructured data into vector embeddings

autovec_unstructured/autovec_unstructured.ipynb

nithishr · 2025-10-07T15:14:36Z

autovec_unstructured/autovec_unstructured.ipynb

+    "query = \"How to setup java SDK?\"\n",
+    "results = vector_store.similarity_search(query, k=3)\n",
+    "\n",
+    "for rank, doc in enumerate(results, start=1):\n",


Can you print the score as well? You can use similarity_search_with_score

autovec_unstructured/README.md

autovec_unstructured/frontmatter.md

nithishr · 2025-10-07T15:18:25Z

autovec_unstructured/frontmatter.md

For now, to not publish the tutorial as the service is not GA while merging, can you rename this file to something else like frontmatter.md that we can change when we publish?

its already frontmatter.md

I meant to name it __frontmatter__.md but it got parsed by markdown so that on merging the tutorial is not published before service is GAed.

giriraj-singh-couchbase added 4 commits September 26, 2025 05:07

document updated

7088c3e

updated the document

eec1a72

fixed some issues

fa4de94

fixed text formatting

84832ae

gemini-code-assist bot reviewed Sep 26, 2025

View reviewed changes

giriraj-singh-couchbase and others added 2 commits September 27, 2025 01:20

Apply suggestions from code review

c5d565f

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fixed index name

055b78f

nithishr reviewed Oct 2, 2025

View reviewed changes

nithishr mentioned this pull request Oct 2, 2025

Autovectorization Tutorial #54

Open

Updated search term

8e96eb3

nithishr reviewed Oct 7, 2025

View reviewed changes

DA 1153 Autovec Unstructured Data #57

Are you sure you want to change the base?

DA 1153 Autovec Unstructured Data #57

Uh oh!

Conversation

giriraj-singh-couchbase commented Sep 26, 2025

Uh oh!

github-actions bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notebooks or Frontmatter Files Have Been Modified

1 Notebook Files Modified:

1 Frontmatter Files Modified:

Uh oh!

gemini-code-assist bot commented Sep 26, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nithishr Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nithishr Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nithishr Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

giriraj-singh-couchbase Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

nithishr Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Sep 26, 2025 •

edited

Loading