Skip to content

Add Elasticsearch semantic search getting started guide #1922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

lcawl
Copy link
Contributor

@lcawl lcawl commented Jun 26, 2025

This PR aims to port content from https://www.elastic.co/getting-started/enterprise-search/build-a-semantic-search-experience into the https://www.elastic.co/docs/solutions/search/get-started section of the documentation.

Some significant changes include:

  • Rather than requiring a crawler to ingest sample documents, it uses the ones provided in the index creation workflow (since we want to defer the complexity of ingestion for early users). If the crawler example needs to be ported to the docs, that can be accomplished in a more advanced workflow.
  • Rather than covering both semantic search and hybrid search, it covers only the former. Hybrid search will be covered in a separate page.
  • In addition to the Query DSL examples, there's now an example of using ES|QL in Discover.
  • Since the steps apply to both Serverless and Stack contexts with minor variations, those variations are covered in tabs.

NOTE: To align with #1929, the new semantic search page is grouped into a quickstart section. I think after all the getting-started guides are migrated into this section we can revisit how/if the API quickstarts can also be included in this section.

@lcawl lcawl added the Team:Projects Issues owned by the Docs Org label Jun 26, 2025
Copy link

github-actions bot commented Jun 26, 2025

🔍 Preview links for changed docs:

🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes.

@lcawl lcawl marked this pull request as ready for review June 28, 2025 05:55
@lcawl lcawl requested review from a team as code owners June 28, 2025 05:55
@lcawl lcawl requested review from jmikell821 and theletterf June 28, 2025 05:56
@theletterf
Copy link
Contributor

Love the overall structure and the use of the stepper!

@florent-leborgne
Copy link
Contributor

+1 on the stepper

You can hosts gifs on Contentstack as well in case we don't want too heavy files stored in GH

Copy link
Contributor

@szabosteve szabosteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you for creating it! I left a couple of comments, mostly nits.

@lcawl
Copy link
Contributor Author

lcawl commented Jun 30, 2025

You can hosts gifs on Contentstack as well in case we don't want too heavy files stored in GH

Thanks @florent-leborgne! I've switched to linking to contentstack in 6afc299

Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a few ideas and observations reading through this :)

---
# Get started with semantic search

_Semantic search_ is a type of AI-powered search that enables you to use intuitive language in your queries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_Semantic search_ is a type of AI-powered search that enables you to use intuitive language in your queries.
_Semantic search_ is a type of AI-powered search that enables you to use natural language in your queries.

nit: more common term

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could have an introductory sentence that states the goal of the quickstart:

In this hands-on quickstart we blah blah blah.


## Prerequisites

::::{tab-set}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tabs feel almost like overkill here TBH

:sync: stack

- An {{es}} cluster for storing and searching your data, and {{kib}} for visualizing and managing your data. This quickstart is available for all [Elastic deployment models](/deploy-manage/deploy.md). The quickest way to get started is by using [{{es-serverless}}](/solutions/search/serverless-elasticsearch-get-started.md).
- If you want to add sample data, you must have authority to create an index, create documents, and view them. To use {{kib}}, you'll also need read authority for the **Discover**, **Dev Tools**, and **{{es}}** features at a minimum. For example, create a custom role that has `all` index privileges for the sample index ("semantic-index") and `read` authority for the specific {{kib}} features. To learn more, refer to [](/deploy-manage/users-roles/cluster-or-deployment-auth/user-roles.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that this might be a bit confusing and we could just assume that the majority of users (particularly if it's a trial) will have these perms already?


## Add data

% TBD: What type of data is ideal for semantic search?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in brief: unstructured text (documents, articles, descriptions)

not ideal for exact keyword search over structured or timestamped data

:::{tab-item} {{serverless-short}}
:sync: serverless
There are some small data sets available for learning purposes.
Go to **{{es}} > Home**, select the semantic search workflow, and click **Create a semantic optimized index**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that we're introducing a few too many options here, if we want to be opinionated I'd stick to using Console

We also need some value-add over the in-product guides, otherwise we should just point users to them 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see an argument for this to not be another tabbed widget


What just happened? The content was transformed into a sparse vector, which involves two main steps.
First, the content is divided into smaller, manageable chunks to ensure that meaningful segments can be more effectively processed and searched. Then each chunk of text is transformed into a sparse vector representation using text expansion techniques.
By default, `semantic_text` fields leverage ELSER to transform the content.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence might be best placed in the create a semantic_text field section

Also seems like an important callout that if you want to use dense vectors (generally more common approach) then you'll have to do a bit more than simply create the semantic_text field

{{es}} provides a variety of query languages for interacting with your data.
For an overview of their features and use cases, check out [](/explore-analyze/query-filter/languages.md).

You can search data that is stored in `semantic_text` fields by using a specific subset of queries, including `knn`, `match`, `semantic`, and `sparse_vector`. Refer to [Semantic text field type](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) for the complete list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general remark: again I worry that we're introducing a few many things at once:

  • Multiple query languages
  • Multiple query types to target semantic_text fields (the happy path is the semantic query, especially if you're going down the default elser route)
  • Multiple interfaces for running your queries

Just thinking might be a bit overwhelming for a new user :)


::::{step} Run a semantic query with Query DSL

Open the **{{index-manage-app}}** page from the navigation menu or return to the [guided index flow](/solutions/search/serverless-elasticsearch-get-started.md#elasticsearch-follow-guided-index-flow) to find code examples for searching the sample data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I worry that we're interleaving suggestions about querying sample data provided in the UI, with the data we've ingested here ourselves. I'd stick to going to straight to console personally in the Query DSL option.


This is a [semantic](elasticsearch://reference/query-languages/query-dsl/query-dsl-semantic-query.md) query that is expressed in [Query Domain Specific Language](/explore-analyze/query-filter/languages/querydsl.md) (DSL), which is the primary query language for {{es}}.

The query is translated automatically into a vector representation and runs against the contents of the semantic text field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is true for any semantic query and is an important fact for a new user's mental model, so maybe it could be raised up a level to the query section introduction?

Importantly, the query is transformed using the same model that transformed the documents. The beauty of semantic_text is that it's smart about that. In the olden days, or in more manual workflows, the user has to make sure they're specifying the same model for both.


In this example, the document related to Rocky Mountain National park has the highest score.
::::
::::{step} Run a match query in ES|QL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
::::{step} Run a match query in ES|QL
::::{step} Run a semantic query in ES|QL

Even though ESQL re-uses the match function for semantic queries, it's confusing to call it a match query 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Projects Issues owned by the Docs Org
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants