-
Notifications
You must be signed in to change notification settings - Fork 107
Add Elasticsearch semantic search getting started guide #1922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔍 Preview links for changed docs:
🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes. |
Love the overall structure and the use of the stepper! |
+1 on the stepper You can hosts gifs on Contentstack as well in case we don't want too heavy files stored in GH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thank you for creating it! I left a couple of comments, mostly nits.
solutions/search/serverless-elasticsearch-get-started-semantic.md
Outdated
Show resolved
Hide resolved
solutions/search/serverless-elasticsearch-get-started-semantic.md
Outdated
Show resolved
Hide resolved
solutions/search/serverless-elasticsearch-get-started-semantic.md
Outdated
Show resolved
Hide resolved
Co-authored-by: István Zoltán Szabó <[email protected]>
Co-authored-by: István Zoltán Szabó <[email protected]>
Thanks @florent-leborgne! I've switched to linking to contentstack in 6afc299 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a few ideas and observations reading through this :)
--- | ||
# Get started with semantic search | ||
|
||
_Semantic search_ is a type of AI-powered search that enables you to use intuitive language in your queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_Semantic search_ is a type of AI-powered search that enables you to use intuitive language in your queries. | |
_Semantic search_ is a type of AI-powered search that enables you to use natural language in your queries. |
nit: more common term
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could have an introductory sentence that states the goal of the quickstart:
In this hands-on quickstart we blah blah blah
.
|
||
## Prerequisites | ||
|
||
::::{tab-set} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tabs feel almost like overkill here TBH
:sync: stack | ||
|
||
- An {{es}} cluster for storing and searching your data, and {{kib}} for visualizing and managing your data. This quickstart is available for all [Elastic deployment models](/deploy-manage/deploy.md). The quickest way to get started is by using [{{es-serverless}}](/solutions/search/serverless-elasticsearch-get-started.md). | ||
- If you want to add sample data, you must have authority to create an index, create documents, and view them. To use {{kib}}, you'll also need read authority for the **Discover**, **Dev Tools**, and **{{es}}** features at a minimum. For example, create a custom role that has `all` index privileges for the sample index ("semantic-index") and `read` authority for the specific {{kib}} features. To learn more, refer to [](/deploy-manage/users-roles/cluster-or-deployment-auth/user-roles.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that this might be a bit confusing and we could just assume that the majority of users (particularly if it's a trial) will have these perms already?
|
||
## Add data | ||
|
||
% TBD: What type of data is ideal for semantic search? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in brief: unstructured text (documents, articles, descriptions)
not ideal for exact keyword search over structured or timestamped data
:::{tab-item} {{serverless-short}} | ||
:sync: serverless | ||
There are some small data sets available for learning purposes. | ||
Go to **{{es}} > Home**, select the semantic search workflow, and click **Create a semantic optimized index**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that we're introducing a few too many options here, if we want to be opinionated I'd stick to using Console
We also need some value-add over the in-product guides, otherwise we should just point users to them 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see an argument for this to not be another tabbed widget
|
||
What just happened? The content was transformed into a sparse vector, which involves two main steps. | ||
First, the content is divided into smaller, manageable chunks to ensure that meaningful segments can be more effectively processed and searched. Then each chunk of text is transformed into a sparse vector representation using text expansion techniques. | ||
By default, `semantic_text` fields leverage ELSER to transform the content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence might be best placed in the create a semantic_text
field section
Also seems like an important callout that if you want to use dense vectors (generally more common approach) then you'll have to do a bit more than simply create the semantic_text field
{{es}} provides a variety of query languages for interacting with your data. | ||
For an overview of their features and use cases, check out [](/explore-analyze/query-filter/languages.md). | ||
|
||
You can search data that is stored in `semantic_text` fields by using a specific subset of queries, including `knn`, `match`, `semantic`, and `sparse_vector`. Refer to [Semantic text field type](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) for the complete list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general remark: again I worry that we're introducing a few many things at once:
- Multiple query languages
- Multiple query types to target semantic_text fields (the happy path is the
semantic
query, especially if you're going down the default elser route) - Multiple interfaces for running your queries
Just thinking might be a bit overwhelming for a new user :)
|
||
::::{step} Run a semantic query with Query DSL | ||
|
||
Open the **{{index-manage-app}}** page from the navigation menu or return to the [guided index flow](/solutions/search/serverless-elasticsearch-get-started.md#elasticsearch-follow-guided-index-flow) to find code examples for searching the sample data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I worry that we're interleaving suggestions about querying sample data provided in the UI, with the data we've ingested here ourselves. I'd stick to going to straight to console personally in the Query DSL option.
|
||
This is a [semantic](elasticsearch://reference/query-languages/query-dsl/query-dsl-semantic-query.md) query that is expressed in [Query Domain Specific Language](/explore-analyze/query-filter/languages/querydsl.md) (DSL), which is the primary query language for {{es}}. | ||
|
||
The query is translated automatically into a vector representation and runs against the contents of the semantic text field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is true for any semantic query and is an important fact for a new user's mental model, so maybe it could be raised up a level to the query section introduction?
Importantly, the query is transformed using the same model that transformed the documents. The beauty of semantic_text is that it's smart about that. In the olden days, or in more manual workflows, the user has to make sure they're specifying the same model for both.
|
||
In this example, the document related to Rocky Mountain National park has the highest score. | ||
:::: | ||
::::{step} Run a match query in ES|QL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
::::{step} Run a match query in ES|QL | |
::::{step} Run a semantic query in ES|QL |
Even though ESQL re-uses the match function for semantic queries, it's confusing to call it a match query 😄
This PR aims to port content from https://www.elastic.co/getting-started/enterprise-search/build-a-semantic-search-experience into the https://www.elastic.co/docs/solutions/search/get-started section of the documentation.
Some significant changes include:
NOTE: To align with #1929, the new semantic search page is grouped into a quickstart section. I think after all the getting-started guides are migrated into this section we can revisit how/if the API quickstarts can also be included in this section.