Skip to content

Conversation

godber
Copy link
Member

@godber godber commented Aug 8, 2025

When Elasticsearch queries exceed the maxClauseCount limit (1024 by default), they fail with a search_phase_execution_exception containing too_many_clauses. Previously, these errors were not properly surfaced to users, causing jobs to appear "completed" instead of failed.

Root Cause:

  • Errors during slicer initialization (count operations) were being caught and treated as "no data found" scenarios
  • SpacesReaderClient error handling didn't properly format elasticsearch errors
  • Job lifecycle treated failed slice range creation as successful completion

Changes:

  1. Enhanced throwRequestError() to detect and properly format elasticsearch search_phase_execution_exception errors including too_many_clauses
  2. Added error handling wrapper in DateReaderAPISlicer.initialize() to prevent elasticsearch errors from being silently treated as "no data"
  3. Added comprehensive tests for both fetcher and slicer error scenarios

Impact:

  • Users now receive clear error messages when queries have too many clauses
  • Jobs fail properly instead of appearing to complete successfully
  • Error messages include specific elasticsearch error details

Fixes #1512

🤖 Generated with Claude Code

When Elasticsearch queries exceed the maxClauseCount limit (1024 by default),
they fail with a search_phase_execution_exception containing too_many_clauses.
Previously, these errors were not properly surfaced to users, causing jobs
to appear "completed" instead of failed.

**Root Cause:**
- Errors during slicer initialization (count operations) were being caught
  and treated as "no data found" scenarios
- SpacesReaderClient error handling didn't properly format elasticsearch errors
- Job lifecycle treated failed slice range creation as successful completion

**Changes:**
1. Enhanced throwRequestError() to detect and properly format elasticsearch
   search_phase_execution_exception errors including too_many_clauses
2. Added error handling wrapper in DateReaderAPISlicer.initialize() to
   prevent elasticsearch errors from being silently treated as "no data"
3. Added comprehensive tests for both fetcher and slicer error scenarios

**Impact:**
- Users now receive clear error messages when queries have too many clauses
- Jobs fail properly instead of appearing to complete successfully
- Error messages include specific elasticsearch error details

Fixes #1512

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@godber godber marked this pull request as draft August 8, 2025 17:22
godber and others added 4 commits August 8, 2025 10:29
- Remove trailing spaces
- Fix operator line breaks (move || to beginning of line)
- Ensure consistent code style

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Development commands (build, test, lint)
- Project structure overview
- Error handling architecture notes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Mock both date range requests (asc and desc sort orders)
- Remove scope.isDone() check since slicer may not use all mocks
- Test now properly validates error handling during initialization

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Instead of checking for specific error types like 'too_many_clauses', now
ALL elasticsearch errors in the response.error field are properly formatted
and surfaced to users.

**Changes:**
- Handle ANY string error as an elasticsearch error
- Handle structured errors with type/reason fields generically
- JSON stringify complex error objects for visibility
- Mark all elasticsearch errors as 'unsafe' so they're not hidden
- Remove specific pattern matching for individual error types

**Benefits:**
- Fixes not just too_many_clauses but ALL elasticsearch errors
- More maintainable - no need to add specific error types
- Better user experience for any elasticsearch query failures
- Future-proof for new elasticsearch error types

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link

github-actions bot commented Aug 8, 2025

@godber godber requested review from ciorg and Copilot August 12, 2025 17:45
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ciorg
Copy link
Member

ciorg commented Aug 12, 2025

I'm still seeing the same result.

Response from CLI is :

tjm update archive-details-data_ip.json --start

satarting job-1, e9e60c73-90f6-40d1-9131-2d6b254427ef
> Job has already completed
--------
job: job-1
id: e9e60c73-90f6-40d1-9131-2d6b254427ef
cluster: XXX
--------

I thought maybe the issue could be in the CLI, like if the CLI was eating the error. So I tried starting the job manually via the TS api:

curl -XPOST 'TS_URL/jobs/e9e60c73-90f6-40d1-9131-2d6b254427ef/_start' 
{
    "job_id": "e9e60c73-90f6-40d1-9131-2d6b254427ef",
    "ex_id": "bc476738-23cb-4744-bfc6-be34ce482c22"
}

But that didn't throw an error either.

@godber
Copy link
Member Author

godber commented Aug 12, 2025

Drat, thanks Charlie.

@godber
Copy link
Member Author

godber commented Aug 14, 2025

When starting the test job I can see all the execution controller and worker pods start up. In the execution controller logs I can see normal job startup stuff then this WARN three times

[2025-08-14T00:10:58.958Z]  WARN: teraslice/7 on ts-exc-job-1-e9e60c73-90f6-rc8xt: No data was found in index: index-1 using query: ip:(TERM_1 OR TERM_2 OR ... TERM_2023)

So it's getting No data was found in index back? Or maybe that's coming from the execution controller. But maybe spaces isn't returning:

search_phase_execution_exception: [too_many_clauses] Reason: too_many_clauses: maxClauseCount is set to 1024
  at onBody (/PATH/node_modules/opensearch2/lib/Transport.js:426:23)

like Elasticsearch is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error for too many search clauses not being exposed to the user
2 participants