Skip to content

Conversation

ggray-cb
Copy link
Contributor

This PR includes docs for the following doc tickets:

Preview URL:
https://preview.docs-test.couchbase.com/docs-server-DOC-12191_allow_auto-failover_ephemeral_buckets_w-o_replica_reconcile

You will need the Docs Team credentials on Confluence.

Primary changes (with links to preview):

* Allow Autofailover for ephemeral bucket w/o relica feature doc (DOC-12191)
* Support auto-failover for exceptionally slow/hanging disks (DOC-12073)

Manually ported over changes from prior branch because attempts to merge resulted in huge numbers of comflicts potentially with Supritha's changes to the underlying docs.
…ng that somehow there was a conflict based on changes the remote branch **WHICH DID NOT CHANGE** at least as far as Guthub showed me. Revernted AGAIN my changes to this doc and AGAIN manually applied them.
https://jira.issues.couchbase.com/browse/MB-34155[MB-34155] Support Auto-failover for exceptionally slow/hanging disks::
You can now configure Couchbase Server to trigger an auto-failover on a node if its data disk is slow to respond or is hanging.
Before version 8.0, you could only configure Couchbase Server to auto-failover a node if the data disk returned errors for a set period of time.
The new `failoverOnDataDiskNonResponsiveness` setting and correspinding settings in the Couchbase Web Console *Settings* page sets the nuber of seconds allowed for read or write oprtions to complete.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "oprtions" -- operations

* In no circumstances where data-loss might result: for example, when a bucket has no replicas.
Therefore, even a single event may not trigger a response; and an administrator-specified maximum number of failed nodes may not be reached.
* By default, Couchbase Server does not allow an auto-failover if it may result in data loss.
For example, with default settings Couchbase Server does not allows the auto-failover of a node that contains a bucket with no replicas.
Copy link
Contributor

@hyunjuV hyunjuV Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 85: Typo -- "does not allows"

Line 73: The sentence "Running a rebalance will reset the count value back to 0." is repeated (seen twice).


+
If the unreplicated ephemeral bucket is indexed, Couchbase Server rebuilds the index after it auto-fails over the node even if the index is not on the failed node.
After this type of failover, the index must be rebuild because it indexes data lost in the failed node's vBuckets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo -- "the index must be rebuild"

Copy link
Contributor

@hyunjuV hyunjuV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a few typo comments.
Otherwise, LGTM.

https://jira.issues.couchbase.com/browse/MB-34155[MB-34155] Support Auto-failover for exceptionally slow/hanging disks::
You can now configure Couchbase Server to trigger an auto-failover on a node if its data disk is slow to respond or is hanging.
Before version 8.0, you could only configure Couchbase Server to auto-failover a node if the data disk returned errors for a set period of time.
The new `failoverOnDataDiskNonResponsiveness` setting and correspinding settings in the Couchbase Web Console *Settings* page sets the nuber of seconds allowed for read or write oprtions to complete.
Copy link

@neelima32 neelima32 Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos "correspinding" - corresponding, nuber - number

+
These parameters are _optional_, and are only supported by Couchbase Server Enterprise Edition.
These parameters and their values are ignored, if `enabled` is set to `false`.
If supply a value for this parameter while `failoverOnDataDiskIssues[enabled]` is `false`, Couchbase Server igtnores the setting.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a value is supplied for this parameter?
Typo: igtnores


* If the value of `timeout` is incorrectly specified, `400 Bad Request` is returned, with the message `The value of "timeout" must be a positive integer in a range from 5 to 3600`.
You must have one of the following roles makes changes to the auto-failover settings:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make changes?

@@ -16,63 +16,56 @@ GET /settings/autoFailover

The `GET /settings/autoFailover` HTTP method and URI retrieve auto-failover settings for the cluster.

Auto-failover settings are global, and apply to all nodes in the cluster.
To read auto-failover settings, one of the following roles is required: Full Admin, Cluster Admin, Read-Only Admin, Backup Full Admin, Eventing Full Admin, Local User Security Admin, External User Security Admin.
Auto-failover settings are global applying to all nodes in the cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global, applying


* `count`.
The number of nodes that Couchbase Server has auto-failed over.
COucbase Server resets this value to zero either when the cluster rebalances to remove or rejoin the failed nodes, or when an administrator manually resets the count (see xref:rest-api:rest-cluster-autofailover-reset.adoc[]).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Coucbase

+
--
** `enabled` indicates whether Couchbase Server initiates an auto-failover on a node when when its data disk has failed to complete an operation in the period set by `timePeriod`.
This value can be `true`, which enables the auto-failover, or teh default `false` which does not trigger a failover due to disk unresponsiveness.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: teh


* `canAbortRebalance`.
Whether or not auto-failover can be triggered if a _rebalance_ is in progress.
Sets whether Couchbase Server can perform an auto-failover can while a rebalance is taking place.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can while a rebalance > while a rebalance

If an ephemeral bucket lacks replicas, it loses the data in vBuckets on any node that fails and restarts.
To prevent this data loss, by default Couchbase Server does not allow auto-failover of a node that contains vBuckets for an unreplicated ephemeral bucket.
In this case, you must manually fail over the node if it is unresponsive.
However, all of ephemeral bucket's data on the node is lost.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of the ephemeral bucket's data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants