Auto failover features for Morpheus #3854

ggray-cb · 2025-08-14T14:24:39Z

This PR includes docs for the following doc tickets:

DOC-12191 Doc for Allow auto-failover for ephemeral buckets even if no replica is configured
DOC-12073 Doc: Support auto-failover for exceptionally slow/hanging disks

Preview URL:
https://preview.docs-test.couchbase.com/docs-server-DOC-12191_allow_auto-failover_ephemeral_buckets_w-o_replica_reconcile

You will need the Docs Team credentials on Confluence.

Primary changes (with links to preview):

What's New entries
In Automatic Failover added description of unresponsive disks being an auto failover trigger and added discussion of auto failover and ephemeral buckets.
in Node Availability section of the General Settings page, updated procedures for setting auto failover for ephemeral bucket /wo replicas and slow disks.
Updated Enabling and Disabling Auto-Failover with the new settings.

* Allow Autofailover for ephemeral bucket w/o relica feature doc (DOC-12191) * Support auto-failover for exceptionally slow/hanging disks (DOC-12073) Manually ported over changes from prior branch because attempts to merge resulted in huge numbers of comflicts potentially with Supritha's changes to the underlying docs.

…my files now.

…ng that somehow there was a conflict based on changes the remote branch **WHICH DID NOT CHANGE** at least as far as Guthub showed me. Revernted AGAIN my changes to this doc and AGAIN manually applied them.

hyunjuV · 2025-08-14T22:32:00Z

modules/introduction/partials/new-features-80.adoc

+https://jira.issues.couchbase.com/browse/MB-34155[MB-34155] Support Auto-failover for exceptionally slow/hanging disks::
+You can now configure Couchbase Server to trigger an auto-failover on a node if its data disk is slow to respond or is hanging. 
+Before version 8.0, you could only configure Couchbase Server to auto-failover a node if the data disk returned errors for a set period of time.
+The new `failoverOnDataDiskNonResponsiveness` setting and correspinding settings in the Couchbase Web Console *Settings* page sets the nuber of seconds allowed for read or write oprtions to complete.


Typo "oprtions" -- operations

hyunjuV · 2025-08-15T05:04:38Z

modules/learn/pages/clusters-and-availability/automatic-failover.adoc

-* In no circumstances where data-loss might result: for example, when a bucket has no replicas.
-Therefore, even a single event may not trigger a response; and an administrator-specified maximum number of failed nodes may not be reached.
+* By default, Couchbase Server does not allow an auto-failover if it may result in data loss.  
+For example, with default settings Couchbase Server does not allows the auto-failover of a node that contains a bucket with no replicas.


Line 85: Typo -- "does not allows"

Line 73: The sentence "Running a rebalance will reset the count value back to 0." is repeated (seen twice).

hyunjuV · 2025-08-15T05:21:13Z

modules/learn/pages/clusters-and-availability/automatic-failover.adoc

+
+
+If the unreplicated ephemeral bucket is indexed, Couchbase Server rebuilds the index after it auto-fails over the node even if the index is not on the failed node.
+After this type of failover, the index must be rebuild because it indexes data lost in the failed node's vBuckets.


Typo -- "the index must be rebuild"

hyunjuV

Had a few typo comments.
Otherwise, LGTM.

neelima32 · 2025-08-15T21:44:01Z

modules/introduction/partials/new-features-80.adoc

+https://jira.issues.couchbase.com/browse/MB-34155[MB-34155] Support Auto-failover for exceptionally slow/hanging disks::
+You can now configure Couchbase Server to trigger an auto-failover on a node if its data disk is slow to respond or is hanging. 
+Before version 8.0, you could only configure Couchbase Server to auto-failover a node if the data disk returned errors for a set period of time.
+The new `failoverOnDataDiskNonResponsiveness` setting and correspinding settings in the Couchbase Web Console *Settings* page sets the nuber of seconds allowed for read or write oprtions to complete.


Typos "correspinding" - corresponding, nuber - number

neelima32 · 2025-08-15T21:53:32Z

modules/rest-api/pages/rest-cluster-autofailover-enable.adoc

-+
-These parameters are _optional_, and are only supported by Couchbase Server Enterprise Edition.
-These parameters and their values are ignored, if `enabled` is set to `false`.
+If supply a value for this parameter while `failoverOnDataDiskIssues[enabled]` is `false`, Couchbase Server igtnores the setting.


If a value is supplied for this parameter?
Typo: igtnores

neelima32 · 2025-08-15T21:54:33Z

modules/rest-api/pages/rest-cluster-autofailover-enable.adoc


-* If the value of `timeout` is incorrectly specified, `400 Bad Request` is returned, with the message `The value of "timeout" must be a positive integer in a range from 5 to 3600`.
+You must have one of the following roles makes changes to the auto-failover settings:


to make changes?

neelima32 · 2025-08-15T21:56:45Z

modules/rest-api/pages/rest-cluster-autofailover-settings.adoc

@@ -16,63 +16,56 @@ GET /settings/autoFailover

 The `GET /settings/autoFailover` HTTP method and URI retrieve auto-failover settings for the cluster.

-Auto-failover settings are global, and apply to all nodes in the cluster.
-To read auto-failover settings, one of the following roles is required: Full Admin, Cluster Admin, Read-Only Admin, Backup Full Admin, Eventing Full Admin, Local User Security Admin, External User Security Admin.
+Auto-failover settings are global applying to all nodes in the cluster.


global, applying

neelima32 · 2025-08-15T21:57:37Z

modules/rest-api/pages/rest-cluster-autofailover-settings.adoc

+
+* `count`.
+The number of nodes that Couchbase Server has auto-failed over.
+COucbase Server resets this value to zero either when the cluster rebalances to remove or rejoin the failed nodes, or when an administrator manually resets the count (see xref:rest-api:rest-cluster-autofailover-reset.adoc[]). 


Typo: Coucbase

neelima32 · 2025-08-15T21:58:20Z

modules/rest-api/pages/rest-cluster-autofailover-settings.adoc

+
+--
+** `enabled` indicates whether Couchbase Server initiates an  auto-failover on a node when when its data disk has failed to complete an operation in the period set by `timePeriod`.
+This value can be `true`, which enables the auto-failover,  or teh default `false` which does not trigger a failover due to disk unresponsiveness.


neelima32 · 2025-08-15T22:13:17Z

modules/rest-api/pages/rest-cluster-autofailover-enable.adoc


 * `canAbortRebalance`.
-Whether or not auto-failover can be triggered if a _rebalance_ is in progress.
+Sets whether Couchbase Server can perform an auto-failover can while a rebalance is taking place.


can while a rebalance > while a rebalance

neelima32 · 2025-08-15T22:14:52Z

modules/learn/pages/clusters-and-availability/automatic-failover.adoc

+If an ephemeral bucket lacks replicas, it loses the data in vBuckets on any node that fails and restarts.
+To prevent this data loss, by default Couchbase Server does not allow auto-failover of a node that contains vBuckets for an unreplicated ephemeral bucket.
+In this case, you must manually fail over the node if it is unresponsive. 
+However, all of ephemeral bucket's data on the node is lost.


all of the ephemeral bucket's data

ggray-cb added 8 commits August 12, 2025 15:45

Trying to fix whatever git's insanely arcane mechanisms have done to …

7215841

…my files now.

Some minor typo fixes

66122f1

Trying to clean up yet more git-inspired calamity caused by it decidi…

4521116

…ng that somehow there was a conflict based on changes the remote branch **WHICH DID NOT CHANGE** at least as far as Guthub showed me. Revernted AGAIN my changes to this doc and AGAIN manually applied them.

Typo fix

5034f6c

Assorted cleanup.

c1cfba7

Fixes to links in What's New and some minort edits.

0c29479

Removing stray text.

c37439c

ggray-cb requested review from anuthan, hyunjuV and neelima32 August 14, 2025 14:24

hyunjuV reviewed Aug 14, 2025

View reviewed changes

hyunjuV reviewed Aug 15, 2025

View reviewed changes

hyunjuV approved these changes Aug 15, 2025

View reviewed changes

neelima32 reviewed Aug 15, 2025

View reviewed changes

neelima32 approved these changes Aug 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auto failover features for Morpheus #3854

Auto failover features for Morpheus #3854

Uh oh!

ggray-cb commented Aug 14, 2025

Uh oh!

hyunjuV Aug 14, 2025

Uh oh!

hyunjuV Aug 15, 2025 •

edited

Loading

Uh oh!

hyunjuV Aug 15, 2025

Uh oh!

hyunjuV left a comment

Uh oh!

neelima32 Aug 15, 2025 •

edited

Loading

Uh oh!

neelima32 Aug 15, 2025

Uh oh!

neelima32 Aug 15, 2025

Uh oh!

neelima32 Aug 15, 2025

Uh oh!

neelima32 Aug 15, 2025

Uh oh!

neelima32 Aug 15, 2025

Uh oh!

neelima32 Aug 15, 2025

Uh oh!

neelima32 Aug 15, 2025

Uh oh!

Uh oh!


		* If the value of `timeout` is incorrectly specified, `400 Bad Request` is returned, with the message `The value of "timeout" must be a positive integer in a range from 5 to 3600`.
		You must have one of the following roles makes changes to the auto-failover settings:

Auto failover features for Morpheus #3854

Are you sure you want to change the base?

Auto failover features for Morpheus #3854

Uh oh!

Conversation

ggray-cb commented Aug 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hyunjuV Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hyunjuV left a comment

Choose a reason for hiding this comment

Uh oh!

neelima32 Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hyunjuV Aug 15, 2025 •

edited

Loading

neelima32 Aug 15, 2025 •

edited

Loading