diff --git a/.gitignore b/.gitignore index a36a5b23..60e28e75 100644 --- a/.gitignore +++ b/.gitignore @@ -3,3 +3,4 @@ tmp .DS_Store _build __pycache__ +_test/*.html diff --git a/topics/cluster-tutorial.md b/topics/cluster-tutorial.md index f8bc909c..1af47aea 100644 --- a/topics/cluster-tutorial.md +++ b/topics/cluster-tutorial.md @@ -191,18 +191,29 @@ in the `valkey.conf` file. To create and use a Valkey Cluster, follow these steps: -* [Create a Valkey Cluster](#create-a-valkey-cluster) -* [Interact with the cluster](#interact-with-the-cluster) -* [Write an example app with redis-rb-cluster](#write-an-example-app-with-redis-rb-cluster) -* [Reshard the cluster](#reshard-the-cluster) -* [A more interesting example application](#a-more-interesting-example-application) -* [Test the failover](#test-the-failover) -* [Manual failover](#manual-failover) -* [Add a new node](#add-a-new-node) -* [Remove a node](#remove-a-node) -* [Replica migration](#replica-migration) -* [Upgrade nodes in a Valkey Cluster](#upgrade-nodes-in-a-valkey-cluster) -* [Migrate to Valkey Cluster](#migrate-to-valkey-cluster) +- [Valkey Cluster 101](#valkey-cluster-101) + - [Valkey Cluster TCP ports](#valkey-cluster-tcp-ports) + - [Valkey Cluster and Docker](#valkey-cluster-and-docker) + - [Valkey Cluster data sharding](#valkey-cluster-data-sharding) + - [Valkey Cluster primary-replica model](#valkey-cluster-primary-replica-model) + - [Valkey Cluster consistency guarantees](#valkey-cluster-consistency-guarantees) +- [Valkey Cluster configuration parameters](#valkey-cluster-configuration-parameters) +- [Create and use a Valkey Cluster](#create-and-use-a-valkey-cluster) + - [Requirements to create a Valkey Cluster](#requirements-to-create-a-valkey-cluster) + - [Create a Valkey Cluster](#create-a-valkey-cluster) + - [Interact with the cluster](#interact-with-the-cluster) + - [Write an example app with Valkey GLIDE](#write-an-example-app-with-valkey-glide) + - [Reshard the cluster](#reshard-the-cluster) + - [A more interesting example application](#a-more-interesting-example-application) + - [Test the failover](#test-the-failover) + - [Manual failover](#manual-failover) + - [Add a new node](#add-a-new-node) + - [Add a new node as a replica](#add-a-new-node-as-a-replica) + - [Remove a node](#remove-a-node) + - [Replica migration](#replica-migration) + - [Upgrade nodes in a Valkey Cluster](#upgrade-nodes-in-a-valkey-cluster) + - [Migrate to Valkey Cluster](#migrate-to-valkey-cluster) +- [Learn more](#learn-more) But, first, familiarize yourself with the requirements for creating a cluster. @@ -271,11 +282,11 @@ You can configure and execute individual instances manually or use the create-cl Let's go over how you do it manually. To create the cluster, run: - - valkey-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 \ - 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \ - --cluster-replicas 1 - +```bash +valkey-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 \ +127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \ +--cluster-replicas 1 +``` The command used here is **create**, since we want to create a new cluster. The option `--cluster-replicas 1` means that we want a replica for every primary created. @@ -297,7 +308,7 @@ system (but you'll not learn the same amount of operational details). Find the `utils/create-cluster` directory in the Valkey distribution. There is a script called `create-cluster` inside (same name as the directory -it is contained into), it's a simple bash script. In order to start +it is contained into), it's a bash script. In order to start a 6 nodes cluster with 3 primaries and 3 replicas just type the following commands: @@ -322,7 +333,7 @@ See the documentation for your [client of choice](../clients/) to determine its You can also test your Valkey Cluster using the `valkey-cli` command line utility: -``` +```bash $ valkey-cli -c -p 7000 127.0.0.1:7000> set foo bar -> Redirected to slot [12182] located at 127.0.0.1:7002 @@ -350,11 +361,11 @@ right node. The map is refreshed only when something changed in the cluster configuration, for example after a failover or after the system administrator changed the cluster layout by adding or removing nodes. -#### Write an example app with redis-rb-cluster +#### Write an example app with Valkey GLIDE Before going forward showing how to operate the Valkey Cluster, doing things like a failover, or a resharding, we need to create some example application -or at least to be able to understand the semantics of a simple Valkey Cluster +or at least to be able to understand the semantics of a Valkey Cluster client interaction. In this way we can run an example and at the same time try to make nodes @@ -362,102 +373,113 @@ failing, or start a resharding, to see how Valkey Cluster behaves under real world conditions. It is not very helpful to see what happens while nobody is writing to the cluster. -This section explains some basic usage of -[redis-rb-cluster](https://github.com/antirez/redis-rb-cluster) showing two -examples. -The first is the following, and is the -[`example.rb`](https://github.com/antirez/redis-rb-cluster/blob/master/example.rb) -file inside the redis-rb-cluster distribution: - -```ruby -require './cluster' - -if ARGV.length != 2 - startup_nodes = [ - {:host => "127.0.0.1", :port => 7000}, - {:host => "127.0.0.1", :port => 7001} - ] -else - startup_nodes = [ - {:host => ARGV[0], :port => ARGV[1].to_i} - ] -end - -rc = RedisCluster.new(startup_nodes,32,:timeout => 0.1) - -last = false - -while not last - begin - last = rc.get("__last__") - last = 0 if !last - rescue => e - puts "error #{e.to_s}" - sleep 1 - end -end - -((last.to_i+1)..1000000000).each{|x| - begin - rc.set("foo#{x}",x) - puts rc.get("foo#{x}") - rc.set("__last__",x) - rescue => e - puts "error #{e.to_s}" - end - sleep 0.1 -} -``` +This section showcases the core functionality of Valkey through a practical Node.js application. +For our example, we will use the [Node.js version of GLIDE,](https://github.com/valkey-io/valkey-glide/tree/main/node) an official Valkey client library that supports multiple languages. -The application does a very simple thing, it sets keys in the form `foo` to `number`, one after the other. So if you run the program the result is the -following stream of commands: +The following example demonstrates how to connect to a Valkey cluster and perform +basic operations. First, install the Valkey GLIDE client: -* SET foo0 0 -* SET foo1 1 -* SET foo2 2 -* And so forth... +```bash +npm install @valkey/valkey-glide +``` -The program looks more complex than it should usually as it is designed to -show errors on the screen instead of exiting with an exception, so every -operation performed with the cluster is wrapped by `begin` `rescue` blocks. +Here's the example code: + +```javascript +import { GlideClusterClient } from "@valkey/valkey-glide"; + +async function runExample() { + const addresses = [ + { + host: "localhost", + port: 6379, + }, + ]; + // Check `GlideClientConfiguration/GlideClusterClientConfiguration` for additional options. + const client = await GlideClusterClient.createClient({ + addresses: addresses, + // if the cluster nodes use TLS, you'll need to enable it. + // useTLS: true, + // It is recommended to set a timeout for your specific use case + requestTimeout: 500, // 500ms timeout + clientName: "test_cluster_client", + }); + + try { + console.log("Connected to Valkey cluster"); + + // Get the last counter value, or start from 0 + let last = false; + while (!last) { + try { + last = await client.get("__last__"); + last = last || 0; + } catch (error) { + console.log(`Error getting counter: ${error.message}`); + await new Promise(resolve => setTimeout(resolve, 1000)); + } + } + + console.log(`Starting from counter: ${last}`); + + // Write keys sequentially with verification + for (let x = parseInt(last) + 1; x <= 1000000000; x++) { + try { + // Set the key + await client.set(`foo${x}`, x.toString()); + + // Get and verify the value + const value = await client.get(`foo${x}`); + console.log(value); + + // Update the counter + await client.set("__last__", x.toString()); + + // Add delay equivalent to Ruby's sleep 0.1 + await new Promise(resolve => setTimeout(resolve, 100)); + + } catch (error) { + console.log(`Error: ${error.message}`); + } + } + } catch (error) { + console.log(`Connection error: ${error.message}`); + } finally { + client.close(); + } +} -The **line 14** is the first interesting line in the program. It creates the -Valkey Cluster object, using as argument a list of *startup nodes*, the maximum -number of connections this object is allowed to take against different nodes, -and finally the timeout after a given operation is considered to be failed. +runExample().catch(console.error); +``` -The startup nodes don't need to be all the nodes of the cluster. The important -thing is that at least one node is reachable. Also note that redis-rb-cluster -updates this list of startup nodes as soon as it is able to connect with the -first node. You should expect such a behavior with any other serious client. +The application writes keys in the format `foo` to `number`, one after the other. For each key, it performs a SET operation, immediately verifies the value with a GET operation, updates a counter, and includes a small delay between operations. -Now that we have the Valkey Cluster object instance stored in the **rc** variable, -we are ready to use the object like if it was a normal Valkey object instance. +The program includes comprehensive error handling to display errors instead of +crashing, so all cluster operations are wrapped in try-catch blocks. -This is exactly what happens in **line 18 to 26**: when we restart the example -we don't want to start again with `foo0`, so we store the counter inside -Valkey itself. The code above is designed to read this counter, or if the -counter does not exist, to assign it the value of zero. +The **client creation section** is the first key part of the program. It creates the +Valkey cluster client using a list of cluster *addresses* and configuration options +including a request timeout and client name. -However note how it is a while loop, as we want to try again and again even -if the cluster is down and is returning errors. Normal applications don't need -to be so careful. +The addresses don't need to be all the nodes of the cluster. The important +thing is that at least one node is reachable. A cluster-aware client, such as Valkey GLIDE automatically +discovers the complete cluster topology once it connects to any node. -**Lines between 28 and 37** start the main loop where the keys are set or -an error is displayed. +Now that we have the cluster client instance, we can use it like any other +Valkey client to perform operations across the cluster. -Note the `sleep` call at the end of the loop. In your tests you can remove -the sleep if you want to write to the cluster as fast as possible (relatively -to the fact that this is a busy loop without real parallelism of course, so -you'll get the usually 10k ops/second in the best of the conditions). +The **counter initialization section** reads a counter with retry logic so that when we restart the example +we don't start again with `foo0`, but continue from where we left off. +The counter is stored in Valkey itself using the key `__last__`. -Normally writes are slowed down in order for the example application to be -easier to follow by humans. +The **main processing loop** sets keys sequentially, immediately verifies each value with a GET operation, updates the counter after each key, and includes a 100ms delay between operations to match the original Ruby example's timing. Starting the application produces the following output: ``` -ruby ./example.rb +node example.js +Connected to Valkey cluster +Starting from counter: 0 1 2 3 @@ -477,9 +499,8 @@ is running. #### Reshard the cluster Now we are ready to try a cluster resharding. To do this, please -keep the example.rb program running, so that you can see if there is some -impact on the program running. Also, you may want to comment the `sleep` -call to have some more serious write load during resharding. +keep the example.js program running, so that you can see if there is some +impact on the program running. Resharding basically means to move hash slots from a set of nodes to another set of nodes. @@ -487,7 +508,7 @@ Like cluster creation, it is accomplished using the valkey-cli utility. To start a resharding, just type: - valkey-cli --cluster reshard 127.0.0.1:7000 + `valkey-cli --cluster reshard 127.0.0.1:7000` You only need to specify a single node, valkey-cli will find the other nodes automatically. @@ -532,7 +553,7 @@ during the resharding if you want. At the end of the resharding, you can test the health of the cluster with the following command: - valkey-cli --cluster check 127.0.0.1:7000 + `valkey-cli --cluster check 127.0.0.1:7000` All the slots will be covered as usual, but this time the primary at 127.0.0.1:7000 will have more hash slots, something around 6461. @@ -541,7 +562,7 @@ Resharding can be performed automatically without the need to manually enter the parameters in an interactive way. This is possible using a command line like the following: - valkey-cli --cluster reshard : --cluster-from --cluster-to --cluster-slots --cluster-yes + `valkey-cli --cluster reshard : --cluster-from --cluster-to --cluster-slots --cluster-yes` This allows to build some automatism if you are likely to reshard often, however currently there is no way for `valkey-cli` to automatically @@ -557,32 +578,32 @@ Note that this option can also be activated by setting the #### A more interesting example application The example application we wrote early is not very good. -It writes to the cluster in a simple way without even checking if what was +It writes to the cluster in a straightforward way without even checking if what was written is the right thing. From our point of view the cluster receiving the writes could just always write the key `foo` to `42` to every operation, and we would not notice at all. -So in the `redis-rb-cluster` repository, there is a more interesting application -that is called `consistency-test.rb`. It uses a set of counters, by default 1000, and sends `INCR` commands in order to increment the counters. +Now we can write a more interesting application for testing cluster behavior. +A comprehensive consistency checking application that uses a set of counters, by default 1000, and sends `INCR` commands to increment the counters. However instead of just writing, the application does two additional things: * When a counter is updated using `INCR`, the application remembers the write. * It also reads a random counter before every write, and check if the value is what we expected it to be, comparing it with the value it has in memory. -What this means is that this application is a simple **consistency checker**, +What this means is that this application is a **consistency checker**, and is able to tell you if the cluster lost some write, or if it accepted a write that we did not receive acknowledgment for. In the first case we'll see a counter having a value that is smaller than the one we remember, while in the second case the value will be greater. -Running the consistency-test application produces a line of output every +Running a consistency testing application produces a line of output every second: -``` -$ ruby consistency-test.rb +```bash +node consistency-test.js 925 R (0 err) | 925 W (0 err) | 5030 R (0 err) | 5030 W (0 err) | 9261 R (0 err) | 9261 W (0 err) | @@ -748,7 +769,7 @@ We'll show both, starting with the addition of a new primary instance. In both cases the first step to perform is **adding an empty node**. -This is as simple as to start a new node in port 7006 (we already used +This is as straightforward as starting a new node in port 7006 (we already used from 7000 to 7005 for our existing 6 nodes) with the same configuration used for the other nodes, except for the port number, so what you should do in order to conform with the setup we used for the previous nodes: @@ -807,8 +828,7 @@ having as a target the empty node. Adding a new replica can be performed in two ways. The obvious one is to use valkey-cli again, but with the --cluster-replica option, like this: - valkey-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000 --cluster-replica - +`valkey-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000 --cluster-replica` Note that the command line here is exactly like the one we used to add a new primary, so we are not specifying to which primary we want to add the replica. In this case, what happens is that valkey-cli will add the new @@ -817,8 +837,7 @@ node as replica of a random primary among the primaries with fewer replicas. However you can specify exactly what primary you want to target with your new replica with the following command line: - valkey-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000 --cluster-replica --cluster-master-id 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e - +`valkey-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000 --cluster-replica --cluster-master-id 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e` This way we assign the new replica to a specific primary. A more manual way to add a replica to a specific primary is to add the new @@ -849,9 +868,9 @@ The node 3c3a0c... now has two replicas, running on ports 7002 (the existing one To remove a replica node just use the `del-node` command of valkey-cli: - valkey-cli --cluster del-node 127.0.0.1:7000 `` + `valkey-cli --cluster del-node 127.0.0.1:7000 ` -The first argument is just a random node in the cluster, the second argument + `valkey-cli --cluster del-node 127.0.0.1:7000 ` is the ID of the node you want to remove. You can remove a primary node in the same way as well, **however in order to @@ -867,7 +886,7 @@ There is a special scenario where you want to remove a failed node. You should not use the `del-node` command because it tries to connect to all nodes and you will encounter a "connection refused" error. Instead, you can use the `call` command: - valkey-cli --cluster call 127.0.0.1:7000 cluster forget `` + `valkey-cli --cluster call 127.0.0.1:7000 cluster forget ` This command will execute `CLUSTER FORGET` command on every node. diff --git a/wordlist b/wordlist index dda5e4aa..af29be95 100644 --- a/wordlist +++ b/wordlist @@ -280,7 +280,7 @@ first-args firstkey FlameGraph fmt -foo[0-9] +foo[0-9]+ formatter fp_error france_location