Simple daemon to monitor quality of DoubleZero connection
and detect subtle failure modes that doublezerod
daemon can not detect on its own.
These scripts may modify nftables state and may interact with the validator. Please understand what the entire script does before running it to make sure it will not break anything on your system.
This works by counting packets coming to the validator, and if there are no packets from a sufficintly high % of stake, it will disconnect DZ "just in case".
This will not trigger on minor DZ packet loss, only substantial failures in the network configuration.
Malbec Labs maintain a PPA which includes releases of this tool:
curl -1sLf https://dl.cloudsmith.io/public/malbeclabs/doublezero/setup.deb.sh | sudo -E bash
sudo apt-get install doublezero-monitor-tool
Malbec Labs maintain a repository which includes releases of this tool:
curl -1sLf https://dl.cloudsmith.io/public/malbeclabs/doublezero/setup.rpm.sh | sudo -E bash
sudo yum install doublezero-monitor-tool
You can run the script from an unpriviledged user account or as root.
Sudo access to the nft
and ip
commands should be granted to use this as an
unpriviledged user.
Edit the config.py
file to configure the parameters to your liking.
Running this in tmux/zellij and monitoring the output
is a viable way to test that the parameters are chosen correctly.
For permanent install it is recommended to have a systemd service configured to
ensure the monitor starts every time the hosts reboots.
A systemd unit doublezero_monitor.service
is provided, install as appropriate for your system.
sudo cp doublezero_monitor.service /etc/systemd/system/
Keep in mind that when running as system service, the script will run as root.
use
./monitor_ibrl.py
Once the script disconnects DZ, it will not automatically reconnect it, as it has no way to test if DZ is back or not short of switching the validator to a potentially broken configuration.
When this script disconnects DZ, it does so by issuing ip link set doublezero0 down
command.
To reenable DZ after it was disconnected, call ip link set doublezero0 up
and restart the
monitoring service.
use
./monitor.py
You will have to configure the validator for multihoming, and sync up the list of IP addresses in validator config and in the script.
PRs are welcome!
- Cascade the pings in active monitoring better to avoid bursts of traffic
- Switch to named counters in nftables?
- rewrite it in Rust (tm)
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.