Migration Best Practices
When we run a mainnet validator for a network, we always try to run a fully-synced non-validator node as backup. Below is the details about practicing validator swap with a fully synced node.
Scenario
We have two nodes: Node A (validator) and Node B (fully-synced non-validator). Both are running under the management of cosmovisor. We have a backup copy of priv_validator_key.json from Node A. We want to move the validator role from Node A to Node B.
Best Practice
Double Check: Double check that we have the correct backup for
priv_validator_key.json.Stop Node A: Stop
cosmovisorservice on Node A. Double check the service status to make sure it is off. On pundiscan explorer, you should see the validator missing blocks.Restart Node A As Non-Validator: Delete
priv_validator_key.jsonfrom Node A and restartcosmovisorservice. Check 3 places to ensure no double-signing. First, the newly generatedpriv_validator_key.jsonshould be different from the one on the backup. Second, on pundiscan explorer, you should see the validator missing blocks. Third, run aBINARY statuscommand and make sure to seeVotingPoweras 0.Change Validator Role to Node B: Stop Node B's
cosmovisorservice. Replace Node B'spriv_validator_key.jsonwith the backup copy from A. Restart B'scosmovisorservice. Make sure it is making blocks on pundiscan explorer.House Clean: The specifics only apply to how we set up the cluster. You should adapt according to your setup.
Swap the server name with "sudo hostname xxx" on each server (for example, between "fxcore_mainnet" and "fxcore_mainnet_backup")
Swap the server shortcut in ~/.ssh/config file for each ssh login
Swap the log name in the promtail.yml file on each server and restart promtail.
Swap the server hosts in the monitoring server deployment script so Prometheus and Grafana get the servers right. Redeploy the monitoring script
Update the server names on the server provider (Hetzner, Contabo, AWS, GCP, Alicloud) to avoid confusion
Update the inventory file in the cosmos server deployment script. This is not very important since the deployment of both servers are completed. However, it is so to avoid future confusions because both servers will still be running, although in the opposite role of validator vs. non-validator.
Last updated