# Solana Validator Monitoring and Failover System
Ansible-based system for monitoring Solana validators and managing automatic failover between main and backup validators.
## Features
### Monitoring
- Real-time monitoring of validator delinquent status
- SSH connectivity checks for both validators
- Automatic tower file backups
- Detailed logging of system state
### Automatic Failover
Handles different scenarios of validator failures:
- Service failure (validator process not running)
- Server unavailability (server down/unreachable)
- Node lag (validator fell behind and marked delinquent)
### Manual Rollback
Provides safe manual rollback functionality when returning to main validator.
## Requirements
- Ansible 2.10+
- Solana CLI tools installed on Sentry server
- SSH access to both validators
- Proper validator setup with identity keys
## Installation
1. Clone the repository:
```bash
git clone https://github.com/your-repo/solana-validator-monitor.git
cd solana-validator-monitor
- Configure variables in
group_vars/all.yml
:
# Main Validator
main_validator_ip: "YOUR_MAIN_IP"
main_validator_ssh_port: YOUR_SSH_PORT
main_validator_identity_key: "YOUR_VALIDATOR_PUBKEY"
# Backup Validator
backup_validator_ip: "YOUR_BACKUP_IP"
backup_validator_ssh_port: YOUR_SSH_PORT
- Place SSH keys in
files/
directory:
cp /path/to/your/ssh/key files/validator_key
chmod 600 files/validator_key
- Initialize the system:
ansible-playbook playbooks/init.yml
ansible-playbook playbooks/monitor.yml
Add to crontab for continuous monitoring:
*/5 * * * * cd /path/to/project && ansible-playbook playbooks/monitor.yml
When you want to return to main validator:
ansible-playbook playbooks/rollback.yml
System tracks active validator in /var/run/solana_validator/active_validator
:
- "main" - Main validator is active
- "backup" - Backup validator is active
- Detects main validator delinquency
- Changes main validator's identity to unstaked
- Transfers tower file to backup validator
- Activates backup validator with staked identity
- Verifies backup validator is currently active
- Checks main validator accessibility
- Transfers tower file back to main validator
- Restores proper identity keys
- Updates system state
.
├── inventory/
│ └── hosts.yml
├── group_vars/
│ └── all.yml
├── roles/
│ ├── monitoring/
│ ├── backup/
│ ├── failover/
│ ├── rollback/
│ └── ssh_setup/
├── playbooks/
│ ├── init.yml
│ ├── monitor.yml
│ └── rollback.yml
└── files/
└── validator_key
System maintains logs in /var/run/solana_validator/
:
last_check.log
- Latest monitoring statusfailover.status
- Failover event recordsactive_validator
- Current active validator
- SSH connectivity verification
- Proper identity key management
- Prevents duplicate active validators
- Safe tower file transfers
- Multiple failure scenario handling
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request