Fault Management

With linux-ha, drbd, and LVS we deploy bulletproof redundancy for UNIX systems, and if necessary, without the need for a shared storage device. A linux-ha cluster will provide failover in case of physical hardware failure, drdb will ensure that there are mirrors of your data on multiple machines, and LVS will act as a load balancer -- providing a single IP to users, spreading the load across all of your web or application servers.

We also use the open-source syslog-ng server. Syslog-ng will forward alerts based on message priority. If it receives a system-level alert such a failed disk, you will receive an email and have ample time to replace it. If it receives a low-priority alert, it can forward it to an email address you only check once a week. With a centralized server, all logs from your UNIX machines, switches, and routers go to the same place.