Contingency planning - having a Plan B in case disaster strikes - is essential to making network monitoring truly effective.
An independent study of FCC outage reports found that nearly half of all reported outages are due to human error:
While people cause the greatest frequency of outages, it turns out that natural disasters and network overloads are responsible for 62% of customer downtime.
That means over half of all network outages are caused by factors out of your control.
Accepting this fact is the first step toward building a more reliable network. The next step is taking action to mitigate the effects of disasters.
If you've ever had a flat tire, you can certainly appreciate why your car came equipped with a spare tire.
It's obvious that having a backup unit is essential, especially in network monitoring. AT&T and other leading RBOCs make it a policy to keep spare units on hand, at various locations, just in case something unavoidable happens. These companies provide services to millions of people, including 911 services, government contracts, and other business critical services. The major RBOCs know that network downtime will mean lost clients, lost revenue, and, in the case of 911 services, possibly even lost lives.
Other examples of systems that benefit from having backups are CRAC units in a remote temperature monitoring system, data center cooling systems, wireless sensors, air conditioners, and microwave towers.
Does your network design include backup systems? Have you identified the critical, single-point-of-failure segments of your network and planned accordingly? All it takes is one lightning storm or flood to destroy equipment that could have lasted for years. As remarkable as overnight delivery is, do you really want to wait 24 hours-at best-plus installation time, for a spare unit to come online? You must protect your network by having spare units in stock.
Dual power supplies are great... if your network elements can handle them
Many companies have realized the value of having a backup power supply. But what if your equipment only has one power input?
The best practice is to always buy equipment that has dual power inputs (A/B power feeds) and can automatically switch to alternate source power, so that if one power source fails, your equipment is never affected.
Geodiversity and redundant systems
Some companies take their backup plans a step further by having multiple master stations collect alarms in different parts of the country. This principle is called geodiversity. But geodiversity only works if all your master stations are synchronized with each other.
There are many ways to synchronize masters. For example, a master-slave relationship synchronizes network element configuration data. If the master should fail, a slave station immediately takes over. There is also passive polling, in which a second station taps into the data steam of the first, creating a live backup, which is used in networks with copper-wire transport.
Dual masters need dual responders
If your redundancy plan is based on dual SNMP managers, make sure that your remote sources and monitoring devices can send traps to multiple SNMP managers. Many remotes do not support more than one SNMP manager. Even if you are currently using only one SNMP manager, it's still a good idea to make sure that your remotes can report to multiple managers, because you may want to implement a backup master as your network grows.
Multiple notification methods
Redundancy means more than just having backup units. A best practice that is often overlooked is the need for backup notification and transport methods. A missed email alert because your internet isn't connected could prevent you from seeing a critical alert.
Most network administrators prefer using a LAN/WAN transport layer, because it is faster and cheaper than a dial-up connection. Your LAN needs a backup, too, and the solution is to select a remote monitoring device that can report alarms both over LAN and Public Switch Telephone Network (PSTN)for your communication networks to maintain the transmission of data.
Multiple Device Paging:
Another often-overlooked necessity for a robust redundancy plan is multiple-personnel notification, which in effect gives you backup repair staff. If only one person is notified of a network fault, and that is person is delayed by traffic, illness, or any other unforeseen event, the fault will not be repaired. You can increase the odds of someone responding to a critical event by selecting a system that can notify multiple people using multiple methods. At the very least, make sure your remote alarm monitoring device can send a message to the master station and one other person or device-the more the better. Here are some notification methods to look for:
Have a specific question? Ask our team of expert engineers and get a specific answer!
Click here for more information.
Download our free Monitoring Fundamentals Tutorial.
An introduction to Monitoring Fundamentals strictly from the prospective of telecom network alarm management.
Click here for more information.