A Network Operations Center (NOC) forms the core of any medium or large-scale network monitoring effort. In it, your staff will monitor for and respond to network problems. Your NOC forms the vital link between the detection of a network problem and the implementation of a solution (usually in the form of a technician dispatched to the remote site).
While many NOC centers (yes, the word "centers" is redundant, but it aids understanding), are open 7x24x365, this isn't always the case. Some companies are in a transitional phase of growth. Their network is large enough to have warranted an investment in NOC center construction, but they can't yet justify the expense of staffing it outside of regular or possibly extended business hours. In this case, companies use after-hours alarm notifications (to email or phones) to alert on-call technicians of alarms in the network.
The core of any NOC is one (or sometimes more) central master console. This console accepts inputs from the handful, hundreds, or thousands of remote devices in your network.
Some remote devices report alarms on their own (SNMP-native devices, for example, are capable of sending SNMP trap messages over LAN when they experience a warning or failure condition). Switches, industrial routers, SONET/optical gear, and many other large telecom systems output alarms using some type of open standard.
Some older devices still in use in networks worldwide cannot report alarms on their own (at least not over LAN). These devices frequently latch contact closures when they have certain problems. They might also report alarms over a dedicated circuit (serial).
To transmit these legacy alarm methods via LAN to your central console, you can deploy alarm remotes in your network. Alarm remotes are specialized monitoring devices to collect alarms from contact closures and sensors and send them back to your NOC. A single alarm remote can cover dozens or remote devices, depending on how many alarms they each output.
As you develop or upgrade your NOC, remember to avoid several common pitfalls that will negatively impact your performance.
You need to work hard to make sure that all alarms throughout your network can be integrated into one unified monitoring system. Otherwise, you're increasing the difficult and staffing requirements associated with monitoring alarms. If you've never been cursed with having to monitor a lot of incompatible monitoring systems, you can't really appreciate how much of a hassle it is. You'll have to keep your head on a swivel, learn a lot of interfaces, and struggle to tie in related alarms from the different systems (which are divided by equipment compatibility, not by any logical division like geography).
You also need to make sure whatever central console you deploy in your NOC can filter nuisance alarms. Any network has its share of alarms that are good to log, but really don't require a response from the operator. The more of these you have in your NOC, the more you're training your staff not to pay attention to alert messages. A good central console can hide unimportant messages from staff, allowing the truly important messages to rise to the top of the list.
To make NOCs easier to know, it will be helpful now to review an equipment example. I like to use the T/Mon LNX central console, since it contains many of the concepts I just mentioned.
The most useful thing about T/Mon is its ability to know lots of protocols (both modern and legacy). The count is actually around 25 at this point, and this enables T/Mon to avoid the multi-screen headaches I described above. It's very likely that you'll be able to put all of your alarms into one central system, allowing computers to do the busywork instead of your staff.
T/Mon can also intelligently filter your incoming alarm messages to keep your staff focused on the important alarms. You can configure simple rules that T/Mon will use to make a show/hide decision for each new alarm message. T/Mon will still log all inbound alarms received at your network actions center, so you can still review all alarms received after an incident.
When choosing a central console for your NOC, it's also important to choose one that has a convenient and intuitive interface. You don't want your staff to waste time trying to figure out what an alarm means when they could be reacting to it. Every minute wasted in your NOC means more expenses for you and a greater potential for a missed problem leading to extended network downtime.
T/Mon includes a pair of interfaces that meet this standard. The most commonly used within the NOC is T/GFX software. This runs on Microsoft Windows and used MapPoint maps as a backdrop for your alarms. Because your alarms appear on actual geographic maps instead of a non-visual list of text messages, your staff - even those who have not been extensively trained - are able to and easily know where alarms are occurring. This is especially helpful if you are trying to interpret the root cause of many simultaneous alarms. When you can see that alarms are clustered around a single area, it becomes very obvious where the problem must lie.
Sometimes, however, you're not in the NOC. Sometimes you have to be out in the field. Wherever you have a PC workstation, including your laptop out at a remote site with LAN access, you can access the T/Mon Web 2.0 interface. Designed for quick alarm review, this web interface uses color coding in place of geographic maps. The beauty of any alarm interface, of course, is that you don't have to install any software to use it. All you need to do is enter the IP address of your T/Mon into your web browser and hit "Enter". After a single page load lasting just a few seconds, T/Mon will no longer require any more page refreshes. This is the hallmark of Web 2.0 technology. The page will update itself, but no conventional refresh will ever be required. You will always have current alarm data.
Discussing NOCs reminds me of a project DPS Telecom recently had with four contacts. The manager of the group was Fred.
At one point in the middle of the project, Fred called in to review the quote line by line and see what optimizations could be done. Fred wanted to see what could be eliminated or what could be added to make the quote more effective for their application.
In this project, there were two NOCs in different parts of California. In one NOC, there were two existing legacy systems from a single manufacturer. One was polled over digital RS232, while the other was pulled over analog 202/FSK. For the serial system, there was a single digital manager. Fred confirmed that they could terminate all circuits are both locations so there will not be any problem with running the polling legs to a redundant pair of T/Mon central consoles. Connected to each T/Mon there would be three polling legs, two of them RS232, and one of them analog. The frequency of the analog needed to be confirmed to the T/Mon to be properly configured to pull these sites.
Fred said that he currently has 70 to 80 sites per leg and that he would like to break them down in the future. DPS recommended that they consider expanding the polling legs with fewer remotes on each leg. They liked that idea because each T/Mon would use two protocol converters and two RS-232 cartridges for the digital side.
Fred wanted DPS to include the SNMP responder module for T/Mon on both systems. Fred also uses an LED bar that accepts ASCII input. He wanted to set this up to capture specific data and present it. The goal for T/Mon was in her face to that are over serial or TCP/IP. Basically, the LED bar takes any alarm stream over a serial connection and scans it for keywords. It can also take input over TCP/IP into the same kind of scanning.
Here's another message the DPS received after a client deployed DPS systems in their NOC. The key to success in this project was the successful conversion of legacy alarm protocols and signals into modern land-based transmission the key to success in this project was the successful conversion of legacy alarm protocols and signals into modern LAN-based transmission:
"Sorry for the delay, I have been extra busy these last couple weeks. Now that you caught me at my terminal let me illuminate the need we had that DPS filled rather well.
We have 6 gear buildings that serve both a Hybrid Fiber Coax television network, a fiber optic SONET network and GigE transport network. Equipment placed in these buildings is expected to be up 24x7 as you might expect. To that end, each site has a backup generator. Since the sites are not manned, some means of status monitoring and alarm reporting was needed for the genset at each site.
My research led me to various methods of remote surveillance of the gensets and each method failed to provide the needed visibility. I was left with a rather primitive temporary solution that only provided a single alarm point to show the position of the automatic transfer switch at each site. Not really what a state-of-the-art network needed for genset monitoring.
Enter DPS Telecom's Alarm Point Conditioner device!
Using this unit and some connecting cable between the genset and the gear building, the DPS Alarm Point Conditioner was used to simply monitor the voltage level of the LED status indications on the control panel of the genset. The DPS unit allowed the voltage changes of the LED (off or on) to be properly adapted for monitoring by the SONET transport gear. This gear is monitored by our 24hr NOC. Using the text descriptions available in the SONET gear allows the NOC technician to more monitor any alarms that the genset may raise.
Since the installation of the DPS gear, two separate incidents occurred and were resolved before any impact to the network occurred. Before DPS Telecom, these two genset problems would have not been noted until the weekly exercise did not happen, or the gensets were not available for an actual commercial power failure. We have reduced and in some cases eliminated technician responses to genset alarms since the actual alarm condition was reported remotely."
Network Management System