Server Room Temperature Monitoring and Maintenance

Equipment in your server room is consuming more power and generating more heat than ever before. High temperatures can cause equipment to function poorly or even break your equipment as internal components swell and pull away from each other or simply burn. Effective server room temperature monitoring and maintenance is critical in maintaining the safety and performance of your equipment.

Heating, ventilating and air conditioning (HVAC) units or computer room air conditioning (CRAC) units are typically expected to keep a controlled temperature between roughly 61 and 75 F. Chances are you won't want the room temperature to spend much time on the upper end of that range. It's also likely that equipment in your server room will not maintain a homogenous environment. Different equipment in different parts of the room will generate more heat and require more cooling power.

To combat high temperatures in the server room, new major cooling strategies have been developed: liquid cooling and air flow management.

Many server rooms employ liquid cooling in some form or another. HVACs / CRACs often use refrigerant to cool the air they put out. But typically the term "liquid cooling" is used to describe liquid-cooled racks, in which the same concept of using refrigerant to chill the air output is simply extended to the racks themselves.

Most liquid cooled systems involve rack mounted chillers. There are door-mounted liquid cooling systems that cool the air leaving the rack to lower the rack's thermal contribution to the room. There are also liquid cooled systems that fully seal the rack. These systems fully seal air-flow in the rack, and have their own heat exchanger to cool air flowing through the system.

Finally, there are also liquid cooled systems that actually run liquid coolant in tubes on or near equipment in the system. While this sort of system does deliver a maximum amount of cooling potential, it is quite expensive and creates a number of operational complexities (including humidity control and problems with scalability).

Air flow management involves maintaining a low temperature in the server room by manipulating air flow, circulating cold air in and hot air out. The most popular implementation of air-flow management is referred to as a "hot aisle" strategy. The popular tactic involves arranging racks in aisles alternating between facing each other and facing away from each other. In the aisles where the racks face each other, cold air is pumped from under the floor and into the air intakes of the racks. Hot air is pushed out the backsides of the racks into the aisles where the racks face away from each other. CRAC units then push the air back into the ventilation system where it is cooled again and released beneath the floor to create a cycle that ensures the constant flow of cold air throughout the server room.

Server Room Air Flow Diagram
The "hot aisle" strategy is a closed system, maximizing the amount of cold air delivered to servers.

This strategy is relatively inexpensive compared to liquid cooling, but heavily depends on all of the component systems working to maintain constant air flow. The failure of any one component results in a failure of the system, which can cause problems for the whole server room. (Whereas the failure of a single liquid cooled system may only present a problem for a single rack.) Still, the low cost of implementation and the flexibility allowed by air flow management cooling strategies typically make it the preferred method of server room cooling.

Monitoring Systems Help You Control Server Room Temperature

Of course, no matter what method you choose, you're going to need to guarantee that your server room maintains a cool temperature. You can't afford to wait until a server melts to find out that a part of your server room's temperature management system has failed. It's expensive both in the cost of equipment and potentially the number of clients you may lose due to a service-affecting outage. So you'll need a proactive monitoring system to keep you informed of the goings-on within your server room, so you can take action before a catastrophic failure occurs.

You'll start by monitoring your rack mounted equipment directly. This is your first line of defense. If equipment in your server room fails due to high temperatures, you won't want to wait for an angry call from a client, colleague, or manager to know about it. A small alarm remote with few analog sensors can monitor each rack quite easily, so you can stay informed about equipment conditions.

DPS Telecom recommends the NetGuardian 216 series remotes for this purpose. Each unit has an internal temperature sensor, an external probe sensor and supports two additional general purpose analog inputs, so you can monitor air intake and exhaust, humidity, voltage, or anything else you may be worried about on a rack-to-rack basis. Each NetGuardian 216 also supports a number of discrete contacts as well, so you can monitor your equipment and environment in your racks with a single, small-form-factor RTU.

TempDefender: Protector of the Server Room

Beyond simple rack monitoring though, you'll need to monitor the server room temperature/environment at large. Proper environmental monitoring will clue you into to problems long before they become service affecting, which will help you greatly improve your server room's uptime.

Of course monitoring your server room's temperature management systems is no easy task. You'll need to monitor air flow in hot aisles, cold aisles, the air plenum under the floor, and possibly the plenum above the racks (if your server room has such a thing). You'll want to measure temperature in hot-spots around the room and at the racks themselves to check ensure that your particular configuration of computer room air conditioners is doesn't have any blind-spots. Monitoring your server room's environment will help you preempt server problems when a part of your temperature control systems fail. If air flow drops or one of your CRAC units isn't putting out the right temperature of air, monitoring your server room will tell you before you end up with a server-affecting problem, saving you from a network downtime or equipment failure. While this means you need to collect and report a lot of information, collecting that information doesn't have to be terribly hard or expensive.

DPS Telecom recommends the TempDefender to collect data on your server room's temperature control systems. The TempDefender is a small, rack-mountable RTU designed to monitor up to 16 analog sensors, measuring temperature, air flow, and any other environmental factors you may be worried about in your server room. (In the event of liquid cooling systems, for example, you may wish to deploy humidity sensors near your sensitive, liquid cooled components.)

The TempDefender tracks server room temperature
TempDefender monitors your server room's temperature to ensure the safety of your equipment and optimal equipment conditions. Preventing server room failures due to high temperatures prevents downtime.

The TempDefender's sensors connect via simple RJ11 connectors and are daisy-chainable up to 600 feet from the RTU, so you can run sensors to every corner of your data center without having to run 16 full sensor cables back to your RTU or, worse, having to place different RTUs in every corner of the data center to get the coverage you need. From this one, centrally located RTU, you can run sensors to your room's hot spots and put sensors to measure air-flow in your hot and cold aisles.

You can also use the TempDefender's 8 Dry contact alarms to add additional monitoring within the cabinet housing the TempDefender. Or you can mount the TempDefender near your CRAC units and monitor CRAC units directly, setting alarms immediately if a CRAC unit or heat exchanger fails.

Your monitoring systems will collect and report a lot of data, and while both the NetGuardians and the TempDefender have a simple web interface and can send email notifications when alarms set, it will be much easier to keep track of your server room with a master station collecting and reporting alarms on a single, concerted interface. The master station will also need to have stolid reporting services, so you can keep updated on what happens in your server room, even when you're not around.

You don't want to wait for equipment to fail or a full-blown network outage to know your temperature control systems have failed. With the right reporting and alarm systems, you can catch environmental problems before they result in full-blown failures, maximizing your network's uptime and making your job a whole lot easier.

Related Topics:

Environmental Monitoring
Room Temp