Series of Mishaps Cause Network Problems in Late June/Early July
Posted: July 24, 2000 at 1:00 am, Last Updated: November 30, -0001 at 12:00 am
On June 29, between 5 and 6 a.m., Virginia Power workers inadvertently cut electrical service to Thompson Hall while working on a project that was not supposed to affect the building. Thompson Hall, located on the west side of the Fairfax Campus, contains the networking systems that allow George Mason computer users to send e-mail, access centralized computer applications, and log on to the Internet.
Although the building’s backup generator started, the generator failed to transfer the backup power to the computer room panels, which power the air conditioning system for the computer room and network area. However, the networking equipment, which has a separate backup power system, continued to work. With no air conditioning to keep the area cool and the additional heat generated by the equipment, the temperature in the area rose quickly. When University Computing and Information Systems (UCIS) staff members arrived less than two hours later, the temperature in the room was already at 110 degrees.
According to John Hanks, UCIS network engineering manager, several pieces of equipment were damaged, including two router processor cards and three ethernet switches, which were later replaced. “We do not know for sure if surges from the power outages or the high temperature, or perhaps a combination of the two, actually caused the equipment damage,” says Joy Hughes, chief information officer and vice president for information technology.
In addition to damaging some equipment, the incident also prevented George Mason users from accessing the LAN for most of the day. Finally, around 2 p.m. that day, “most of the known central problems were believed to have been resolved and the network staff began to address distributed smaller issues caused by the power outage in other buildings,” says Hughes.
But when it rains, it pours. Later that day, the Science and Tech I ATM network failed, causing all ATM network users in Science and Tech I, King Hall, SUB II, the Johnson Center, Patriot Center, the Performing Arts Building, and isolated labs in other locations on campus to lose access to the network. UCIS staff members worked late into the evening to restore that network. They spent Friday and the following Monday, July 3, correcting all outstanding problems.
When employees came back to work on July 5 and network traffic returned to normal levels, other problems became apparent. These problems were later traced to “NetBIOS broadcast storms.” NetBIOS (Network Basic Input Output System) is an application programming interface used by PC networks. The NetBIOS system makes setting up computers on a network easy, explains Hanks, because users do not have to manually configure each computer to recognize the network server. This is done automatically when the server sends out an information packet over the network.
This feature, however, can go awry, causing all computers to send out information packets, which quickly overload the system and cause it to crash. That’s what happened on the Wednesday morning when George Mason employees came back from their long holiday. Once UCIS workers identified the source of the problem, they were able to take care of it quickly. “No further problems have been reported or observed since noon on the seventh,” says Hughes.
“The UCIS technical staff worked many hours to identify and remedy all problems as quickly as possible,” says Keith Segerson, UCIS executive director. “Senior staff also met with the vice president of facilities and the director of the physical plant to review the events that led up to the failures so that procedures and processes could be developed to ensure that problems don’t re-occur,” he says. “Additionally, we met with the CIO and senior vice president for finance to look at long-term solutions for improved air conditioning support for these mission-critical facilities.”