Campus Undergoes Nearly Week-Long Server Outage Due to Network Disruptions
The campus experienced nearly a week of network outages beginning on Monday, Feb. 11, as the Information Technology (IT) department worked around the clock to test the campus-wide server and address varying disruptions that had caused outages since the beginning of the semester. IT is close to identifying and resolving the issues, said Chief Information Officer David Hamilton, and hopes to secure the network by the end of Friday, Feb. 15.
The campus community was first made aware of server issues over the weekend, when connectivity to the Amherst Wi-Fi was shaky and unavailable for short periods of time. On Monday, the server went completely under, progressively leaving the campus without access to Wi-Fi, email, Moodle, employee payrolls, card scanning systems, the college website and any content hosted on amherst.edu, including Loeb Center’s Handshake platform for career planning. The campus was left without the server or an explanation for the outage; the college released an official report on the server outage on Tuesday, nearly 24 hours after the server first went down.
Over the last few weeks, the network had experienced a series of disruptions prior that caused network outages of various durations.
To test what was going on in the network, the IT department decided to shut down the entire network on Monday and attempt to identify the cause of the disruptions. Though Hamilton’s first instinct was that it was a cyber attack, “we’ve ruled it out since,” he said at an open meeting held on Friday, Feb. 15 with President Biddy Martin, Dean of Faculty Catherine Epstein and Chief Student Affairs Officer Hikaru Kozuma.
On Friday, the IT department had identified three issues: a configuration issue in the router that was causing elevated levels of traffic; MAC flap incidents, which occur when switching equipment at the core of the network saturate the network with messaging and cause the network to crash — a common cause of which is cabling issues — and a third as of yet unconfirmed issue that is localized in Chapin Hall.
According to Hamilton, these types of issues typically occur by accident or equipment failure. In his 12 years at the college, he has not been aware of similar kinds of incidents, but he said that based on the IT department’s analysis, “it is a confluence of accidents that caused it.”
“I’ll caveat that and say we’re still testing to prove to ourselves that we’re down to the bottom of it at this point,” he added. The college has brought in outside expert teams including Cisco, Jupiter and MIT, the last of which has dealt with similar incidents and is confident that it is not an intentional attack on the network.
Multiple residential counselors (RC) were told that the outage was due to an active hacking scenario at an RC meeting held in the middle of the week. Any possibility of a cyber attack, however, was disputed by members of the IT department.
The server outage led to some dormitories defaulting to an unlocked status at midnight, but the Amherst College Police Department began manually locking the dormitory doors once the issue was raised. Laundry systems also tanked, requiring students to use quarters only. The college announced on Thursday that it would make laundry free of charge for the duration of the outage.
Students were forced to rely on their personal data plans as the week went on without Wi-Fi. According to Martin, the administration is aware that students may have had to exceed the limit on their data charges and pay out of pocket. “I don’t know how to figure out what needs to be done,” she said. “We’re aware, and we’re thinking about it.” Students who wish to speak to a member of the administration regarding this issue can contact Kozuma.
Professors were forced to restructure classes and teach classes without Wi-Fi. Tekla Harms, chair of the geology department, said, “There is no circumstance that prepared us for being without Wi-Fi this whole week. I haven’t been able to receive personal emails about the health of my friends and family. I also couldn’t answer questions that students sent me through email. That’s a challenge, but we have survived that. We are very fortunate that this is what we have to worry about.”
As the week progressed, the Amherst Muck-Rake posted four photos on Instagram deriding the server shutdown while various speculative theories spread across campus. Though Martin acknowledged the anxieties that could have led to speculation, she said there was no reason to worry and that the network would be up. She noted that other institutions have experienced similar incidents. “Sometimes according to the people who reported this to me, they don’t ever find out what the problem was,” she said. “But it does happen.”
The college sent its first AC Alert about the network outage to the campus community on Tuesday and provided updates every few hours over the next few days.
The AC alert system has delivered a series of updates to students since Feb. 12. Details have been attached in a link to a Google Doc. An update on Feb. 13 announced that IT would bring in “several outside expert teams.” At 3:30 p.m. a second update stated that employees would be paid on Friday. It also included a warning to “be vigilant about phishing and fraudulent phone calls.”
On Feb. 14, AC Alert sent an update which said Wi-Fi provided by Verizon hotspot equipment would be available in Frost Library and the Science Center.
It was also announced that a new email system, Gmail, would replace Microsoft Outlook. According to the alert on Feb. 14 the outage “has led us to make the transition immediately.” Martin and Hamilton said the college had been planning on moving to Gmail for some time but that this week’s events hastened the decision. Instructions for configuring the new system were sent on Feb. 15.
It was emphasized that paychecks would be delivered to staff on Friday. “Everyone will receive a payment in the same dollar amount as last week,” Chief Financial Officer Kevin Weinman wrote in one update.
“We are aware some employers worked more or less hours than the week prior,” Weinman added. “We are working diligently through a manual process to identify such situations.”
On Feb. 15, the alert announced that IT had identified three separate problems with the existing system and were working to fix them. “IT is working to restore services by moving them to the cloud. This is taking longer than expected because of the instability of the existing network,” the update said.
Hamilton said that the college plans to replace the entire network in the future. “Is it possible that the network will come down again? Yes,” he said. “Systems die. You have to be good at responding to that and getting it back up.” Over the next few years, the college will also aim to transition central business systems to the cloud, which will be more secure and resilient than a local network.
According to Hamilton, the IT staff has received support from other institutions in the five-college community and beyond. The IT department will produce an incident report and a narrative for the larger community once all issues have been resolved.