| |||
Speedy and SmartWritten By Patricia Schnaidt and Shyamala Reddy, March 1991The traditional Ethernet topology calls for subnetworks attached to a backbone via bridges or routers. The inter-networking devices are sprawled all over campus. In such a dispersed set up, maintenance and troubleshooting are difficult propositions. Plus, as traffic increases the limited bandwidth of the traditional backbone can become a bottleneck.With the traditional topology, if a problem arises, technicians may have go to multiple locations. This poses two problems. "It takes time for people to get to that remote building. Time is of the essence when you're having a network problem. Then when somebody gets out there, they never have the right tools, so he has to come back and get the right tools, and that takes up more time," says Leong. If the network's essential components were centralized, then the issues of time and tools would be resolved. The backbone-in-a-box was born. With an inverted backbone, as it's also called, if Building A has an Ethernet internally, then it also has a fiber spine which runs back to the data concentration facility. Similarly, Building B's internal network has a spine which goes back to the central data facility. The inverted backbone topology enabled CMU to put all bridges and routers into a single 19 inch rack in an air-conditioned room. "We have all the tools, facilities, and technical skill around that area. We can troubleshoot problems a lot easier in this central location" says Leong. The backbone-in- a-box greatly simplifies network reconfiguration. When upgrading a router, it can be swapped out, the new one installed, and only a few cables need to be patched. No longer does the CMU computer services staff have to coordinate reconfigurations with multiple people in multiple buildings. CMU converted from the standard backbone to the inverted backbone over three months. The gradual installation minimized any service disruptions. However, a new cable plant was installed before the conversion began. IBM Cabling System twisted-pair wiring is used inside the buildings and fiber-optic cable is used for the backbone spines. Each building on campus has at least one Ethernet, one Token Ring, and one AppleTalk, for a total of 1,200 Ethernet, 1,000 Token Ring, and 1,500 AppleTalk nodes.
BACKBONE IN A BUSIn April 1990, says Leong, the computer services team realized that the backplane of the SynOptics LattisNet hub was the network backbone. Data runs in parallel over a backplane or bus; over a cable it runs in serial. A backbone that could run data in parallel would be much faster than one that uses serial transmission.CMU spent the summer of 1990 searching for a box that had a high-speed backplane and standard LAN interfaces to the world. To make the network even more efficient, the team wanted to integrate the routers and bridges --which sat in front of the wiring hub-- into the backbone box. CMU discovered the AGS + from Cisco Systems. By August, it was installed. "The AGS + box did exactly what we wanted," says Leong. The C-Bus, a proprietary architecture of the AGS+, runs at 530Mbps. A similar number of networks couldn't use a traditional Ethernet backbone. If all 24 of AGS+'s 10Mbps ports talked to each other at full speed, they would generate a 120Mbps aggregate data rate, which would overwhelm a 10Mbps Ethernet, but not CMU's 530Mbps backbone. The backbone-in-a-box topology may fit your network needs if your campus is compact. Leong points out that the inverted backbone is also "great if you're working in a relatively small environment, like inside a building or you have high-speed, sustained bandwidth applications." If your campus is large, then the inverted backbone isn't efficient.
MANAGING IS PROBLEMATIC"Building the network was a big challenge, but it isn't as problematic as managing the network," says Leong. CMU was using a hodgepodge of equipment to manage the 3,700 node network. "A lot of the network management we've found in the marketplace today is not very satisfactory," says Leong. CMU set out to build its own. CMU is integrating existing tools, such as the Simple Network Management Protocol (SNMP), monitors, databases, and a trouble ticketing system, into the pacesetter for network management systems. The system, developed entirely at CMU, is referred to as the "SNMP Console" but is officially unnamed.CMU is starting to use the SNMP Console in production operations, although more development is planned. Within the next six months, the SNMP Console will become the primary means of management, says Steve Waldbusser, CMU's manager of network development. Existing network management products simply report information. They don't interpret or act. "What are you supposed to do with all that information? If you monitor it all, you've got tons of information coming from everywhere. A lot may be useful, if you happen to be a highly skilled network engineer. They can take the information and make some conclusion," says Leong. But networks aren't necessarily operated by experts. "The network operation people, who are on the front line and have to answer irate phone calls, also happen to be the people you are not paying a lot of money. These people are not particularly good at getting a lot of information and then trying to figure out what it all means," says Leong. Management needs to develop the capability to identify and diagnose --and eventually fix-- problems. "Our network management design goal is to have the big finger from the sky pushed into our network map, and saying, 'You have a problem and this is the way you fix it,' " says Leong.
PRACTICAL EXPERT SYSTEMCMU first thought an expert system was the answer. It wasn't. To build an expert system, you need to find and debrief an expert and encode the rules. "A person who really knows how to do things may not be the person who can tell you what he's doing. These people aren't necessarily able to express themselves constructively," says Leong."All this represents an instant of expertise. In networking we find new symptoms and problems every day. Every six months, you have to grab the expert and work him over again. And hopefully, when you encode the new information you get, it doesn't totally mess up the old rules," he says. Having realized that an expert system wasn't a panacea, CMU is using a low-tech but practical approach. Although a possible oxymoron, Leong describes the approach as a "practical expert system." "It turns out that in order to build something that's pretty smart about the network, you don't need fancy AI [Artificial Intelligence] techniques," says CMU's Waldbusser. "The great thing about this expertise is that it's related to people. The expertise is customized to your particular environment, because it is your history," says Leong. CMU is using an intelligent trouble ticketing system built around an Ingres database and SNMP. When a network operator detects a problem, an electronic trouble ticket form pops up. Either a network monitor or a user alerts the operator to a problem. When the trouble ticket pops up, the SNMP Console takes a snapshot of the network conditions as seen by the SNMP monitors. This snapshot is attached into the form. The operator can tie to the form other relevant data, such as symptoms reported by users. The operator then decides what the problem is, dispatches a technician to fix the problem, and closes the trouble report. The developers write the rules to characterize the problems that technicians encounter. "We might write a rule that SNMP has detected a problem, diagnoses it, and at the same time presents a solution to the operator," says Waldbusser. That solution could be a page out of the user's manual. "It means that there isn't much room for not knowing how to fix a problem." The following day, the supervisor reviews the trouble tickets to see what problems arose and how the technicians handled them. The technicians also read the previous day's trouble reports.
CMU'S LAN MANAGEMENT PLANSThe next step is to try to find a pattern among problems, present and past. The SNMP Console will link multiple trouble tickets with similar characteristics. Trouble tickets may be matched according to their 25 fields, including time, date, machine type, machine name, and problem. An exact match will rarely appear. The initial match criteria may be narrow and broaden until five reports are found, for example. The operator can browse through those trouble tickets to see if a similar problem has occurred and how the technician handled it."The next step is to correlate other failures with the current one," says Waldbusser. For example, a router fails and users report that they can't access four hosts. If the hosts are behind that router, then the failed router is the real problem. The SNMP Console will tell you that the hosts have seemed to fail because the router has failed. Development plans call for a history of each machine's adds, moves, and changes to be attached to the trouble reports, according to Waldbusser. "I wouldn't be surprised to see a lot of progress [on this type of system] in three or four years time, because it makes sense and it's easy to do," says CMU's Leong.
Reprinted with permission from LAN, March 1991
|
|||
Home | Webmaster | Copyright | Carnegie Mellon Home |
|||