Microsoft Networking Problems at Carnegie Mellon
Erikas Aras Napjus
Network Architect
erikas@cmu.edu
Introduction
Since the introduction of Windows NT to the Carnegie Mellon campus in
early 1994, we have been noting a number of serious and not-so-serious
network problems related to Windows NT and Windows 95. Many of these
problems are related to Microsoft's NetBIOS-based peer-to-peer
networking, but others are simply related to the network stacks.
We would love to work with Microsoft or, if necessary, work on our own
to resolve these problems. For now, however, these difficulties have
severely restricted our ability to support these machines in our
heterogeneous computing environment.
Network Neighborhood / Browsing Problems
This area is our biggest problem area with peer-to-peer networking. On
network segments, often the smaller routed network segments in the
residence halls, users will suddenly (and randomly) stop seeing most
or all machines in their Network Neighborhood window. This appears to
happen when the master browers for a network segment and protocol
aren't properly configured to talk to the WINS servers for additional
browsing information (e.g. Samba servers without WINS support, Windows
for Workgroups servers under NetBeui, etc.)
The proposed solution to this problem is to place three machines with
sufficiently high browse priorities on every segment to ensure that no
improperly configured nodes take control of browsing. This is very
costly. An alternative would be configuring all machines properly and
administratively locking out users from making changes, but that isn't
practical in our environment.
It looks like Windows NT 4.0 Server's browse server might be unstable.
Recently, we've seen a few instances where the Network Neighborhood
becomes effectively segmented. Machines attached to the same segment
as our WINS servers (who are "super" master browsers for the campus)
see themselves and machines on remote networks see themselves, but the
connection between the two disappears. Name resolution continues to
work, however, indicating that the WINS server processes are
functioning properly.
WINS Server Name Conflicts
The Microsoft WINS server doesn't appear to handle name conflicts
properly. If we have a long-standing server with a WINS name "lease"
and a new machine comes along and registers the same name, the new
machine "wins" (no pun intended) and takes control of the name,
effectively taking the server off the network unless it changes names.
This has even happened with the WINS servers themselves and
effectively prevents individuals from being able to run stable
services.
Broadcast Packets
All three NetBIOS transports for peer-to-peer networking are fairly
broadcast (or multicast) intensive while users are browsing the
network or resolving names. We've restricted users to only NBT
(NetBIOS over TCP/IP), so we've only been looking at that transport in
detail. Although Microsoft documentation indicates that a properly
configured Windows 95/NT machine that talks to a WINS server will
never broadcast on the local wire, we still see fairly frequent
broadcasts. NT Server appears to send out significantly more broadcast
than NT Workstation or Windows 95.
Furthermore, we seem to be seeing an incredible number of browser
elections on our networks. These are probably caused by misconfigured
machines trying to take control of browsing (or losing control of
browsing). These elections involve large numbers of broadcasts from
many machines on our network. In the past, browser elections have
caused user machines to stop responding due to the number of IP
broadcasts on the wire.
In a bridged environment like ours, broadcast control is particularly
critical. Adding an extra 1,000 nodes running peer-to-peer to the
campus network would quite possibly break our current architecture.
(For example, when a bug in Cisco's routing code caused broadcasts
from the residence halls to be forwarded onto the campus network, we
ran into severe broadcast-related performance problems for some users;
that's a good example of what might happen if another two or three
hundred machines appeared on the campus network...)
Workgroup/Domain Scaling Problems
During both browser elections and regular operation of our network,
master browers send out broadcast update messages, one for each
workgroup or domain registered in the network neighborhood. With
individual users defining their own workgroup or domain, the number of
broadcast packets started severely impacting performance. Therefore,
we've attempted to reduce the number of workgroups and domains
attached to the network and tried to standardize on one primary domain
for all machines.
Furthermore, since browsing appears to be for each segment, protocol,
and workgroup/domain, you need three stable master browsers for each
domain or the domain may not be stable as random machines are selected
as master browsers. Therefore, anyone who wants a stable domain needs
a minimum of three servers. We'd much rather see support for providing
master browsing services for multiple domains from the same machine so
we can centrally "seed" the network with workgroup and domain
information (preferably without a broadcast per workgroup or domain as
described above).
IPX SAP Advertisement
Windows NT Workstation and Server boxes appear to advertise themselves
via IPX/SAP, rapidly populating the SAP tables on our IPX routers.
Although this should be configurable, we haven't been able to find a
reliable way of preventing NT machines from advertising themselves.
SAP table explosion on routers isn't a good thing, particularly when
these machines really don't need to be advertising themselves in the
first place.
Windows NT Server machines that are configured to serve information
via IPX (I believe -- this might be more general) have been known to
attract all IPX clients on a given network segment to themselves,
effectively causing IPX to break for all clients on the network. This
also causes machines with automatic IPX stacks, such as Windows 95, to
take a few minutes to boot while the connection to the IPX server
times out. We haven't been able to configure away this problem.
RAS Server DHCP Proxy
The RAS Server functionality built into Windows NT appears to use a
non-standard form of DHCP proxy to request addresses for remote
clients. These malformed DHCP packets not only contribute to
additional broadcast load on the network (since they don't timeout and
retry fairly aggressively), but also can confuse DHCP servers who find
malformed DHCP packets.
Conclusions
We have had some indications from Microsoft that a number of these
difficulties are going to be resolved in future versions of Windows
NT. Until those difficulties are resolved or significant development
resources are thrown at these problems, it will be difficult if not
impossible to support all aspects of Windows NT networking in our
environment.
|