2008/01/17

Spanning-Tree Tuning

In these days I'm working with a colleague (system engineer) in steel-maker plant doing an assessment of the physical network.

I'm learning a lot of things and the most important one is that network-tuning is hard!

I never considered that when a network topology is planned you must avoid cycling-path and every path must be closed (graph theory)

Diagramma di una rete di media complessità, notare l'assenza di percorsi ciclici

The trade-off is to outcome high-reliable networks, it's mandatory to provide redundacy over links or paths so cycling-path are required and cannot be avoided!

Spanning-Tree algorithm is a protocol supported by bridges and switches used to avoid cycling-path on a rendundat network, and it works automatically disabling links which will causes cycling-path

In our case we've a double-ring fiber-optic network topology provided using Netgear 7212 (28 switches installed) and GSM 724 (13 switches installed).

  • 7212 are the nodes on the fiber-optic network
  • 724 are the switches to connect the PCs and Servers and they're are always connected to the 7212 (which made up the backbone)

We turned on Spanning-Tree (abbreviated to SPT) on every 7212 but quite often we get a Network Topology Change which means the SPT has discovered a broken-link and it's changing the favourite paths on each other 7212.

This means that network will slow-down for a couple of minutes (using 7212).

Looking for the causes, we discovered:

  • A fiber-optic link that is not reliable and sometimes it's seen as broken by the 7212 causing a Topology Change.
  • A 7212 (not on the ring) was flickering (every 2 minutes) causing others network topology change.

Lesson-Learned

  • Turn-off SPT on switches outside the ring
    You cannot have cycling-paths on those connections but if for some reason the links will broken-up the SPT will force a network topology change to every other switches on the ring!
     SPT
    The previous diagram is our case, on red switches  (outside the ring) the SPT must be switched-off.
  • In SPT define the cost for each link
    • Spend time to find the less reliable links because whene they're used, on every communication error they will cause a topology change.
    • Give the top cost to these links
  • Filter the SPT traffic when different networks are connected
    • On production plants usually there are different networks (installed at different times by different vendors) that must be connected (supervisions, chemical analysis, backoffice, etc.). Pay attention when you connect the networks to filter the traffic and if it's feasible use a L3 switch.

No comments: