Network Loop Problems
I have customer with Network loop problems, we have tried to narrow the problem down but still it evades us and our ability to pin it down.
This is a wide area (controller) network where 2 out of 3 controllers report loops on the secondary network, whilst the third one have closed its primary nw port but is opened again.
Network is provided by third party supplier and therfore not possible to totally control. Wireshark sessions listening on traffic does not indicate switched primary/secondary nw.
Connectivity-server-end RNRP is repeatidly opening ports and closening again.
Controller logs says that there is messages received twice etc.
Could it be the controller hardware itself? Can a faulty controller (in this case a PM861) cause the problems?
Very unlikely a controller problem
Make sure Receive Side Scaling is DISABLED on subjected NICs in Microsoft Windows. AC 800M can afaik not report false loop.
Use Wireshark to continuously record UDP port 2423 traffic at controller switch. You can use multiple files in circular manner to avoid filling the harddisk.
Stop logging when RNRP detect loop.
Open capture and use view filter repeatedly to view telegrams from ONE source node at a time. Expect one transmission per second. When first source node is "cleared" continue with next until you find the culprit telegrams (>12 per second from same node). Notice that last byte of RNRP frame is a sequence number ramping from 0x00 to 0xFF. Same number is NEVER sent twice by RNRP.
Show capture to net owner and demand a correction.
Perhaps they use STP/RSTP (which are too slow to prevent loop detection) or have some issue in their net causing frequent ring openings / closures (for which no protocol on Earth is fast enough to react). Only as little as 13 telegrams is required for loop detection.
See also other threads and answers, eg
Attached to this answer you will find a capture of "UDP port 2423" during a network loop detection.
1. Unzip and open in Wireshark
2. "See mess" (too many telegrams per second to make sense to human eye...)
3. Right click any telegram from 172.17.0.27 and filter on this node only
4. "See light" - now one RNRP telegram per second from node 27
5. Notice what happens at packet #1021. More than one telegram per second and sequence number frozen...
When any RNRP node receive 13 or more of such telegrams within one second loop will be detected and counteractions be initiated.
Find "your loop" - note that it is not likely the sending node that is the culprit. Somewhere along the network, some network equipment is resending this telegram more than once... actually more than 12 times...
This is what you must bring up with your network owner and ask to be corrected.