Controller Connection to Clientnetwork lost
I have a strange Problem on our System. We do not have connection from the Clientnetwork to one Controller.
- Controller is working for about three years
- Other 16 Controller do not have this issue
- If i reboot the Controller, I can ping it for 15 times, after that "Request Time out"
- OPC seems to work, no impact to the Production
- I can not see live values in CB on Client
- I can see live values in CB on Connectivity Server
- I can download CB on Client!!
- I can download CB on Connectivity Server
- OPC DA / AE no Problems available
- Hardware and Task on Client CB are red.
What i tried before:
- Rebooting Controller
- RNRP Setting on Controllers, equal like others
- Analyse RNRP
- Test Responsetime (RNRP Fault Tracer) Option 5 --> From CS OK / From Client OK
(RNRP Fault Tracer) Option 2 (get Configuration Parameters and error logs from one node:
On Connectivity Server:
Always "No Connfiguration error", "no System error"
Sometimes: "Error, no answer"
Sometimes: "No Connfiguration error", "no System error"
On Connectivity Server
Max Response Time 40ms
I am a little bit confused, is it a Controller Problem (PEC3) or a rnrp Problem. All other Controllers work fine, so it might be a Controller issue!?
Does anybody know something a solution or a better way for analyse?
- ping is lost, but you can still see OPC data and perform "Test Responsetime"
- download from CBM @ client server network to controller is possible, but no online values are presented.
RNRP is only working by manipulating the IPv4 routing table to assist with reaching beyond your local network; so any test made from the Control Network itself (e.g. from a Connectivity Server towards a controller) would rule out any influence from RNRP as those tests does not involve any routing, only direct access on a common LAN.
A wild guess would be a problem with the ARP cache, e.g. from some kind of ARP flooding, etc. ARP is used to resolve MAC addresses (on Ethernet, communication is made over physical MAC-addresses after resolving them from volatile IP-addresses). If the ARP-table is flushed partially or completely, communication is not possible with those addresses that were removed.
Some additional information may be hidden in the Network Infomation which you can pull using CBM->Remote System --> Controller Analysis -> [Get] Network Information
However, parts of the logfile you attached makes me wonder if PEC has not yet full implementation of network info - "statistics not complete"?
W 2018-07-19 17:05:01.688 TODO: VxWorks6.8 - statistics not complete
172.28.80.21 at 00:50:56:a7:4c:db on motetsec0
172.28.80.22 at 00:50:56:a7:80:4b on motetsec0
172.28.80.158 at 00:03:2c:00:2d:0e on motetsec0
For a controller to be able to send a telegram to a remote node, it's MAC address must be part of the table above (=successfully resolved via ARP). I sense 172.28.80.21 and 22 are the Connectivity Server, which should be able to ping the controller.
However, a complete Network Information should contain more entries. Here is an example from a non-PEC controller running 5.1 firmware.
INET route table - vr: 0, table: 254
Destination Gateway Flags Use If Metric
127.0.0.0/8 localhost UR 0 lo0 0
localhost localhost UH 3720229lo0 0
172.16.4.0/22 172.17.80.9 UGS 25388 cpm1 0
172.16.80.0/22 172.16.80.106 UC 78 cpm0 0
172.16.80.9 00:05:5d:64:97:9c UHL 3 cpm0 2
172.16.80.17 00:0d:88:52:fe:89 UHL 1156 cpm0 2
172.16.80.19 68:05:ca:33:dd:77 UHL 8785 cpm0 5
172.16.80.21 00:0d:88:52:fe:91 UHL 239 cpm0 2
172.16.80.32 00:15:17:0d:8b:84 UHL 99822 cpm0 2
172.16.80.51 00:1b:21:60:50:6d UHL 11039226cpm0 2
172.16.80.52 00:1b:21:60:4d:65 UHL 324174246cpm0 2
172.16.80.53 00:1b:21:62:86:7b UHL 519 cpm0 5
172.16.80.102 00:00:23:0a:11:98 UHL 11160347cpm0 2
172.16.80.106 172.16.80.106 UH 21 lo0 0
172.16.80.107 00:00:23:0d:1a:4c UHL 41852735cpm0 2
172.16.80.141 00:0c:29:bd:3f:ed UHL 82 cpm0 2
172.16.80.162 00:0c:29:c1:b8:a9 UHL 11221 cpm0 5
172.16.80.163 00:0c:29:75:56:30 UHL 11712 cpm0 5
172.16.84.0/22 172.17.80.9 UGS 0 cpm1 0
172.17.4.0/22 172.17.80.9 UGS 8 cpm1 0
172.17.80.0/22 172.17.80.106 UC 1459 cpm1 0
172.17.80.9 00:05:5d:64:97:9d UHL 40515 cpm1 5
172.17.80.17 00:0d:88:52:fe:88 UHL 230 cpm1 2
172.17.80.53 00:1b:21:62:86:7a UHL 93 cpm1 5
172.17.80.106 172.17.80.106 UH 0 lo0 0
172.17.80.141 00:0c:29:bd:3f:01 UHL 6 cpm1 5
172.17.84.0/22 172.17.80.9 UGS 0 cpm1 0
172.16.80.9 at 00:05:5d:64:97:9c on cpm0
172.16.80.17 at 00:0d:88:52:fe:89 on cpm0
172.16.80.19 at 68:05:ca:33:dd:77 on cpm0
172.16.80.21 at 00:0d:88:52:fe:91 on cpm0
172.16.80.32 at 00:15:17:0d:8b:84 on cpm0
172.16.80.51 at 00:1b:21:60:50:6d on cpm0
172.16.80.52 at 00:1b:21:60:4d:65 on cpm0
172.16.80.53 at 00:1b:21:62:86:7b on cpm0
172.16.80.102 at 00:00:23:0a:11:98 on cpm0
172.16.80.107 at 00:00:23:0d:1a:4c on cpm0
172.16.80.141 at 00:0c:29:bd:3f:ed on cpm0
172.16.80.162 at 00:0c:29:c1:b8:a9 on cpm0
172.16.80.163 at 00:0c:29:75:56:30 on cpm0
172.17.80.9 at 00:05:5d:64:97:9d on cpm1
172.17.80.17 at 00:0d:88:52:fe:88 on cpm1
172.17.80.53 at 00:1b:21:62:86:7a on cpm1
172.17.80.141 at 00:0c:29:bd:3f:01 on cpm1
As you can see, there are routing entries for the Client Server network (172.16.4.0/22) and for a second control network area (172.16.84.0/22). These entries makes it possible to reach nodes on those remote subnets.
I suggest that you report this problem to the PEC support team; they may ask for help from SE if needed.