The controller PM864 suffered a fatal error
I faced a fatal error at the controller PM864. The Primary and the backup controllers and the communication interfaces - four CI864 - entered in failure. It was necessary to make a init to the controllers returned to normal state. That problem occurred twice times in the last five days.
Firmware Version: 5.0.1005.4
Bellow, I attached the controllers' logs, CIs log and information about MMS Connections, firmware versions and data from System Diagnostics faceplate.
Thank you everyone for the attention.
Without knowing when your controller actually stopped and what you were doing at the time its not really possible to relate the various CPU restarts to an actual fault.
I suggest you work through the controller logs and search for the text "This PM has been" which will find the various restart reasons. From a cursory glance these include Power Failures, Pressing the Init Switch and the Stall Timer being exceeded.
There are no crash dumps, so the most likely reason for an unexpected halt is the "Stall Timer Exceeded" which usually means that one of your controller tasks has failed to complete in the time allocated to it.
Have you made any changes to the Task Scan rates, Priorities etc or added significant amounts of new code?
What is is your CPU load ?
Are you trending your CPU load and does it increase when some event in the plant happens ?
Did you check 3BSE036351R5001?
As I remember, both redundant CPU can stop in case the connection to ModuleBus is broken or both Control Networks ports fail.
Most probably ModuleBus to I/O Modules??? In case ModuleBus is disconnected, Primary CPU switch-over to Backup CPU. Then it is not possible to restart Primary CPU even if ModuleBus is recovered. Then if ModuleBus dissappear again... (see note below)
Automatic Switch-Over to Backup CPU
In a redundant configuration an automatic switch-over from the Primary CPU to the
Backup CPU occurs in the following situations, provided they are in synchronized
state (DUAL LED is lit):
• Memory error in the Primary CPU.
• Other HW-error in the Primary CPU, which causes CPU crash.
• Severe communication errors on the Control network, that is, loss of both
network ports in the Primary CPU.
• Severe communication errors on the ModuleBus (if ModuleBus is part of the
HW configuration) that is, loss of clusters in the Primary CPU.
Note that a Backup CPU with severe communication errors on the ModuleBus
will be rejected (if ModuleBus is part of the HW configuration) and
synchronized state will never be reached as long as error remains.
You can check if Backup CPU is working correctly (3BSE036351R5001):
To check that the redundancy, is working correctly, perform a manual switch-over
from the Primary CPU to the Backup CPU. This should be performed with caution,
and consideration to possible impact on the process.
The RCU Link Cable must NEVER be removed from the primary Processor Unit
during redundant operation. Removal of the cable may cause the unit to stop.