AC460 redundancy failure
We have 800xA 5.1 FP4 with MOD300 on customers site with AC460 controllers.
Customers system generated a diagnostic message for first redundant CPU pair on one subsystem >> static ram section 2 has been modified ( see picture RAM 6801.png ).
We checked recomended action and there was >> if repeated, replace ram memory.
It hasn’t repeat but after 14 days both redundant CPUs failed at same time, so all I/Os, serials, etc. Failed (see pictures CPU 1 failrue.jpg, 800xa_system_event.png, All_Msg_Text.png)
We did reset of both CPUs and after one week we have replaced both.
After the replacement we found that there is TO led ON except of the replaced CPU pair (CPU 1 replaced.jpg)
We found out that the led indicates that there is a BUS timeout so we assume that it can cause a failure of the rest 2 redundant cpus.
Right now we are talking about online reset of the controllers("failover") but seems like to be little hazardous.
Any suggestions ?
I suggest you to check status/functionality of your DSBC cards and transreceivers(if you are using e-DCN) before doing online reset (please consider process implications and do risk assesment before going for online reset).
Timeout generally happens if you have issues with your Buscoupler cards.