800xA 5.1 Rev E with AC450 - Redundancy Problem
A custumer told about problem in system redundancy. The problem is occurring when the system lost the redundancy ( CS primary or secondary). If we lost only PU410 didn’t happing anything.
1° test: Turn Of PU410 redundat and the system switch the services (MB300) perfectly.
2° test: The same test n°1 but turn off Pu410 Primary. Switch the services perfectly.
3° test: Turn off secundary CS server and the switch take more then 2 minutes and occurs lost OPC. In the same MB300 has AS500 and lose to dynamic reference too.
4° test: The same test n°3 but torn off primary CS server and the problem was the same.
When Turnf Off the Cs server was happeing something in MB300.
I don't Know yet if this problem is a " MB300 BackPressure". Next week I will take more details like:
FTCCB ,,, in RTA bord
Log in RTA manager tool
Is it possible a incompatibility between the 800xA and AS500 system? The Costumer didn't do the test removing AS500.
attachment MB300 Switch (both are the same status) configurations. In the Principal and Secondary Switch CRC error are more then 5% (recommended).
RTA System Messages?
- Before/during test #3 and #4, please login to the server that is to be kept up and running and check if any System Messages are output in the RTA Board Configuration tool (remember to verify that message output to screen is enabled, or else they will only be saved to the log). Report messages with full detail.
What is the revision of the controllers?
- versions older than AC450 2.3/5 may stop sending data to ALL RTAs during backpressure.
What is the subscription load?
- with high amount of subscriptions running, a failover could be a very stressful event.
What is the RTA CPU load?
- for smooth failover, RTA CPU load should be kept below 50%, preferably below 40%.
Remember: The PU410 connection with the host computer is hardcoded to 100 mbit full duplex at PU410 side. Some NICs don’t handle a fixed speed setting so well. Attempting to auto-negotiate from host side could result in host NIC selecting 100 *half* duplex. At higher throughputs, a mismatch in duplex could lead to major losses. Verify the PU410 and host communication. Potentially interposition a managed switch to be able to monitor port error counters (at full duplex, expect more or less zero errors).
Both CS should be live at the same time. So the fact that you get bad OPC data implies that your 800xA clients are "stuck" on the stopped CS and have not failed over to the backup CS properly. When they do finally failover, everything works OK. This sounds like an "800xA" problem - not an AC450 / Masterbus problem.
A 2 minute delay implies RNRP isnt working properly and the clients are waiting for a base TCP/IP timeout before realizing the server is unavailable.
- Check the RNRP configuration for each CS. Ensure that Both the Plant Networks are configured correctly and that your Plant Networks and Control Networks are on separate areas. Note that the MB300 areas (ie the connection between the CS and the RTA) are local to the PC and should be local RNRP networks.
- Check Server affinity and load sharing settings. Check to see if all of the clients are attached to CS1 at the same time or spread over both CS1 and CS2.
- You described that you stop the RTA on CS2 first, which forces clients to use CS1, then stop CS1 whch forces all the clients back to CS2. What happens when you do this test in "reverse order" - i.e. After a long period of stable running, you begin with the RTA on CS1