Aspect Servers loosing Sync every few minutes
I have an Aspect server that loses connection to the backup AS every few minutes. The connection drop lasts only a few seconds, but just long enough that the system status display see’s all the services drop momentarily, lots of errors appear in the system message log and the 800xA system gets all upset for a few moments until it sorts itself out and starts working again.
There are NO errors in RNRP and I can continually ping the backup AS while this connection drop happens ... So “ping” is not affected and it does not look like there’s any error with the network connection.
BUT
Its not only ABB services that are affected. I initially thought this was a time sync error, so I sorted out a bunch of issues with time settings, disabled VMware Time sync settings entirely and got the physical server BIOS times set correctly then made sure all the server NTP times were within a few seconds of each other. To check it all, I ran a “w32tm /stripchart” command to see if the time was drifting between AS1 and AS2. And the w32tm service ALSO loses connection between the time services on CS1 and AS1 for a few seconds. So I’m assuming this isn’t an ABB problem, its a windows problem. Services running on AS1 briefly lose connections to services running on AS2. There’s NOTHING in the windows event lists to say there’s a problem. I’m stuck and google doesn’t help.
- The system is 800xA System Version 5.1 running on VMware Virtualized servers.
- There are 2 * Combined AS/DC and 2 * Combined Advant Master and AC800M Connectivity Servers (Yes I know this is a bad idea. Wasn't my decision)
- Controllers are 4 * MP200/1's plus an AC800M. There are also 2 standalone AC800M controllers connected only over Masterbus.
- The 800xA System has been in place for a year and was running fine until a few weeks ago.
Any ideas ?
There are NO errors in RNRP and I can continually ping the backup AS while this connection drop happens ... So “ping” is not affected and it does not look like there’s any error with the network connection.
BUT
Its not only ABB services that are affected. I initially thought this was a time sync error, so I sorted out a bunch of issues with time settings, disabled VMware Time sync settings entirely and got the physical server BIOS times set correctly then made sure all the server NTP times were within a few seconds of each other. To check it all, I ran a “w32tm /stripchart” command to see if the time was drifting between AS1 and AS2. And the w32tm service ALSO loses connection between the time services on CS1 and AS1 for a few seconds. So I’m assuming this isn’t an ABB problem, its a windows problem. Services running on AS1 briefly lose connections to services running on AS2. There’s NOTHING in the windows event lists to say there’s a problem. I’m stuck and google doesn’t help.
- The system is 800xA System Version 5.1 running on VMware Virtualized servers.
- There are 2 * Combined AS/DC and 2 * Combined Advant Master and AC800M Connectivity Servers (Yes I know this is a bad idea. Wasn't my decision)
- Controllers are 4 * MP200/1's plus an AC800M. There are also 2 standalone AC800M controllers connected only over Masterbus.
- The 800xA System has been in place for a year and was running fine until a few weeks ago.
Any ideas ?
Voted best answer
Patch (build) level of ESXi and the choice of virtual network adapter have caused issues in the past - but one may ask why now after a period of successful running?
See if there is a more recent ESXi patch/build - check its release notes for known/solved problems.
I assume you are running with E1000 as virtual network adapters as the ABB virtualization guide recommend, but in this situation trying the other (VMXNETx, E1000E, etc) would not harm.
I don’t know how the virtual NIC is treated by Microsoft, maybe VMware Tools and MS hotfixes have an influence?
See if there is a more recent ESXi patch/build - check its release notes for known/solved problems.
I assume you are running with E1000 as virtual network adapters as the ABB virtualization guide recommend, but in this situation trying the other (VMXNETx, E1000E, etc) would not harm.
I don’t know how the virtual NIC is treated by Microsoft, maybe VMware Tools and MS hotfixes have an influence?
Add new comment