Server Redundancy Problem.(ASCS1/ ASCS2)
ASCS1-- Primary Aspect & Connectivity Server
ASCS2-- Secondary Aspect & Connectivity Server
OS1 and OS2
Observation and Activities:
Both the servers were running with all the services healthy condition. Redundancy was checked previously and it was working.
On Saturday i.e. on 13.04.2016 afternoon we observed that on loop details Blue Dot is coming for ASCS2 (ASCS2 trying to connect) . In Task manager Process Tab in ASCS2 we searched for 'AdvLoopServer.exe. It was not there. On Monday morning (15.04.2013) we checked that again and find the same problem. Same in Task Manager also.
So, we restarted ASCS2. In this condition ASCS1, OS1 and OS2 were functioning properly. We are using Operator Workplace in all the four systems (i.e. both in server and client). In Operator workplace graphic pages we got the value changing, Faceplats are also operating properly.
After rebooting of ASCS2 all the services of ASCS2 came to healthy condition. We checked in OPCDA_Connector service provider for ASCS2. It was in service. Upto this every thing was ok. ASCS1, ASCS2, OS1, OS2 all were working fine that means from all the system we can view live data. Faceplates were also working. After 5 to 7 minutes in ASCS2, OS1 and in OS2 there were no value available on the graphics pages, '?' marks were coming which indicates bad data. Faceplates also not working. We changed the pages still the condition was same in those three systems. We checked in ASCS1. It was working. We can able to operate from there.
In this condition we checked again OPCDA_Connector service for ASCS2, It was in service. We checked MOD OPC Statistics Aspect for the same service provider and subscribed for live data. We got 818 count with status good. We checked PAS service. All were started there. Affinity setting we made previously that we removed.
We restarted ASCS1. After rebooting it came to service. Value was updating and Faceplate was working on ASCS1. Still other three systems were not coming in to line. Then we disabled OPCDA_Connector service for ASCS2. Now data started coming to all the system even in ASCS2 also.
Before disable OPCDA service in ASCS2 we observed that Event Collector for MOD OPC in ASCS2 gone to Synchronization Mode.
We kept the system in this condition for half an hour. It was running with out any disturbance. After that again we made OPCDA service for ASCS2 enabled. Again for 5 to 7 minutes it was working and then same problem as I mentioned above, happened. So again we disabled the same service for ASCS2.
Today Morning i.e. 16.04.2013 we decided to disconnect ASCS2 from ASCS1 and connect it again. For that we take maintenance stop of ASCS2 using Configuration Wizard. Then we disconnected ASCS2 from ASCS1 using Config Wizard in ASCS1. After that we restarted ASCS2. After rebooting we checked PAS service of ASCS2. All services were started there except HISTORY service. It was showing failed. We uninstalled PAS service and again installed it. Setting of PAS also done. Still the HISTORY service not came into line. All other services were started.
In the mean time we restarted ASCS2 3-4 times. Near about 3:50 PM (UTC + 5:30) one failure happened. ASCS1, OS1, and OS2 was showing wrong status of some pumps i.e. actual status was not showing. This we got from UNIX OS. There those were showing correct status as field.
Then afternoon near about 4:00 PM ( UTC + 5:30) we connected ASCS2 as a redundant server of ASCS1. All the services came healthy of ASCS2 except OPCDA_Connector service as it was removed after disconnect. We added the same service from control structure and it came to service. MOD OPC Statistics Aspect created for the service provider. Now in ASCS2 every thing was coming correctly. Then we restarted OPCDA_Connector service for ASCS1 then there also healthy status came. After that OS1 and OS2 also started getting correct value.
System was running till 9:00 PM (UTC+5:30). After that again except ASCS1 in all other system data got hanged. '?' was coming in those systems. This time after rebooting of ASCS2 clear the issue, that means again data updating restarted in all the systems. We disabled OPCADA_Connector service provider for ASCS2. We checked PAS in ASCS2 again and found that all the services were started including 'HISTORY'.
We have some other observations also.
When we restart ASCS2 first after that we get some system messages related to MOD Diagnostics. Those are
i) DBMS NEVER FINISHED TRANSACTION 00 1D 8B AF for ASCS1
ii) DBMS NEVER FINISHED TRANSACTION 00 1D 28 F8 for ASCS1
iii) DBMS NEVER FINISHED TRANSACTION 00 1D 8B F7 for ASCS1
iv) DATABASE SUCCESSFULLY DOWNLOADED for ASCS2
If we give restart ASCS1 after rebooting of ASCS2 then only 'DATABASE SUCCESSFULLY DOWNLOADED' comes for ASCS1 and other messages stop coming. But if we restart ASCS1 then no messages comes related to DBMS TRANSACTION. Today afternoon also when problem came for ASCS1 and other clients that time also we got same type of messages.
Now there is no redundancy. ASCS2 OPCDA_Connector service is disabled.
Now I have some queries:
1) If there is some problem with ASCS2 then how it is affecting to the other clients.
I have an explanation of this, correct me if I am wrong: For any purpose OS1 and OS2 both the clients were talking to ASCS2 first, that means they were taking data from ASCS2. DATA was coming to ASCS2 through OPCDA_Connector service provider. Now if this service goes down or desable that time both the client should talk to ASCS1. Here, as ASCS2 OPCDA service was healthy that is why clients were talking to ASCS2 as earlier. But ASCS2 itself was not getting any data from OPCDA. That's why three systems (ASCS2, OS1, OS2) at a time was showing wrong value or data got hanged there.
2) Why in MOD OPC Statistics aspect of ASCS2 if we subscribe for live data we are getting some count when actually it is not providing any data to the server (ASCS2).
3) Why those DBMS messages are coming when we are restarting ASCS2. What are the impacts of these so that data communication of ASCS1 is getting stopped.
4) RNRP we have checked. It is ok in all the system.
5) How to configure RTA through RNRP
6) If there is any problem with Time Sync then what is the proper configuration of Time synchronization for SRF Plant. System configuration of this plant provided earlier.In DCN there are ES, Old Unix OS, ASCS1 and ASCS2, four systems available. In plant network there are also four systems available (ASCS1, ASCS2, OS1, OS2). For 800xA this is Workgroup configuration not Domain configuration.
7) What are the configurations to be checked or implemented. Already we have sent all the details we collected.
8 )How to resolve this issue.