Issue: You are setting up HANA system replication and replication getting failed or stuck in UNKNOWN status.
Logs : You are seeing following logs in the name sever or index server log files on the secondary or primary HANA server
Error on target secondary HANA server :
[416549]{-1}[-1/-1] 2021-05-06 11:20:50.422625 e sr_dataaccess DisasterRecoveryProtocol.cpp(01226) : V 3: HT_Sec (CT_Log[0/0]): void DataAccess::ReplicationProtocolHandler::setError(const ltt::exception&), pHandler=0x00007f3405eb9000, error=exception 2110001: Generic stream error: getsockopt, Event=EPOLLERR - , rc=111: Connection refused; $Context$=[c8f12ac7b0250001,172.XX.XXX.39:49817,10.XX.XXX.25:44503,UNK,0]; $channel$={<NetworkChannelSSLFilter>={<NetworkChannelBase>={this=139861414813720, fd=43, refCnt=1, idx=8, local=172.XX.XXX.39/49817_tcp, remote=10.XX.XXX.25/44503_tcp, state=ConnectWait, pending=[---c]}}}
(Basis/IO/Stream/impl/NetworkChannelCompletion.cpp:581; 2110001)
Error on primary HANA server :
[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167514 i sr_dataaccess DisasterRecoveryPrimaryImpl.cpp(01136) : Start listen to global interface port:44503
[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167739 e sr_dataaccess DisasterRecoveryPrimaryImpl.cpp(01171) : Listener cannot be started, because port 44503 is already in use!
[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167743 e sr_dataaccess DisasterRecoveryPrimaryImpl.cpp(01172) : A system replication primary uses replication ports in the range of instance number(s) from 45 to 45
[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167745 e sr_dataaccess DisasterRecoveryPrimaryImpl.cpp(01173) : Please check, that there is no other system on this machine using instancenr 45! This is just a hint and possibly not the root cause ..
[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167747 e sr_dataaccess DisasterRecoveryPrimaryImpl.cpp(01174) : In general the port range 44500-44599 must not be used by any other process when system replication is turned on!
[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167750 e sr_dataaccess DisasterRecoveryPrimaryImpl.cpp(01175) : You may need to set ip_local_port_range as Multitenant Database, please check "System Replication with Tenant Databases" section in admin guide and SAP note 2382421, 401162
[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167765 e sr_dataaccess DisasterRecoveryPrimaryImpl.cpp(01104) : checkAndStartListener(): listener start failed: exception 1: no.2110008 (Basis/IO/Stream/impl/NetworkChannel.cpp:1261)
The port can be any 4<NN><XX> port, NN is your SAP HANA instance number and XX is the service port.
Cause :This is caused due to port blocked by some other non-sap application , another HANA instance on the same server or in case of Multi tier replication one of the HANA server is holding that port.
Resolution : We have to first find out who is blocking/using the port. Issue following command to check the work process.
[root@youHANAServer ~]# fuser 44503/tcp 44503/tcp: 3202
We found that 3202 is the process who is blocking the port.Now we will check what is running with that process.
[root@youHANAServer ~]# ps -eaf|grep -i 3202
root 3202 1 0 00:01 ? 00:00:00 /opt/CA/AccessControl/bin/serevu
We can see that this process related to some backup agent avatar was blocking our port
We can also check this with following command too.
[root@youHANAServer ~]# lsof -i:44503
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
hdbindexs 37342 prdadm 50u IPv4 363652 0t0 TCP *:44503 (LISTEN)
serevu 3202 root 486u 3u 75930953 37100 TCP localhost.localdo:44503 localhost.localdom:8891 CLOSE_WAIT
In our case this "/opt/CA/AccessControl/bin/serevu " was the process occupying it , we worked with our OS team and killed/reconfigured that backup agent to take another port.
We have also seen similar issue where in Multi Tier replication (A - B - C) , secondary and tertiary server was having this issue as Primary server A was blocking an port on secondary server B. So we did a clean stop and start of secondary server , and did -re replication. That fixed the issue.
SAP suggests following SAP note to reserve these ports so that only HANA can consume them.
These notes explain the use of ip_local_port_range linux parameter and host agent to reserve SAP ports.
Reference: SAP notes
401162 - Linux: Avoiding TCP/IP port conflicts and start problems
2712064 - SAP HANA System Replication Error port 4#### already in use
<Placehoder for How to check HANA replication status>
<Placeholder for SAP HANA ports>