HANA Replication Listener cannot be started or listener start failed: exception

Issue: You are setting up HANA system replication and replication getting failed or stuck in UNKNOWN status.

Logs : You are seeing following logs in the name sever or index server log files on the secondary or primary HANA server  

Error on target secondary HANA server :
[416549]{-1}[-1/-1] 2021-05-06 11:20:50.422625 e sr_dataaccess    DisasterRecoveryProtocol.cpp(01226) : V 3: HT_Sec (CT_Log[0/0]): void DataAccess::ReplicationProtocolHandler::setError(const ltt::exception&), pHandler=0x00007f3405eb9000, error=exception 2110001: Generic stream error: getsockopt, Event=EPOLLERR - , rc=111: Connection refused; $Context$=[c8f12ac7b0250001,172.XX.XXX.39:49817,10.XX.XXX.25:44503,UNK,0]; $channel$={<NetworkChannelSSLFilter>={<NetworkChannelBase>={this=139861414813720, fd=43, refCnt=1, idx=8, local=172.XX.XXX.39/49817_tcp, remote=10.XX.XXX.25/44503_tcp, state=ConnectWait, pending=[---c]}}}

(Basis/IO/Stream/impl/NetworkChannelCompletion.cpp:581; 2110001)

 

Error on primary HANA server :
[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167514 i sr_dataaccess    DisasterRecoveryPrimaryImpl.cpp(01136) : Start listen to global interface port:44503

[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167739 e sr_dataaccess    DisasterRecoveryPrimaryImpl.cpp(01171) : Listener cannot be started, because port 44503 is already in use!

[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167743 e sr_dataaccess    DisasterRecoveryPrimaryImpl.cpp(01172) : A system replication primary uses replication ports in the range of instance number(s) from 45 to 45

[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167745 e sr_dataaccess    DisasterRecoveryPrimaryImpl.cpp(01173) : Please check, that there is no other system on this machine using instancenr 45! This is just a hint and possibly not the root cause ..

[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167747 e sr_dataaccess    DisasterRecoveryPrimaryImpl.cpp(01174) : In general the port range 44500-44599 must not be used by any other process when system replication is turned on!

[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167750 e sr_dataaccess    DisasterRecoveryPrimaryImpl.cpp(01175) : You may need to set ip_local_port_range as Multitenant Database, please check "System Replication with Tenant Databases" section in admin guide and SAP note 2382421, 401162

[23685]{-1}[-1/-1] 2021-05-06 11:23:55.167765 e sr_dataaccess    DisasterRecoveryPrimaryImpl.cpp(01104) : checkAndStartListener(): listener start failed: exception  1: no.2110008  (Basis/IO/Stream/impl/NetworkChannel.cpp:1261)  


The port can be any 4<NN><XX> port, NN is your SAP HANA instance number and XX is the service port.

Cause :This is caused due to port blocked by some other non-sap application , another HANA instance on the same server or in case of Multi tier replication one of the HANA server is holding that port.

    Resolution : 
    We have to first find out who is blocking/using the port. Issue following command to check the work process.
      [root@youHANAServer ~]# fuser 44503/tcp
      44503/tcp:            3202


      We found that 3202 is the process who is blocking the port.Now we will check what is running with that process.

       

      [root@youHANAServer ~]# ps -eaf|grep -i 3202

      root       3202      1  0 00:01 ?        00:00:00 /opt/CA/AccessControl/bin/serevu



      We can see that  this process related to some backup agent avatar was blocking our port
      We can also check this with following command too. 



      [root@youHANAServer ~]# lsof -i:44503

      COMMAND     PID   USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME

      hdbindexs 37342 prdadm   50u  IPv4   363652      0t0  TCP *:44503 (LISTEN)

      serevu      3202 root  486u  3u 75930953   37100  TCP localhost.localdo:44503 localhost.localdom:8891 CLOSE_WAIT



      In our case this "/opt/CA/AccessControl/bin/serevu " was the process occupying it , we worked with our OS team and killed/reconfigured that backup agent to take another port. 


      We have also seen similar issue where in Multi Tier replication (A - B - C) , secondary and tertiary server was having this issue as Primary server A was blocking an port on secondary server B. So we did a clean stop and start of secondary server , and did -re replication. That fixed the issue.



      SAP suggests following SAP note to reserve these ports so that only HANA can consume them.
      These notes explain the use of ip_local_port_range linux parameter and host agent to reserve SAP ports.

      Reference: SAP notes

      401162 - Linux: Avoiding TCP/IP port conflicts and start problems
      2712064 - SAP HANA System Replication Error port 4#### already in use



      <Placehoder for How to check HANA replication status>

      <Placeholder for SAP HANA ports>