This year, I’m primarily focusing on Replication Server. Increasing the performance is of keen interest to me as well as increasing stability of the replication. There are a number of outstanding issues with Replication Server that I’m hoping to address. The main one is an issue with firewalls.
This is the environment (very common in large corporations with segmented networks):
ASE (repAgent) tcp/ip connection <-> firewall <-> tcp/ip connection (repagent descriptor) RepServer -> ….
After a period of inactivity on the tcp/ip connection is closed by the firewall
ASE (repAgent) tcp/ip connection <X> firewall X> tcp/ip connection <X> (repagent descriptor) RepServer -> ….
The tcp/ip connection is closed, which will notify both the operating systems containing the primary ASE and replication server.
On the primary Sybase ASE side:
- os notifies ASE of the disconnected connection. this is mostncommonly reported as a 1608 error (client connection expectedly disappeared).
- RepAgent is notified within ASE and attempts to reconnect
On the Replication server side:
- os notifies RepServer of the disconnected connection
- RepServer either doesn’t handle the message from the os or doesn’t release the repagent descriptor correctly.
When the RepAgent attempts to connect to the Replication Server:
- RepAgent connection is denied due to Replication Server says that the RepAgent is in the process of disconnecting. So it *appears* to be handling the message from the os but not completely freeing the repagent descriptor for some reason.
This can be easily reproduced with the help of someone that can set up a firewall that will close connections due to both inactivity and maximum time allowed.
Using a heartbeat of say 30secs does reduce the occurances of this issue but there are mandated, from the security group, maximum times that *any* connection can be open regardless of activity. This partial workaround is already in place. Another partial workaround is to have repagent itself disconnect after 20 secs of inactivity but we still run into the maximum connection time limit.
As it is, I’m forced to restart the repserver every 18 to 24 hours. Regardless of how the connection is closed, repserver should release the repagent descriptor fully so that a reconnect from the repagent will go through.