Questions on Config-Sync between mates in HA Group and Replication

rdesoju
rdesoju Member Posts: 66

Hi,
It appears that when config-sync is enabled in a HA-Group and Replication mates, it is provisioning additional MessageVPN and Queues. I have a few questions on the same:

  1. As it is provisioning the additional VPN and queues, does it mean config-sync between mates in a HA-Group and Replication Group is asynchronous?
  2. If it is Asynchronous, how durable it is when one of the node/site crashes and fails over to other node/site? Should we expect a message loss ?
  3. If it is synchronous, does it acknowledge only when message is persisted on other node/site?
  4. Is it Storage layer replication? or Network layer replication?

Thanks,
Raghu

Tagged:

Best Answer

Answers

  • rdesoju
    rdesoju Member Posts: 66

    It appears that mate-link service is responsible for syncing up message queues on both nodes in a HA group.
    Does it sync up the other node asynchronously or synchronously? is it storage layer replication? or network layer?
    Please clarify.
    Thanks,
    Raghu

  • TomF
    TomF Member, Employee Posts: 406 Solace Employee

    @raghu the synchronisation across a mate link has to be synchronous, otherwise you will end up with windows during which message loss can occur. This is why we recommend you keep the brokers close to each other, since performance can be adversely affected by long mate link round trip times. We will only acknowledge the producer once the message has been persisted on both the backup and primary.
    We do not use storage layer replication, which is a bad solution to this problem since the backup broker would have to load the entire storage volume on fail over. We replicate on a per-message basis.

  • rdesoju
    rdesoju Member Posts: 66

    Hi @TomF
    Thanks for your insights on it. During my test on the failover behavior, I have simulated a disk space issue on secondary while primary is active, in which case secondary's spool cannot grow with primary when slow consumer is the scenario. Primary is able to continue processing independently and at this time when Primary is failed purposefully, Secondary did not take over the activity as it is already out of sync with primary (my best guess).
    Could you please provide your insights in this situation?
    Is it honoring Durability vs availability?
    What would be possible out come in case same issue happens because of a network degradation between primary and secondary?
    What would be the role of monitoring node in this situation? Does it detect which one is current and help in decision making to make some node active?

    Thanks

  • rdesoju
    rdesoju Member Posts: 66

    Another question is, since mate-link is synchronous, what happens if mate-link is down? does it detect and stop syncing with other nodes? If yes, how does it determine mate-link is down? does it take help of monitoring node in this case?