How to create Solace HA cluster Manually

Susi Dman
Susi Dman Member Posts: 10
edited February 2022 in PubSub+ Event Broker #1

Hi Team
I would like to create solace software cluster manually on my VMs. My VMs are on Azure and they are Red-Hats. (Note that I'm aware that there is a Azure template to achieve this easily but due to some reasons I cant use this automated way)

As per some documents I came across I'm aware we need 3 nodes, solace1 (active-primary), solace2 (active-backup) and and solace3 (monitoring node) . I was able to create standlaone single node solace installations by following https://docs.solace.com/Solace-SW-Broker-Set-Up/Docker-Containers/Set-Up-Docker-Container-CentOS-Azure.htm

Then I tried to follow https://docs.solace.com/Configuring-and-Managing/Configuring-HA-Groups.htm to configure the redundancy group. While configuring the last part I'm getting this issue

(configure/redundancy)# no shutdown
ERROR: Invalid redundancy group configuration, could not determine mate router-name
Command Failed

Another doubt is should we configure LoadBalancer for this, I could not found any docs on manual configuration of loadbalancer.

Appreciate if some one could guide on how to create HA cluster manually, given that we have 3 nodes.

Thank you
Susi

Answers

  • arih
    arih Member, Employee Posts: 125 Solace Employee

    Hi Susi,
    Just to confirm, so now you have 3 CentOS VMs, each running it's own docker version of Solace PubSub+ right? And the error you showed is still with the primary broker configuration?
    I would also recommend looking at option to create container instances directly from Azure, and then only play around with the docker environment variables. If you're interested, I can give it a try and share the steps on Monday.

  • arih
    arih Member, Employee Posts: 125 Solace Employee

    My bad, just tried to set this up using Azure Container Instance, but not successful since they don't support setting shm yet. Similarly, we can use docker environment variable on the docker create step when creating the broker on your CentOS VMs to setup HA.

    For example, on the primary node, use this docker create command instead.
    sudo docker run \
    --network=host \
    --uts=host \
    --shm-size=2g \
    --ulimit core=-1 \
    --ulimit memlock=-1 \
    --ulimit nofile=2448:42192 \
    --restart=always \
    --detach=true \
    --memory-swap=-1 \
    --memory-reservation=0 \
    --env 'username_admin_globalaccesslevel=admin' \
    --env 'username_admin_password=admin' \
    --env 'nodetype=message_routing' \
    --env 'routername=primary' \
    --env 'redundancy_matelink_connectvia=[backup node IP/hostname]' \
    --env 'redundancy_activestandbyrole=primary' \
    --env 'redundancy_group_password=password' \
    --env 'redundancy_enable=yes' \
    --env 'redundancy_group_node_primary_nodetype=message_routing' \
    --env 'redundancy_group_node_primary_connectvia=[primary node IP/hostname]' \
    --env 'redundancy_group_node_backup_nodetype=message_routing' \
    --env 'redundancy_group_node_backup_connectvia=[backup node IP/hostname]' \
    --env 'redundancy_group_node_monitor_nodetype=monitoring' \
    --env 'redundancy_group_node_monitor_connectvia=[monitor node IP/hostname]' \
    --env 'configsync_enable=yes' \
    -v /opt/vmr/internalSpool:/usr/sw/internalSpool \
    -v /opt/vmr/diags:/var/lib/solace/diags \
    -v /opt/vmr/jail:/usr/sw/jail \
    -v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb \
    -v /opt/vmr/var:/usr/sw/var \
    -v /opt/vmr/adb:/usr/sw/adb \
    --name=solacePrimary solace-pubsub-standard

    for all the *connect_via variables, you can use hostname, public IP, or internal IP. The important thing is the nodes must be able to resolve the other nodes via that hostname/IP.

    Now, for the backup node, similarly use the following command:
    sudo docker run \
    --network=host \
    --uts=host \
    --shm-size=2g \
    --ulimit core=-1 \
    --ulimit memlock=-1 \
    --ulimit nofile=2448:42192 \
    --restart=always \
    --detach=true \
    --memory-swap=-1 \
    --memory-reservation=0 \
    --env 'username_admin_globalaccesslevel=admin' \
    --env 'username_admin_password=admin' \
    --env 'nodetype=message_routing' \
    --env 'routername=backup' \
    --env 'redundancy_matelink_connectvia=[primary node IP/hostname]' \
    --env 'redundancy_activestandbyrole=backup' \
    --env 'redundancy_group_password=password' \
    --env 'redundancy_enable=yes' \
    --env 'redundancy_group_node_primary_nodetype=message_routing' \
    --env 'redundancy_group_node_primary_connectvia=[primary node IP/hostname]' \
    --env 'redundancy_group_node_backup_nodetype=message_routing' \
    --env 'redundancy_group_node_backup_connectvia=[backup node IP/hostname]' \
    --env 'redundancy_group_node_monitor_nodetype=monitoring' \
    --env 'redundancy_group_node_monitor_connectvia=[monitor node IP/hostname]' \
    --env 'configsync_enable=yes' \
    -v /opt/vmr/internalSpool:/usr/sw/internalSpool \
    -v /opt/vmr/diags:/var/lib/solace/diags \
    -v /opt/vmr/jail:/usr/sw/jail \
    -v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb \
    -v /opt/vmr/var:/usr/sw/var \
    -v /opt/vmr/adb:/usr/sw/adb \
    --name= solaceBackup solace-pubsub-standard

    And, lastly for the monitor node, use this command:

    sudo docker run \
    --network=host \
    --uts=host \
    --shm-size=2g \
    --ulimit core=-1 \
    --ulimit memlock=-1 \
    --ulimit nofile=2448:42192 \
    --restart=always \
    --detach=true \
    --memory-swap=-1 \
    --memory-reservation=0 \
    --env 'username_admin_globalaccesslevel=admin' \
    --env 'username_admin_password=admin' \
    --env 'nodetype= monitoring' \
    --env 'routername= monitor' \
    --env 'redundancy_activestandbyrole= monitor' \
    --env 'redundancy_group_password=password' \
    --env 'redundancy_enable=yes' \
    --env 'redundancy_group_node_primary_nodetype=message_routing' \
    --env 'redundancy_group_node_primary_connectvia=[primary node IP/hostname]' \
    --env 'redundancy_group_node_backup_nodetype=message_routing' \
    --env 'redundancy_group_node_backup_connectvia=[backup node IP/hostname]' \
    --env 'redundancy_group_node_monitor_nodetype=monitoring' \
    --env 'redundancy_group_node_monitor_connectvia=[monitor node IP/hostname]' \
    --env 'configsync_enable=yes' \
    -v /opt/vmr/internalSpool:/usr/sw/internalSpool \
    -v /opt/vmr/diags:/var/lib/solace/diags \
    -v /opt/vmr/jail:/usr/sw/jail \
    -v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb \
    -v /opt/vmr/var:/usr/sw/var \
    -v /opt/vmr/adb:/usr/sw/adb \
    --name= solaceMonitor solace-pubsub-standard

    For your case, if you want to debug your current setup, please share the commands you have used when setting up the HA. Otherwise, if you want to try the commands I shared, then just make sure the docker container names are not the same as your current ones.

  • Susi Dman
    Susi Dman Member Posts: 10

    Hi Arih
    Thanks for the quick response. I have 3 RedHats. (From Azure - but lets forget the fact where it comes from and consider these are just 3 RedHat VMs for make it simple). And yes each running it's own docker version of Solace PubSub+.
    I will try to redo the setup using the docker create commands you provided and get back.
    Another Q - Assume I should create external LB to balance the load between solace nodes. Could you give some input on this too, should it be only for port 55555 or any other ports, and what if I want to use TLS/SSL encryption.

  • arih
    arih Member, Employee Posts: 125 Solace Employee

    Hi Susi,
    For the LB setup, you can refer to this page for the default ports that Solace PubSub+ use. For encrypted SMF port, the default is 55443. To enable that, you basically need to setup the server certificate and then enable secured port for the service that you want. You can start from this page for that.
    A bit more detailed on this is about the load balancer health-check, please refer to this page for that. Note that this is HTTP-based checks and you should refer to the Redundancy Configured column.

  • Susi Dman
    Susi Dman Member Posts: 10

    Hi Arih
    I was able to start primary node and backup node with the commands you given, However the monitoring node wasnt success when I give the run command it gives me following error.
    INFO repairDatabase.py: processing database (currDbPath: /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000, nextDbPath: /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000085)
    INFO Processing baseline /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000/dbBaseline
    WARN Baseline /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000/dbBaseline: does not exist
    WARN Database /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000: backing up database to jail/configs/db.corrupt.db.00000000.1574914975
    Failed to repair database, exiting (rc = 1)

    Any idea on this, Should we have a monitoring node or can we survive only with Primary/Backup nodes only.
    Thanks
    Susi

  • arih
    arih Member, Employee Posts: 125 Solace Employee

    Hi Susi,
    Monitoring node is mandatory. And for a quick answer, I'd try to create a new node and run the same command.

  • Susi Dman
    Susi Dman Member Posts: 10

    One more question to the same thing? To see solace logs for debug purposes from the HOST os what would be the best way? Does solace outputs its logs to the docker run command output ?

  • arih
    arih Member, Employee Posts: 125 Solace Employee

    Hi Susi,
    You can check the docker container logs, but for the complete logs you can go inside the container and have access the complete set of the broker's log files.
    $ docker exec -it /bin/bash
    $ cd logs

    Or, you can stream the broker's log to you existing syslog server. More on that in here.

  • Susi Dman
    Susi Dman Member Posts: 10

    Hi Arih
    Thanks for getting back on logs, Did you have a chance to look at Solace-monitoring node startup issue ?
    Thanks
    Susi

  • arih
    arih Member, Employee Posts: 125 Solace Employee

    Hi Susi,
    For your monitoring node, it looks like the storage of that node was gone or corrupted. Since this is a new node, would be easier just to recreate a new node and rerun the monitor node setup cli commands :)

  • Susi Dman
    Susi Dman Member Posts: 10

    Hi Arih
    Thanks for the support, was able to setup monitor node after cleaning the file system. Could you please point me some document on how to check the cluster status. Currently status is
    Monitoring node
    default VPN Status: Down
    Replication: Off
    DMR: Off

    Backup Node

    config-sync Status: Down

    Replication: Off
    DMR: Off
    default Status: Down
    Replication: Off
    DMR: On

    PrimaryNode

    config-sync Status: Up

    Replication: Off
    DMR: Off
    default Status: Up
    Replication: Off
    DMR: On

    Is the above status shows a good cluster ?

  • arih
    arih Member, Employee Posts: 125 Solace Employee

    You can run "show redundancy" from the CLI, can be either on the primary or backup node. The result will tell you the redundancy status is up or down.

  • hemaprasad
    hemaprasad Member Posts: 2

    Hi arih,
    i'm also getting the same issue for monitoring node,i deleted the file system also but still same error,pls help on this issue
    Failed to render config files, exiting (rc = 1)
    Host Boot ID: 88a01798-b61d-4922-a213-fbad2e032039
    Starting VMR Docker Container: Wed Jan 15 06:11:06 UTC 2020
    SolOS Version: soltr_9.3.1.5
    INFO repairDatabase.py: processing database (currDbPath: /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000, nextDbPath: /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000001)
    INFO Processing baseline /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000/dbBaseline
    WARN Baseline /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000/dbBaseline: does not exist
    WARN Database /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000: backing up database to jail/configs/db.corrupt.db.00000000.1579068729
    Failed to repair database, exiting (rc = 1)
    command i'm using:
    docker run -d -p 8080:8080 -p 55555:55555 -p 55443:55443 -p 55556:55556 -p 55003:55003 \
    --network=host \
    --uts=host \
    --shm-size=2g \
    --ulimit core=-1 \
    --ulimit memlock=-1 \
    --ulimit nofile=2448:42192 \
    --restart=always \
    --detach=true \
    --memory-reservation=0 \
    --env 'username_admin_globalaccesslevel=admin' \
    --env 'username_admin_password=admin' \
    --env 'nodetype= monitoring' \
    --env 'routername= monitor' \
    --env 'redundancy_activestandbyrole= monitor' \
    --env 'redundancy_group_password=password' \
    --env 'redundancy_enable=yes' \
    --env 'redundancy_group_node_primary_nodetype=message_routing' \
    --env 'redundancy_group_node_primary_connectvia=[primaryip]' \
    --env 'redundancy_group_node_backup_nodetype=message_routing' \
    --env 'redundancy_group_node_backup_connectvia=[backupip]' \
    --env 'redundancy_group_node_monitor_nodetype=monitoring' \
    --env 'redundancy_group_node_monitor_connectvia=[monitoring ip]' \
    --env 'configsync_enable=yes' \
    -v /opt/vmr/internalSpool:/usr/sw/internalSpool:rw \
    -v /opt/vmr/diags:/var/lib/solace/diags:rw \
    -v /opt/vmr/jail:/usr/sw/jail:rw \
    -v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb:rw \
    -v /opt/vmr/var:/usr/sw/var:rw \
    -v /opt/vmr/adb:/usr/sw/adb:rw \
    --name=solaceMonitor \
    f327d7a752a6

  • hemaprasad
    hemaprasad Member Posts: 2

    Hi All,
    issue resolved, there was a space on nodetype,routename