How to create Solace HA cluster Manually

Hi Team
I would like to create solace software cluster manually on my VMs. My VMs are on Azure and they are Red-Hats. (Note that I’m aware that there is a Azure template to achieve this easily but due to some reasons I cant use this automated way)

As per some documents I came across I’m aware we need 3 nodes, solace1 (active-primary), solace2 (active-backup) and and solace3 (monitoring node) . I was able to create standlaone single node solace installations by following Setting Up Container Images

Then I tried to follow Configuring High-availability (HA) Redundancy Groups to configure the redundancy group. While configuring the last part I’m getting this issue

(configure/redundancy)# no shutdown
ERROR: Invalid redundancy group configuration, could not determine mate router-name
Command Failed

Another doubt is should we configure LoadBalancer for this, I could not found any docs on manual configuration of loadbalancer.

Appreciate if some one could guide on how to create HA cluster manually, given that we have 3 nodes.

Thank you
Susi

Hi Susi,
Just to confirm, so now you have 3 CentOS VMs, each running it’s own docker version of Solace PubSub+ right? And the error you showed is still with the primary broker configuration?
I would also recommend looking at option to create container instances directly from Azure, and then only play around with the docker environment variables. If you’re interested, I can give it a try and share the steps on Monday.

My bad, just tried to set this up using Azure Container Instance, but not successful since they don’t support setting shm yet. Similarly, we can use docker environment variable on the docker create step when creating the broker on your CentOS VMs to setup HA.

For example, on the primary node, use this docker create command instead.
sudo docker run
–network=host
–uts=host
–shm-size=2g
–ulimit core=-1
–ulimit memlock=-1
–ulimit nofile=2448:42192
–restart=always
–detach=true
–memory-swap=-1
–memory-reservation=0
–env ‘username_admin_globalaccesslevel=admin’
–env ‘username_admin_password=admin’
–env ‘nodetype=message_routing’
–env ‘routername=primary’
–env ‘redundancy_matelink_connectvia=[backup node IP/hostname]’
–env ‘redundancy_activestandbyrole=primary’
–env ‘redundancy_group_password=password’
–env ‘redundancy_enable=yes’
–env ‘redundancy_group_node_primary_nodetype=message_routing’
–env ‘redundancy_group_node_primary_connectvia=[primary node IP/hostname]’
–env ‘redundancy_group_node_backup_nodetype=message_routing’
–env ‘redundancy_group_node_backup_connectvia=[backup node IP/hostname]’
–env ‘redundancy_group_node_monitor_nodetype=monitoring’
–env ‘redundancy_group_node_monitor_connectvia=[monitor node IP/hostname]’
–env ‘configsync_enable=yes’
-v /opt/vmr/internalSpool:/usr/sw/internalSpool
-v /opt/vmr/diags:/var/lib/solace/diags
-v /opt/vmr/jail:/usr/sw/jail
-v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb
-v /opt/vmr/var:/usr/sw/var
-v /opt/vmr/adb:/usr/sw/adb
–name=solacePrimary solace-pubsub-standard

for all the *connect_via variables, you can use hostname, public IP, or internal IP. The important thing is the nodes must be able to resolve the other nodes via that hostname/IP.

Now, for the backup node, similarly use the following command:
sudo docker run
–network=host
–uts=host
–shm-size=2g
–ulimit core=-1
–ulimit memlock=-1
–ulimit nofile=2448:42192
–restart=always
–detach=true
–memory-swap=-1
–memory-reservation=0
–env ‘username_admin_globalaccesslevel=admin’
–env ‘username_admin_password=admin’
–env ‘nodetype=message_routing’
–env ‘routername=backup’
–env ‘redundancy_matelink_connectvia=[primary node IP/hostname]’
–env ‘redundancy_activestandbyrole=backup’
–env ‘redundancy_group_password=password’
–env ‘redundancy_enable=yes’
–env ‘redundancy_group_node_primary_nodetype=message_routing’
–env ‘redundancy_group_node_primary_connectvia=[primary node IP/hostname]’
–env ‘redundancy_group_node_backup_nodetype=message_routing’
–env ‘redundancy_group_node_backup_connectvia=[backup node IP/hostname]’
–env ‘redundancy_group_node_monitor_nodetype=monitoring’
–env ‘redundancy_group_node_monitor_connectvia=[monitor node IP/hostname]’
–env ‘configsync_enable=yes’
-v /opt/vmr/internalSpool:/usr/sw/internalSpool
-v /opt/vmr/diags:/var/lib/solace/diags
-v /opt/vmr/jail:/usr/sw/jail
-v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb
-v /opt/vmr/var:/usr/sw/var
-v /opt/vmr/adb:/usr/sw/adb
–name= solaceBackup solace-pubsub-standard

And, lastly for the monitor node, use this command:

sudo docker run
–network=host
–uts=host
–shm-size=2g
–ulimit core=-1
–ulimit memlock=-1
–ulimit nofile=2448:42192
–restart=always
–detach=true
–memory-swap=-1
–memory-reservation=0
–env ‘username_admin_globalaccesslevel=admin’
–env ‘username_admin_password=admin’
–env ‘nodetype= monitoring’
–env ‘routername= monitor’
–env ‘redundancy_activestandbyrole= monitor’
–env ‘redundancy_group_password=password’
–env ‘redundancy_enable=yes’
–env ‘redundancy_group_node_primary_nodetype=message_routing’
–env ‘redundancy_group_node_primary_connectvia=[primary node IP/hostname]’
–env ‘redundancy_group_node_backup_nodetype=message_routing’
–env ‘redundancy_group_node_backup_connectvia=[backup node IP/hostname]’
–env ‘redundancy_group_node_monitor_nodetype=monitoring’
–env ‘redundancy_group_node_monitor_connectvia=[monitor node IP/hostname]’
–env ‘configsync_enable=yes’
-v /opt/vmr/internalSpool:/usr/sw/internalSpool
-v /opt/vmr/diags:/var/lib/solace/diags
-v /opt/vmr/jail:/usr/sw/jail
-v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb
-v /opt/vmr/var:/usr/sw/var
-v /opt/vmr/adb:/usr/sw/adb
–name= solaceMonitor solace-pubsub-standard

For your case, if you want to debug your current setup, please share the commands you have used when setting up the HA. Otherwise, if you want to try the commands I shared, then just make sure the docker container names are not the same as your current ones.

Hi Arih
Thanks for the quick response. I have 3 RedHats. (From Azure - but lets forget the fact where it comes from and consider these are just 3 RedHat VMs for make it simple). And yes each running it’s own docker version of Solace PubSub+.
I will try to redo the setup using the docker create commands you provided and get back.
Another Q - Assume I should create external LB to balance the load between solace nodes. Could you give some input on this too, should it be only for port 55555 or any other ports, and what if I want to use TLS/SSL encryption.

Hi Susi,
For the LB setup, you can refer to this page for the default ports that Solace PubSub+ use. For encrypted SMF port, the default is 55443. To enable that, you basically need to setup the server certificate and then enable secured port for the service that you want. You can start from this page for that.
A bit more detailed on this is about the load balancer health-check, please refer to this page for that. Note that this is HTTP-based checks and you should refer to the Redundancy Configured column.

Hi Arih
I was able to start primary node and backup node with the commands you given, However the monitoring node wasnt success when I give the run command it gives me following error.
INFO repairDatabase.py: processing database (currDbPath: /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000, nextDbPath: /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000085)
INFO Processing baseline /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000/dbBaseline
WARN Baseline /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000/dbBaseline: does not exist
WARN Database /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000: backing up database to jail/configs/db.corrupt.db.00000000.1574914975
Failed to repair database, exiting (rc = 1)

Any idea on this, Should we have a monitoring node or can we survive only with Primary/Backup nodes only.
Thanks
Susi

Hi Susi,
Monitoring node is mandatory. And for a quick answer, I’d try to create a new node and run the same command.

One more question to the same thing? To see solace logs for debug purposes from the HOST os what would be the best way? Does solace outputs its logs to the docker run command output ?

Hi Susi,
You can check the docker container logs, but for the complete logs you can go inside the container and have access the complete set of the broker’s log files.
$ docker exec -it /bin/bash
$ cd logs

Or, you can stream the broker’s log to you existing syslog server. More on that in here.

Hi Arih
Thanks for getting back on logs, Did you have a chance to look at Solace-monitoring node startup issue ?
Thanks
Susi

Hi Susi,
For your monitoring node, it looks like the storage of that node was gone or corrupted. Since this is a new node, would be easier just to recreate a new node and rerun the monitor node setup cli commands :slight_smile:

Hi Arih
Thanks for the support, was able to setup monitor node after cleaning the file system. Could you please point me some document on how to check the cluster status. Currently status is
Monitoring node
default VPN Status: Down
Replication: Off
DMR: Off

Backup Node
#config-sync Status: Down
Replication: Off
DMR: Off
default Status: Down
Replication: Off
DMR: On

PrimaryNode
#config-sync Status: Up
Replication: Off
DMR: Off
default Status: Up
Replication: Off
DMR: On

Is the above status shows a good cluster ?

You can run “show redundancy” from the CLI, can be either on the primary or backup node. The result will tell you the redundancy status is up or down.

Hi arih,
i’m also getting the same issue for monitoring node,i deleted the file system also but still same error,pls help on this issue
Failed to render config files, exiting (rc = 1)
Host Boot ID: 88a01798-b61d-4922-a213-fbad2e032039
Starting VMR Docker Container: Wed Jan 15 06:11:06 UTC 2020
SolOS Version: soltr_9.3.1.5
INFO repairDatabase.py: processing database (currDbPath: /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000, nextDbPath: /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000001)
INFO Processing baseline /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000/dbBaseline
WARN Baseline /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000/dbBaseline: does not exist
WARN Database /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000: backing up database to jail/configs/db.corrupt.db.00000000.1579068729
Failed to repair database, exiting (rc = 1)
command i’m using:
docker run -d -p 8080:8080 -p 55555:55555 -p 55443:55443 -p 55556:55556 -p 55003:55003
–network=host
–uts=host
–shm-size=2g
–ulimit core=-1
–ulimit memlock=-1
–ulimit nofile=2448:42192
–restart=always
–detach=true
–memory-reservation=0
–env ‘username_admin_globalaccesslevel=admin’
–env ‘username_admin_password=admin’
–env ‘nodetype= monitoring’
–env ‘routername= monitor’
–env ‘redundancy_activestandbyrole= monitor’
–env ‘redundancy_group_password=password’
–env ‘redundancy_enable=yes’
–env ‘redundancy_group_node_primary_nodetype=message_routing’
–env ‘redundancy_group_node_primary_connectvia=[primaryip]’
–env ‘redundancy_group_node_backup_nodetype=message_routing’
–env ‘redundancy_group_node_backup_connectvia=[backupip]’
–env ‘redundancy_group_node_monitor_nodetype=monitoring’
–env ‘redundancy_group_node_monitor_connectvia=[monitoring ip]’
–env ‘configsync_enable=yes’
-v /opt/vmr/internalSpool:/usr/sw/internalSpool:rw
-v /opt/vmr/diags:/var/lib/solace/diags:rw
-v /opt/vmr/jail:/usr/sw/jail:rw
-v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb:rw
-v /opt/vmr/var:/usr/sw/var:rw
-v /opt/vmr/adb:/usr/sw/adb:rw
–name=solaceMonitor
f327d7a752a6

Hi All,
issue resolved, there was a space on nodetype,routename