How to create Solace HA cluster Manually

Susi Dman · November 2019

Hi Team
I would like to create solace software cluster manually on my VMs. My VMs are on Azure and they are Red-Hats. (Note that I'm aware that there is a Azure template to achieve this easily but due to some reasons I cant use this automated way)

As per some documents I came across I'm aware we need 3 nodes, solace1 (active-primary), solace2 (active-backup) and and solace3 (monitoring node) . I was able to create standlaone single node solace installations by following https://docs.solace.com/Solace-SW-Broker-Set-Up/Docker-Containers/Set-Up-Docker-Container-CentOS-Azure.htm

Then I tried to follow https://docs.solace.com/Configuring-and-Managing/Configuring-HA-Groups.htm to configure the redundancy group. While configuring the last part I'm getting this issue

(configure/redundancy)# no shutdown
ERROR: Invalid redundancy group configuration, could not determine mate router-name
Command Failed

Another doubt is should we configure LoadBalancer for this, I could not found any docs on manual configuration of loadbalancer.

Appreciate if some one could guide on how to create HA cluster manually, given that we have 3 nodes.

Thank you
Susi

arih · November 2019

Hi Susi,
Just to confirm, so now you have 3 CentOS VMs, each running it's own docker version of Solace PubSub+ right? And the error you showed is still with the primary broker configuration?
I would also recommend looking at option to create container instances directly from Azure, and then only play around with the docker environment variables. If you're interested, I can give it a try and share the steps on Monday.

arih · November 2019

My bad, just tried to set this up using Azure Container Instance, but not successful since they don't support setting shm yet. Similarly, we can use docker environment variable on the docker create step when creating the broker on your CentOS VMs to setup HA.

For example, on the primary node, use this docker create command instead.
sudo docker run \
--network=host \
--uts=host \
--shm-size=2g \
--ulimit core=-1 \
--ulimit memlock=-1 \
--ulimit nofile=2448:42192 \
--restart=always \
--detach=true \
--memory-swap=-1 \
--memory-reservation=0 \
--env 'username_admin_globalaccesslevel=admin' \
--env 'username_admin_password=admin' \
--env 'nodetype=message_routing' \
--env 'routername=primary' \
--env 'redundancy_matelink_connectvia=[backup node IP/hostname]' \
--env 'redundancy_activestandbyrole=primary' \
--env 'redundancy_group_password=password' \
--env 'redundancy_enable=yes' \
--env 'redundancy_group_node_primary_nodetype=message_routing' \
--env 'redundancy_group_node_primary_connectvia=[primary node IP/hostname]' \
--env 'redundancy_group_node_backup_nodetype=message_routing' \
--env 'redundancy_group_node_backup_connectvia=[backup node IP/hostname]' \
--env 'redundancy_group_node_monitor_nodetype=monitoring' \
--env 'redundancy_group_node_monitor_connectvia=[monitor node IP/hostname]' \
--env 'configsync_enable=yes' \
-v /opt/vmr/internalSpool:/usr/sw/internalSpool \
-v /opt/vmr/diags:/var/lib/solace/diags \
-v /opt/vmr/jail:/usr/sw/jail \
-v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb \
-v /opt/vmr/var:/usr/sw/var \
-v /opt/vmr/adb:/usr/sw/adb \
--name=solacePrimary solace-pubsub-standard

for all the *connect_via variables, you can use hostname, public IP, or internal IP. The important thing is the nodes must be able to resolve the other nodes via that hostname/IP.

Now, for the backup node, similarly use the following command:
sudo docker run \
--network=host \
--uts=host \
--shm-size=2g \
--ulimit core=-1 \
--ulimit memlock=-1 \
--ulimit nofile=2448:42192 \
--restart=always \
--detach=true \
--memory-swap=-1 \
--memory-reservation=0 \
--env 'username_admin_globalaccesslevel=admin' \
--env 'username_admin_password=admin' \
--env 'nodetype=message_routing' \
--env 'routername=backup' \
--env 'redundancy_matelink_connectvia=[primary node IP/hostname]' \
--env 'redundancy_activestandbyrole=backup' \
--env 'redundancy_group_password=password' \
--env 'redundancy_enable=yes' \
--env 'redundancy_group_node_primary_nodetype=message_routing' \
--env 'redundancy_group_node_primary_connectvia=[primary node IP/hostname]' \
--env 'redundancy_group_node_backup_nodetype=message_routing' \
--env 'redundancy_group_node_backup_connectvia=[backup node IP/hostname]' \
--env 'redundancy_group_node_monitor_nodetype=monitoring' \
--env 'redundancy_group_node_monitor_connectvia=[monitor node IP/hostname]' \
--env 'configsync_enable=yes' \
-v /opt/vmr/internalSpool:/usr/sw/internalSpool \
-v /opt/vmr/diags:/var/lib/solace/diags \
-v /opt/vmr/jail:/usr/sw/jail \
-v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb \
-v /opt/vmr/var:/usr/sw/var \
-v /opt/vmr/adb:/usr/sw/adb \
--name= solaceBackup solace-pubsub-standard

And, lastly for the monitor node, use this command:

sudo docker run \
--network=host \
--uts=host \
--shm-size=2g \
--ulimit core=-1 \
--ulimit memlock=-1 \
--ulimit nofile=2448:42192 \
--restart=always \
--detach=true \
--memory-swap=-1 \
--memory-reservation=0 \
--env 'username_admin_globalaccesslevel=admin' \
--env 'username_admin_password=admin' \
--env 'nodetype= monitoring' \
--env 'routername= monitor' \
--env 'redundancy_activestandbyrole= monitor' \
--env 'redundancy_group_password=password' \
--env 'redundancy_enable=yes' \
--env 'redundancy_group_node_primary_nodetype=message_routing' \
--env 'redundancy_group_node_primary_connectvia=[primary node IP/hostname]' \
--env 'redundancy_group_node_backup_nodetype=message_routing' \
--env 'redundancy_group_node_backup_connectvia=[backup node IP/hostname]' \
--env 'redundancy_group_node_monitor_nodetype=monitoring' \
--env 'redundancy_group_node_monitor_connectvia=[monitor node IP/hostname]' \
--env 'configsync_enable=yes' \
-v /opt/vmr/internalSpool:/usr/sw/internalSpool \
-v /opt/vmr/diags:/var/lib/solace/diags \
-v /opt/vmr/jail:/usr/sw/jail \
-v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb \
-v /opt/vmr/var:/usr/sw/var \
-v /opt/vmr/adb:/usr/sw/adb \
--name= solaceMonitor solace-pubsub-standard

For your case, if you want to debug your current setup, please share the commands you have used when setting up the HA. Otherwise, if you want to try the commands I shared, then just make sure the docker container names are not the same as your current ones.

Susi Dman · November 2019

Hi Arih
Thanks for the quick response. I have 3 RedHats. (From Azure - but lets forget the fact where it comes from and consider these are just 3 RedHat VMs for make it simple). And yes each running it's own docker version of Solace PubSub+.
I will try to redo the setup using the docker create commands you provided and get back.
Another Q - Assume I should create external LB to balance the load between solace nodes. Could you give some input on this too, should it be only for port 55555 or any other ports, and what if I want to use TLS/SSL encryption.

arih · November 2019

Hi Susi,
For the LB setup, you can refer to this page for the default ports that Solace PubSub+ use. For encrypted SMF port, the default is 55443. To enable that, you basically need to setup the server certificate and then enable secured port for the service that you want. You can start from this page for that.
A bit more detailed on this is about the load balancer health-check, please refer to this page for that. Note that this is HTTP-based checks and you should refer to the Redundancy Configured column.

Susi Dman · November 2019

Hi Arih
I was able to start primary node and backup node with the commands you given, However the monitoring node wasnt success when I give the run command it gives me following error.
INFO repairDatabase.py: processing database (currDbPath: /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000, nextDbPath: /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000085)
INFO Processing baseline /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000/dbBaseline
WARN Baseline /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000/dbBaseline: does not exist
WARN Database /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000: backing up database to jail/configs/db.corrupt.db.00000000.1574914975
Failed to repair database, exiting (rc = 1)

Any idea on this, Should we have a monitoring node or can we survive only with Primary/Backup nodes only.
Thanks
Susi

arih · November 2019

Hi Susi,
Monitoring node is mandatory. And for a quick answer, I'd try to create a new node and run the same command.

Susi Dman · November 2019

One more question to the same thing? To see solace logs for debug purposes from the HOST os what would be the best way? Does solace outputs its logs to the docker run command output ?

arih · December 2019

Hi Susi,
You can check the docker container logs, but for the complete logs you can go inside the container and have access the complete set of the broker's log files.
$ docker exec -it /bin/bash
$ cd logs

Or, you can stream the broker's log to you existing syslog server. More on that in here.

Susi Dman · December 2019

Hi Arih
Thanks for getting back on logs, Did you have a chance to look at Solace-monitoring node startup issue ?
Thanks
Susi

arih · December 2019

Hi Susi,
For your monitoring node, it looks like the storage of that node was gone or corrupted. Since this is a new node, would be easier just to recreate a new node and rerun the monitor node setup cli commands

Susi Dman · December 2019

Hi Arih
Thanks for the support, was able to setup monitor node after cleaning the file system. Could you please point me some document on how to check the cluster status. Currently status is
Monitoring node
default VPN Status: Down
Replication: Off
DMR: Off

Backup Node

config-sync Status: Down

Replication: Off
DMR: Off
default Status: Down
Replication: Off
DMR: On

PrimaryNode

config-sync Status: Up

Replication: Off
DMR: Off
default Status: Up
Replication: Off
DMR: On

Is the above status shows a good cluster ?

arih · December 2019

You can run "show redundancy" from the CLI, can be either on the primary or backup node. The result will tell you the redundancy status is up or down.

hemaprasad · January 2020

Hi arih,
i'm also getting the same issue for monitoring node,i deleted the file system also but still same error,pls help on this issue
Failed to render config files, exiting (rc = 1)
Host Boot ID: 88a01798-b61d-4922-a213-fbad2e032039
Starting VMR Docker Container: Wed Jan 15 06:11:06 UTC 2020
SolOS Version: soltr_9.3.1.5
INFO repairDatabase.py: processing database (currDbPath: /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000, nextDbPath: /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000001)
INFO Processing baseline /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000/dbBaseline
WARN Baseline /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000/dbBaseline: does not exist
WARN Database /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000: backing up database to jail/configs/db.corrupt.db.00000000.1579068729
Failed to repair database, exiting (rc = 1)
command i'm using:
docker run -d -p 8080:8080 -p 55555:55555 -p 55443:55443 -p 55556:55556 -p 55003:55003 \
--network=host \
--uts=host \
--shm-size=2g \
--ulimit core=-1 \
--ulimit memlock=-1 \
--ulimit nofile=2448:42192 \
--restart=always \
--detach=true \
--memory-reservation=0 \
--env 'username_admin_globalaccesslevel=admin' \
--env 'username_admin_password=admin' \
--env 'nodetype= monitoring' \
--env 'routername= monitor' \
--env 'redundancy_activestandbyrole= monitor' \
--env 'redundancy_group_password=password' \
--env 'redundancy_enable=yes' \
--env 'redundancy_group_node_primary_nodetype=message_routing' \
--env 'redundancy_group_node_primary_connectvia=[primaryip]' \
--env 'redundancy_group_node_backup_nodetype=message_routing' \
--env 'redundancy_group_node_backup_connectvia=[backupip]' \
--env 'redundancy_group_node_monitor_nodetype=monitoring' \
--env 'redundancy_group_node_monitor_connectvia=[monitoring ip]' \
--env 'configsync_enable=yes' \
-v /opt/vmr/internalSpool:/usr/sw/internalSpool:rw \
-v /opt/vmr/diags:/var/lib/solace/diags:rw \
-v /opt/vmr/jail:/usr/sw/jail:rw \
-v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb:rw \
-v /opt/vmr/var:/usr/sw/var:rw \
-v /opt/vmr/adb:/usr/sw/adb:rw \
--name=solaceMonitor \
f327d7a752a6

hemaprasad · January 2020

Hi All,
issue resolved, there was a space on nodetype,routename

How to create Solace HA cluster Manually

Answers

config-sync Status: Down

config-sync Status: Up

Categories

This Month's Leaders

This Week's Leaders