How to create Solace HA cluster Manually
Hi Team
I would like to create solace software cluster manually on my VMs. My VMs are on Azure and they are Red-Hats. (Note that I'm aware that there is a Azure template to achieve this easily but due to some reasons I cant use this automated way)
As per some documents I came across I'm aware we need 3 nodes, solace1 (active-primary), solace2 (active-backup) and and solace3 (monitoring node) . I was able to create standlaone single node solace installations by following https://docs.solace.com/Solace-SW-Broker-Set-Up/Docker-Containers/Set-Up-Docker-Container-CentOS-Azure.htm
Then I tried to follow https://docs.solace.com/Configuring-and-Managing/Configuring-HA-Groups.htm to configure the redundancy group. While configuring the last part I'm getting this issue
(configure/redundancy)# no shutdown
ERROR: Invalid redundancy group configuration, could not determine mate router-name
Command Failed
Another doubt is should we configure LoadBalancer for this, I could not found any docs on manual configuration of loadbalancer.
Appreciate if some one could guide on how to create HA cluster manually, given that we have 3 nodes.
Thank you
Susi
Answers
-
Hi Susi,
Just to confirm, so now you have 3 CentOS VMs, each running it's own docker version of Solace PubSub+ right? And the error you showed is still with the primary broker configuration?
I would also recommend looking at option to create container instances directly from Azure, and then only play around with the docker environment variables. If you're interested, I can give it a try and share the steps on Monday.0 -
My bad, just tried to set this up using Azure Container Instance, but not successful since they don't support setting shm yet. Similarly, we can use docker environment variable on the docker create step when creating the broker on your CentOS VMs to setup HA.
For example, on the primary node, use this docker create command instead.
sudo docker run \
--network=host \
--uts=host \
--shm-size=2g \
--ulimit core=-1 \
--ulimit memlock=-1 \
--ulimit nofile=2448:42192 \
--restart=always \
--detach=true \
--memory-swap=-1 \
--memory-reservation=0 \
--env 'username_admin_globalaccesslevel=admin' \
--env 'username_admin_password=admin' \
--env 'nodetype=message_routing' \
--env 'routername=primary' \
--env 'redundancy_matelink_connectvia=[backup node IP/hostname]' \
--env 'redundancy_activestandbyrole=primary' \
--env 'redundancy_group_password=password' \
--env 'redundancy_enable=yes' \
--env 'redundancy_group_node_primary_nodetype=message_routing' \
--env 'redundancy_group_node_primary_connectvia=[primary node IP/hostname]' \
--env 'redundancy_group_node_backup_nodetype=message_routing' \
--env 'redundancy_group_node_backup_connectvia=[backup node IP/hostname]' \
--env 'redundancy_group_node_monitor_nodetype=monitoring' \
--env 'redundancy_group_node_monitor_connectvia=[monitor node IP/hostname]' \
--env 'configsync_enable=yes' \
-v /opt/vmr/internalSpool:/usr/sw/internalSpool \
-v /opt/vmr/diags:/var/lib/solace/diags \
-v /opt/vmr/jail:/usr/sw/jail \
-v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb \
-v /opt/vmr/var:/usr/sw/var \
-v /opt/vmr/adb:/usr/sw/adb \
--name=solacePrimary solace-pubsub-standardfor all the *connect_via variables, you can use hostname, public IP, or internal IP. The important thing is the nodes must be able to resolve the other nodes via that hostname/IP.
Now, for the backup node, similarly use the following command:
sudo docker run \
--network=host \
--uts=host \
--shm-size=2g \
--ulimit core=-1 \
--ulimit memlock=-1 \
--ulimit nofile=2448:42192 \
--restart=always \
--detach=true \
--memory-swap=-1 \
--memory-reservation=0 \
--env 'username_admin_globalaccesslevel=admin' \
--env 'username_admin_password=admin' \
--env 'nodetype=message_routing' \
--env 'routername=backup' \
--env 'redundancy_matelink_connectvia=[primary node IP/hostname]' \
--env 'redundancy_activestandbyrole=backup' \
--env 'redundancy_group_password=password' \
--env 'redundancy_enable=yes' \
--env 'redundancy_group_node_primary_nodetype=message_routing' \
--env 'redundancy_group_node_primary_connectvia=[primary node IP/hostname]' \
--env 'redundancy_group_node_backup_nodetype=message_routing' \
--env 'redundancy_group_node_backup_connectvia=[backup node IP/hostname]' \
--env 'redundancy_group_node_monitor_nodetype=monitoring' \
--env 'redundancy_group_node_monitor_connectvia=[monitor node IP/hostname]' \
--env 'configsync_enable=yes' \
-v /opt/vmr/internalSpool:/usr/sw/internalSpool \
-v /opt/vmr/diags:/var/lib/solace/diags \
-v /opt/vmr/jail:/usr/sw/jail \
-v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb \
-v /opt/vmr/var:/usr/sw/var \
-v /opt/vmr/adb:/usr/sw/adb \
--name= solaceBackup solace-pubsub-standardAnd, lastly for the monitor node, use this command:
sudo docker run \
--network=host \
--uts=host \
--shm-size=2g \
--ulimit core=-1 \
--ulimit memlock=-1 \
--ulimit nofile=2448:42192 \
--restart=always \
--detach=true \
--memory-swap=-1 \
--memory-reservation=0 \
--env 'username_admin_globalaccesslevel=admin' \
--env 'username_admin_password=admin' \
--env 'nodetype= monitoring' \
--env 'routername= monitor' \
--env 'redundancy_activestandbyrole= monitor' \
--env 'redundancy_group_password=password' \
--env 'redundancy_enable=yes' \
--env 'redundancy_group_node_primary_nodetype=message_routing' \
--env 'redundancy_group_node_primary_connectvia=[primary node IP/hostname]' \
--env 'redundancy_group_node_backup_nodetype=message_routing' \
--env 'redundancy_group_node_backup_connectvia=[backup node IP/hostname]' \
--env 'redundancy_group_node_monitor_nodetype=monitoring' \
--env 'redundancy_group_node_monitor_connectvia=[monitor node IP/hostname]' \
--env 'configsync_enable=yes' \
-v /opt/vmr/internalSpool:/usr/sw/internalSpool \
-v /opt/vmr/diags:/var/lib/solace/diags \
-v /opt/vmr/jail:/usr/sw/jail \
-v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb \
-v /opt/vmr/var:/usr/sw/var \
-v /opt/vmr/adb:/usr/sw/adb \
--name= solaceMonitor solace-pubsub-standardFor your case, if you want to debug your current setup, please share the commands you have used when setting up the HA. Otherwise, if you want to try the commands I shared, then just make sure the docker container names are not the same as your current ones.
1 -
Hi Arih
Thanks for the quick response. I have 3 RedHats. (From Azure - but lets forget the fact where it comes from and consider these are just 3 RedHat VMs for make it simple). And yes each running it's own docker version of Solace PubSub+.
I will try to redo the setup using the docker create commands you provided and get back.
Another Q - Assume I should create external LB to balance the load between solace nodes. Could you give some input on this too, should it be only for port 55555 or any other ports, and what if I want to use TLS/SSL encryption.0 -
Hi Susi,
For the LB setup, you can refer to this page for the default ports that Solace PubSub+ use. For encrypted SMF port, the default is 55443. To enable that, you basically need to setup the server certificate and then enable secured port for the service that you want. You can start from this page for that.
A bit more detailed on this is about the load balancer health-check, please refer to this page for that. Note that this is HTTP-based checks and you should refer to the Redundancy Configured column.0 -
Hi Arih
I was able to start primary node and backup node with the commands you given, However the monitoring node wasnt success when I give the run command it gives me following error.
INFO repairDatabase.py: processing database (currDbPath: /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000, nextDbPath: /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000085)
INFO Processing baseline /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000/dbBaseline
WARN Baseline /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000/dbBaseline: does not exist
WARN Database /usr/sw/var/soltr_9.2.0.14/.dbHistory/db.00000000: backing up database to jail/configs/db.corrupt.db.00000000.1574914975
Failed to repair database, exiting (rc = 1)Any idea on this, Should we have a monitoring node or can we survive only with Primary/Backup nodes only.
Thanks
Susi0 -
Hi Susi,
You can check the docker container logs, but for the complete logs you can go inside the container and have access the complete set of the broker's log files.
$ docker exec -it /bin/bash
$ cd logsOr, you can stream the broker's log to you existing syslog server. More on that in here.
0 -
Hi Arih
Thanks for the support, was able to setup monitor node after cleaning the file system. Could you please point me some document on how to check the cluster status. Currently status is
Monitoring node
default VPN Status: Down
Replication: Off
DMR: OffBackup Node
config-sync Status: Down
Replication: Off
DMR: Off
default Status: Down
Replication: Off
DMR: OnPrimaryNode
config-sync Status: Up
Replication: Off
DMR: Off
default Status: Up
Replication: Off
DMR: OnIs the above status shows a good cluster ?
0 -
Hi arih,
i'm also getting the same issue for monitoring node,i deleted the file system also but still same error,pls help on this issue
Failed to render config files, exiting (rc = 1)
Host Boot ID: 88a01798-b61d-4922-a213-fbad2e032039
Starting VMR Docker Container: Wed Jan 15 06:11:06 UTC 2020
SolOS Version: soltr_9.3.1.5
INFO repairDatabase.py: processing database (currDbPath: /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000, nextDbPath: /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000001)
INFO Processing baseline /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000/dbBaseline
WARN Baseline /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000/dbBaseline: does not exist
WARN Database /usr/sw/var/soltr_9.3.1.5/.dbHistory/db.00000000: backing up database to jail/configs/db.corrupt.db.00000000.1579068729
Failed to repair database, exiting (rc = 1)
command i'm using:
docker run -d -p 8080:8080 -p 55555:55555 -p 55443:55443 -p 55556:55556 -p 55003:55003 \
--network=host \
--uts=host \
--shm-size=2g \
--ulimit core=-1 \
--ulimit memlock=-1 \
--ulimit nofile=2448:42192 \
--restart=always \
--detach=true \
--memory-reservation=0 \
--env 'username_admin_globalaccesslevel=admin' \
--env 'username_admin_password=admin' \
--env 'nodetype= monitoring' \
--env 'routername= monitor' \
--env 'redundancy_activestandbyrole= monitor' \
--env 'redundancy_group_password=password' \
--env 'redundancy_enable=yes' \
--env 'redundancy_group_node_primary_nodetype=message_routing' \
--env 'redundancy_group_node_primary_connectvia=[primaryip]' \
--env 'redundancy_group_node_backup_nodetype=message_routing' \
--env 'redundancy_group_node_backup_connectvia=[backupip]' \
--env 'redundancy_group_node_monitor_nodetype=monitoring' \
--env 'redundancy_group_node_monitor_connectvia=[monitoring ip]' \
--env 'configsync_enable=yes' \
-v /opt/vmr/internalSpool:/usr/sw/internalSpool:rw \
-v /opt/vmr/diags:/var/lib/solace/diags:rw \
-v /opt/vmr/jail:/usr/sw/jail:rw \
-v /opt/vmr/softAdb:/usr/sw/internalSpool/softAdb:rw \
-v /opt/vmr/var:/usr/sw/var:rw \
-v /opt/vmr/adb:/usr/sw/adb:rw \
--name=solaceMonitor \
f327d7a752a60 -
Hi All,
issue resolved, there was a space on nodetype,routename0