HA redundancy - Monitoring node issue

I followed the steps documented in the HA redundancy section, both the primary and backup nodes are online and I can see the redundancy working.
Monitoring node is offline, the show redundancy group on monitoring node shows all nodes as offline.
Configuring High-availability (HA) Redundancy Groups
I tried multiple times the behaviour is consistent, not sure what I am missing.
Thanks
Madhu

Hi @madhu
It’d be helpful to share more details, like what is your setup looks like, is it a Kubernetes-based, or you;re doing multiple VMs or EC2 instances maybe?
And a few screenshots or log snippets would be great too.

Thanks Arih for quick reply.
3 machines ec2 instances.
Primary


Backup

Monitoring Node

Primary node

Backup node

Debug on monitoring node

Thanks
Madhu

that’s awesome!
can you also share the content of command.log from the monitor node.
also, did you somehow changed hostname or IP address for the monitoring node sometime after setup?

No its new setup, no changes to ip or hostname.

Versions I have tried 10.2.1.51 and 10.3.0.32, AMI from community standard instances.

aws-marketplace/solace-pubsub-standard-10.3.0.32-amzn2-15.2.0-ac3bbfe4-a7d2-4591-bbc5-f43908c43764

Thanks
Madhu

noted, can you show us the output of this command from the monitor node as well as one of the pri/sec node?
solace> show ip vrf management

Monitoring node

Primary node:

hmm, that looks correct.
can we go back to command.log and show the full content from beginning?

Command log

Thanks
Madhu

that looks good as well… I’m running out of ideas :slight_smile:
from the monitor node linux shell, are you able to ping 10.0.0.126 and 10.0.0.128?

Yes they are reachable, any specific ports other than documentation provided… ? or this offline for monitor node has some bug?

Thanks
Madhu

I observe the below error message on the monitoring node debug log. Not exactly sure what does it mean, I just performed the steps provided in the documentation no customization performed on any node.

2023-04-26T05:52:56.174+00:00 <local0.err> ip-10-0-0-126 appuser[387]: /usr/sw            ConsulFSM.cpp:860           (REDUNDANCY  - 0x0000000
0) ConsulProxyThread(9)@controlplane(10)     ERROR  Could not determine self node configuration
2023-04-26T05:52:57.175+00:00 <local0.err> ip-10-0-0-126 appuser[387]: /usr/sw            ConsulFSM.cpp:860           (REDUNDANCY  - 0x0000000
0) ConsulProxyThread(9)@controlplane(10)     ERROR  Could not determine self node configuration
2023-04-26T05:52:58.176+00:00 <local0.err> ip-10-0-0-126 appuser[387]: /usr/sw            ConsulFSM.cpp:860           (REDUNDANCY  - 0x0000000
0) ConsulProxyThread(9)@controlplane(10)     ERROR  Could not determine self node configuration
2023-04-26T05:52:59.177+00:00 <local0.err> ip-10-0-0-126 appuser[387]: /usr/sw            ConsulFSM.cpp:860           (REDUNDANCY  - 0x0000000
0) ConsulProxyThread(9)@controlplane(10)     ERROR  Could not determine self node configuration

Thanks
Madhu

I’m not sure either. You tried with two different versions and they both had the same issue? Sure all the firewall rules are correct…? TCP & UDP?

hey @madhu ,
can you check one more thing by issuing below command on all nodes:
solace> show router name

Wow arih, that did the trick.
My mistake some reason I missed updated the router-name in monitoring node. I did it consistently in all my attempts :(.

Thanks
Madhu

Hahaha, no worries, that’s muscle memory I guess :wink:
Good to hear it got solved!

As added note, there are options to use helm charts, docker compose, cloud formation, etc. that might help automate these steps. Or even better, just use solace.com/cloud if you just need a broker that just runs :wink:

@madhu glad you got it resolved!! ??
What was it before? And what did you have to change it to? I’m surprised there wasn’t an easier error/status notification somewhere in CLI showing a mismatch or something.

I think i found the pattern, seems like when we change the node to monitoring node it overwrites the node-name.

@Aaron, no error on monitoring node. On messaging node we get error if node-name doesn’t match.