HA redundancy - Monitoring node issue

madhu · April 2023

I followed the steps documented in the HA redundancy section, both the primary and backup nodes are online and I can see the redundancy working.

Monitoring node is offline, the show redundancy group on monitoring node shows all nodes as offline.

https://docs.solace.com/Features/HA-Redundancy/Configuring-HA-Groups.htm

I tried multiple times the behaviour is consistent, not sure what I am missing.

Thanks

Madhu

arih · April 2023

Hi @madhu

It'd be helpful to share more details, like what is your setup looks like, is it a Kubernetes-based, or you;re doing multiple VMs or EC2 instances maybe?

And a few screenshots or log snippets would be great too.

madhu · April 2023

Thanks Arih for quick reply.

3 machines ec2 instances.

Primary

Backup

Monitoring Node

Primary node

Backup node

Debug on monitoring node

Thanks

Madhu

arih · April 2023

that's awesome!

can you also share the content of command.log from the monitor node.

also, did you somehow changed hostname or IP address for the monitoring node sometime after setup?

madhu · April 2023

No its new setup, no changes to ip or hostname.

madhu · April 2023

Versions I have tried 10.2.1.51 and 10.3.0.32, AMI from community standard instances.

aws-marketplace/solace-pubsub-standard-10.3.0.32-amzn2-15.2.0-ac3bbfe4-a7d2-4591-bbc5-f43908c43764

Thanks

Madhu

arih · April 2023

noted, can you show us the output of this command from the monitor node as well as one of the pri/sec node?

solace> show ip vrf management

madhu · April 2023

Monitoring node

Primary node:

arih · April 2023

hmm, that looks correct.

can we go back to command.log and show the full content from beginning?

madhu · April 2023

Command log

Thanks

Madhu

arih · April 2023

that looks good as well... I'm running out of ideas :)

from the monitor node linux shell, are you able to ping 10.0.0.126 and 10.0.0.128?

madhu · April 2023

Yes they are reachable, any specific ports other than documentation provided.. ? or this offline for monitor node has some bug?

Thanks

Madhu

madhu · April 2023

I observe the below error message on the monitoring node debug log. Not exactly sure what does it mean, I just performed the steps provided in the documentation no customization performed on any node.

2023-04-26T05:52:56.174+00:00 <local0.err> ip-10-0-0-126 appuser[387]: /usr/sw ConsulFSM.cpp:860 (REDUNDANCY - 0x0000000

0) ConsulProxyThread(9)@controlplane(10) ERROR Could not determine self node configuration

2023-04-26T05:52:57.175+00:00 <local0.err> ip-10-0-0-126 appuser[387]: /usr/sw ConsulFSM.cpp:860 (REDUNDANCY - 0x0000000

0) ConsulProxyThread(9)@controlplane(10) ERROR Could not determine self node configuration

2023-04-26T05:52:58.176+00:00 <local0.err> ip-10-0-0-126 appuser[387]: /usr/sw ConsulFSM.cpp:860 (REDUNDANCY - 0x0000000

0) ConsulProxyThread(9)@controlplane(10) ERROR Could not determine self node configuration

2023-04-26T05:52:59.177+00:00 <local0.err> ip-10-0-0-126 appuser[387]: /usr/sw ConsulFSM.cpp:860 (REDUNDANCY - 0x0000000

0) ConsulProxyThread(9)@controlplane(10) ERROR Could not determine self node configuration

Thanks

Madhu

Aaron · April 2023

I'm not sure either. You tried with two different versions and they both had the same issue? Sure all the firewall rules are correct..? TCP & UDP?

arih · April 2023

hey @madhu ,

can you check one more thing by issuing below command on all nodes:

solace> show router name

madhu · April 2023

Wow arih, that did the trick.

My mistake some reason I missed updated the router-name in monitoring node. I did it consistently in all my attempts :(.

Thanks

Madhu

arih · April 2023

Hahaha, no worries, that's muscle memory I guess ;)

Good to hear it got solved!

As added note, there are options to use helm charts, docker compose, cloud formation, etc. that might help automate these steps. Or even better, just use solace.com/cloud if you just need a broker that just runs ;)

Aaron · April 2023

@madhu glad you got it resolved!! 🙌🏼

What was it before? And what did you have to change it to? I'm surprised there wasn't an easier error/status notification somewhere in CLI showing a mismatch or something.

madhu · May 2023

I think i found the pattern, seems like when we change the node to monitoring node it overwrites the node-name.

madhu · May 2023

@Aaron, no error on monitoring node. On messaging node we get error if node-name doesn't match.

Aaron · May 2023

Looking at your CLI output, I can see you setting the router-name and then reloading the default config. Which essentially completely replaces all configuration with whatever is stored internally as default. So yeah, it would have overwritten the router name that you set.

I've raised a documentation enhancement for the page you cited in your first post.

🎄 Happy Holidays! 🥳

Happy Holidays!

HA redundancy - Monitoring node issue

Comments

Categories

This Month's Leaders

This Week's Leaders