🎄 Happy Holidays! 🥳
Most of Solace is closed December 24–January 1 so our employees can spend time with their families. We will re-open Thursday, January 2, 2024. Please expect slower response times during this period and open a support ticket for anything needing immediate assistance.
Happy Holidays!
Please note: most of Solace is closed December 25–January 2, and will re-open Tuesday, January 3, 2023.
Unable to setup SSL based replication between two HA triplets
Hi,
I have two HA triplets and I am trying to setup the SSL based native Solace Replication (Async) between them.
Attempt 1:
I generated server certificate with following instructions:
openssl req -x509 -newKey rsa:4096 -keyout certs/solace_server.key -out certs/solace_server.crt -days 365 cat certs/solace_server.key certs/solace_server.crt > certs/solace_server.pem
Loaded it on to both triplets (primaries and secondaries) using following CLI command:
enable configure ssl server-certificate file solace_server.pem
also generated client certificate using following commands:
./keytool -genKey -keyalg RSA -alias client -keystore certs/client.keystore -storepass <pwd> -validity 365 -startdate -1d -keysize 4096 ./keytool -keystore certs/client.keystore -export -alias client > certs/client.crt openssl x509 -out certs/client.pem -outform pem -text -in certs/client.crt -inform der
and loaded it to both HA triplets(Primaries and secondaries from CLI as following:
enable configure authentication create certificate-authority client certificate file client.pem
When I enable the replication between HA Triplet 1 and 2 I get below exception:
2020-10-08T16:46:09.784+00:00 <local4.info> ip-x-x-x-x event: SYSTEM: SYSTEM_SSL_CONNECTION_REJECTED: - - SSL Connection rejected: reason (certificate verify failed: self signed certificate); connection to y.y.y.y:55443 from x.x.x.x:33282
Note: I have masked ip addresses. x.x.x.x
is primary of HA triplet 1. y.y.y.y
is primary of HA triplet 2.
Attempt 2:
Generated root ca and leaf certificates using following commands (Two certs in Chain - Self signed CA):
openssl genrsa -out root.key 4096 openssl req -new -key root.key -out root.csr -config root_req.config openssl ca -in root.csr -out root.pem -config root.config -selfsign -extfile ca.ext -days 1095 openssl genrsa -out leaf.key 4096 openssl req -new -key leaf.key -out leaf.csr -config leaf_req.config openssl ca -in leaf.csr -out leaf.pem -config root.config -extfile ca.ext -days 1095
Loaded leaf.pem as follows in both triplets:
enable configure ssl server-certificate file leaf.pem
Loaded root.pem as follows in both triplets:
enable configure authentication create certificate-authority solace_ca certificate file root.pem
Now, with this Primary node in HA triplet 1 is getting following exception while connecting to primary node of HA triplet 2:
020-10-15T17:23:42.852+00:00 <local4.info> ip-x.x.x.x event: SYSTEM: SYSTEM_SSL_CONNECTION_REJECTED: - - SSL Connection rejected: reason (certificate verify failed: not trusted common name); connection to y.y.y.y:55443 from x.x.x.x:40027
I enabled debug logging to see what's wrong and I found below logs:
020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw authenticationThread.cpp:614 (MP_AUTH - 0x00000000) AuthenticationThread(10)@mgmtplane(9) DEBUG Received IPC message MSGTYPE_SSL_CERT_VERIFICATION_REQUEST 2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw authClientCertificate.cpp:851 (MP_AUTH - 0x00000001) AuthenticationThread(10)@mgmtplane(9) DEBUG X509 peer certificate processing request chain size=1267 client id=1 conn type = 59 2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw authClientCertificate.cpp:892 (MP_AUTH - 0x00000001) AuthenticationThread(10)@mgmtplane(9) DEBUG X509 peer certificate about to verify chain size=1267, chainLengthFromPeer=1 2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw authClientCertificate.cpp:909 (MP_AUTH - 0x00000001) AuthenticationThread(10)@mgmtplane(9) DEBUG X509 peer certificate verification succeed 2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw authClientCertificate.cpp:922 (MP_AUTH - 0x00000001) AuthenticationThread(10)@mgmtplane(9) DEBUG X509 peer certificate username=Solace Leaf 2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw authClientCertificate.cpp:1018 (MP_AUTH - 0x00000001) AuthenticationThread(10)@mgmtplane(9) DEBUG X509 certificate fail to get valid SAN 2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw authClientCertificate.cpp:1640 (MP_AUTH - 0x00000001) AuthenticationThread(10)@mgmtplane(9) DEBUG Authenticate SSL bridge[1]: CN = Solace Leaf, isValid = 1, chain len 2 2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw authClientCertificate.cpp:1689 (MP_AUTH - 0x00000001) AuthenticationThread(10)@mgmtplane(9) DEBUG Authenticate SSL bridge[1]: No match for common name Solace Leaf 2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw ipcMsg.cpp:1707 (BASE_IPC - 0x00000000) AuthenticationThread(10)@mgmtplane(9) DEBUG Attempt to send message len 1971 to linecard
I tried almost all instructions specified in the documentation. In fact following NOTE from official documentation is confusing to me and I am trying to crack my head on setting up the replication with SSL:
After TLS/SSL is enabled on the replication Config-Sync bridges, for authentication using SSL to succeed, the following must be also be configured: an SSL server certificate on the remote event broker a matching trusted CA on the local event broker the connect port used for the replication mate must be set as SSL When SSL is enabled for the bridge, the replication mates that you set must use SSL connect ports (see Configuring Replication Mates).
Here is the link to the documentation:
https://docs.solace.com/Configuring-and-Managing/Replication-Sys-Level-Settings.htm#SSL
My attempt 2 was based on above explanation and the note. I am not sure if I understood the documentation's point of view properly.
Could someone please help me understand what I'm doing wrong? and help doing it right way as from generating self signed CA and certificates/keys to loading them properly to both triplets?
Thanks,
Raghu
Comments
-
Hi @ rdesoju,
About your 1st attempt: You loaded a client certificate as certificate authority. That's clearly wrong.Let me talk about TLS basics:
1. There is a CA - a certificate authority. This CA will sign server- and client certs. To believe in a CA, you have to load the CA's certificate. In a java app, this is done in the trust store. In a Solace broker this is done via "create certificate-authority". You can have multiple CAs (maybe because some of your communication mates use different CAs)
2. If you want to use TLS in your broker, you need a server certificate - and you're absolutely correct: You have to cat the key and the cert in a .pem-file. Is broker A is communicating with broker B, then broker B needs an certificate-authority for the CA of broker A and vice versa. (if both brokers use certs signed from the same CA, you need to configure that CA just once on both brokers)
3. The Solace broker is able to use it's server certificate as client-certificate.
4. TLS is: both sides of the communication validate the certificate of the other (given that you use a cert on the client side, not just user/password)Just imaging : broker A starts a TLS connection to broker B (maybe to build up a bridge):
1. Broker B sends it's server cert to broker A. Broker A wants to validate that broker-b-cert and needs the CA for cert-B to do that.
2. Broker A sends it's client cert (and we know: That is the server cert of Broker A used here as client cert) back to Broker B. Broker B wants to validate that broker-A-cert and needs the CA for cert-A to do that.if the CA to validate a cert is not available, you see an error like "unable to get issuer certificate".
If you see "not trusted common name", you have to configure the CN (common-name) from your server cert on the communication mate as trusted-common-name
Uli.0 -
Hi @uherbst
Thank you for detailed and valuable information.
I think my attempt 2 was explained in the initial post was per your suggestion on loading the certificates. I turned off "enforce-trusted-common-name" to avoid common name validation using below command.configure replication config-sync bridge ssl-server-certificate-validation no enforce-trusted-common-name
Started seeing below issue:
020-10-16T19:11:04.319+00:00 <local4.notice> ip-x.x.x.x event: VPN: VPN_BRIDGING_LINK_REJECTED: #config-sync - Message VPN (108) #config-sync Bridge #CFGSYNC_REPLICATION_BRIDGE from VPN #config-sync rejected: Service Unavailable 2020-10-16T19:11:07.324+00:00 <local4.notice> ip-x.x.x.x event: VPN: VPN_BRIDGING_LINK_REJECTED: #config-sync - Message VPN (108) #config-sync Bridge #CFGSYNC_REPLICATION_BRIDGE from v:solace100 VPN #config-sync rejected: Bad Request
I verified
config-sync
on both HA triplets A and B is running.
Thanks,
Raghu0 -
Replication is working fine after redoing the steps cleanly.
Once replication is up and running I see below from Primaries of ACTIVE and STANDBY sites:ip-x-x-x-x(configure/redundancy)# show message-vpn default replication detail Message VPN: default Admin Status: enabled Config Status: active Local Bridge: State: n/a Name: n/a Queue State: n/a Authentication: Scheme: Basic Basic: Client Username: default Password Configured: No Client Certificate: Certificate File: Using Server Certificate: Yes Compressed: No SSL: Yes Message Spool: Window Size: 255 Unidirectional: Client Profile: #client-profile Retry Delay: 3 Remote Bridge: State: up Name: #bridge/v:solace100/default/1 Queue: State: bound Quota (MB): 1500 Reject Msg to Sender on Discard: Yes Ack Propagation: Interval in Messages: 20 Sync Replication: Eligible: yes Duration: 0d 0h 1m 4s ---Press any key to continue, or `q' to quit--- Mate Flow Congested: no Duration: 0d 0h 0m 0s Reject Msg When Sync Ineligible: No Transaction Replication Mode: async
ip-y-y-y-y(configure/message-vpn/replication)# show message-vpn default replication detail Message VPN: default Admin Status: enabled Config Status: standby Local Bridge: State: up Name: #MSGVPN_REPLICATION_BRIDGE Queue State: bound Authentication: Scheme: Basic Basic: Client Username: default Password Configured: No Client Certificate: Certificate File: Using Server Certificate: Yes Compressed: No SSL: Yes Message Spool: Window Size: 255 Unidirectional: Client Profile: #client-profile Retry Delay: 3 Remote Bridge: State: n/a Name: n/a Queue: State: n/a Quota (MB): 1500 Reject Msg to Sender on Discard: Yes Ack Propagation: Interval in Messages: 20 Sync Replication: Eligible: n/a Duration: n/a ---Press any key to continue, or `q' to quit--- Mate Flow Congested: n/a Duration: n/a Reject Msg When Sync Ineligible: No Transaction Replication Mode: async
Therefore, I have shutdown ACTIVE HA triplet and expected that STANDBY HA triplet would become ACTIVE automatically.
However, I still see it is in STANDBY state:ip-y-y-y-y# show message-vpn default replication detail Message VPN: default Admin Status: enabled Config Status: standby Local Bridge: State: down Name: #MSGVPN_REPLICATION_BRIDGE Queue State: unbound Authentication: Scheme: Basic Basic: Client Username: default Password Configured: No Client Certificate: Certificate File: Using Server Certificate: Yes Compressed: No SSL: Yes Message Spool: Window Size: 255 Unidirectional: Client Profile: #client-profile Retry Delay: 3 Remote Bridge: State: n/a Name: n/a Queue: State: n/a Quota (MB): 1500 Reject Msg to Sender on Discard: Yes Ack Propagation: Interval in Messages: 20 Sync Replication: Eligible: n/a Duration: n/a ---Press any key to continue, or `q' to quit--- Mate Flow Congested: n/a Duration: n/a Reject Msg When Sync Ineligible: No Transaction Replication Mode: async
Isn't it supposed to become ACTIVE once it loses connection with primary HA triplet?
Could someone please clarify?
Thanks,
Raghu0 -
Hi @rdesoju - DR replication behaves differently. With DR replication, you have to manually switch over since typically, during a DR failover, there are several applications, infrastructure components etc that have to failover as well. Here is what our docs say:
The fail-over of a replication site is often an action that cannot be performed at the messaging layer only—typically there are servers, critical applications, and other infrastructure that must be switched as part of the fail-over. Therefore the fail-over is a co-ordinated operation that must be performed by network administrators. It does not happen automatically.
1