Unable to setup SSL based replication between two HA triplets

Options
rdesoju
rdesoju Member Posts: 66
edited February 2022 in PubSub+ Event Broker #1

Hi,
I have two HA triplets and I am trying to setup the SSL based native Solace Replication (Async) between them.
Attempt 1:
I generated server certificate with following instructions:

openssl req -x509 -newKey rsa:4096 -keyout certs/solace_server.key -out certs/solace_server.crt -days 365
cat certs/solace_server.key certs/solace_server.crt > certs/solace_server.pem

Loaded it on to both triplets (primaries and secondaries) using following CLI command:

enable configure ssl server-certificate file solace_server.pem

also generated client certificate using following commands:

./keytool -genKey -keyalg RSA -alias client -keystore certs/client.keystore -storepass <pwd> -validity 365 -startdate -1d -keysize 4096
./keytool -keystore certs/client.keystore -export -alias client > certs/client.crt
openssl x509 -out certs/client.pem -outform pem -text -in certs/client.crt -inform der

and loaded it to both HA triplets(Primaries and secondaries from CLI as following:

enable configure authentication create certificate-authority client
certificate file client.pem

When I enable the replication between HA Triplet 1 and 2 I get below exception:

2020-10-08T16:46:09.784+00:00 <local4.info> ip-x-x-x-x event: SYSTEM: SYSTEM_SSL_CONNECTION_REJECTED: - - SSL Connection rejected: reason (certificate verify failed: self signed certificate); connection to y.y.y.y:55443 from x.x.x.x:33282

Note: I have masked ip addresses. x.x.x.x is primary of HA triplet 1. y.y.y.y is primary of HA triplet 2.

Attempt 2:
Generated root ca and leaf certificates using following commands (Two certs in Chain - Self signed CA):

openssl genrsa -out root.key 4096
openssl req -new -key root.key -out root.csr -config root_req.config
openssl ca -in root.csr -out root.pem -config root.config -selfsign -extfile ca.ext -days 1095

openssl genrsa -out leaf.key 4096
openssl req -new -key leaf.key -out leaf.csr -config leaf_req.config
openssl ca -in leaf.csr -out leaf.pem -config root.config -extfile ca.ext -days 1095

Loaded leaf.pem as follows in both triplets:

enable configure ssl server-certificate file leaf.pem

Loaded root.pem as follows in both triplets:

enable configure authentication create certificate-authority solace_ca
certificate file root.pem

Now, with this Primary node in HA triplet 1 is getting following exception while connecting to primary node of HA triplet 2:

020-10-15T17:23:42.852+00:00 <local4.info> ip-x.x.x.x event: SYSTEM: SYSTEM_SSL_CONNECTION_REJECTED: - - SSL Connection rejected: reason (certificate verify failed: not trusted common name); connection to y.y.y.y:55443 from x.x.x.x:40027

I enabled debug logging to see what's wrong and I found below logs:

020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw                        authenticationThread.cpp:614          (MP_AUTH     - 0x00000000) AuthenticationThread(10)@mgmtplane(9)         DEBUG    Received IPC message MSGTYPE_SSL_CERT_VERIFICATION_REQUEST
2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw                        authClientCertificate.cpp:851         (MP_AUTH     - 0x00000001) AuthenticationThread(10)@mgmtplane(9)         DEBUG    X509 peer certificate processing request chain size=1267 client id=1 conn type = 59
2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw                        authClientCertificate.cpp:892         (MP_AUTH     - 0x00000001) AuthenticationThread(10)@mgmtplane(9)         DEBUG    X509 peer certificate about to verify chain size=1267, chainLengthFromPeer=1
2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw                        authClientCertificate.cpp:909         (MP_AUTH     - 0x00000001) AuthenticationThread(10)@mgmtplane(9)         DEBUG    X509 peer certificate verification succeed
2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw                        authClientCertificate.cpp:922         (MP_AUTH     - 0x00000001) AuthenticationThread(10)@mgmtplane(9)         DEBUG    X509 peer certificate username=Solace Leaf
2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw                        authClientCertificate.cpp:1018        (MP_AUTH     - 0x00000001) AuthenticationThread(10)@mgmtplane(9)         DEBUG    X509 certificate fail to get valid SAN
2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw                        authClientCertificate.cpp:1640        (MP_AUTH     - 0x00000001) AuthenticationThread(10)@mgmtplane(9)         DEBUG    Authenticate SSL bridge[1]: CN = Solace Leaf, isValid = 1, chain len 2
2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw                        authClientCertificate.cpp:1689        (MP_AUTH     - 0x00000001) AuthenticationThread(10)@mgmtplane(9)         DEBUG    Authenticate SSL bridge[1]: No match for common name Solace Leaf
2020-10-15T15:51:20.210+00:00 <local0.debug> ip-x.x.x.x mgmtplane: /usr/sw                        ipcMsg.cpp:1707                       (BASE_IPC     - 0x00000000) AuthenticationThread(10)@mgmtplane(9)         DEBUG    Attempt to send message len 1971 to linecard

I tried almost all instructions specified in the documentation. In fact following NOTE from official documentation is confusing to me and I am trying to crack my head on setting up the replication with SSL:

After TLS/SSL is enabled on the replication Config-Sync bridges, for authentication using SSL to succeed, the following must be also be configured:
an SSL server certificate on the remote event broker
a matching trusted CA on the local event broker
the connect port used for the replication mate must be set as SSL
When SSL is enabled for the bridge, the replication mates that you set must use SSL connect ports (see Configuring Replication Mates).

Here is the link to the documentation:
https://docs.solace.com/Configuring-and-Managing/Replication-Sys-Level-Settings.htm#SSL

My attempt 2 was based on above explanation and the note. I am not sure if I understood the documentation's point of view properly.

Could someone please help me understand what I'm doing wrong? and help doing it right way as from generating self signed CA and certificates/keys to loading them properly to both triplets?

Thanks,
Raghu

Comments

  • uherbst
    uherbst Member, Employee Posts: 123 Solace Employee
    Options

    Hi @ rdesoju,
    About your 1st attempt: You loaded a client certificate as certificate authority. That's clearly wrong.

    Let me talk about TLS basics:
    1. There is a CA - a certificate authority. This CA will sign server- and client certs. To believe in a CA, you have to load the CA's certificate. In a java app, this is done in the trust store. In a Solace broker this is done via "create certificate-authority". You can have multiple CAs (maybe because some of your communication mates use different CAs)
    2. If you want to use TLS in your broker, you need a server certificate - and you're absolutely correct: You have to cat the key and the cert in a .pem-file. Is broker A is communicating with broker B, then broker B needs an certificate-authority for the CA of broker A and vice versa. (if both brokers use certs signed from the same CA, you need to configure that CA just once on both brokers)
    3. The Solace broker is able to use it's server certificate as client-certificate.
    4. TLS is: both sides of the communication validate the certificate of the other (given that you use a cert on the client side, not just user/password)

    Just imaging : broker A starts a TLS connection to broker B (maybe to build up a bridge):
    1. Broker B sends it's server cert to broker A. Broker A wants to validate that broker-b-cert and needs the CA for cert-B to do that.
    2. Broker A sends it's client cert (and we know: That is the server cert of Broker A used here as client cert) back to Broker B. Broker B wants to validate that broker-A-cert and needs the CA for cert-A to do that.

    if the CA to validate a cert is not available, you see an error like "unable to get issuer certificate".

    If you see "not trusted common name", you have to configure the CN (common-name) from your server cert on the communication mate as trusted-common-name
    Uli.

  • rdesoju
    rdesoju Member Posts: 66
    Options

    Hi @uherbst
    Thank you for detailed and valuable information.
    I think my attempt 2 was explained in the initial post was per your suggestion on loading the certificates. I turned off "enforce-trusted-common-name" to avoid common name validation using below command.

    configure replication config-sync bridge ssl-server-certificate-validation 
    no enforce-trusted-common-name
    

    Started seeing below issue:

    020-10-16T19:11:04.319+00:00 <local4.notice> ip-x.x.x.x event: VPN: VPN_BRIDGING_LINK_REJECTED: #config-sync - Message VPN (108) #config-sync Bridge #CFGSYNC_REPLICATION_BRIDGE from  VPN #config-sync rejected: Service Unavailable
    2020-10-16T19:11:07.324+00:00 <local4.notice> ip-x.x.x.x event: VPN: VPN_BRIDGING_LINK_REJECTED: #config-sync - Message VPN (108) #config-sync Bridge #CFGSYNC_REPLICATION_BRIDGE from v:solace100 VPN #config-sync rejected: Bad Request
    

    I verified config-sync on both HA triplets A and B is running.
    Thanks,
    Raghu

  • rdesoju
    rdesoju Member Posts: 66
    Options

    Replication is working fine after redoing the steps cleanly.
    Once replication is up and running I see below from Primaries of ACTIVE and STANDBY sites:

    ip-x-x-x-x(configure/redundancy)# show message-vpn default replication detail
    Message VPN:                       default
    Admin Status:                      enabled
    Config Status:                     active
    Local Bridge:
      State:                           n/a
      Name:                            n/a
      Queue State:                     n/a
      Authentication:
        Scheme:                        Basic
        Basic:
          Client Username:             default
          Password Configured:         No
        Client Certificate:
          Certificate File:
          Using Server Certificate:    Yes
      Compressed:                      No
      SSL:                             Yes
      Message Spool:
        Window Size:                   255
      Unidirectional:
        Client Profile:                #client-profile
      Retry Delay:                     3
    Remote Bridge:
      State:                           up
      Name:                            #bridge/v:solace100/default/1
    Queue:
      State:                           bound
      Quota (MB):                      1500
      Reject Msg to Sender on Discard: Yes
    Ack Propagation:
      Interval in Messages:            20
    Sync Replication:
      Eligible:                        yes
        Duration:                      0d 0h 1m 4s
    ---Press any key to continue, or `q' to quit---
      Mate Flow Congested:             no
        Duration:                      0d 0h 0m 0s
      Reject Msg When Sync Ineligible: No
    Transaction Replication Mode:      async
    
    ip-y-y-y-y(configure/message-vpn/replication)# show message-vpn default replication detail
    
    Message VPN:                       default
    Admin Status:                      enabled
    Config Status:                     standby
    Local Bridge:
      State:                           up
      Name:                            #MSGVPN_REPLICATION_BRIDGE
      Queue State:                     bound
      Authentication:
        Scheme:                        Basic
        Basic:
          Client Username:             default
          Password Configured:         No
        Client Certificate:
          Certificate File:
          Using Server Certificate:    Yes
      Compressed:                      No
      SSL:                             Yes
      Message Spool:
        Window Size:                   255
      Unidirectional:
        Client Profile:                #client-profile
      Retry Delay:                     3
    Remote Bridge:
      State:                           n/a
      Name:                            n/a
    Queue:
      State:                           n/a
      Quota (MB):                      1500
      Reject Msg to Sender on Discard: Yes
    Ack Propagation:
      Interval in Messages:            20
    Sync Replication:
      Eligible:                        n/a
        Duration:                      n/a
    ---Press any key to continue, or `q' to quit---
      Mate Flow Congested:             n/a
        Duration:                      n/a
      Reject Msg When Sync Ineligible: No
    Transaction Replication Mode:      async
    

    Therefore, I have shutdown ACTIVE HA triplet and expected that STANDBY HA triplet would become ACTIVE automatically.
    However, I still see it is in STANDBY state:

    ip-y-y-y-y# show message-vpn default replication detail
    
    Message VPN:                       default
    Admin Status:                      enabled
    Config Status:                     standby
    Local Bridge:
      State:                           down
      Name:                            #MSGVPN_REPLICATION_BRIDGE
      Queue State:                     unbound
      Authentication:
        Scheme:                        Basic
        Basic:
          Client Username:             default
          Password Configured:         No
        Client Certificate:
          Certificate File:
          Using Server Certificate:    Yes
      Compressed:                      No
      SSL:                             Yes
      Message Spool:
        Window Size:                   255
      Unidirectional:
        Client Profile:                #client-profile
      Retry Delay:                     3
    Remote Bridge:
      State:                           n/a
      Name:                            n/a
    Queue:
      State:                           n/a
      Quota (MB):                      1500
      Reject Msg to Sender on Discard: Yes
    Ack Propagation:
      Interval in Messages:            20
    Sync Replication:
      Eligible:                        n/a
        Duration:                      n/a
    ---Press any key to continue, or `q' to quit---
      Mate Flow Congested:             n/a
        Duration:                      n/a
      Reject Msg When Sync Ineligible: No
    Transaction Replication Mode:      async
    

    Isn't it supposed to become ACTIVE once it loses connection with primary HA triplet?
    Could someone please clarify?
    Thanks,
    Raghu

  • himanshu
    himanshu Member, Employee Posts: 67 Solace Employee
    Options

    Hi @rdesoju - DR replication behaves differently. With DR replication, you have to manually switch over since typically, during a DR failover, there are several applications, infrastructure components etc that have to failover as well. Here is what our docs say:

    The fail-over of a replication site is often an action that cannot be performed at the messaging layer only—typically there are servers, critical applications, and other infrastructure that must be switched as part of the fail-over. Therefore the fail-over is a co-ordinated operation that must be performed by network administrators. It does not happen automatically.