StaleSessionException - Tried to call receive on a stopped message consumer

Options
Kaliappans
Kaliappans Member Posts: 24
edited November 2022 in PubSub+ Event Broker #1
at java.base/java.lang.Thread.run(Thread.java:834), exception_class=com.solacesystems.jcsmp.StaleSessionException, exception_message=Tried to call receive on a stopped message consumer.}","threadID":"pool-6-thread-3","sourceHost":"cardholder-ia-verify-uat2-6-nql8r","logVersion":"1.5","category":"com.solace.spring.cloud.stream.binder.inbound.InboundXMLMessageListener"}
{"timestmp":"2022-10-14_07:26:46.822+0000","logLevel":"WARN","msg":"Received error while trying to read message from endpoint QUE_APP2_SEND","exception":"{stacktrace=at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.throwClosedException(FlowHandleImpl.java:1957)
at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:899)
at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:866)
at com.solace.spring.cloud.stream.binder.util.FlowReceiverContainer.receive(FlowReceiverContainer.java:279)
at com.solace.spring.cloud.stream.binder.util.FlowReceiverContainer.receive(FlowReceiverContainer.java:211)
at com.solace.spring.cloud.stream.binder.inbound.InboundXMLMessageListener.receive(InboundXMLMessageListener.java:93)
at com.solace.spring.cloud.stream.binder.inbound.InboundXMLMessageListener.run(InboundXMLMessageListener.java:73)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)


I am using Spring Cloud Stream api with Solace. It was working all fine but when Solace fail over or if any small outage happens, app is trying to re-establish the solace connection. But since it is not happening, it gives tones of above log lines keep on printed in our log files and filling the server space. Need your immediate help to solve this issue. Below is my solace properties set in my yaml files.

 

type: solace
    environment:
      solace:
       java:
        clientUsername: username
        connectRetries: 3
        connectRetriesPerHost: 0
        reconnectRetries: 3
        host: 'tcps://domain:port'
        msgVpn: VPN-NAME1


Comments

  • Kaliappans
    Kaliappans Member Posts: 24
    Options

    Can I have an update on this issue please ?

  • giri
    giri Member, Administrator, Employee Posts: 108 admin
    Options

    Hi @Kaliappans Will get back to you on this.

  • giri
    giri Member, Administrator, Employee Posts: 108 admin
    Options

    Hi @Kaliappans

    The reconnectRetries being set to 3 is causing the issue. By default, the binder will attempt to reconnect until it connects successfully - which I believe is the behavior you want based on your message.

    Let us know if this change allows your application to reconnect when the fail-over or disconnect-reconnect occurs.

    However, the infinite loop of exceptions with a reconnectRetries set is an issue. We will report to the dev team for suggestions and see if there is a bug.

  • marc
    marc Member, Administrator, Moderator, Employee Posts: 923 admin
    edited October 2022 #5
    Options

    Hi @Kaliappans,

    As @giri mentioned you'll want to tune your reconnect retries to retry for a longer period of time to successfully reconnect and process messages after a failover. -1 will retry forever ;)

    That said, the logging issue you've seen is also not ideal and I believe it's captured to be worked here: https://github.com/SolaceProducts/solace-spring-cloud/issues/174

    (Edit) and Giri opened another issue here to capture what you ran into: https://github.com/SolaceProducts/solace-spring-cloud/issues/179

  • Kaliappans
    Kaliappans Member Posts: 24
    Options

    Thanks @giri for your response. Our expectation is to retry only for 3 times. Even though we set reconnectRetries to 3, solace is behaving in a opposite way that is it is keep on trying until it is getting connected. But, actually, the connection is not established with Solace. Because of that, our logging infrastructure getting flooded with solace connection exception traces. So the questions would be,


    1. Why was solace not able to re-establish the connection

    2. Why is Solace trying to reconnect even after the third time when the property "reconnectretries" set to 3.

  • giri
    giri Member, Administrator, Employee Posts: 108 admin
    Options

    Hi @Kaliappans - The only reason I can think of is the broker did not come up when the 3-retries were made. It might have come up later, but the application is stuck in a different thread - which is what I have escalated on the Git issue.

    I hope we get a resolution on this soon - until that time, how about setting reconnectRetries set to -1? This will ensure that the application will reconnect whenever the broker becomes available and won't enter the logging loop. Will this work?

    Feel free to chime in on the issue thread https://github.com/SolaceProducts/solace-spring-cloud/issues/179

  • marc
    marc Member, Administrator, Moderator, Employee Posts: 923 admin
    Options

    Hi @Kaliappans,

    It looks like you're using a HA pair and wait to ensure your app re-connects after a retry. Our recommendation for that is actually to retry for 5 minutes as it can sometimes take ~30 seconds to a minute for the failover to happen. This gives you some leeway. With your reconnect retries at 3 and the connect retries per host at 0 you are only trying to reconnect for 9 seconds. You can see the recommended minimum settings for that in the docs here: https://docs.solace.com/API/API-Developer-Guide/Configuring-Connection-T.htm. That said, these are actually the defaults in the Spring Cloud Stream binder so you shouldn't have to change them.


    The logging part is a bug that we will address in the issue that Giri mentioned above (#179). After the retries expire the JCSMP session is no longer trying to actually reconnect, but the binder is still trying to continuously use the resources from that session that will never come back. At this point, like is mentioned in the github issue, the health of the binder goes to DOWN and will not recover.


    Hope that clarifies things!

  • Tamimi
    Tamimi Member, Administrator, Employee Posts: 499 admin
    Options

    Hey @Kaliappans ! Following up on this and checking to see the above recommendation resolved your issue?

  • Kaliappans
    Kaliappans Member Posts: 24
    Options

    @Tamimi @marc The above configurations of retry fixed the issue and now logs are running cool. Thanks all for your valuable help.