How to prevent endless loop of timeouts when the Solace queue is not available

MickeyZ
MickeyZ Member Posts: 14 ✭✭

Our microservices were not able to connect to clients respective Solace queues. Although it seems microservices were able to connect to Solace, the queues themselves were not available, causing a huge volume of time outs.
At the moment the configuration has a set number of retries for the connection to Solace, but no provision for what should occur when a queue is unavailable. The result in this instance was a near endless loop of timeouts that caused the log files to grow to 20+ terabytes.
Although the queue unavailability is a client side issue, we need to have a contingency in place to prevent this from happening in future.
Please could you advise how can we prevent this indefinite attempts to connect to the queue that is not available.
The example of the error is bellow:

2024-12-12 WARN com.solace.spring.cloud.stream.binder.inbound.InboundXMLMessageListener [pool-9-thread-4] Received error while trying to read message from endpoint HOSTNAME.LOCATES.SECURITY.PRICE
com.solacesystems.jcsmp.StaleSessionException: Tried to call receive on a stopped message consumer.
at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.throwClosedException(FlowHandleImpl.java:2042)
at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:920)
at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:887)
at com.solace.spring.cloud.stream.binder.util.FlowReceiverContainer.receive(FlowReceiverContainer.java:326)
at com.solace.spring.cloud.stream.binder.util.FlowReceiverContainer.receive(FlowReceiverContainer.java:258)
at com.solace.spring.cloud.stream.binder.inbound.InboundXMLMessageListener.receive(InboundXMLMessageListener.java:123)
at com.solace.spring.cloud.stream.binder.inbound.InboundXMLMessageListener.run(InboundXMLMessageListener.java:96)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.solacesystems.jcsmp.JCSMPTransportException: (JCSMPTransportException) Error communicating with the router. (KeepAlive)
at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel$ClientChannelReconnect.call(TcpClientChannel.java:2493)
... 4 common frames omitted
Caused by: com.solacesystems.jcsmp.JCSMPTransportException: (Client name: client-lon-name/nnn/nnn/dddd Local port: -1 Remote addr: n.n.n.n Remote port: nnn) - Timeout happened when reading response from the router.
at com.solacesystems.jcsmp.protocol.impl.TcpChannel.executePostOnce(TcpChannel.java:243)
at com.solacesystems.jcsmp.protocol.impl.ChannelOpStrategyClient.performOpen(ChannelOpStrategyClient.java:99)
at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel.performOpenSingle(TcpClientChannel.java:426)
at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel.access$800(TcpClientChannel.java:122)
at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel$ClientChannelReconnect.call(TcpClientChannel.java:2329)
... 4 common frames omitted
Caused by: java.net.SocketTimeoutException: null
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:129)
at com.solacesystems.jcsmp.protocol.smf.SimpleSmfClient.open(SimpleSmfClient.java:905)
at com.solacesystems.jcsmp.protocol.smf.ZSmfClient.open(ZSmfClient.java:62)
at com.solacesystems.jcsmp.protocol.smf.SSLSmfClient.open(SSLSmfClient.java:90)
at com.solacesystems.jcsmp.protocol.smf.SSLSmfClient.doPostNoResponse(SSLSmfClient.java:241)
at com.solacesystems.jcsmp.protocol.smf.SimpleSmfClient.doPost(SimpleSmfClient.java:235)
at com.solacesystems.jcsmp.protocol.impl.TcpChannel.executePostOnce(TcpChannel.java:199)
... 8 common frames omitted

My config is:

 webService:
url: http://localhost:nnnn/
user: userName
password: userPassword
maxRetries: 6
retryIntervalSecs: 120
readTimeoutSecs: 120
connectTimeoutSecs: 120
poolSize: 10
server:
port: nnn
spring:
cloud:
function:
definition: myFunction
stream:
binders:
local-solace:
type: solace
environment:
solace:
java:
host: tcp://n.n.n.n:nnn
msgVpn: default
clientUsername: default
clientPassword: default
connectRetries: 3
connectRetriesPerHost: 0
reconnectRetries: 3
bindings:
#The bindings section is used to define your input and output channels.
myFunction-in-0:
destination: client.hostName.securities
#The presence of "group" tells the binder to follow the "consumer group" pattern; if not present the "publish-subscribe" pattern will be used.
group: hostName
consumer:
concurrency: 5
solace:
bindings:
myFunction-in-0:
consumer:
#Avoid the scst prefix etc
queueNameExpression: destination
autoBindErrorQueue: false
#Queue should not be created by the application
provisionDurableQueue: false
provisionErrorQueue: false
addDestinationAsSubscriptionToQueue: false

Answers

  • marc
    marc Member, Administrator, Moderator, Employee Posts: 972 admin
    edited January 2 #2

    Hi @MickeyZ,

    Do you know how to reproduce it? Do we need to delete the queues after the apps are running or something like that?

    Also, what version of solace-spring-cloud are you using?

  • MickeyZ
    MickeyZ Member Posts: 14 ✭✭
    edited January 2 #3

    Hi Marc,

    Thanks for your answer.

    The issue is that client queues were not available and even we connected to thier Solace PubSub+ broker, our application were trying to connect over long period of time and each time was logging that unsucceful attempt.
    So eventually it got to 20TB of logging.
    I hope that there is a way how to prevent it using configurarion. That is why I posted my config file. Or is there some other way how to prevent this indefinite number of re-tries?

  • marc
    marc Member, Administrator, Moderator, Employee Posts: 972 admin

    Thanks @MickeyZ , do you have the solace-spring-cloud version?

  • MickeyZ
    MickeyZ Member Posts: 14 ✭✭
    edited January 2 #5

    Hi Marc - we are now using com.solace.spring.cloud:spring-cloud-starter-stream-solace:4.2.0
    That is the only version that is working with client's PubSub+ Broker. We tried using version 5.n and 3.n without the success.
    The pubsub+ broker is 9.2.1.17
    That is on the client site and we can not upgrade it even though it is EOL

    However, the error message that I posted previously was produced by client version 3.4.0 running Java 8 and SpringBoot 2.
    As we moved to Java 17 and Spring Boot 3, the only Solace client verison that works is 4.2.0 (with PubSub+ broker 9.2.1.17)
    Example of the error message using non-existing queue:

    2025-01-02 16:39:22,578 WARN  [main] c.s.s.c.s.b.p.SolaceQueueProvisioner: Failed to connect test consumer flow to queue non-existing.queue.name. Provisioning is disabled, queue was not provisioned nor was its configuration validated.
    com.solacesystems.jcsmp.JCSMPErrorResponseException: 503: Unknown Queue
    at com.solacesystems.jcsmp.impl.flow.BindRequestTask.execute(BindRequestTask.java:213)
    at com.solacesystems.jcsmp.impl.flow.SubFlowManagerImpl.handleAssuredCtrlMessage(SubFlowManagerImpl.java:578)
    at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel.handleAssuredCtrlMsg(TcpClientChannel.java:1840)
    at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel.handleMessage(TcpClientChannel.java:1805)
    at com.solacesystems.jcsmp.protocol.nio.impl.SubscriberMessageReader.processRead(SubscriberMessageReader.java:98)
    at com.solacesystems.jcsmp.protocol.nio.impl.SubscriberMessageReader.read(SubscriberMessageReader.java:140)
    at com.solacesystems.jcsmp.protocol.smf.SimpleSmfClient.read(SimpleSmfClient.java:1206)
    at com.solacesystems.jcsmp.protocol.nio.impl.SyncEventDispatcherReactor.processReactorChannels(SyncEventDispatcherReactor.java:206)
    at com.solacesystems.jcsmp.protocol.nio.impl.SyncEventDispatcherReactor.eventLoop(SyncEventDispatcherReactor.java:157)
    at com.solacesystems.jcsmp.protocol.nio.impl.SyncEventDispatcherReactor$SEDReactorThread.run(SyncEventDispatcherReactor.java:338)
    at java.base/java.lang.Thread.run(Thread.java:840)
    2025-01-02 16:39:22,578 ERROR [main] o.s.c.s.b.BindingService: Failed to create consumer binding; retrying in 30 seconds
    org.springframework.cloud.stream.provisioning.ProvisioningException: Failed to connect test consumer flow to queue non-existing.queue.name. Provisioning is disabled, queue was not provisioned nor was its configuration validated.
    at com.solace.spring.cloud.stream.binder.provisioning.SolaceQueueProvisioner.provisionQueue(SolaceQueueProvisioner.java:195)
    at com.solace.spring.cloud.stream.binder.provisioning.SolaceQueueProvisioner.provisionQueue(SolaceQueueProvisioner.java:150)
    at com.solace.spring.cloud.stream.binder.provisioning.SolaceQueueProvisioner.provisionConsumerDestination(SolaceQueueProvisioner.java:127)
    at com.solace.spring.cloud.stream.binder.provisioning.SolaceQueueProvisioner.provisionConsumerDestination(SolaceQueueProvisioner.java:33)
    at org.springframework.cloud.stream.binder.AbstractMessageChannelBinder.doBindConsumer(AbstractMessageChannelBinder.java:512)
    at org.springframework.cloud.stream.binder.AbstractMessageChannelBinder.doBindConsumer(AbstractMessageChannelBinder.java:102)
    at org.springframework.cloud.stream.binder.AbstractBinder.bindConsumer(AbstractBinder.java:144)
    at org.springframework.cloud.stream.binding.BindingService.doBindConsumer(BindingService.java:186)
    at org.springframework.cloud.stream.binding.BindingService.bindConsumer(BindingService.java:139)
    at org.springframework.cloud.stream.binding.AbstractBindableProxyFactory.createAndBindInputs(AbstractBindableProxyFactory.java:98)
    at org.springframework.cloud.stream.binding.InputBindingLifecycle.doStartWithBindable(InputBindingLifecycle.java:58)
    at java.base/java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:647)
    at org.springframework.cloud.stream.binding.AbstractBindingLifecycle.start(AbstractBindingLifecycle.java:57)
    at org.springframework.cloud.stream.binding.InputBindingLifecycle.start(InputBindingLifecycle.java:34)
    at org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:285)
    at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:472)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:257)
    at org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:202)
    at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:990)
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:628)
    at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:146)
    at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:754)
    at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:456)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:335)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1363)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1352)
    at com.tradingapps.microservices.eventbridge.Application.main(Application.java:16)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:569)
    at org.springframework.boot.loader.launch.Launcher.launch(Launcher.java:102)
    at org.springframework.boot.loader.launch.Launcher.launch(Launcher.java:64)
    at org.springframework.boot.loader.launch.PropertiesLauncher.main(PropertiesLauncher.java:580)
    Caused by: com.solacesystems.jcsmp.JCSMPErrorResponseException: 503: Unknown Queue
    at com.solacesystems.jcsmp.impl.flow.BindRequestTask.execute(BindRequestTask.java:213)
    at com.solacesystems.jcsmp.impl.flow.SubFlowManagerImpl.handleAssuredCtrlMessage(SubFlowManagerImpl.java:578)
    at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel.handleAssuredCtrlMsg(TcpClientChannel.java:1840)
    at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel.handleMessage(TcpClientChannel.java:1805)
    at com.solacesystems.jcsmp.protocol.nio.impl.SubscriberMessageReader.processRead(SubscriberMessageReader.java:98)
    at com.solacesystems.jcsmp.protocol.nio.impl.SubscriberMessageReader.read(SubscriberMessageReader.java:140)
    at com.solacesystems.jcsmp.protocol.smf.SimpleSmfClient.read(SimpleSmfClient.java:1206)
    at com.solacesystems.jcsmp.protocol.nio.impl.SyncEventDispatcherReactor.processReactorChannels(SyncEventDispatcherReactor.java:206)
    at com.solacesystems.jcsmp.protocol.nio.impl.SyncEventDispatcherReactor.eventLoop(SyncEventDispatcherReactor.java:157)
    at com.solacesystems.jcsmp.protocol.nio.impl.SyncEventDispatcherReactor$SEDReactorThread.run(SyncEventDispatcherReactor.java:338)
    at java.base/java.lang.Thread.run(Thread.java:840)

    That logging pattern repeats over and over untill there are resources available

  • MickeyZ
    MickeyZ Member Posts: 14 ✭✭

    One way to replicate the issue is to put not existing queue name in config for subscriber:
    destination: non-existing.queue.name


    The subscriber would be trying to connect to the non-existing queue untill the resources are available

  • marc
    marc Member, Administrator, Moderator, Employee Posts: 972 admin

    Thanks @MickeyZ, I'm hoping to get the team to take a look. Note that if it is a bug they would likely fix it in the latest version of the SCSt. binder.

    If your main concern is the amount of logs you could also disable WARN logs for that class, but just note that would also suppress other WARN log entries as well. I'd suggest looking through the class and/or your logs to see if you need the other log entries.

    logging.level.com.solace.spring.cloud.stream.binder.inbound.InboundXMLMessageListener=(ERROR|FATAL|OFF)
    

    More importantly: this is obviously an issue of too many logs, but I imagine you're already solving the issue of why this is occurring? Make sure to nail down your queue provisioning process in production. That sounds like a concerning issue if you have apps trying to connect to queues that don't exist!

  • Aaron
    Aaron Member, Administrator, Moderator, Employee Posts: 664 admin
    edited January 3 #8

    Hi @MickeyZ , I hear you about the excessive amount of logging, due to the full stacktrace being logged each time. That seems unnecessary. However, I'm pretty sure that fix would not be back-ported to 4.2.0. So: even if fixed, it would probably be in the latest release, and it appears you can't use this due to your client's EOL broker version.

    Why can't the broker be updated?

    Another question: why are your microservices deployed for so long if the queues they need are not configured yet? Surely it might have been easier to create the queues, but leave the queues' ingress disabled to prevent message reception; but queue egress enabled, which would allow the clients to bind. Or just deploy the microservices when ready?

    I'd also recommend checking your log rotation settings for your clients… either in log4j or whatever framework you're using, and set the max log file size and max number retained to something reasonable. I.e. set the max value for the rollover strategy: https://logging.apache.org/log4j/2.x/manual/appenders/rolling-file.html#DefaultRolloverStrategy That seems like a config escape (and monitoring failure) if log files were allowed to grow to 20TB..! 🤯

    If you want to take a crack at fixing the binder yourself, you can clone the repo here: https://github.com/SolaceProducts/solace-spring-cloud , checkout the stage-4.0.0 branch, and update the following lines of code by removing the ",e" from the log line, which will eliminate the full stacktraces for the two exceptions you noted above:

    https://github.com/SolaceProducts/solace-spring-cloud/blob/stage-4.0.0/solace-spring-cloud-stream-binder/solace-spring-cloud-stream-binder-core/src/main/java/com/solace/spring/cloud/stream/binder/inbound/InboundXMLMessageListener.java#L131-L135

    https://github.com/SolaceProducts/solace-spring-cloud/blob/stage-4.0.0/solace-spring-cloud-stream-binder/solace-spring-cloud-stream-binder-core/src/main/java/com/solace/spring/cloud/stream/binder/provisioning/SolaceQueueProvisioner.java#L194

    Hope that helps.

  • mpatel
    mpatel Member, Employee Posts: 14 Solace Employee
    edited January 3 #9

    @MickeyZ The binding services tries to reconnect the consumer every 30 seconds.

    2025-01-02 16:39:22,578 ERROR [main] o.s.c.s.b.BindingService: Failed to create consumer binding; retrying in 30 seconds
    

    How is it causing terabytes of logs ? Just trying to understand.

    Do you have lots of queues or lot of instances of the same application ?

  • MickeyZ
    MickeyZ Member Posts: 14 ✭✭

    @marc @Aaron @mpatel Thank you for your replies.
    FYI we got a new client that is not yet in the production - although we hope to be very soon. We need to iron out couple of outstanding issues, including the one I mentioned here on the thread.

    @marc - we need logging. Regarding the issues with the queues on the client side, they do exist and we were using them for couple of months, but they were not available at that weekend. We did not get explanation why that was the case.
    In order to reproduce the issue I gave the example of non existing queue. That has the same effect as not existing queues. The subscirber is trying to connect untill there are resources available. As @Aaron said due to the full stacktrace being logged each time, the ammount of logging adds up.
    I wonder if there is a flag that could be set up in configuration to limit the subsciber's attempts to connect to non-avilable queues?

    @Aaron - we did notify the clients that PubSub+ Broker is EOL and that would be adventageous for them to upgrade the broker, but that is all we can do.

    We do have log rotation, but we are required to keep the logs (no overwritting for 60 days), so they added up to 20 TB for that case when client's queues were not available.

    @mpatel - due to the full stacktrace being logged each time, the MB become TB over the weekend :-)
    The configuration file is at the begging of the thread. Please let me know if I need to change anything.

    I am adding extra logging in the case that might be usefull to you:

    2025-01-02 16:39:22,431 INFO  [main] c.s.j.p.i.TcpClientChannel: Client-1: Connecting to host 'orig=tcp://n.n.n.n:nnn, scheme=tcp://, host=n.n.n.n, port=nnn' (host 1 of 1, smfclient 1, attempt 1 of 3, this_host_attempt: 1 of 1)
    2025-01-02 16:39:22,467 INFO [main] c.s.j.i.JCSMPBasicSession: Client-1: AdCtrl version changes: 2->4
    2025-01-02 16:39:22,467 INFO [main] c.s.j.p.i.TcpClientChannel: Client-1: Connected to host 'orig=tcp://n.n.n.n:nnn, scheme=tcp://, host=n.n.n.n, port=nnn' (smfclient 1)
    2025-01-02 16:39:22,546 INFO [main] o.s.c.s.b.DefaultBinderFactory: Caching the binder: local-solace
    2025-01-02 16:39:22,546 INFO [main] c.s.s.c.s.b.p.SolaceQueueProvisioner: Creating durable queue clientName.hostName_test.securities for consumer group hostName
    2025-01-02 16:39:22,546 INFO [main] c.s.s.c.s.b.p.SolaceQueueProvisioner: Durable queue provisioning is disabled, clientName.hostName_test.securities will not be provisioned nor will its configuration be validated
    2025-01-02 16:39:22,546 INFO [main] c.s.s.c.s.b.p.SolaceQueueProvisioner: Testing consumer flow connection to queue clientName.hostName_test.securities (will not start it)
    2025-01-02 16:39:22,578 INFO [Context_2_ReactorThread] c.s.j.i.f.BindRequestTask: Client-1: Got BIND ('clientName.hostName_test.securities') Error Response (503) - Unknown Queue
    2025-01-02 16:39:22,578 WARN [main] c.s.s.c.s.b.p.SolaceQueueProvisioner: Failed to connect test consumer flow to queue clientName.hostName_test.securities. Provisioning is disabled, queue was not provisioned nor was its configuration validated.
    com.solacesystems.jcsmp.JCSMPErrorResponseException: 503: Unknown Queue
    at com.solacesystems.jcsmp.impl.flow.BindRequestTask.execute(BindRequestTask.java:213)
    at com.solacesystems.jcsmp.impl.flow.SubFlowManagerImpl.handleAssuredCtrlMessage(SubFlowManagerImpl.java:578)
    ...
    2025-01-02 16:39:22,578 ERROR [main] o.s.c.s.b.BindingService: Failed to create consumer binding; retrying in 30 seconds
    org.springframework.cloud.stream.provisioning.ProvisioningException: Failed to connect test consumer flow to queue clientName.hostName_test.securities. Provisioning is disabled, queue was not provisioned nor was its configuration validated.
    at com.solace.spring.cloud.stream.binder.provisioning.SolaceQueueProvisioner.provisionQueue(SolaceQueueProvisioner.java:195)
    at com.solace.spring.cloud.stream.binder.provisioning.SolaceQueueProvisioner.provisionQueue(SolaceQueueProvisioner.java:150)
    ...
    Caused by: com.solacesystems.jcsmp.JCSMPErrorResponseException: 503: Unknown Queue
    at com.solacesystems.jcsmp.impl.flow.BindRequestTask.execute(BindRequestTask.java:213)
    at com.solacesystems.jcsmp.impl.flow.SubFlowManagerImpl.handleAssuredCtrlMessage(SubFlowManagerImpl.java:578)
    ...
    2025-01-02 16:39:22,587 INFO [main] c.t.m.e.Application: Started Application in 5.156 seconds (process running for 5.73)
    2025-01-02 16:39:52,589 INFO [scheduling-1] c.s.s.c.s.b.p.SolaceQueueProvisioner: Creating durable queue clientName.hostName_test.securities for consumer group hostName
    2025-01-02 16:39:52,589 INFO [scheduling-1] c.s.s.c.s.b.p.SolaceQueueProvisioner: Durable queue provisioning is disabled, clientName.hostName_test.securities will not be provisioned nor will its configuration be validated
    2025-01-02 16:39:52,589 INFO [scheduling-1] c.s.s.c.s.b.p.SolaceQueueProvisioner: Testing consumer flow connection to queue clientName.hostName_test.securities (will not start it)
    2025-01-02 16:39:52,600 INFO [Context_2_ReactorThread] c.s.j.i.f.BindRequestTask: Client-1: Got BIND ('clientName.hostName_test.securities') Error Response (503) - Unknown Queue
    2025-01-02 16:39:52,608 WARN [scheduling-1] c.s.s.c.s.b.p.SolaceQueueProvisioner: Failed to connect test consumer flow to queue clientName.hostName_test.securities. Provisioning is disabled, queue was not provisioned nor was its configuration validated.
    com.solacesystems.jcsmp.JCSMPErrorResponseException: 503: Unknown Queue
    at com.solacesystems.jcsmp.impl.flow.BindRequestTask.execute(BindRequestTask.java:213)
    ...
    2025-01-02 16:39:52,608 ERROR [scheduling-1] o.s.c.s.b.BindingService: Failed to create consumer binding; retrying in 30 seconds
    org.springframework.cloud.stream.provisioning.ProvisioningException: Failed to connect test consumer flow to queue clientName.hostName_test.securities. Provisioning is disabled, queue was not provisioned nor was its configuration validated.
    at com.solace.spring.cloud.stream.binder.provisioning.SolaceQueueProvisioner.provisionQueue(SolaceQueueProvisioner.java:195)
    at com.solace.spring.cloud.stream.binder.provisioning.SolaceQueueProvisioner.provisionQueue(SolaceQueueProvisioner.java:150)
    ...
    Caused by: com.solacesystems.jcsmp.JCSMPErrorResponseException: 503: Unknown Queue
    at com.solacesystems.jcsmp.impl.flow.BindRequestTask.execute(BindRequestTask.java:213)
    ...
    2025-01-02 16:40:22,622 INFO [scheduling-1] c.s.s.c.s.b.p.SolaceQueueProvisioner: Creating durable queue clientName.hostName_test.securities for consumer group hostName
    2025-01-02 16:40:22,622 INFO [scheduling-1] c.s.s.c.s.b.p.SolaceQueueProvisioner: Durable queue provisioning is disabled, clientName.hostName_test.securities will not be provisioned nor will its configuration be validated
    ...


  • MickeyZ
    MickeyZ Member Posts: 14 ✭✭

    @mpatel, @Aaron, @marc I wonder if you have any other suggestion how to limit the subscriber's attempts to connect to non-available queues?

  • mpatel
    mpatel Member, Employee Posts: 14 Solace Employee

    @MickeyZ,

    Here is my analysis of the exception traces you've shared:

    Referring to log statements...

    2024-12-12 WARN com.solace.spring.cloud.stream.binder.inbound.InboundXMLMessageListener [pool-9-thread-4] Received error while trying to read message from endpoint HOSTNAME.LOCATES.SECURITY.PRICE
    .....
    .....
    Caused by: com.solacesystems.jcsmp.JCSMPTransportException: (Client name: client-lon-name/nnn/nnn/dddd   Local port: -1   Remote addr: n.n.n.n  Remote port: nnn) - Timeout happened when reading response from the router.
    .....
    .....
    

    Here are some important observations:

    1. The root cause for JCSMPTransportException is connectivity issue, rather than a missing queue. The Solace broker is not reachable/resolvable from the application host for some reason.
    2. The log message indicating Received error while trying to read message from endpoint ..... suggests that the consumer had initially established a successful connection, which would only be possible if the queue existed.
    3. If the Queue is missing during startup of application, with older version (v3.4.0) you would see the same exception as you are seeing with the newer version (v4.2.0) i.e. o.s.c.s.b.BindingService: Failed to create consumer binding; retrying in 30 seconds

    I noticed your configuration includes:

    stream:
          binders:
            local-solace:
              type: solace
              environment:
                solace:
                  java:
                    host: tcp://n.n.n.n:nnn
                    msgVpn: default 
                    clientUsername: default 
                    clientPassword: default 
                    connectRetries: 3
                    connectRetriesPerHost: 0
                    reconnectRetries: 3
    

    could you help me understand:

    • Whether this exact configuration was in use?
    • If there's a possibility that reconnectRetries might be set to -1?
    • If you have any monitoring systems that might be automatically restarting the application?
    • The frequency of these exception traces in your logs? can you share how frequently JCSMPTransportException was logged..

    The reconnectRetries=3, means the reconnection attempts should have been stopped after 3 attempts, and application should have required restart, but which is not the case if the log files were flooded.

    A helpful tip: You can adjust the timing between reconnection attempts using the solace.java.reconnectRetryWaitInMillis property (default is 3 seconds). This should help minimizing the logging at the cost of delayed reconnect attempts.

    Regarding the second trace you shared:
    The error com.solacesystems.jcsmp.JCSMPErrorResponseException: 503: Unknown Queue does indicate a missing queue scenario.

    To help manage this, you might consider increasing the consumer binding retry time using:
    spring.cloud.stream.bindingRetryInterval=300 (this extends the retry window to 5 minutes, default is 30 seconds). It is a standard spring properties and applies to all consumer bindings.

    To summarize the key points:

    1. The log flooding appears to be related to connectivity issues rather than a missing queue
    2. There seems to be a discrepancy between the configuration and the observed behavior
    3. I'd recommend reviewing these configuration properties to ensure they align with your needs:
    reconnectRetryWaitInMillis:
    connectRetries: 
    reconnectRetries: 
    connectRetriesPerHost: 
    

  • mpatel
    mpatel Member, Employee Posts: 14 Solace Employee

    @MickeyZ reply directly to me with below details (as he is not able to reply or post further comments he said).

    After further consultation with colleagues who initially reported the issue we agree that the issue was not the wrong name of the queue, as we are using that name for couple of months already.

    We did never get the full explanation from the client of the reason why did happen - they just mentioned a glitch on their side.

    So even though the connection to broker (on the client side) was established, the queues were not available. Would it be possible that queues were full?

    You are absolutely correct in your conclusion that:

    "The root cause for JCSMPTransportException is connectivity issue, rather than a missing queue" and

    "that the consumer had initially established a successful connection, which would only be possible if the queue existed.".

    Regarding the second trace (I used non-existing queue names to demonstrate that application is trying to connect infinite number of times, beside the config settings that should limit the number of attempts):

    The error "com.solacesystems.jcsmp.JCSMPErrorResponseException: 503: Unknown Queue" does indicate a missing queue scenario.

    That message is clear, however even though I have used: "reconnectRetries: 3", the application did not stop - it was just trying to connect infinite number of times.

    That is similar to the previous case when the queues were not available (possibly full).

    In answer to your questions:

    Whether this exact configuration was in use?

    => correct - that was the configuration used

    If there's a possibility that reconnectRetries might be set to -1?

    => No, the configuration would never have been set to -1 at any point.

    If you have any monitoring systems that might be automatically restarting the application?

    => No, that was not the case

    The frequency of these exception traces in your logs? can you share how frequently JCSMPTransportException was logged..

    => The frequency of the exceptions was hundreds / thousands of times a second.

  • mpatel
    mpatel Member, Employee Posts: 14 Solace Employee

    @MickeyZ Thanks for confirmation about the cause of the 1st issue.

    The "reconnectRetries: 3" applies only in case of the connection issues.

    When connection is successful, and queue is not found, the Spring Cloud Stream BindingService retries creating consumer infinitely. It's not Solace specific issue.

    Also, thanks for including response to my other questions.

    The JCSMPTransportException thousands of times a second is definitely weird. The reconnection attempt should be respecting reconnectRetryWaitInMillis between reconnection attempts. Without more detailed logs it's very difficult to tell what exactly could be cause.

    With newer version, you can probably simulate the connection loss issue in your test environment and see what to expect for robustness if it ever happens in production system. I said test with new version as you are already migrating to newer version, and it includes bug fixes on top of older version.