Solace fail-over test - affected consuming of event
Hi,
While doing Solace fail-over test, once the test was initiated, the SolaceHealthIndicator Status change in following chronological order : DOWN -> RECONNECTING -> UP
We didn't observed and pod restart.
However, the consumer was in stale state, where no events were being consumed.
Note that the event consumer health was up .
Debug logs didnt help much. what might be happening. all event-services required manual re-deployment to overcome this and start consuming incoming events.
Logs
{"timestamp":"2024-10-02 09:20:045.419","level":"DEBUG","thread":"Context_2_ConsumerDispatcher","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is DOWN"},"context":"event-service-1"}
{"timestamp":"2024-10-02 09:20:045.318","level":"DEBUG","thread":"Context_2_Thread_reconnect_service","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is RECONNECTING"},"context":"event-service-1"}
{"timestamp":"2024-10-02 07:20:33.545","level":"DEBUG","thread":"Context_2_Thread_reconnect_notify_service","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is UP"}
Technologies / protocal used:
Java. Spring Webflux, Spring Integration, Spring-Cloud-Stream, JCSMP, tcp
We are publishing to a TopicEndpoint/topic and consuming from Queue/topic
We use below configuration:
consumer:
queueAdditionalSubscriptions: "SOME_TOPIC_SUBSCRIPTION"
errorQueueNameExpression: "SOME_ERROR_QUEUE"
queueNameExpression: "SOME_INPUT_QUEUE_NAME'"
queueAccessType: 0
queuePermission: 4
errorQueueAccessType: 0
errorQueuePermission: 4
errorQueueRespectsMsgTtl: true
errorQueueMaxDeliveryAttempts: 1
errorQueueMaxMsgRedelivery: 1
errorMsgDmqEligible: true
errorMsgTtl: 1000
errorQueueDiscardBehaviour: 1
provisionDurableQueue: false
addDestinationAsSubscriptionToQueue: false
autoBindErrorQueue: false
provisionErrorQueue: false
selector: SOME_MESSAGE_SELECTOR
Ref:
https://docs.solace.com/API-Developer-Online-Ref-Documentation/java/com/solacesystems/jcsmp/JCSMPChannelProperties.html#setReadTimeoutInMillis(int)
https://docs.solace.com/API/API-Developer-Guide/Configuring-Connection-T.htm
https://docs.solace.com/API/API-Developer-Guide/Java-API-Best-Practices.htm
Answers
-
reffered : https://solace.community/discussion/3478/flowid-not-matching-with-incoming-message-and-silent-ignoring
enabling DEBUG logs
{"timestamp":"2024-10-05 11:18:14.798","level":"DEBUG","thread":"main","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is UP"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.831","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.RequestResponseTask","message":{"input":"RequestResponseTask ([BRT resource=SOME_QUEUE existingFH=null flowType=CONSUMER counter=0]) startTimer "},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.850","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.protocol.impl.SmfUhUtil","message":{"input":"Ignored one or more unknown parameters in SMF header."},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.850","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.flow.SubFlowManagerImpl","message":{"input":"Client-1: Received AdCtrlMes, corrtag: 3"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.850","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.flow.BindRequestTask","message":{"input":"Client-1: Executing response handler."},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.850","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.RequestResponseTask","message":{"input":"RequestResponseTask ([BRT resource=SOME_QUEUE existingFH=null flowType=CONSUMER counter=0]) cancelTimer "},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.855","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.timers.SubAckTimedTask","message":{"input":"Creating SubAckTimedTask."},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.855","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.flow.FlowHandleImpl","message":{"input":"Flow 0: Clear AD timer"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.855","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.flow.BindRequestTask","message":{"input":"Client-1: Got OK bindresponse, flowName=null, flowId=499, QueueAccessType=NON_EXCLUSIVE, explicitlyActive=false"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.855","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.flow.SubFlowManagerImpl","message":{"input":"Client-1: regActiveFlow id=499"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.855","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.flow.SubFlowManagerImpl","message":{"input":"Client-1: addManagedFlow Flowid=499"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.856","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.flow.FlowHandleImpl","message":{"input":"Client-1:SubFlow-499: Flow 499: Clear AD timer"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.856","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.flow.FlowHandleImpl","message":{"input":"Client-1:SubFlow-499: Flow 499: window size 0 due to startState=STOPPED"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.856","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.flow.FlowHandleImpl","message":{"input":"Client-1:SubFlow-499: Flow 499: getWindowSize()=0"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.856","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.flow.FlowSmfUtil","message":{"input":"tpCreateAck: flow_d: 499, TpMsgId: 0, windowSz: 0"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.856","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.flow.FlowHandleImpl","message":{"input":"Client-1:SubFlow-499: tpCreateAck: flow_d: 499, tt_lastInOrderTpMsg: 0, tt_windowSz: 0"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.856","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.queues.UnackedMessageList2","message":{"input":"UNACKLIST-ack>>> reason=flow-closing flow=499"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.857","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.protocol.impl.TcpClientChannel","message":{"input":"Client-1: sendUnbindRequest (smfclient 1) sendAdCtrlRequest Response 0 for flowId 499"},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:18:14.857","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.RequestResponseTask","message":{"input":"RequestResponseTask ([UBRT resource=SOME_QUEUE flowId=499 counter=0]) startTimer "},"context":"SpringBootApp"}
{"timestamp":"2024-10-05 11:32:12.469","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.protocol.impl.TcpChannel","message":{"input":"Scheduling KA timer in 3000ms"},"context":"SpringBootApp"}0 -
@imdadareeph, can you please confirm the Spring Boot, Solace Binder and JCSMP version ?
Thanks for sharing the consumer properties, can you also share the solace binder reconnection settings ? from
solace.java.reconnect-retries
etc.If I understand correctly, after the Solace Broker failover you were expecting the Solace binder to reconnect and consumer to continue consuming the events where it was stalled and you had to manually restart the application ??
Looking at the logs shared in your first message, your interpretation of the events is actually opposite to what actually happened.
Actual sequence of event is: UP (at 07:20:33.545) → RECONNECTING (09:20:045.318) → DOWN (09:20:045.419)
{"timestamp":"2024-10-02 09:20:045.419","level":"DEBUG","thread":"Context_2_ConsumerDispatcher","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is DOWN"},"context":"event-service-1"} {"timestamp":"2024-10-02 09:20:045.318","level":"DEBUG","thread":"Context_2_Thread_reconnect_service","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is RECONNECTING"},"context":"event-service-1"} {"timestamp":"2024-10-02 07:20:33.545","level":"DEBUG","thread":"Context_2_Thread_reconnect_notify_service","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is UP"}
I suspect your
solace.java.reconnect-retries
values are configured low.If you haven't already tried with the
solace.java.reconnect-retries=-1
please try it.0