Solace fail-over test - affected consuming of event

imdadareeph
imdadareeph Member Posts: 3
edited October 2024 in General Discussions #1

Hi,

While doing Solace fail-over test, once the test was initiated, the SolaceHealthIndicator Status change in following chronological order : DOWN -> RECONNECTING -> UP

We didn't observed and pod restart.

However, the consumer was in stale state, where no events were being consumed.

Note that the event consumer health was up .

Debug logs didnt help much. what might be happening. all event-services required manual re-deployment to overcome this and start consuming incoming events.


Logs
{"timestamp":"2024-10-02 09:20:045.419","level":"DEBUG","thread":"Context_2_ConsumerDispatcher","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is DOWN"},"context":"event-service-1"}
{"timestamp":"2024-10-02 09:20:045.318","level":"DEBUG","thread":"Context_2_Thread_reconnect_service","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is RECONNECTING"},"context":"event-service-1"}
{"timestamp":"2024-10-02 07:20:33.545","level":"DEBUG","thread":"Context_2_Thread_reconnect_notify_service","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is UP"}


Technologies / protocal used:
Java. Spring Webflux, Spring Integration, Spring-Cloud-Stream, JCSMP, tcp

We are publishing to a TopicEndpoint/topic and consuming from Queue/topic
We use below configuration:
consumer:
queueAdditionalSubscriptions: "SOME_TOPIC_SUBSCRIPTION"
errorQueueNameExpression: "SOME_ERROR_QUEUE"
queueNameExpression: "SOME_INPUT_QUEUE_NAME'"
queueAccessType: 0
queuePermission: 4
errorQueueAccessType: 0
errorQueuePermission: 4
errorQueueRespectsMsgTtl: true
errorQueueMaxDeliveryAttempts: 1
errorQueueMaxMsgRedelivery: 1
errorMsgDmqEligible: true
errorMsgTtl: 1000
errorQueueDiscardBehaviour: 1
provisionDurableQueue: false
addDestinationAsSubscriptionToQueue: false
autoBindErrorQueue: false
provisionErrorQueue: false
selector: SOME_MESSAGE_SELECTOR


Ref:
https://docs.solace.com/API-Developer-Online-Ref-Documentation/java/com/solacesystems/jcsmp/JCSMPChannelProperties.html#setReadTimeoutInMillis(int)
https://docs.solace.com/API/API-Developer-Guide/Configuring-Connection-T.htm

https://docs.solace.com/API/API-Developer-Guide/Java-API-Best-Practices.htm

Answers

  • imdadareeph
    imdadareeph Member Posts: 3
    edited October 2024 #2

    reffered : https://solace.community/discussion/3478/flowid-not-matching-with-incoming-message-and-silent-ignoring

    enabling DEBUG logs

    {"timestamp":"2024-10-05 11:18:14.798","level":"DEBUG","thread":"main","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is UP"},"context":"SpringBootApp"}


    {"timestamp":"2024-10-05 11:18:14.831","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.RequestResponseTask","message":{"input":"RequestResponseTask ([BRT resource=SOME_QUEUE existingFH=null flowType=CONSUMER counter=0]) startTimer "},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.850","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.protocol.impl.SmfUhUtil","message":{"input":"Ignored one or more unknown parameters in SMF header."},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.850","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.flow.SubFlowManagerImpl","message":{"input":"Client-1: Received AdCtrlMes, corrtag: 3"},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.850","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.flow.BindRequestTask","message":{"input":"Client-1: Executing response handler."},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.850","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.RequestResponseTask","message":{"input":"RequestResponseTask ([BRT resource=SOME_QUEUE existingFH=null flowType=CONSUMER counter=0]) cancelTimer "},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.855","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.timers.SubAckTimedTask","message":{"input":"Creating SubAckTimedTask."},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.855","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.flow.FlowHandleImpl","message":{"input":"Flow 0: Clear AD timer"},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.855","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.flow.BindRequestTask","message":{"input":"Client-1: Got OK bindresponse, flowName=null, flowId=499, QueueAccessType=NON_EXCLUSIVE, explicitlyActive=false"},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.855","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.flow.SubFlowManagerImpl","message":{"input":"Client-1: regActiveFlow id=499"},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.855","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.impl.flow.SubFlowManagerImpl","message":{"input":"Client-1: addManagedFlow Flowid=499"},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.856","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.flow.FlowHandleImpl","message":{"input":"Client-1:SubFlow-499: Flow 499: Clear AD timer"},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.856","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.flow.FlowHandleImpl","message":{"input":"Client-1:SubFlow-499: Flow 499: window size 0 due to startState=STOPPED"},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.856","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.flow.FlowHandleImpl","message":{"input":"Client-1:SubFlow-499: Flow 499: getWindowSize()=0"},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.856","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.flow.FlowSmfUtil","message":{"input":"tpCreateAck: flow_d: 499, TpMsgId: 0, windowSz: 0"},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.856","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.flow.FlowHandleImpl","message":{"input":"Client-1:SubFlow-499: tpCreateAck: flow_d: 499, tt_lastInOrderTpMsg: 0, tt_windowSz: 0"},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.856","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.queues.UnackedMessageList2","message":{"input":"UNACKLIST-ack>>> reason=flow-closing flow=499"},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.857","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.protocol.impl.TcpClientChannel","message":{"input":"Client-1: sendUnbindRequest (smfclient 1) sendAdCtrlRequest Response 0 for flowId 499"},"context":"SpringBootApp"}
    {"timestamp":"2024-10-05 11:18:14.857","level":"DEBUG","thread":"main","logger":"com.solacesystems.jcsmp.impl.RequestResponseTask","message":{"input":"RequestResponseTask ([UBRT resource=SOME_QUEUE flowId=499 counter=0]) startTimer "},"context":"SpringBootApp"}


    {"timestamp":"2024-10-05 11:32:12.469","level":"DEBUG","thread":"Context_2_ReactorThread","logger":"com.solacesystems.jcsmp.protocol.impl.TcpChannel","message":{"input":"Scheduling KA timer in 3000ms"},"context":"SpringBootApp"}

  • mpatel
    mpatel Member, Employee Posts: 14 Solace Employee

    @imdadareeph, can you please confirm the Spring Boot, Solace Binder and JCSMP version ?

    Thanks for sharing the consumer properties, can you also share the solace binder reconnection settings ? from solace.java.reconnect-retries etc.

    If I understand correctly, after the Solace Broker failover you were expecting the Solace binder to reconnect and consumer to continue consuming the events where it was stalled and you had to manually restart the application ??

    Looking at the logs shared in your first message, your interpretation of the events is actually opposite to what actually happened.

    Actual sequence of event is: UP (at 07:20:33.545) → RECONNECTING (09:20:045.318) → DOWN (09:20:045.419)

    {"timestamp":"2024-10-02 09:20:045.419","level":"DEBUG","thread":"Context_2_ConsumerDispatcher","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is DOWN"},"context":"event-service-1"}
    {"timestamp":"2024-10-02 09:20:045.318","level":"DEBUG","thread":"Context_2_Thread_reconnect_service","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is RECONNECTING"},"context":"event-service-1"}
    {"timestamp":"2024-10-02 07:20:33.545","level":"DEBUG","thread":"Context_2_Thread_reconnect_notify_service","logger":"com.solace.spring.cloud.stream.binder.health.base.SolaceHealthIndicator","message":{"input":"Solace connection/flow status is UP"}
    

    I suspect your solace.java.reconnect-retries values are configured low.

    If you haven't already tried with the solace.java.reconnect-retries=-1 please try it.