Best Practices for handling message processing failures (Go API)

Lunastryke
Lunastryke Member Posts: 3

Hi, just wanted to know what is the best practice for handling failures when processing consumed events from a queue?

i.e. Upon consumption of an event from a queue subscribed to multiple topics, I would call an external API to send an email to a person based on the topic. However there is a chance that this API call would fail, and I would like to retry this event.

What would be the best practices in this case to attempt a retry a certain number of times before publishing to a dead letter queue?

var notificationReceiver solace.PersistentMessageReceiver

func initNotificationSubscriber(msgService solace.MessagingService) error {
    notificationReceiver, err := msgService.CreatePersistentMessageReceiverBuilder().
        WithMissingResourcesCreationStrategy(strategy).
        WithMessageClientAcknowledgement().
        WithSubscriptions(ts...).
        Build(queue)
}

func notificationMessageHandler(message message.InboundMessage) {
    var messageBody []byte
    if payload, ok := message.GetPayloadAsBytes(); ok {
        messageBody = payload
    }
    receivedTopic := message.GetDestinationName()
    slice := strings.Split(receivedTopic, "/")
    switch slice[0] {
    case "payment":
        switch slice[1] {
        case "failed":
            if err := sendEmail(messageBody, "failed"); err != nil {
                // Want to retry this here
            }
        case "success":
            if err := sendEmail(messageBody, "success"); err != nil {
                // Want to retry this here
            }
    }
    notificationReceiver.Ack(message)
}

Answers

  • arih
    arih Member, Employee Posts: 125 Solace Employee
    edited April 2023 #2

    Hi @Lunastryke ,

    For starters, using DMQ usually means applying expiry/TTL on the message, or max redelivery count set. Leah did a great post on this subject here: https://solace.com/blog/pubsub-message-handling-features-dead-message-queues/

    In this blog, the approach is not to retry within the application, but just focus on single event processing and ACK only when the process completed successfully, and leave the retry mechanism to the broker. If you don't ACK, that message will eventually[1] be put back on the queue and gets redelivered to a consumer application.

    You can also do a retry loop within your function call as an application logic indefinitely or up to certain times. Again, you'll only ACK that message if you succeed with that API call you needed.

    So it depends on how you'd like to do (and maintain) the retry :) how many times, what intervals, order of processing, etc.

    Ari


    [1] https://solace.community/discussion/1160/clientack-behavior-in-case-ack-is-not-sent-timeout

  • Lunastryke
    Lunastryke Member Posts: 3

    Hi @arih,

    Thanks for the response. So from what i understand is that the broker will handle any redelivery due to exceptions within the application (i.e. the application crashed mid processing and failed to ack).

    If within the handler i encounter any failed operations such as an API call and want to retry the call after a certain amount of time, this should be handled within the handler code. If at the end of the handler I have decided that this message has been successfully processed, I will ACK it.

    However, what happens if after all my retries I decided that the message cannot be successfully processed? Can i simply call publish( ) to publish the event to a DMQ ?

  • arih
    arih Member, Employee Posts: 125 Solace Employee
    edited April 2023 #4

    the broker will handle any redelivery due to exceptions within the application (i.e. the application crashed mid processing and failed to ack).

    yep, simply no ACK means back to the queue for next available consumers. the referred blog post mentioned the pre-requisites such as marking the message DMQ-eligible and setting the TTL or redelivery counts. Do note that if there's no TTL being set, there's a possibility that the message stays in 'unacknowledged' state indefinitely until the client disconnects/unbind in this scenario of failed to ACK or No ACK.

    If within the handler i encounter any failed operations such as an API call and want to retry the call after a certain amount of time, this should be handled within the handler code.

    That is one option, but not necessarily the only way or the correct way :)

    However, what happens if after all my retries I decided that the message cannot be successfully processed? Can i simply call publish( ) to publish the event to a DMQ ?

    In general, you can just leave the message to expiry and moved to DMQ.

    There is also an upcoming feature to send Reject or Failed ACK, so a Negative ACK basically. This might be useful if you need an immediate send to ACK and can't afford the expiration wait.

  • Aaron
    Aaron Member, Administrator, Moderator, Employee Posts: 579 admin

    I kind of agree with @Lunastryke, there are different types of error I might want to do different things with. If my consumer is trying to POST to some downstream system, and it's temporarily offline, I'd probably go into a retry loop within my consumer and just wait to see if it comes back... rather than disconnecting from the queue or NACKing the message back right away. If my consumer can't even deserialize the message, then that new feature (Reject) would come in handy, to force to DMQ.

    Note that in Spring Cloud Stream binder, for Rabbit and I think for Solace too, the app has the option of publishing the message (a copy of the message anyway) somewhere else (DMQ, error handling queue) and ACKing the original message. This is more application-level... but I'd prefer to let the retries happen inside Solace.

    Note that for the message to be made available for redelivery (without using the Failed/Reject new functionality), the consumer needs to unbind from the queue, or disconnect from the broker. This doesn't always happen, even with thrown exception. I had a Java JCSMP app where I threw an unchecked exception out of my onReceive() method, which stopped me getting new messages, but also didn't unbind me. I had to add some code in the onException() to unbind from the queue to allow the message to get redelivered.