Try PubSub+

Slow subscriber causing solace spool quota blow up

RajeshRajesh Unconfirmed, Member Posts: 2

Hi Solace experts - I am new to solace world and getting to know things. We have solace set up in our application platform and it has been running fine for few weeks. As client applications are increased, we see some slow subscribers in solAdmin tool. The delivery mode set up is "DIRECT" and the client apps subscribe to topics like t/env/loc/app/pub and t/env/loc/app/ssn.
In an ideal scenario when client application is closed, the client app is unsubscribing for all the subscribed topics and the connections from that client disappear in the SolAdmin, all good in this case. However, in some machines, we see slow subscriber i.e. when the app is set to auto shutdown at night say (11:00 p.m.), not all client apps could close gracefully.. one of the client apps are appearing under "slow subscribers" column in SolAdmin tool.
This further caused the message queue build up overnight and exceeded the spool quota causing much bigger problem the next morning to other consumer clients who couldn't consume data from solace. Everything had to be restarted later to get to normal.
I'm sure that not all client apps can be expected to gracefully unsubscribe and close connections.
so Could you please tell me how to handle such clients who are not able to consume messages for whatever reason and how can I save the solace from building up the queue and exceeding the spool quota? Ideally, I would like to identify such slow consumers and disconnect them from the topic channel in solace itself (better if it can be automatically disconnected by solace) and not queue up the messages for them to avoid spool quota blow up. sorry for a long description, but please advise...




  • TomFTomF Member, Employee Posts: 119 Solace Employee

    Hi @Rajesh, welcome to the wonderful world of Solace!

    TL;DR: I don't think you want to persist messages, but you are. Make sure you have your subscribers set up to receive DIRECT messages, they aren't reading from queues or Topic Endpoints.

    I think the most important thing to get clear here is the interaction of DIRECT messages, PERSISTENT messages, topics and queues. So my first question is: are you using JMS?
    There is some confusing terminology to get to grips with. A DIRECT message is stored in memory - it doesn't survive a broker restart. Note I said DIRECT message - not subscriber. From your description it seems like you don't want subscribers that have closed getting current messages later when they re-connect - in other words you don't want the messages to be queued (persisted). This is what DIRECT messages are meant for. If messages are being queued and quota exceeded, then it's clear your incoming messages are being attracted to a persistent endpoint somewhere - this is what is causing the problem.

    So, are your applications getting messages from a topic or a queue? This is where the distinction between JMS and everything else becomes important. JMS has the idea of a durable subscriber. This creates a special type of persistent endpoint called a Topic Endpoint. When the subscriber is off line, the Topic Endpoint will persist messages on that topic. If you aren't using JMS, then you must have a queue endpoint somewhere that is subscribed to your topics.

    If you don't use a JMS durable subscriber or a queue, you will get exactly the behaviour you want with DIRECT messages. Pubsub+ keeps a track of all subscribers, and when it detects a slow subscriber it will keep a track of it. If Pubsub+ starts to run out of memory allocated to a client (see Message Delivery Resources), Pubsub+ will start to discard messages for that particular client. We're nice and set a flag for you in subsequent messages (see Message Discard Notification) If things get worse and Pubsub+ starts to run out of memory because there are many slow subscribers, Pubsub+ will start to disconnect the slow subscribers (see Egress Buffer Management).

    In summary: I don't think you want to persist messages, but you are. Make sure you have your subscribers set up to receive DIRECT messages, they aren't reading from queues or Topic Endpoints.

  • RajeshRajesh Unconfirmed, Member Posts: 2

    Hi Tom - thanks for the description and sorry for using confusing terms as I'm getting used to these now :).
    I checked with my admin and he confirmed that we are not using JMS. What we are using is "Direct" mode of messaging with Non-durable queues and the client applications are subscribing to topics.
    The problem we are facing is - the client applications run fine during the day, but when it is shutdown at 11:00 pm, the client application does an Unsubscribe() for all the topics it has subscribed to and then kills the exe.
    During this attempt of unsubscribe(), one of the client instance appears as slow subscriber in SolAdmin and the connection appears to be open, but in the client app logs - we see the unsubscribe() call is complete and the connection is closed and exe is also killed. This remains open the next day until when we disconnect manually from SolAdmin.
    In above case , the message spool quota was breached due to the Egress discards.
    My main concern here is that when the clientApp.exe is killed - why does the connections remain open on solace side?
    How do I ensure that nothing remains on solace when my exe is closed? we are using SolClient.Messaging.dll to call the Unsubscribe() method in a .Net client application.
    How can we configure solace such that - if there is any client process that is popping up as a slow subscriber - can solace disconnect the client automatically?
    Thanks in advance

  • TomFTomF Member, Employee Posts: 119 Solace Employee

    @Rajesh don't apologise for our industry's confusing terminology! :smile:
    Thanks for confirming all of this, I now have a better understanding of what you're seeing. I'm surprised the connection to your closed app is remaining open for so long: there are keepalives at both the TCP level and in the .Net API which should detect the client has disconnected. There is clearly something going on at the network level since that's how PubSub+ detects a slow subscriber (it looks at the network connection and sees congestion to the client.)

    So, to answer your question, PubSub+ does not automatically disconnect a slow subscriber unless there are many slow subscribers and it starts to run out of memory. All is not lost, though. PubSub+ has a management and monitoring API called SEMP which we can use to detect this condition and perform the disconnection.

    As an example, to return all the clients flagged as slow subscribers in the message VPN, use the following SEMP URI:
    "http://<pubsub+ ip:port>/SEMP/v2/monitor/msgVpns/<MsgVpn>/clients?where=slowSubscriber==true"
    So for instance for the broker running on my Macbook in Docker:
    curl -X GET -u admin:<password> "http://localhost:8080/SEMP/v2/monitor/msgVpns/default/clients?where=slowSubscriber==true"

    With this list you can then tell the SEMP API to disconnect them:
    "http://<pubsub+ ip:port>/SEMP/v2/action/msgVpns/<MsgVpn>/clients/<client name>/disconnect

    This will fix the immediate problem of forcing the disconnection of these rogue clients. Next, we should concentrate on finding out what's happening to cause the problem. I suspect that something is causing the client operating system to not close the TCP connection properly. What we would need to do is perform a packet capture that starts just before 11pm for say 30 minutes, identify which client connection has the problem, and then we can see what's happening at the network level.

Sign In or Register to comment.