How to check the publisher of a TOPIC?

SasikumarSP
SasikumarSP Member Posts: 31 ✭✭

Hi All,
Can you please share a command or SEMP query to get the publisher client details for a particular topic?
We can view the sender of a queue using ingress flow on the client but I don't see any option for a topic publisher in soladmin/pubsub+ manager.

Tagged:

Comments

  • uherbst
    uherbst Member, Employee Posts: 127 Solace Employee

    Hi @SasikumarSP,

    have you tried

    show client *

    or

    show client * detail

    ?
    What exactly do you want to know ?

    Uli

  • SasikumarSP
    SasikumarSP Member Posts: 31 ✭✭
    edited September 2021 #3

    @uherbst
    Thanks for the reply.
    I want to know the client who is publishing on a topic or a client name on which topic its publishing messages.

    e.g.,
    1. Solace/Test/01 is a topic name want to know the client name who is publishing messages on this topic.
    or
    2. sasi-server-01/3456/#000f8d is the client name connect to a msgVPN to publish messages on a topic. I want to check on which topic this client publishing messages on.

    When I open a client connection in soladmin/pubsub+ manager, there is tab as Ingress/Egress or producer/consumer. This helps me to identify the queue name if that client send/consumes messages on a queue. Whereas for the topic publisher I could see only the Ingress flow name but there is no topic name (not even in the subscription tab).

    show client * is also not giving any details for the topic publisher on which topic it sends messages.

  • uherbst
    uherbst Member, Employee Posts: 127 Solace Employee

    Short answer: You can't map topics to publishers.

    You can check publishing ACLs (if set), if you can limit the number of client, that are able to publish a specific topic. But you can't identify, which message / which topic originates from which client.

  • SasikumarSP
    SasikumarSP Member Posts: 31 ✭✭

    Thanks for the update @uherbst
    Is there any roadmap to add this feature in an upcoming release?

  • ChristianHoltfurth
    ChristianHoltfurth Member, Employee Posts: 73 Solace Employee

    Hi @SasikumarSP ,

    I'm curious to understand what you are trying to achieve. Could you explain in a bit more detail why you need to know the publisher of a topic and which part of your system would like to collect/check that information?

    The reason I'm asking is that in a pub/sub system you typically want some decoupling between the publishers and the subscribers, so the subscribers shouldn't really care about who is publishing the data that they are receiving.
    To enforce that only authorised system publish data to certain topics, you would use ACLs to enfore e.g. that only the master data system can send updates on topics related to the objects in the master data catalog.

    Do you have auditing requirements that demand you collect the publisher information?
    If so, could distributed tracing & OpenTelemetry help in your case?
    See https://solace.com/blog/what-is-distributed-tracing-and-how-does-opentelemetry-work/ for some more information on that topic.

    Best regards,
    Christian

  • TomF
    TomF Member, Employee Posts: 409 Solace Employee

    @SasikumarSP, another point is that in Solace, topics are completely dynamic. They are just a tag - an address - on a message. Given suitable permissions, any publisher can publish on any topic. This means it isn't possible to know what publishers are publishing on what topic, because the next message a given publisher sends could be to any topic it has permissions to publish on.

    So PubSub+ could tell you on what topics a publisher has published messages - but you'd have to track every single message that publisher has sent - which is why Christian pointed you at OpenTelemetry. But even then, you can't tell what the next topics your publishers will publish to. For that, you'd have to ask the publishers themselves.

  • Tamimi
    Tamimi Member, Administrator, Employee Posts: 531 admin

    And to add to Tom's point, since Solace topics are completely dynamic, you can potentially include some metadata in the topic that could give you an idea on the publisher during topic design. so for examples <topic_level1>/<topic_level2>/{sensorID} or <topic_level1>/<topic_level2>/{organization}
    You might find this page helpful: Topic Architecture Best Practices

  • SasikumarSP
    SasikumarSP Member Posts: 31 ✭✭

    Thanks all for sharing the information!

    We receive a question more often from downstream applications, who are all publishing messages on this topic?

    Let's take the example of a data warehouse application. They just consume messages from a queue endpoint and do not know who is publishing messages to the topic which subscribed to that queue. There is no information about clients who are all publishing messages on that topic in solace pubsub+ manager or in the event.log.

    If event.log registers the information of topic or queue name for the ingress or egress flow open it will help us to identify the clients who connected to topics and queues. This will also help us to get stats of the topic (total ingress and egress).

    For the application that uses direct transport to publish messages, the topic name can be added in the client_connect even log.

    I understand the topics are dynamic and the client connect changes the string every time when it connects. Solace log registers the application connect even in log and if we get the destination name in the same log event it will be a great help.

    Thanks!

  • ChristianHoltfurth
    ChristianHoltfurth Member, Employee Posts: 73 Solace Employee
    edited September 2021 #10

    Hi @SasikumarSP ,

    Thanks for the additional information. You have given us the reason why you are asking to get that information. What I'm not clear on yet is why the downstream applications are asking for the information?

    Let's look at this problem/question from various angles:

    Runtime aspect:
    In an decoupled EDA based system they shouldn't need to care on a message by message basis, who explicitly published this one or that one in my opinion. As long as a publisher is authorised to publish to a specific topic, he should be treated as a valid source.
    If they really need to know, then maybe that information should become part of the actual message itself or it could be added to the topic space?

    Governance aspect:
    For operational and governance reasons you actually do want to know who's publishing and receiving certain messages, so you know who's affecting who if payloads need to change etc.
    For that you may want to look at Solace's Event Portal as it will help you design, discover (via discovery scans on your broker) and document your event flows on your EDA system.
    There are also scans that you can run to discover which topics are the busiest in a given timeframe.

    Statistics & Billing aspects:
    If you are looking for stats on how many messages a publisher or a subscriber are sending or receiving, you might want to look at the stats on the clients themselves or on their respective endpoints (queues).
    My question would be though why you would need rates on specific topics?
    And how would you cut your topics?
    If you desinged your topic space with the dynamicity of Solace's hierarchical and dynamic topic support in mind, you might just have one message per topic (or a few) if you include very dynamic attributes like object or payment ids or similar.
    Topics are very dynamic in Solace, there's no long lived administrative object associated with it.
    Maybe what you are really looking for is how many messages match a certain subscription?
    e.g. if you are publishing to something like
    <domain>/<business_area>/<business_object_type>/<business_object_id>/<status>
    Then maybe you want to know all messages that were published to something like
    sales&marketing/sales/product/product_xyz/> to count the number of messages (maybe correlating to sales in this case) for a specific product?
    There are several ways how you could go about this.
    Maybe you could look at the queue stats of a queue that has a subscription matching the above to get those numbers.
    Or you could create another audit queue that matches that specific subscription.
    Or you could go down the OpenTelemetry route and collect stats for everything that is happening in your system.
    Or you could run periodic discovery scans to get an idea of the activity during specific time windows...

    Hope I've given you some ideas that help!
    Let me know, if you have any further questions.

    Best regards,
    Christian

  • SasikumarSP
    SasikumarSP Member Posts: 31 ✭✭

    Thanks for the detailed information @ChristianHoltfurth

    Most of the use-case which I came across that asked for this information are using static topics. We can consider the topic naming structure for new development but for the system running ages may not change immediately.

    Let consider a scenario of the Dataware house app:
    Datawarehouse application faced an issue with their consumer process and it may not be up for a couple of hours. Based on the business use case they don't have a DR setup. There are more than 100 topics and all of them are static. Now they want to inform the publishers who are all sending messages only to a particular topic. It is not possible to find them using live connections or logs. The queue can be shut down but still, it's not worth keep sending millions of messages when no one consumes messages. Queue spool quota cannot be increased to store all messages because that appliance is used by 20+ applications. If we have the CLIENT_CONNECT event with a topic name easy to get the list of servers and send information.

    Discovery agent:
    It has a great feature of finding the topic structure even if wildcard (>) is used. I have seen the agent with subscriber ID for the consumer and also it subscribe directly to the topic name then finds the rate of messages. As far as I know, the discovery agent will not find the publisher connection. Please correct me if I am wrong.

    ** stats:**
    It's possible to find the stats from the queue but sometime we get a query about the message loss. The publisher says I sent all messages but receive would have not got the same count and we know this is not a solace issue (with checks on queue stats and logs) but cannot find who is the publisher of that particular topic to get that stats. There could be 20+ topics and 600+ connections (both consumer and producer) during that time frame.

    **MNR **
    This is a direct message delivery setup but still, get queries from users to know the stats for a topic. Applications use 100+ topic strings to publish messages and they want to know the count of messages for a particular topic and during a time frame.
    msgVPN stats help for overall count and rate but not for the topic pattern.

    I feel it's worth having a feature of destination name (topic and endpoints name ) register in the event log with CLIENT_CONNECT event, especially for a publisher.

    Thanks!

  • ChristianHoltfurth
    ChristianHoltfurth Member, Employee Posts: 73 Solace Employee

    Hi @SasikumarSP ,

    Sorry for the delayed response. I have been on holiday for a bit and only managed to get back to your post now.
    Thanks for the additional background on your usecase, very insightful.

    I think what you probably need is a combination of good documentation of your flows in Event Portal + additional stats on topics at runtime once Solace has added OpenTelemetry support into the product.

    For now, you could also log sender information from each message, if applications are adding something meaningful here, see https://solace.com/blog/inside-a-solace-message-using-header-properties/ for some details on the headers of a message, in particular the field senderid.

    For your datawarehouse usecase:
    Does your datawarehouse application go down or fall behind regularly?
    Are other applications receiving the same messages?

    In a publish/subscribe architecture it can be tricky to deal with one consumer falling way behind or being offline for a long time, if others are receiving the same messages and want to keep receiving messages.
    Typically your datawarehouse application being offline may/should not justify stopping other critical business processes from receiving their real-time data.
    Typically you would try to size the queues for your datawarehouse application accordingly to buffer all the data. But if that is not possible, because you are looking at buffering Gigabytes or Terrabytes of data (Solace currently supports up to 6 TB spool size), then you may want to look at staging your data in another cheaper and highly available storage solution before processing it with your datawarehouse application.
    Solace is very good for real-time data distribution, but slow, lower priority consumers might be better off catching up from a storage solution that is designed for large volumes and longer time storage (think several Terrabytes of data).

  • SasikumarSP
    SasikumarSP Member Posts: 31 ✭✭

    Hi @ChristianHoltfurth

    Thanks for the input!

    The Dataware house app does not go down frequently but if goes down it may take time to up and running based on the issue. The consumer app asks us to stop the publisher so it's a bit tricky for us to identify an existing application flow.

    We will consider the documentation and add publisher details in the message.

    We have another use case very similar to the event portal, say we have 100+ applications already using solace, and flows are not documented. Now, we want to create a flow diagram from end to end based on the live connections or using event logs history. Is there a way to identify the publisher server/IP for each topic? As mentioned, I don't think the discovery agent does not get publisher details of the topic. Please let me know if that feature is added.

  • ChristianHoltfurth
    ChristianHoltfurth Member, Employee Posts: 73 Solace Employee

    Hi @SasikumarSP ,
    You are right in that it's not possible to determine the publisher of a message easily at the moment. I mentioned using header fields like senderId to determine who published a message, but that field is optional and requires your publishers to provide it.
    If they don't, then you have no way of determining who published the message.

    Unless, you have the publisher id built into your topic hierarchy and are enforcing this via ACLs. This is described in a topic best practices blog post by Ken Barr and referred to as SourceId here: https://solace.com/blog/topic-hierarchy-best-practices/

    If you don't have either of these options available to you right now, then distributed tracing might provide you with a way of aquiring this information in the future when that feature gets added to our brokers.