C# library is allocation heavy on the Send() for message arrays, any chance for improvement?

Aleksei
Aleksei Member Posts: 14

I have market data app that uses 3rd party C++ library that I wrapped in native C interface and I'm using it in C# app. I exchange memory between native and managed with zero allocations and copies using all the latest and greatest stuff C# has to offer at .NET 6 level. But Solace C# client library is lighting up my application with allocations like a Christmas tree, and it is the only source of any allocations after initial startup of the app.

The allocation appear to come from two places in Send(IMessage[], int, int).

  1. Dictionary for checking uniqueness of passed messages. It is allocated without consideration that message limit is 50 and it causes multiple reallocs due to growth... If the limit is known to be 50 and array is passed that has length attribute, why dictionary is at least not allocated with enough size initially? Room for improvement. Dictionary can be replaced by stackalloc array of 50 elements with basic linear search upon insertion of an id into it. This is only 49 passes of array which is nothing in terms of CPU compared to GC. ArrayPool<Int32> can alternatively be used to borrow memory on the heap if stackalloc use is not possible for some reason.
  2. BitConverter.GetBytes(long) is used in both Send methods during call to TagMsgAtomically. There is a version of the method called TryWriteBytes(Span<byte>, long) since .NET Standard 2.1. It can be used with stackalloc or ArrayPool array to convert long into bytes without allocation.

Currently as a workaround I'm trying to find good moments in market stream to manually run GC cycles not to wait for garbage to pile up in 30 min interval long pause cleanups. Unfortunately the only garbage in my app is generated by C# client library and 3rd party C++ lib I use doesn't allow me to transparently see when a good moment for manual GC is due to some threading design decisions taken by it's authors and it being a black box. So improvements to the Solace C# client library are very welcome.

Tagged:

Answers

  • nicholasdgoodman
    nicholasdgoodman Member, Employee Posts: 43 Solace Employee

    Very interesting observations - and absent any direct changes to the SDK library, I am looking to see if there are any other workarounds apart from your manual garbage collection scheduling.

    Can you clarify, the performance issues you observe are not necessarily on the allocations themselves, but on the garbage collection that results from it? Or are both impacting the observed performance?

  • Aleksei
    Aleksei Member Posts: 14

    Hello @nicholasdgoodman,

    In C# allocations are cheap, so my problem is payback on GC which wouldn't normally be needed if allocations didn't happen in the first place. I found a good place to induce GC manually, but I'm not happy with the result to be honest as time app spends in pauses is much longer in total time but pauses themselves are now of predictable lengths at least. This isn't really a workaround I would like long term.

    Profiler also shows some additional allocations of temporary arrays like IntPtr[] and Int64[] during sends (array version, maybe single too) which can probably be avoided, but the ones I mentioned are multiple times more frequent and can easily be worked around with backward compatible changes to library code.

    I'm a long time user of Solace product, must admit mostly in C++, but C# nowadays is becoming high performance language with all the memory management improvements of late. So I'm happy to sign an NDA just for a chance to contribute to C# client API or at least assist you in any ways without having access to source code.

  • nicholasdgoodman
    nicholasdgoodman Member, Employee Posts: 43 Solace Employee
    edited January 2023 #4

    I see. There are some internal discussions going on about this and, as you note, the optimizations are fairly straightforward apart from the fact a number of Solace users are still leveraging significantly older (perhaps even out-of-support) versions of .NET Framework. It would be possible to address this by changing what versions of .NET are supported or by using conditional compilation.

    @Aleksei, that being said, while thinking about other workarounds and the general problem, you have got me curious about minimizing the impact of the GC, and I wanted to ask how are you handling the actual IMessage instances? Although you can manually control the unmanaged resources (alloc and free calls) via the provided .Dispose() implementation, are you avoiding GC on the managed instances somehow -- or are you re-using the messages and simply updating their payloads and headers?

    Assuming there's no immediate fix to the package, are you interested in some (possibly hacky) workarounds to this issue? It may be possible to "rework" some of these methods using P/Invoke to grab some private IntPtr fields and DllImporting from the C API.

    Even if not, if I can get a working example, I will share it (with the requisite caveats).

  • Aleksei
    Aleksei Member Posts: 14

    @nicholasdgoodman you are right about older frameworks, updating to .Net standard 2.1 if I remember correctly will drop the classic .NET framework out of the window. But stackalloc should be available in 2.0 and in the worse case if it isn't, it should be possible to use ConcurrentBag to store heap arrays of 50 elements for uniqueness validation in arrayed Send() replacing dictionary. I use ConcurrentBag to store protobuf object instances for market data in this app. I put a link at the bottom with example of making object pool with it.

    With regards to IMessage instances, I Reset() them after Send accepts them on a wire and reuse them on next Send. Acknowledgements come with a correlation tag set to my internal objects, so I don't need IMessage instance beyond return of Send call.

    Regarding the P/Invoke, this one is interesting, If I had pointers from the message instances and a session pointer I could call solClient_session_sendMultipleMsg directly and still use most of C# lib. Do you have an example or an idea if this is possible without modifying the library to expose those internal IntPtr? The last time I did trickery like that was a decade ago and it involved reflection.

    https://learn.microsoft.com/en-us/dotnet/standard/collections/thread-safe/how-to-create-an-object-pool

  • nicholasdgoodman
    nicholasdgoodman Member, Employee Posts: 43 Solace Employee
    edited January 2023 #6

    So, I will start this with a disclaimer: the following is merely sample code which shows a hypothetical "how it could be done", and bypasses much of the internal sanity checks, validations, etc. that the full .NET SDK (C Wrapper) provides. Also, because it involves reflection to obtain private fields, could break any time you upgrade the .NET SDK package.

    With that out of the way, here is a very bare-bones helper class which allows a "mostly C#" application to directly invoke the C send multiple API. Note: as we discussed, it assumes that individual IMessage instances are going to be re-used. (I would be curious to hear how you are handling ITopic references as well.)

    class MessageBatchSender: IDisposable
    {
        IntPtr sessionPtr;
        IntPtr[] messagePtrs;
            
        public MessageBatchSender(ISession session)
        {
            this.Messages = Enumerable.Repeat<object>(null, 50).Select(_ => session.CreateMessage()).ToArray();
            this.messagePtrs = this.Messages.Select(GetPrivateIntPtr).ToArray();
    
            this.Session = session;
            this.sessionPtr = this.GetPrivateIntPtr(session);
        }
    
        public IMessage[] Messages { get; }
        public ISession Session { get; }
    
        public ReturnCode SendMessages(uint length, out uint messagesSent)
        {
            return SolClientSessionSendMultipleMsg(this.sessionPtr, this.messagePtrs, length, out messagesSent);
        }
    
        // This method could stop working if an SDK upgrade changes the internal implementation
        private IntPtr GetPrivateIntPtr(IMessage message)
        {
            var messageImplType = message.GetType();
            var messagePtrField = messageImplType.GetField("m_opaqueMessagePtr", BindingFlags.NonPublic | BindingFlags.Instance);
            return (IntPtr)messagePtrField.GetValue(message);
        }
    
        // This method could stop working if an SDK upgrade changes the internal implementation
        private IntPtr GetPrivateIntPtr(ISession session)
        {
            var sessionImplType = session.GetType();
            var sessionPtrField = sessionImplType.GetField("m_opaque", BindingFlags.NonPublic | BindingFlags.Instance);
            return (IntPtr)sessionPtrField.GetValue(session);
        }
    
        public void Dispose()
        {
            //TODO: Dispose all those IMessage instances!
        }
    
        [DllImport("libsolclient", CharSet = CharSet.Ansi, EntryPoint = "solClient_session_sendMultipleMsg", ExactSpelling = true)]
        [SuppressUnmanagedCodeSecurity]
        static extern ReturnCode SolClientSessionSendMultipleMsg(IntPtr opaqueSession, [MarshalAs(UnmanagedType.LPArray)] IntPtr[] opaqueMessages, uint msgArrayLength, out uint numMsgsSent);
    }
    

    And it can be used in this manner:

    // Create context and session instances
    using (var context = ContextFactory.Instance.CreateContext(contextProperties, null))
    using (var session = context.CreateSession(sessionProperties, null, null))
    {
        // Connect to the Solace messaging router
        Console.WriteLine($"Connecting as {username}@{vpnname} on {host}...");
        var connectResult = session.Connect();
    
        if (connectResult == ReturnCode.SOLCLIENT_OK)
        {
            Console.WriteLine("Session successfully connected.");
    
            // Create a topic and subscribe to it
            using (var publisher = new Helpers.MessageBatchSender(session))
            using (var topic = ContextFactory.Instance.CreateTopic("tutorial/topic"))
            {
                // This example assumes all messages have the same topic
                publisher.Messages[0].Destination = topic;
                publisher.Messages[0].BinaryAttachment = Encoding.UTF8.GetBytes("Msg 1");
                publisher.Messages[1].Destination = topic;
                publisher.Messages[1].BinaryAttachment = Encoding.UTF8.GetBytes("Msg 2");
                publisher.Messages[2].Destination = topic;
                publisher.Messages[2].BinaryAttachment = Encoding.UTF8.GetBytes("Msg 3");
                publisher.Messages[3].Destination = topic;
                publisher.Messages[3].BinaryAttachment = Encoding.UTF8.GetBytes("Msg 4");
    
                Console.WriteLine("Publishing messages...");
                var sendResult = publisher.SendMessages(4, out var messagesSent);
                                                         
                if (sendResult == ReturnCode.SOLCLIENT_OK)
                {
                    Console.WriteLine($"Done. Sent {messagesSent} messages.");
                }
                else
                {
                    Console.WriteLine($"Publishing failed, return code: {sendResult}");
                }
            }
        }
        else
        {
            Console.WriteLine($"Error connecting, return code: {connectResult}");
        }
    }
    
  • rehan_azam786
    rehan_azam786 Member Posts: 3
    edited November 2023 #7

    Are there any plans to expose these IntPtr fields as properties in a later version so we do not have to rely on reflection? Also, are there any plans on using {ReadOnly}Span/{ReadOnly}Memory in place of byte[] for BinaryAttachment? I am using pooled arrays to serialize our messages, but they are, most of the time, larger than the message size. It would be nice to not have to allocate arrays for setting onto messages.

    The only workaround I have is to stackalloc the byte[], copy from Span (into which the data was serialized), and then p/invoke setting of the binary attachment using the IntPtr of the stack-allocated byte[].

    We are using C# and .NET Framework 4.8.1. The business message library I am writing is being compiled as a .net standard 2.0 library.

  • nicholasdgoodman
    nicholasdgoodman Member, Employee Posts: 43 Solace Employee

    There are no current plans to do so.

    The use of the multi-messageSend(IMessage[], int, int) API is not particularly common or beneficial from a performance perspective except in very special circumstances; and the general recommendation is to use the single-message Send(IMessage) API unless there is a demonstrated benefit to the former.

    It would be more likely, in my opinion, for the Solace SDK to merely fix the unnecessary allocations (per the OP) internally rather than leverage a hybrid C / C# API via IntPtr properties as shared in the sample code, above.

    (Note: the code provided here is not intended for production use and is provided for demonstration purposes only.)

    Regarding the use of Span / ReadOnlySpan / Memory / ReadOnlyMemory - improvements like these would be possible and are driven by customer asks.

    One of the biggest challenges with maintaining an SDK with a very wide range of users is maintaining supportability for users who rely on legacy or even deprecated technologies. To date, Solace has maintained support for very old .NET technologies, and is still compatible with .NET Framework 2.0!

    Theoretically, a next-gen Solace .NET API would be .NET Standard compliant, and include modern language features such as Span and async / await style APIs. But until then…

  • rehan_azam786
    rehan_azam786 Member Posts: 3

    Of course, it isn't the proper way of doing this, but it does open up possibilities to interact with solace native code without reflection and worrying about field name changes.

    We aren't using the multi-message API. I wanted to find a way to set the binary message on a single IMessage without having to create a new byte[].

    To do so, I am deserializing my object into an off-heap array wrapped as a Span and then setting the IntPtr of the off-heap array to the IMessage via MessageSetBinaryAttachment (below). Not proud of this code, but it is working without allocating (from my side). Do you see any challenges with this (other than the ugliness of it)?

    var alloc = Marshal.AllocHGlobal(length);
    var span = new Span<byte>(alloc.ToPointer(), length);

    [DllImport("libsolclient", EntryPoint = "solClient_msg_setBinaryAttachment", CharSet = CharSet.Ansi)]
    public static extern ReturnCode MessageSetBinaryAttachment(
    IntPtr opaqueMessage,
    IntPtr opaqueBuffer,
    uint bufferSize);

  • nicholasdgoodman
    nicholasdgoodman Member, Employee Posts: 43 Solace Employee

    Very interesting approach, @rehan_azam786. Is the goal here to use a fixed or pre-allocated byte buffer which can be used multiple times as an IMessage.BinaryAttachment value? Without seeing the entire code base, I assume that you are only doing a single allocation with AllocHGlobal(…) and then reusing the unmanaged memory buffer for many messages. Is this correct?

    Not sure how you can get the IntPtr opaqueMessage reference without reflection, though; but assuming you have that value calling the native method essentially what the .NET SDK is doing anyway — with the added perk of being able to specify the buffer size.

    It would be nice if the SDK contained a IMessage.SetBinaryAttachment(byte[] buffer, int start, int length) method which would more or less accomplish the same and allow re-using the same (managed) memory array over and over for the entire application lifecycle assuming it was initialized with an appropriate initial size.

  • rehan_azam786
    rehan_azam786 Member Posts: 3

    Yup, that is exactly it - I want to pre-allocate a byte buffer and reuse it multiple times. There is also the option of using an ArrayPool or MemoryPool, but both can return a buffer larger than expected, in which case I would need to specify the start and length. Without the SDK allowing for this, either via the method you specified or via the use of Spans, ArraySegments, etc., I have to use this workaround. Hopefully, these new methods and support for Spans can be added to the SDK.