Questions on sdkperf

dreamoka
dreamoka Member Posts: 52 ✭✭✭

I have some questions on the sdkperf latency result.

1) Why 95th, 99th percentile latency is N/A ?

2) How does the bucket size affect the latency measurement?

3) How does sdkperf calculates Minimum and Average Latency for subs? Is the time based on the data packet travel from client to message broker and back?


Tagged:

Comments

  • Aaron
    Aaron Member, Administrator, Moderator, Employee Posts: 580 admin

    Hi @dreamoka. Happy to answer some questions about SdkPerf latency testing. Its default configuration is to test the hardware broker over a very short LAN (usually around 20 microseconds), and I can see by your average latency of about 20ms that you're going over the WAN. So we'll definitely have to configure the buckets to be bigger.

    Also, it looks like you only received 19 latency measurement messages! That's not enough. Generally, you'd want to publish for maybe 10-30 seconds, maybe at a rate of 1000 msgs/s, and have a short warmup before. Make sure to add extra messages to the number that publish to account for the warmup. E.g. 10 seconds of latency measurements, with 5 seconds warmup = 15 seconds total, at 1000 msg/s == 15000 messages total (-mn=150000 -mr=1000)

    I recently did a YouTube stream on SdkPerf, but didn't get into latency testing that much. Perhaps I should do a whole hour on latency? Anyhow, check this: https://www.youtube.com/watch?v=BayQH6RhQuQ starting around minute 58. Maybe it will help a bit. Actually, watch the whole thing to not miss any of the basics! haha

    So let me try to answer your questions:

    1. They are N/A because you don't have enough buckets / big enough buckets to adequately capture enough measurements to calculate these percentiles. Default bucket size is 1 microsecond, and default number of buckets is 1024. (max 4096). So for your WAN testing, we definitely need to make the buckets bigger.
    2. SdkPerf doesn't keep track of the exact delta of every single latency measurement that it receives. Instead, it calculates the delta and sticks it into one of 1024 (default) buckets. To produce a histogram essentially. So you need to make sure your buckets cover the range of time values that you will encounter.
    3. Minimum (and max) is easy, it just keeps track of the lowest (highest) that it's seen. Same with average, that's just a running total or deltas divided by the number of measurements. For the percentiles, it needs the histogram information. The measurements are from the publisher application, through the API, through the network, to the broker, back out through the network, to the consumer, and then measured when the app receives the message. So app-to-app. Note that going over the WAN like you are, you're mostly testing the network, not the broker performance.

    To make your buckets larger, we need to use the -lg switch. Now, the weird thing is that SdkPerf Java API and SdkPerf C API measure the bucket sizes differently (Java == microseconds, C == clock cycles). And the size of the buckets is actually 2 to the power of the number you pass into the -lg switch.

    In Java, default for -lg is 0, so 2^0 == 1, so bucket size of 1μs. If you have 1024 buckets, then your buckets only cover a range of 1024μs, or about 1ms. Definitely not enough for your WAN testing with latencies of ~20ms. So, now we do a bit of math. If -lg=8, then the size of your buckets would be 2^8 == 256μs. And if you have 1024 buckets, they'd cover a range up to 256*1024 == ~262ms. So maybe -lg=6 might be better? 2^6 * 1024 == ~65ms. That should definitely cover any weird / long latencies if your average is around 20ms. Of course, you could increase the number of buckets to the max (-lb=4096), and then you could make your buckets a bit smaller and still cover the same range.

    I'll just mention, but with C API SdkPerf, the latencies are counted in clock ticks. The default for -lg is 11. Which is 2048 clock cycles. So if you're on a 2GHz processor, that should be 1/2000000000 * 2048 seconds big, or ~1μs. A faster processor will have smaller buckets due to higher clock speed. So if testing using C API, make sure you scale up -lg from 11, not from 0 like Java.

    Wow this is a long response.

    Finally, make sure you're sending and receiving from the SAME computer! Due to clock synchronization. If you want to test end-to-end across the WAN from one app to another, then make sure to configure the remote side as a reflector/replier:

        -cm=string      Client mode.  One of 'reply' or 'sink'.  Default is 'sink'.
                        In reply, all messages are reflected by the replyTo topic is used as the destination when sending.
    

    ... and still capture the latency measurements on the same computer, and then divide the latencies by 2 to compensate for the round-trip time (RTT).

    Hope that helps. Let us know your command line and which API you're testing with, and what exactly you're trying to test, and we can help further. 👍🏼