Publish unicode string using solclientjs
Hi Experts,
I am trying to publish a non-ascii string to a solace topic using the solclientjs library in nodeJs, and consume the same using the golang solace messaging library, and vice-versa (i.e. golang to nodeJs).
Example string: abc-nonascii_ãçï_স
I understand I have three options to do this in solclientjs: as BinaryAttachment, as XMLContent, or as SDT Field.
However, the first option of BinaryAttachment seems to support only latin1 encoding, and the non-ascii characters are not sent correctly. The third option of SDT Field of ByteArray type also seems to have the same issue. The second option of XMLContent seems to be the only option, however I read that it is a legacy type and the golang library also does not have it as an explicit option.
I have tried changing the SolClientFactory Profile to version10_5, which makes the consumer correctly decode the unicode bytes, but the publisher still fails to encode the string correctly. I also see the same mentioned in the docs.
Hence I have two questions:
- What is the recommended way to transfer unicode strings in solace client libraries?
- Is it expected/desirable for the publisher to behave differently from the consumer when using the same factory profile version?
Thanks
Best Answer
-
Hi there @soham. Thanks for trying to research this out yourself and post a well-thought-out question. 👍🏼
First: don't use the XMLContent portion of the payload. That's old and legacy.
2nd: I'd suggest either using a regular TextMessage, or possibly a BytesMessage / binary. Either way. TextMessage is a special type of SDT, with just a single field (the text). It's (supposed to be!) UTF-8 encoded, so it definitely (should?) handle non-ASCII text. Since JavaScript is naturally UTF-16, it should handle the conversion of the text for you. I think! This is not true for a plain binary attachment (more on that later).
For JS sending / receiving a TextMessage:
// send var msg = solace.SolclientFactory.createMessage(); msg.setSdtContainer(solace.SDTField.create(solace.SDTFieldType.STRING, "here is my text.")); // receive var payload; if (msg.getType() == solace.MessageType.TEXT) { // in case someone sends text message try { payload = JSON.parse(message.getSdtContainer().getValue()); } catch(e) { subscriber.log(e); return; }
If trying to send a JS string as just plain binary attachment, you need to convert from UTF-16 to UTF-8. Same with receiving a binary attachment string from another app. I ran into this myself when I had a little JS app that was trying to show my colleague's non-Latin-spelling names (e.g. Chinese, Japanese, ...). These are the little helper methods I ended up finding and using:
//http://ecmanaut.blogspot.hk/2006/07/encoding-decoding-utf8-in-javascript.html function decode_utf8(s) { return decodeURIComponent(escape(s)); } function encode_utf8(s) { return unescape(encodeURIComponent(s)); }
Then the JS code for sending/receiving a BytesMessage with just a UTF-8 string as attachment should look like:
// send var message = solace.SolclientFactory.createMessage(); message.setDestination(solace.SolclientFactory.createTopic(topic)); var jsonPayload = JSON.stringify(payload); // still UTF-16 message.setBinaryAttachment(encode_utf8(jsonPayload)); // change to UTF-8 // or just message.setBinaryAttachment(encode_utf8("my js string")); // receive var payload; if (msg.getType() == solace.MessageType.BINARY) { try { payload = JSON.parse(decode_utf8(message.getBinaryAttachment())); // or just: var str = decode_utf8(message.getBinaryAttachment()); } catch(e) { subscriber.log(e); return; }
I think that should run. I just copied/pasted from some old examples I have, so hopefully this just works. Let us know!
EDIT: let me know if the JS Text/SDT approach works. I'll test it myself eventually if you don't get back to me. You might need to do that encode/decode thing for the TextMessage as well..?
1
Answers
-
Hi there @soham. Thanks for trying to research this out yourself and post a well-thought-out question. 👍🏼
First: don't use the XMLContent portion of the payload. That's old and legacy.
2nd: I'd suggest either using a regular TextMessage, or possibly a BytesMessage / binary. Either way. TextMessage is a special type of SDT, with just a single field (the text). It's (supposed to be!) UTF-8 encoded, so it definitely (should?) handle non-ASCII text. Since JavaScript is naturally UTF-16, it should handle the conversion of the text for you. I think! This is not true for a plain binary attachment (more on that later).
For JS sending / receiving a TextMessage:
// send var msg = solace.SolclientFactory.createMessage(); msg.setSdtContainer(solace.SDTField.create(solace.SDTFieldType.STRING, "here is my text.")); // receive var payload; if (msg.getType() == solace.MessageType.TEXT) { // in case someone sends text message try { payload = JSON.parse(message.getSdtContainer().getValue()); } catch(e) { subscriber.log(e); return; }
If trying to send a JS string as just plain binary attachment, you need to convert from UTF-16 to UTF-8. Same with receiving a binary attachment string from another app. I ran into this myself when I had a little JS app that was trying to show my colleague's non-Latin-spelling names (e.g. Chinese, Japanese, ...). These are the little helper methods I ended up finding and using:
//http://ecmanaut.blogspot.hk/2006/07/encoding-decoding-utf8-in-javascript.html function decode_utf8(s) { return decodeURIComponent(escape(s)); } function encode_utf8(s) { return unescape(encodeURIComponent(s)); }
Then the JS code for sending/receiving a BytesMessage with just a UTF-8 string as attachment should look like:
// send var message = solace.SolclientFactory.createMessage(); message.setDestination(solace.SolclientFactory.createTopic(topic)); var jsonPayload = JSON.stringify(payload); // still UTF-16 message.setBinaryAttachment(encode_utf8(jsonPayload)); // change to UTF-8 // or just message.setBinaryAttachment(encode_utf8("my js string")); // receive var payload; if (msg.getType() == solace.MessageType.BINARY) { try { payload = JSON.parse(decode_utf8(message.getBinaryAttachment())); // or just: var str = decode_utf8(message.getBinaryAttachment()); } catch(e) { subscriber.log(e); return; }
I think that should run. I just copied/pasted from some old examples I have, so hopefully this just works. Let us know!
EDIT: let me know if the JS Text/SDT approach works. I'll test it myself eventually if you don't get back to me. You might need to do that encode/decode thing for the TextMessage as well..?
1 -
Hi @Aaron,
Appreciate your detailed response. Below are my findings:
- Publishing using the SDTContainer with
SDTFieldType.STRING
is working for me, and I am able to receive the unicode string correctly as binary attachment at both golang and nodejs consumers (using factoryProfile version10_5). - The encode/decode approach may not work for all characters, as discussed in the blog comments (http://disq.us/p/fg74xj). Also, it would require the conversion to be done at both producer and consumer, and any external or non-JavaScript client which does not have the capability will not work.
Thanks
0 - Publishing using the SDTContainer with
-
An update:
In the nodejs --> solace --> nodejs flow, when publishing using SDTContainer and consuming from solace using getBinaryAttachment(), I found that it prefixes a stray inverted comma (') in the message. This may be a bug? @Aaron
On using
message.getSdtContainer().getValue()
the issue could be mitigated.Hence I had to include conditions in my consumer code to check for all 3 types of messages, for compatibility with diverse clients.
0 -
Yeah that's because you're serializing the message as an SDT (with extra bytes defining how big the SDT field is) and you're deserializing it as raw binary. You can see this in the
dump()
of Text messages, like here from JCSMP sample:Destination: Topic 'solace/samples/jcsmp/hello/aaron' Priority: 4 Class Of Service: USER_COS_1 DeliveryMode: DIRECT Message Id: 5 Binary Attachment: len=26 1c 1a 48 65 6c 6c 6f 20 57 6f 72 6c 64 20 66 72 ..Hello.World.fr 6f 6d 20 41 61 72 6f 6e 21 00 om.Aaron!.
Note
1c
1a
at beginning of text field. That's the SDT encoding of a "text message". If I was to just take a UTF-8 string and stick it as binary payload, it wouldn't have that.1 -
Hello hello! I am updating this thread..! 🎉 I have recently stumbled onto this particular issue again where a JavaScript publisher was sending a String as a raw binary attachment, and it contained the GBP symbol £. And it wasn't getting encoded properly into UTF-8, it was sending it as byte that doesn't exist in UTF-8. I've done some research and thought I'd post an update.
I started off using my approach above for converting to a UTF-8 string:
function encode_utf8(s) { return unescape(encodeURIComponent(s)); }
And it still works great, as expected. But did some research and turns out
escape()
/unescape()
have been deprecated for a LONG time. So while this approach still works, it is not current best practices.I found some other posts that talk about
TextEncoder
object, and it seems to work well. TextEncoder can either generate a new Uint8Array on eachencode()
invocation (for low performance apps), or you can predefine the array and reuse it withencodeInto(array)
to save on memory thrashing.However! I noticed that I could not get my subscriber to properly detect when I was sending a raw array in the binary attachment… it was always returning a type of String. So I checked the Docs, and noticed that for "older" versions of our JavaScript API, it always returns a Latin1 string. To fix this, all I had to do was update the factory profile to the newer 10.5: 👈🏼
var factoryProps = new solace.SolclientFactoryProperties(); factoryProps.profile = solace.SolclientFactoryProfiles.version10_5; solace.SolclientFactory.init(factoryProps
Then my subscriber's call to
getBinaryAttachment()
was returning an array as expected. Unexpectedly, I didn't even have to use theTextDecoder
on the other side, JavaScript just knew that it was a UTF-8 String!?Hopefully this helps anyone in the future stumbling onto this. The publisher should do something like:
const weirdText = "Hello World! £¥→ÐĞ🎅🏼🎉"; const encoder = new TextEncoder(); // probably best to make this global and reuse const u8array = encoder.encode(weirdText); message.setBinaryAttachment(u8array);
Then on the subscriber side, make sure you're using
SolclientFactoryProfiles.version10_5
and the String will pop out properly formatted as expected..! 🙌🏼If the factory profile version is left at 10, it looks like this:
[19:09:36] solace/js/test/topic: Hello World! £¥âÃÄð ð
0