Python client reconnection failure
Hi,
I use Python package solace-pubsubplus v 1.2.0 to connect to Solace. My client has the following reconnection settings: reconnect retries = 20 and reconnect retry wait = 3000 ms. Here is my code for that:
messaging_service = MessagingService.builder().from_properties(broker_props) \ .with_reconnection_retry_strategy(RetryStrategy.parametrized_retry(20, 3000)) \ .build()
But each time when Solace has scheduled HA failover my client fails to reconnect to the server again. Below you can find error messages that I have from error handler (it just prints error reason and message). Even if I do have reconnection settings I don't see any attempts of reconnecting on client side after the failure. Any ideas how to resolve this issue?
2022-03-23 08:00:05,423 ERROR [error_handlers.py:on_reconnecting():48] - Error on_reconnecting 2022-03-23 08:00:05,424 ERROR [error_handlers.py:on_reconnecting():49] - Error cause: {'caller_description': 'From service event callback', 'return_code': 'Ok', 'sub_code': 'SOLCLIENT_SUBCODE_COMMUNICATION_ERROR', 'error_info_sub_code': 14, 'error_info_contents': 'TCP: Could not read from socket 13, error = Connection reset by peer (104)'} 2022-03-23 08:00:05,424 ERROR [error_handlers.py:on_reconnecting():50] - Message: TCP: Could not read from socket 13, error = Connection reset by peer (104) 2022-03-23 08:00:09,699 ERROR [error_handlers.py:on_reconnecting():48] - Error on_reconnecting 2022-03-23 08:00:09,699 ERROR [error_handlers.py:on_reconnecting():49] - Error cause: {'caller_description': 'From service event callback', 'return_code': 'Ok', 'sub_code': 'SOLCLIENT_SUBCODE_COMMUNICATION_ERROR', 'error_info_sub_code': 14, 'error_info_contents': 'TCP: Could not read from socket 14, error = Connection reset by peer (104)'} 2022-03-23 08:00:09,699 ERROR [error_handlers.py:on_reconnecting():50] - Message: TCP: Could not read from socket 14, error = Connection reset by peer (104) 2022-03-23 08:00:10,082 ERROR [error_handlers.py:on_service_interrupted():53] - Error on_service_interrupted 2022-03-23 08:00:10,082 ERROR [error_handlers.py:on_service_interrupted():54] - Error cause: {'caller_description': 'From service event callback', 'return_code': 'Unknown (503)', 'sub_code': 'SOLCLIENT_SUBCODE_SERVICE_UNAVAILABLE', 'error_info_sub_code': 115, 'error_info_contents': 'Service Unavailable'} 2022-03-23 08:00:10,082 ERROR [error_handlers.py:on_service_interrupted():55] - Message: Service Unavailable 2022-03-23 08:00:13,091 ERROR [error_handlers.py:on_service_interrupted():53] - Error on_service_interrupted 2022-03-23 08:00:13,091 ERROR [error_handlers.py:on_service_interrupted():54] - Error cause: {'caller_description': 'From service event callback', 'return_code': 'Unknown (503)', 'sub_code': 'SOLCLIENT_SUBCODE_SERVICE_UNAVAILABLE', 'error_info_sub_code': 115, 'error_info_contents': 'Service Unavailable'} 2022-03-23 08:00:13,091 ERROR [error_handlers.py:on_service_interrupted():55] - Message: Service Unavailable
Thanks,
Sergiy
Comments
-
Hi,
I also have a similar issue - every time we flip to HA our python client gets disconnected and fails to reconnect. We attempt to reconnect every 3 seconds for 5 minutes using a RetryStrategy.parameterized_retry, but to no avail.
The errors are:
2022-10-15 07:00:04.882 INFO: on_reconnecting. Error cause: {'caller_description': 'From service event callback', 'return_code': 'Ok', 'sub_code': 'SOLCLIENT_SUBCODE_COMMUNICATION_ERROR', 'error_info_sub_code': 14, 'error_info_contents': 'TCP: Could not read from socket 15, error = Connection reset by peer (104)'} Message: TCP: Could not read from socket 15, error = Connection reset by peer (104) 2022-10-15 07:00:04.908 INFO: on_reconnecting. Error cause: {'caller_description': 'From service event callback', 'return_code': 'Ok', 'sub_code': 'SOLCLIENT_SUBCODE_COMMUNICATION_ERROR', 'error_info_sub_code': 14, 'error_info_contents': 'TCP: Could not read from socket 13, error = Connection reset by peer (104)'} Message: TCP: Could not read from socket 13, error = Connection reset by peer (104) RETURNING losrttp {t1}losrttp|1665176523219|1665666481128|Peq1IUXQXpyyGxx9ItM4wnQe4uU= None 2022-10-15 07:05:08,409 [WARNING] solace.messaging.receiver: [_message_receiver.py:270] [[SERVICE: 0x7fb905cdab30] [RECEIVER: 0x7fb9064264a8]] Unable to Receive message. Messaging service disconnected. 2022-10-15 07:05:08.409 INFO: on_service_interrupted. Error cause: {'caller_description': 'From service event callback', 'return_code': 'Unknown (401)', 'sub_code': 'SOLCLIENT_SUBCODE_LOGIN_FAILURE', 'error_info_sub_code': 19, 'error_info_contents': 'Unauthorized'} Message: Unauthorized 2022-10-15 07:05:08.409 INFO: SolaceQueueReaderSource-id(140432642918048)thread(140434170173248) raised exception Unable to Receive message. Messaging service disconnected. 2022-10-15 07:05:08.409 INFO: _complete_run raised exception. Exception was Unable to Receive message. Messaging service disconnected.
Traceback (most recent call last):
snip snip File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/solace/messaging/receiver/_impl/_message_receiver.py", line 272, in _is_message_service_connected raise IllegalStateError(UNABLE_TO_RECEIVE_MESSAGE_MESSAGE_SERVICE_NOT_CONNECTED) solace.messaging.errors.pubsubplus_client_error.IllegalStateError: Unable to Receive message. Messaging service disconnected. 2022-10-15 07:05:08.710 INFO: on_service_interrupted. Error cause: {'caller_description': 'From service event callback', 'return_code': 'Unknown (401)', 'sub_code': 'SOLCLIENT_SUBCODE_LOGIN_FAILURE', 'error_info_sub_code': 19, 'error_info_contents': 'Unauthorized'} Message: Unauthorized
The code to build the connection is below.
from solace.messaging.config.retry_strategy import RetryStrategy from solace.messaging.messaging_service import MessagingService _KEEP_TRYING_TO_CONNECT_FOR_NO_MORE_THAN_MINUTES = 5 # 5 Minutes should be enough to ride out a badly timed reboot of the solace server. _INTERVAL_BETWEEN_CONNECTION_ATTEMPTS_MS = 3000 _MAX_CONNECTION_ATTEMPTS = int( _KEEP_TRYING_TO_CONNECT_FOR_NO_MORE_THAN_MINUTES * 60 / (_INTERVAL_BETWEEN_CONNECTION_ATTEMPTS_MS / 1000) ) # We try to reconnect for 5 minutes, every 3000 Milliseconds so max connection attempts is 5 * 60 / (3000/1000) = 100 _DEFAULT_RECONNECT_STRATEGY = RetryStrategy.parametrized_retry(_MAX_CONNECTION_ATTEMPTS, _INTERVAL_BETWEEN_CONNECTION_ATTEMPTS_MS) _DEFAULT_CONNECT_STRATEGY = RetryStrategy.parametrized_retry(_MAX_CONNECTION_ATTEMPTS, _INTERVAL_BETWEEN_CONNECTION_ATTEMPTS_MS) class SolaceMessagingService: def __init__(self, broker_props: Dict, service_event_handler=ServiceEventHandler(), connection_retry_strategy: Optional['RetryStrategy'] = None, reconnection_retry_strategy: Optional['RetryStrategy'] = None): # Note: The reconnections strategy could also be configured using the broker properties object self.messaging_service = MessagingService.builder().from_properties(broker_props) \ .with_reconnection_retry_strategy(reconnection_retry_strategy or _DEFAULT_RECONNECT_STRATEGY) \ .with_connection_retry_strategy(connection_retry_strategy or _DEFAULT_CONNECT_STRATEGY) \ .build()
It would be wonderful to hear back with any suggestions!
0 -
Hi,
In my case I resolved this issue using reconnection parameters 20 and 3000 and default connection strategy:
messaging_service = MessagingService.builder().from_properties(broker_props)..with_reconnection_retry_strategy(RetryStrategy.parametrized_retry(20, 3000)).build()
But this is what is recommended in the best practice guide:
4.4. High Availability Failover and Reconnect Retries
4.4 Reconnect duration should be set to at least 300 seconds when designing
applications for High Availability (HA) support.
All APIs
When using High Availability (HA) redundant Solace router setup, a failover from one Solace router to its
mate will typically occur within 30 seconds. However, an application should attempt to reconnect for at
least for five minutes. Below is an example of setting the reconnect duration to five minutes using the
following session property values:
• Connect retries = 1
• Reconnect retries = 5
• Reconnect retry wait = 3000ms
• Connect retries per host = 20
You can find all these properties here:
solace.messaging.config.solace_properties.transport_layer_properties
and configure them as part of broker_props.
1 -
Hey @sergkr21 and @AlexanderJHall thanks for raising this concern. We have alot of fixes coming up in v1.4.0 of the API so stay tuned. Some of the issues we saw was around hanging and deadlocks which might be different from your interruption issues, so can you please raise a ticket with support containing the details of your issue and sending to to support@solace.com so we can track it for the next release if it was not already fixed. Thanks for raising issues you face!
1 -
@sergkr21 Thanks for the suggestion, this worked:
messaging_service = ( MessagingService.builder() .from_properties(broker_props) .with_reconnection_retry_strategy(RetryStrategy.parametrized_retry(20, 3000)) .build()
For completeness, using thebroker_props
didn't work:"solace.messaging.transport.reconnect-retries": 10, # Number of reconnection attempts "solace.messaging.transport.reconnect-retry-wait-time": 5000, # Time to wait between reconnection attempts (in milliseconds) "solace.messaging.transport.reconnect-retry-wait-time-increment": 2000, # Incremental wait time between reconnection attempts (in milliseconds) "solace.messaging.transport.reconnect-retry-wait-time-max": 30000, # Maximum wait time between reconnection attempts (in milliseconds)
Hi @Tamimi,Here are the packages in use if it's any help.
Thanks.pip freeze |grep solace
solace==0.1.22
solace-pubsubplus==1.9.01