Solace in Docker boot problems
Hello,
I have noticed a week ago that my Solace in Docker does not boot anymore. I suspect the problem is that I am using quite a fresh Linux kernel (5.11.15-zen - Arch Linux default). Here is the boot log:
Host Boot ID: 91c90137-fb8e-4af8-bbf4-179fb9048351 Starting VMR Docker Container: Tue Apr 20 11:06:02 UTC 2021 Setting umask to 077 SolOS Version: soltr_9.8.1.29 2021-04-20T11:06:03.376+00:00 <syslog.info> 59ef275173f2 rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="101" x-info="https://www.rsyslog.com"] start 2021-04-20T11:06:04.366+00:00 <local6.info> 59ef275173f2 appuser[99]: rsyslog startup 2021-04-20T11:06:05.389+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT INFO: Log redirection enabled, beginning playback of startup log buffer 2021-04-20T11:06:05.400+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT INFO: /usr/sw/var/soltr_9.8.1.29/db/dbBaseline does not exist, generating from confd template 2021-04-20T11:06:05.431+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT INFO: repairDatabase.py: no database to process 2021-04-20T11:06:05.450+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT INFO: Finished playback of log buffer 2021-04-20T11:06:05.471+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT INFO: Updating dbBaseline with dynamic instance metadata 2021-04-20T11:06:05.632+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT INFO: Generating SSH key ssh-keygen: generating new host keys: RSA1 RSA DSA ECDSA ED25519 2021-04-20T11:06:05.872+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT INFO: Starting solace process 2021-04-20T11:06:07.131+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT INFO: Launching solacedaemon: /usr/sw/loads/soltr_9.8.1.29/bin/solacedaemon --vmr -z -f /var/lib/solace/config/SolaceStartup.txt -r -1 2021-04-20T11:06:07.855+00:00 <local0.info> 59ef275173f2 appuser[185]: /usr/sw/loads/soltr_9.8.1.29/scripts/post:69 WARN Unable to read /sys/fs/cgroup/memory/memory.limit_in_bytes 2021-04-20T11:06:07.856+00:00 <local0.info> 59ef275173f2 appuser[185]: /usr/sw/loads/soltr_9.8.1.29/scripts/post:69 WARN Unable to read /sys/fs/cgroup/memory/memory.kmem.limit_in_bytes 2021-04-20T11:06:07.856+00:00 <local0.info> 59ef275173f2 appuser[185]: /usr/sw/loads/soltr_9.8.1.29/scripts/post:69 WARN Unable to read /sys/fs/cgroup/memory/memory.kmem.tcp.limit_in_bytes 2021-04-20T11:06:07.857+00:00 <local0.info> 59ef275173f2 appuser[185]: /usr/sw/loads/soltr_9.8.1.29/scripts/post:69 WARN Unable to read /sys/fs/cgroup/memory/memory.swappiness 2021-04-20T11:06:07.861+00:00 <local0.info> 59ef275173f2 appuser[185]: /usr/sw/loads/soltr_9.8.1.29/scripts/post:69 WARN Unable to read /sys/fs/cgroup/cpuset/cpuset.cpus Traceback (most recent call last): File "/usr/sw/loads/soltr_9.8.1.29/scripts/post", line 1020, in <module> results, fatal = performPlatformAudit(results) File "/usr/sw/loads/soltr_9.8.1.29/scripts/post", line 687, in performPlatformAudit ContainerCpuIds = ContainerCpuset.splitRanges() File "/usr/sw/loads/soltr_9.8.1.29/scripts/post", line 76, in splitRanges for rng in self.value.split(','): AttributeError: 'NoneType' object has no attribute 'split' 2021-04-20T11:06:07.873+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw main.cpp:1007 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Child terminated with failure status: command: '$SOLENV_ORIG_CURRENTLOAD_REALPATH/scripts/post -a -f /var/lib/solace/config/sol-platform-audit.json' PID: 185 status: 512 sigRxd: 0 2021-04-20T11:06:07.873+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw main.cpp:754 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Determining platform type: [ FAIL ] 2021-04-20T11:06:07.933+00:00 <local0.info> 59ef275173f2 appuser[196]: /usr/sw/loads/soltr_9.8.1.29/scripts/vmr-solredswitch:11 WARN Running vmr-solredswitch 2021-04-20T11:06:07.938+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw main.cpp:754 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Monitoring SolOS processes: [ OK ] 2021-04-20T11:06:07.943+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw Generated_commonReturnCodes.cpp:135 (BASE - 0x00000000) main(0)@solacedaemon WARN Unknown exit value 1, defaulting it to 'fail'. 2021-04-20T11:06:07.943+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw main.cpp:1094 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Child terminated with failure status: command: 'pkill -P $PPID dataplane-linux' PID: 198 rc: fail status: 256 sigRxd: 0 2021-04-20T11:06:07.948+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw main.cpp:3542 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Syncing filesystem before shutdown ... 2021-04-20T11:06:08.039+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw main.cpp:3547 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Shutting down router 2021-04-20T11:06:08.039+00:00 <local0.err> 59ef275173f2 appuser[1]: /usr/sw main.cpp:3526 (SOLDAEMON - 0x00000001) main(0)@solacedaemon ERROR ######## System shutdown complete (Version 9.8.1.29) ########
I suspect that the problem is my /sys/fs/cgroup directory content (it does not contain those files which solace tries to read). Am I missing something here?
Thanks, Pavel
Comments
-
Hi @Pavel Kryl,
You mentioned that it "does not boot anymore". Was it previously working for you on the same machine / kernel version or did this start to happen after an OS/kernel update?
0 -
Hi @Pavel Kryl ,
There are two version of cgroups (v1 and v2). PubSub+ supports cgroups v1 (as most major distros still support v1). I suspect that your new kernel does not support cgroups v1 and that is why those files are missing. Is it possible to enable cgroups v1 in your kernel? That would probably get you going until we are able to add support for cgroups v2 to PubSub+.
Best Regards,
Paul3 -
Hi @marc, well it was booting something like a month ago. I cannot however tell exactly which kernel version I was using (I think it was already 5.11 lineup, but I am not sure).
Hi @pkondrat I've changed by kernel parameters to enforce cgroups v1 (https://wiki.archlinux.org/index.php/cgroups). Problem was that it only worked for the LTS kernel (5.10 lineup), not zen kernel (stuck on boot). Do not know exactly why, however the suggested workaround helped. Thank you!
1 -
Excellent, thanks for the update @Pavel Kryl and thanks @pkondrat for the fix
0 -
Hey,
I stumbled upon the same problem when running the docker-compose file from this repo https://github.com/SolaceLabs/solace-single-docker-compose.
I am using Fedora 33, so obviously it is the cgroup v2 support.
So, I wonder @pkondrat, what are the plans to support that?
Thanks!
Ivan0 -
Hy @ivannov ! Did you attempt the workaround that was suggested by @Pavel Kryl to enforce cgroups v1 in your kernel?
0