Solace in Docker boot problems

Pavel Kryl
Pavel Kryl Member Posts: 2
edited April 2021 in General Discussions #1

Hello,

I have noticed a week ago that my Solace in Docker does not boot anymore. I suspect the problem is that I am using quite a fresh Linux kernel (5.11.15-zen - Arch Linux default). Here is the boot log:

Host Boot ID: 91c90137-fb8e-4af8-bbf4-179fb9048351
Starting VMR Docker Container: Tue Apr 20 11:06:02 UTC 2021
Setting umask to 077
SolOS Version: soltr_9.8.1.29
2021-04-20T11:06:03.376+00:00 <syslog.info> 59ef275173f2 rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="101" x-info="https://www.rsyslog.com"] start
2021-04-20T11:06:04.366+00:00 <local6.info> 59ef275173f2 appuser[99]: rsyslog startup
2021-04-20T11:06:05.389+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT  INFO: Log redirection enabled, beginning playback of startup log buffer
2021-04-20T11:06:05.400+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT  INFO: /usr/sw/var/soltr_9.8.1.29/db/dbBaseline does not exist, generating from confd template
2021-04-20T11:06:05.431+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT  INFO: repairDatabase.py: no database to process
2021-04-20T11:06:05.450+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT  INFO: Finished playback of log buffer
2021-04-20T11:06:05.471+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT  INFO: Updating dbBaseline with dynamic instance metadata
2021-04-20T11:06:05.632+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT  INFO: Generating SSH key
ssh-keygen: generating new host keys: RSA1 RSA DSA ECDSA ED25519 
2021-04-20T11:06:05.872+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT  INFO: Starting solace process
2021-04-20T11:06:07.131+00:00 <local0.info> 59ef275173f2 appuser: EXTERN_SCRIPT  INFO: Launching solacedaemon: /usr/sw/loads/soltr_9.8.1.29/bin/solacedaemon --vmr -z -f /var/lib/solace/config/SolaceStartup.txt -r -1
2021-04-20T11:06:07.855+00:00 <local0.info> 59ef275173f2 appuser[185]: /usr/sw/loads/soltr_9.8.1.29/scripts/post:69    WARN   Unable to read /sys/fs/cgroup/memory/memory.limit_in_bytes
2021-04-20T11:06:07.856+00:00 <local0.info> 59ef275173f2 appuser[185]: /usr/sw/loads/soltr_9.8.1.29/scripts/post:69    WARN   Unable to read /sys/fs/cgroup/memory/memory.kmem.limit_in_bytes
2021-04-20T11:06:07.856+00:00 <local0.info> 59ef275173f2 appuser[185]: /usr/sw/loads/soltr_9.8.1.29/scripts/post:69    WARN   Unable to read /sys/fs/cgroup/memory/memory.kmem.tcp.limit_in_bytes
2021-04-20T11:06:07.857+00:00 <local0.info> 59ef275173f2 appuser[185]: /usr/sw/loads/soltr_9.8.1.29/scripts/post:69    WARN   Unable to read /sys/fs/cgroup/memory/memory.swappiness
2021-04-20T11:06:07.861+00:00 <local0.info> 59ef275173f2 appuser[185]: /usr/sw/loads/soltr_9.8.1.29/scripts/post:69    WARN   Unable to read /sys/fs/cgroup/cpuset/cpuset.cpus
Traceback (most recent call last):
  File "/usr/sw/loads/soltr_9.8.1.29/scripts/post", line 1020, in <module>
    results, fatal = performPlatformAudit(results)
  File "/usr/sw/loads/soltr_9.8.1.29/scripts/post", line 687, in performPlatformAudit
    ContainerCpuIds = ContainerCpuset.splitRanges()
  File "/usr/sw/loads/soltr_9.8.1.29/scripts/post", line 76, in splitRanges
    for rng in self.value.split(','):
AttributeError: 'NoneType' object has no attribute 'split'
2021-04-20T11:06:07.873+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw                        main.cpp:1007                         (SOLDAEMON    - 0x00000000) main(0)@solacedaemon                          WARN     Child terminated with failure status: command: '$SOLENV_ORIG_CURRENTLOAD_REALPATH/scripts/post -a -f /var/lib/solace/config/sol-platform-audit.json' PID: 185 status: 512 sigRxd: 0
2021-04-20T11:06:07.873+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw                        main.cpp:754                          (SOLDAEMON    - 0x00000000) main(0)@solacedaemon                          WARN     Determining platform type: [ FAIL ]
2021-04-20T11:06:07.933+00:00 <local0.info> 59ef275173f2 appuser[196]: /usr/sw/loads/soltr_9.8.1.29/scripts/vmr-solredswitch:11    WARN   Running vmr-solredswitch
2021-04-20T11:06:07.938+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw                        main.cpp:754                          (SOLDAEMON    - 0x00000000) main(0)@solacedaemon                          WARN     Monitoring SolOS processes: [  OK  ]
2021-04-20T11:06:07.943+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw                        Generated_commonReturnCodes.cpp:135   (BASE         - 0x00000000) main(0)@solacedaemon                          WARN     Unknown exit value 1, defaulting it to 'fail'.
2021-04-20T11:06:07.943+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw                        main.cpp:1094                         (SOLDAEMON    - 0x00000000) main(0)@solacedaemon                          WARN     Child terminated with failure status: command: 'pkill -P $PPID dataplane-linux' PID: 198 rc: fail status: 256 sigRxd: 0
2021-04-20T11:06:07.948+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw                        main.cpp:3542                         (SOLDAEMON    - 0x00000000) main(0)@solacedaemon                          WARN     Syncing filesystem before shutdown ...
2021-04-20T11:06:08.039+00:00 <local0.warning> 59ef275173f2 appuser[1]: /usr/sw                        main.cpp:3547                         (SOLDAEMON    - 0x00000000) main(0)@solacedaemon                          WARN     Shutting down router
2021-04-20T11:06:08.039+00:00 <local0.err> 59ef275173f2 appuser[1]: /usr/sw                        main.cpp:3526                         (SOLDAEMON    - 0x00000001) main(0)@solacedaemon                          ERROR    ######## System shutdown complete (Version 9.8.1.29) ########

I suspect that the problem is my /sys/fs/cgroup directory content (it does not contain those files which solace tries to read). Am I missing something here?

Thanks, Pavel

Comments

  • marc
    marc Member, Administrator, Moderator, Employee Posts: 914 admin

    Hi @Pavel Kryl,

    You mentioned that it "does not boot anymore". Was it previously working for you on the same machine / kernel version or did this start to happen after an OS/kernel update?

  • pkondrat
    pkondrat Member, Employee Posts: 24 Solace Employee

    Hi @Pavel Kryl ,

    There are two version of cgroups (v1 and v2). PubSub+ supports cgroups v1 (as most major distros still support v1). I suspect that your new kernel does not support cgroups v1 and that is why those files are missing. Is it possible to enable cgroups v1 in your kernel? That would probably get you going until we are able to add support for cgroups v2 to PubSub+.

    Best Regards,
    Paul

  • Pavel Kryl
    Pavel Kryl Member Posts: 2

    Hi @marc, well it was booting something like a month ago. I cannot however tell exactly which kernel version I was using (I think it was already 5.11 lineup, but I am not sure).

    Hi @pkondrat I've changed by kernel parameters to enforce cgroups v1 (https://wiki.archlinux.org/index.php/cgroups). Problem was that it only worked for the LTS kernel (5.10 lineup), not zen kernel (stuck on boot). Do not know exactly why, however the suggested workaround helped. Thank you!

  • marc
    marc Member, Administrator, Moderator, Employee Posts: 914 admin

    Excellent, thanks for the update @Pavel Kryl and thanks @pkondrat for the fix

  • ivannov
    ivannov Member Posts: 1

    Hey,

    I stumbled upon the same problem when running the docker-compose file from this repo https://github.com/SolaceLabs/solace-single-docker-compose.

    I am using Fedora 33, so obviously it is the cgroup v2 support.

    So, I wonder @pkondrat, what are the plans to support that?

    Thanks!
    Ivan

  • Tamimi
    Tamimi Member, Administrator, Employee Posts: 491 admin

    Hy @ivannov ! Did you attempt the workaround that was suggested by @Pavel Kryl to enforce cgroups v1 in your kernel?

  • pkondrat
    pkondrat Member, Employee Posts: 24 Solace Employee

    Hi @ivannov

    We are working on support for cgroups v2. I am hoping we will release that this fall.

    Best Regards,
    Paul

  • olly
    olly Member Posts: 4

    Any news on cgroups v2 support? It is becoming more common on Linux and Docker for Desktop uses it by default on the Mac also now. Cgroups 2 has been around in some form or other since 2016

  • olly
    olly Member Posts: 4

    I can confirm that cgroups 2 actually work with version 9.12.0.15, which is good news

  • marc
    marc Member, Administrator, Moderator, Employee Posts: 914 admin

    Thanks for confirming @olly 🙏