Bug#953111: pacemaker: post-start got lost

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Bug#953111: pacemaker: post-start got lost

Lukas Straub
Package: pacemaker
Version: 2.0.3-3
Severity: normal

Dear Maintainer,
I have a master-slave resource which relies on the post-start notification to start replication
to the slave. This works well, but every now and then the post-start notification isn't sent.

Note the line
 Discarding attempt to perform action notify on colo_test in state S_INTEGRATION
below.

Mar 04 16:28:50 tele-clu-03 python[8351]: (colo_test) DEBUG: notify called: action: pre-start, master_uname: tele-clu-03, start_uname: tele-clu-02, stop_uname: tele-clu-01, shutdown_guest: False
Mar 04 16:28:50 tele-clu-03 pacemaker-controld[538]:  notice: Result of notify operation for colo_test on tele-clu-03: 0 (ok)
Mar 04 16:28:50 tele-clu-03 pacemaker-controld[538]:  notice: Initiating start operation colo_test_start_0 on tele-clu-02
Mar 04 16:28:54 tele-clu-03 corosync[481]:   [KNET  ] rx: host: 1 link: 0 is up
Mar 04 16:28:54 tele-clu-03 corosync[481]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 0)
Mar 04 16:28:54 tele-clu-03 corosync[481]:   [TOTEM ] A new membership (1.1cc0) was formed. Members joined: 1
Mar 04 16:28:54 tele-clu-03 corosync[481]:   [CPG   ] downlist left_list: 0 received
Mar 04 16:28:54 tele-clu-03 corosync[481]:   [CPG   ] downlist left_list: 0 received
Mar 04 16:28:54 tele-clu-03 corosync[481]:   [CPG   ] downlist left_list: 0 received
Mar 04 16:28:55 tele-clu-03 corosync[481]:   [QUORUM] Members[3]: 1 2 3
Mar 04 16:28:55 tele-clu-03 corosync[481]:   [MAIN  ] Completed service synchronization, ready to provide service.
Mar 04 16:28:55 tele-clu-03 pacemaker-controld[538]:  notice: Node tele-clu-01 state is now member
Mar 04 16:28:55 tele-clu-03 pacemakerd[532]:  notice: Node tele-clu-01 state is now member
Mar 04 16:28:55 tele-clu-03 pacemaker-based[533]:  notice: Node tele-clu-01 state is now member
Mar 04 16:28:55 tele-clu-03 pacemaker-attrd[536]:  notice: Node tele-clu-01 state is now member
Mar 04 16:28:55 tele-clu-03 pacemaker-attrd[536]:  notice: Setting #attrd-protocol[tele-clu-01]: (unset) -> 2
Mar 04 16:28:55 tele-clu-03 pacemaker-fenced[534]:  notice: Node tele-clu-01 state is now member
Mar 04 16:28:55 tele-clu-03 pacemaker-controld[538]:  notice: Transition 12 aborted: Peer Halt
Mar 04 16:28:55 tele-clu-03 pacemaker-controld[538]:  notice: Initiating notify operation colo_test_post_notify_start_0 locally on tele-clu-03
Mar 04 16:28:55 tele-clu-03 pacemaker-controld[538]:  notice: Discarding attempt to perform action notify on colo_test in state S_INTEGRATION (shutdown=false)
Mar 04 16:28:55 tele-clu-03 pacemaker-controld[538]:  notice: Initiating notify operation colo_test_post_notify_start_0 on tele-clu-02
Mar 04 16:28:55 tele-clu-03 pacemaker-controld[538]:  notice: Transition 12 (Complete=22, Pending=0, Fired=0, Skipped=1, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-warn-93.bz2): Stopped
Mar 04 16:28:57 tele-clu-03 pacemaker-schedulerd[537]:  notice: Watchdog will be used via SBD if fencing is required and stonith-watchdog-timeout is nonzero
Mar 04 16:28:57 tele-clu-03 pacemaker-schedulerd[537]:  notice: Calculated transition 13, saving inputs in /var/lib/pacemaker/pengine/pe-input-1569.bz2
Mar 04 16:28:57 tele-clu-03 pacemaker-controld[538]:  notice: Initiating monitor operation colo_test_monitor_0 on tele-clu-01
Mar 04 16:28:57 tele-clu-03 pacemaker-controld[538]:  notice: Initiating monitor operation colo_test_monitor_10000 on tele-clu-02
Mar 04 16:28:58 tele-clu-03 pacemaker-controld[538]:  notice: Initiating monitor operation colo_small_test_monitor_0 on tele-clu-01
Mar 04 16:28:58 tele-clu-03 pacemaker-controld[538]:  notice: Transition 13 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1569.bz2): Complete
Mar 04 16:28:58 tele-clu-03 pacemaker-controld[538]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Mar 04 16:29:08 tele-clu-03 pacemaker-controld[538]:  notice: High CPU load detected: 2.750000

-- System Information:
Debian Release: bullseye/sid
  APT prefers testing
  APT policy: (990, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 5.3.0-2-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages pacemaker depends on:
ii  corosync                   3.0.2-1+b1
ii  dbus                       1.12.16-2
ii  init-system-helpers        1.57
ii  libc6                      2.29-10
ii  libcfg7                    3.0.2-1+b1
ii  libcib27                   2.0.3-3
ii  libcmap4                   3.0.2-1+b1
ii  libcorosync-common4        3.0.2-1+b1
ii  libcrmcluster29            2.0.3-3
ii  libcrmcommon34             2.0.3-3
ii  libcrmservice28            2.0.3-3
ii  libglib2.0-0               2.62.5-1
ii  libgnutls30                3.6.12-2
ii  liblrmd28                  2.0.3-3
ii  libpacemaker1              2.0.3-3
ii  libpam0g                   1.3.1-5
ii  libpe-rules26              2.0.3-3
ii  libpe-status28             2.0.3-3
ii  libqb0                     1.0.5-1
ii  libstonithd26              2.0.3-3
ii  lsb-base                   11.1.0
ii  pacemaker-common           2.0.3-3
ii  pacemaker-resource-agents  2.0.3-3
ii  python3                    3.7.5-3

Versions of packages pacemaker recommends:
pn  fence-agents         <none>
ii  pacemaker-cli-utils  2.0.3-3

Versions of packages pacemaker suggests:
ii  cluster-glue  1.0.12-15
ii  crmsh         4.2.0-2

-- no debconf information