Sunday, February 4, 2024

Behavior in PostgreSQL after "kill -9" on backend server process

 In PostgreSQL, server (backend) process exits abnormally could cause PostgreSQL restart, and impacts all existing connections, not just the database client served by the server process.

Check the existing 2 sessions with corresponding server process.

-- Session 1
mytest=# select pg_backend_pid();
 pg_backend_pid
----------------
          89176
(1 row)
-- Session 2
mytest=# select pg_backend_pid();
 pg_backend_pid
----------------
          89182
(1 row)

Terminate the process 89176 by kill -9.

With kill -9, the process ended abruptly.

myhost$ kill -9 89176

PostgreSQL error log contents

2024-02-04 19:28:45.573 +08 [59773] DEBUG:  server process (PID 89176) was terminated by signal 9: Killed: 9
2024-02-04 19:28:45.573 +08 [59773] DETAIL:  Failed process was running: select * from pg_stat_activity where pid=pg_backend_pid();
2024-02-04 19:28:45.573 +08 [59773] LOG:  server process (PID 89176) was terminated by signal 9: Killed: 9
2024-02-04 19:28:45.573 +08 [59773] DETAIL:  Failed process was running: select * from pg_stat_activity where pid=pg_backend_pid();
2024-02-04 19:28:45.573 +08 [59773] LOG:  terminating any other active server processes
2024-02-04 19:28:45.573 +08 [59773] DEBUG:  sending SIGQUIT to process 59780
2024-02-04 19:28:45.573 +08 [59773] DEBUG:  sending SIGQUIT to process 89182
2024-02-04 19:28:45.573 +08 [59773] DEBUG:  sending SIGQUIT to process 84277
2024-02-04 19:28:45.574 +08 [59773] DEBUG:  sending SIGQUIT to process 59776
2024-02-04 19:28:45.574 +08 [59773] DEBUG:  sending SIGQUIT to process 59775
2024-02-04 19:28:45.575 +08 [59773] DEBUG:  sending SIGQUIT to process 59778
2024-02-04 19:28:45.575 +08 [59773] DEBUG:  sending SIGQUIT to process 59779
2024-02-04 19:28:45.575 +08 [59773] DEBUG:  server process (PID 89182) exited with exit code 2
2024-02-04 19:28:45.575 +08 [59773] DETAIL:  Failed process was running: select * from pg_stat_activity where pid=pg_backend_pid();
2024-02-04 19:28:45.578 +08 [59773] DEBUG:  server process (PID 84277) exited with exit code 2
2024-02-04 19:28:45.578 +08 [59773] DETAIL:  Failed process was running: elect * from pg_database;

2024-02-04 19:28:45.578 +08 [59773] LOG:  all server processes terminated; reinitializing
2024-02-04 19:28:45.595 +08 [59773] DEBUG:  cleaning up orphaned dynamic shared memory with ID 1898246468
2024-02-04 19:28:45.595 +08 [59773] DEBUG:  cleaning up dynamic shared memory control segment with ID 3888812834
2024-02-04 19:28:45.608 +08 [59773] DEBUG:  dynamic shared memory system will support 674 segments
2024-02-04 19:28:45.608 +08 [59773] DEBUG:  created dynamic shared memory control segment 2163855276 (26976 bytes)
2024-02-04 19:28:45.612 +08 [89227] LOG:  database system was interrupted; last known up at 2024-02-04 12:00:25 +08
2024-02-04 19:28:45.612 +08 [89227] DEBUG:  removing all temporary WAL segments
2024-02-04 19:28:45.612 +08 [89228] DEBUG:  checkpointer updated shared memory configuration values
2024-02-04 19:28:46.094 +08 [89227] DEBUG:  checkpoint record is at 1/102B4200
2024-02-04 19:28:46.094 +08 [89227] DEBUG:  redo record is at 1/102B41C8; shutdown false
2024-02-04 19:28:46.094 +08 [89227] DEBUG:  next transaction ID: 1232; next OID: 25618
2024-02-04 19:28:46.094 +08 [89227] DEBUG:  next MultiXactId: 1; next MultiXactOffset: 0
2024-02-04 19:28:46.094 +08 [89227] DEBUG:  oldest unfrozen transaction ID: 722, in database 1
2024-02-04 19:28:46.094 +08 [89227] DEBUG:  oldest MultiXactId: 1, in database 5
2024-02-04 19:28:46.094 +08 [89227] DEBUG:  commit timestamp Xid oldest/newest: 0/0
2024-02-04 19:28:46.094 +08 [89227] LOG:  database system was not properly shut down; automatic recovery in progress
2024-02-04 19:28:46.094 +08 [89227] DEBUG:  transaction ID wrap limit is 2147484369, limited by database with OID 1
2024-02-04 19:28:46.094 +08 [89227] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 5
2024-02-04 19:28:46.097 +08 [89227] DEBUG:  starting up replication slots
2024-02-04 19:28:46.097 +08 [89227] DEBUG:  restoring replication slot from "pg_replslot/my_logical_slot/state"
2024-02-04 19:28:46.098 +08 [89227] DEBUG:  xmin required by slots: data 0, catalog 1232
2024-02-04 19:28:46.098 +08 [89227] DEBUG:  starting up replication origin progress state
2024-02-04 19:28:46.098 +08 [89227] DEBUG:  didn''t need to unlink permanent stats file "pg_stat/pgstat.stat" - didn''t exist
2024-02-04 19:28:46.099 +08 [89227] DEBUG:  resetting unlogged relations: cleanup 1 init 0
2024-02-04 19:28:46.102 +08 [59773] DEBUG:  postmaster received pmsignal signal
2024-02-04 19:28:46.103 +08 [89227] LOG:  redo starts at 1/102B41C8
2024-02-04 19:28:46.105 +08 [89227] LOG:  invalid record length at 1/102B6128: expected at least 24, got 0
2024-02-04 19:28:46.106 +08 [89227] LOG:  redo done at 1/102B60F0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-02-04 19:28:46.106 +08 [89227] DEBUG:  resetting unlogged relations: cleanup 0 init 1
2024-02-04 19:28:46.112 +08 [89227] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 5
2024-02-04 19:28:46.112 +08 [89227] DEBUG:  MultiXact member stop limit is now 4294914944 based on MultiXact 1
2024-02-04 19:28:46.113 +08 [89228] LOG:  checkpoint starting: end-of-recovery immediate wait
2024-02-04 19:28:46.113 +08 [89228] DEBUG:  performing replication slot checkpoint
2024-02-04 19:28:46.113 +08 [89228] DEBUG:  removing snapbuild snapshot pg_logical/snapshots/1-102B4190.snap
2024-02-04 19:28:46.113 +08 [89228] DEBUG:  removing snapbuild snapshot pg_logical/snapshots/1-102B41C8.snap
2024-02-04 19:28:46.117 +08 [89228] DEBUG:  checkpoint sync: number=1 file=base/16384/2619 time=0.218 ms
2024-02-04 19:28:46.117 +08 [89228] DEBUG:  checkpoint sync: number=2 file=pg_multixact/offsets/0000 time=0.070 ms
2024-02-04 19:28:46.117 +08 [89228] DEBUG:  checkpoint sync: number=3 file=base/16384/2619_fsm time=0.055 ms
2024-02-04 19:28:46.117 +08 [89228] DEBUG:  checkpoint sync: number=4 file=pg_xact/0000 time=0.054 ms
2024-02-04 19:28:46.118 +08 [89228] DEBUG:  attempting to remove WAL segments older than log file 00000000000000010000000F
2024-02-04 19:28:46.118 +08 [89228] LOG:  checkpoint complete: wrote 4 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.006 s; sync files=4, longest=0.001 s, average=0.001 s; distance=7 kB, estimate=7 kB; lsn=1/102B6128, redo lsn=1/102B6128
2024-02-04 19:28:46.123 +08 [59773] DEBUG:  starting background worker process "logical replication launcher"
2024-02-04 19:28:46.124 +08 [89231] DEBUG:  autovacuum launcher started
2024-02-04 19:28:46.124 +08 [59773] LOG:  database system is ready to accept connections

Status/error message for these 2 PostgreSQL sessions

-- Session 1, process been killed by kill -9
mytest=# select * from pg_stat_activity where pid=pg_backend_pid();

server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
mytest=#
-- Session 2
mytest=# select * from pg_stat_activity where pid=pg_backend_pid();
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
mytest=?#
mytest=?# \q


myhost$ psql
psql (16.1)
Type "help" for help.
mytest=#

No comments:

Post a Comment