2.10. Logging #
Shardman is a critical point in your infrastructure as it stores all of your data. This makes logging mandatory. So you should understand how logging works in Shardman. Due to the complexity of Shardman, it supports logging from several components: logs from the shardmand daemon that manages the cluster configuration and logs from PostgreSQL database instances.
2.10.1. PostgreSQL Logs #
Shardman uses standard PostgreSQL logging settings, described here. Logging settings should be placed to sdmspec.json
in the pgParameters
section, as shown in the example below:
{ "ShardSpec": { "pgParameters": { "log_line_prefix": "%m [%r][%p]", "log_min_messages": "INFO", "log_statement": "none", "log_destination": "stderr", "log_filename": "pg.log", "logging_collector": "on", "log_checkpoints": "false", ... }, ... }, ... }
By default, logs are placed in the directory like this: /var/lib/pgpro/sdm-14/data/keeper-cluster0-clover-1-shrn1-0/postgres/log
. In this example, cluster0
is the current cluster, clover-1-shrn1
is the name of the current shard, 0
is the identifier of the integrated keeper
process. To change the log directory, set the log_directory
parameter.
2.10.2. shardmand Logs #
shardmand is a systemd unit, its logs are written to journald. You can use journalctl
to examine it. For example, you can use the following command:
$
journalctl -u shardmand@cluster0.service
You can filter logs by arbitrary time limits using the --since
and --until
options, which restrict the entries displayed to those after or before the given time, respectively. The time values can come in a variety of formats. For absolute time values, you should use YYYY-MM-DD HH:MM:SS
. For instance, we can see all of the entries since January 10th, 2023 at 5:15 PM by typing:
$
journalctl -u shardmand@cluster0.service --since "2023-01-10 17:15:00"
If components of the above format are left off, some defaults will be applied. For instance, if the date is omitted, the current date will be assumed. If the time component is missing, “00:00:00” (midnight) will be substituted. The seconds field can be left off as well to default to “00”:
$
journalctl -u shardmand@cluster0.service --since "2023-01-10" --until "2023-01-11 03:00"
To control the log verbosity for all Shardman services, set SDM_LOG_LEVEL
in the shardmand configuration file.
2.10.3. Getting Information on Backend Crashes #
Some crashes are caused by the hardware failure or the DBMS issues. To understand the root causes of the crash, use crash_info
. To set it up, follow these steps:
Create a directory on each cluster node that the Shardman operating system user has access to (usually, it is
postgres
). Error reports will be sent to this directory.install -d -o postgres -g postgres -m 700 /var/lib/postgresql/crashinfo
Set the
crash_info_location
value.Note
This will cause the DBMS to restart.
shardmanctl --store-endpoints http://etcdserver:2379 set -y crash_info_location=/var/lib/postgresql/crashinfo
To make sure the changes are applied, send a signal that will cause the backend failure and a core dump creation, along with the instance restart.
Note
Do it in your test environment only.
Connect to your DBMS and find out PID of the backend associated with the current session:
postgres=# select pg_backend_pid(); pg_backend_pid ---------------- 23770
Then send the SIGSEGV signal to the process with the received PID:
kill -11 23770
This will result in this backend crash, and a log file with the time, backtrace and cause of an error will be written to /var/lib/postgresql/crashinfo
:
# Signal Program received signal: 11 (SIGSEGV) Signal UTC date time: 25.10.2024 08:37:02 # Program pid: 23770 ppid: 17506 program_invocation_name: postgres: postgres postgres 10.42.42.10(34202) idle program_invocation_short_name: tgres 10.42.42.10(34202) idle exe_path: /opt/pgpro/sdm-14/bin/postgres exe: postgres # Backtrace 1 postgres + 0x5b55c0 0x55c5ba8459b7 0x00007ffcbef19070 bt_crash_handler + 0x3f7 2 libc.so.6 + 0x4251f 0x7f01c2caa520 0x00007ffcbef19140 __sigaction + 0x50 unknown ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 3 libc.so.6 + 0x125f80 0x7f01c2d8df9a 0x00007ffcbef195b8 epoll_wait + 0x1a epoll_wait ../sysdeps/unix/sysv/linux/epoll_wait.c:30 4 postgres + 0x433870 0x55c5ba6c39bb 0x00007ffcbef195c0 WaitEventSetWait + 0x14b 5 postgres + 0x320de0 0x55c5ba5b0e74 0x00007ffcbef19650 secure_read + 0x94 6 postgres + 0x327d20 0x55c5ba5b7dae 0x00007ffcbef196a0 pq_recvbuf + 0x8e 7 postgres + 0x328980 0x55c5ba5b8995 0x00007ffcbef196c0 pq_getbyte + 0x15 8 postgres + 0x457da0 0x55c5ba6e909c 0x00007ffcbef196d0 PostgresMain + 0x12fc 9 postgres + 0x3ce210 0x55c5ba65ef86 0x00007ffcbef19a60 ServerLoop + 0xd76 10 postgres + 0x3cf240 0x55c5ba65fe18 0x00007ffcbef1a040 PostmasterMain + 0xbd8 11 postgres + 0x14ecc0 0x55c5ba3df182 0x00007ffcbef1a0c0 main + 0x4c2 12 libc.so.6 + 0x29d10 0x7f01c2c91d90 0x00007ffcbef1a0f0 __libc_init_first + 0x90 __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 13 libc.so.6 + 0x29dc0 0x7f01c2c91e40 0x00007ffcbef1a190 __libc_start_main + 0x80 call_init ../csu/libc-start.c:128 __libc_start_main_impl ../csu/libc-start.c:379 14 postgres + 0x14f200 0x55c5ba3df225 0x00007ffcbef1a1e0 _start + 0x25