pgbadger
pgbadger — rapidly analyze Postgres Pro logs, producing detailed reports and graphs
Synopsis
pgbadger
[connection-option
...] [option
...] [logfile
...]
Description
pgbadger is a Postgres Pro/PostgreSQL log analyzer, which rapidly provides detailed reports based on your log files. pgbadger is provided with Postgres Pro as a standalone Perl script.
logfile
can be a single log file, a list of files or a shell command that returns a list of files. To get log content from the standard input, pass “-” as logfile
.
pgbadger can parse huge log files and compressed files. It can autodetect your log file format (syslog, stderr, csvlog or jsonlog) if the file is long enough. Supported compressed formats are gzip, bzip2, lz4, xz, zip and zstd. For the xz format, you must have an xz version higher than 5.05, which supports the --robot
option. For pgbadger to determine the uncompressed file size for the lz4 format, the file must be compressed with the --content-size
option.
pgbadger supports any format of log-line prefixes that can be specified through the log_line_prefix configuration setting of your postgresql.conf
configuration file, provided that at least %t
and %p
are specified.
pgbouncer log files can also be parsed.
To speed up log parsing, you can use any of these multiprocessing modes: one core per log file and multiple cores per file. These modes can be combined.
pgbadger can also parse remote log files fetched using a passwordless SSH connection. This mode can be used with compressed files and even supports multiprocessing with multiple cores per file.
Examples of reports can be found at https://pgbadger.darold.net/#reports.
Limitations
pgbadger currently has the following limitations:
Multiprocessing is not supported for compressed log files and CSV files, as well as on Windows.
CSV format of log files cannot be parsed remotely.
csvlog logs cannot be passed from the standard input.
Setup and Configuring
Once you have pgbadger installed, complete the setup described in sections below.
[Optional] Set up Parsing Specific Log Formats
If you plan to parse CSV log files, install Text::CSV_XS
Perl module.
[Optional] Set up Export of Statistics
If you want to export statistics as a JSON file, install JSON::XS
Perl module:
To install this optional module:
On a Debian-based system, run:
sudo apt-get install libjson-xs-perl
On an RPM system, run:
sudo yum install perl-JSON-XS
[Optional] Set up Parsing Compressed Log Files
By default, pgbadger autodetects the compressed log file format from the file extension and uses decompression utilities accordingly:
zcat for
gz
bzcat for
bz2
lz4cat for
lz4
zstdcat for
zst
unzip or xz for
zip
orxz
If the needed utility is outside of your PATH
directories, use the --zcat
command-line option to specify the path to the decompression utility. For example:
--zcat="/usr/local/bin/gunzip -c" or --zcat="/usr/local/bin/bzip2 -dc" --zcat="C:\tools\unzip -p"
Note
With the default autodetection of the compressed file format, you can mix gz, bz2, lz4, xz, zip and zstd log files. Once you specified a custom value of --zcat
, mixing compressed files of different formats is longer possible.
Configure Your Postgres Pro Server
Set the values of certain configuration parameters in your postgresql.conf
:
Set up logging SQL queries.
To enable SQL query logging and have the query statistics include actual query strings, set
log_min_duration_statement = 0
On a busy server you may want to increase this value to only log queries with a longer duration.
If you just want to report the duration and number of queries and do not want details of queries, set log_min_duration_statement to -1, which disables logging statement durations, and enable log_duration.
Enabling
log_min_duration_statement
will add reports about slowest queries and queries that took the most time. Note that if you set log_statement toall
, the setting oflog_min_duration_statement
will have no effect.Warning
Avoid setting
log_min_duration_statement
to a non-positive value together with enablinglog_duration
andlog_statement
as this will result in wrong counter values and drastically increase the size of your log. Always prefer settinglog_min_duration_statement
.Set the log line prefix string in log_line_prefix.
It must at least include a time escape sequence (
%t
,%m
or%n
) and the process-related escape sequence (%p
or%c
). For example, for stderr logs, the setting must be at leastlog_line_prefix = '%t [%p]: '
The log line prefix could also specify the user, database name, application name and client IP address. For example,
for stderr logs:
log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h '
or
log_line_prefix = '%t [%p]: db=%d,user=%u,app=%a,client=%h '
and for syslog logs:
log_line_prefix = 'user=%u,db=%d,app=%a,client=%h '
or
log_line_prefix = 'db=%d,user=%u,app=%a,client=%h '
To get more information from your log files, set certain configuration parameters as follows:
log_checkpoints = on log_connections = on log_disconnections = on log_lock_waits = on log_temp_files = 0 log_autovacuum_min_duration = 0 log_error_verbosity = default
To benefit from these settings, do not enable log_statement as pgbadger does not parse the corresponding log format.
Set the language in which messages are displayed; messages must be in English with or without locale support:
lc_messages='C'
or
lc_messages='en_US.UTF-8'
Locales for other languages, such as
ru_RU.utf8
, are not supported.
Usage
The following are simple examples to illustrate miscellaneous pgbadger usage details.
pgbadger /var/lib/pgpro/std-13/data/log/postgresql-2022-01-14_000000.log pgbadger /var/lib/pgpro/std-13/data/log/postgres.log.2.gz /var/lib/pgpro/std-13/data/log/postgres.log.1.gz /var/lib/pgpro/std-13/data/log/postgresql-2022-01-14_000000.log pgbadger /var/lib/pgpro/std-13/data/log/postgresql/postgresql-2022-01-* pgbadger --exclude-query="^(COPY|COMMIT)" /var/lib/pgpro/std-13/data/log/postgresql-2022-01-14_000000.log pgbadger -b "2022-01-25 10:56:11" -e "2022-01-25 10:59:11" /var/lib/pgpro/std-13/data/log/postgresql-2022-01-25-0000.log cat /var/lib/pgpro/std-13/data/log/postgresql-2022-01-14_000000.log | pgbadger - # Log line prefix with stderr log output pgbadger --prefix '%t [%p]: user=%u,db=%d,client=%h' /var/lib/pgpro/std-13/data/log/postgresql-2022-08-21* pgbadger --prefix '%m %u@%d %p %r %a : ' /var/lib/pgpro/std-13/data/log/postgresql-2022-08-21-0000.log # Log line prefix with syslog log output pgbadger --prefix 'user=%u,db=%d,client=%h,appname=%a' /var/lib/pgpro/std-13/data/log/postgresql-2022-08-21* # Use 8 CPUs to parse 10GB file faster pgbadger -j 8 /var/lib/pgpro/std-13/data/log/postgresql-2022-08-21-0000.log # Use a cron job to report errors weekly 30 23 * * 1 /usr/bin/pgbadger -q -w /var/lib/pgpro/std-13/data/log/postgresql-2022-01*.log -o /var/www/pg_reports/pg_errors.html
Specifying Remote Log Files
Specify remote log files to parse using a URI. Supported protocols are HTTP[S] and [S]FTP. The curl
command will be used to download the file, and the file will be parsed during download. The SSH protocol is also supported and will use the ssh
command to get log files, as when the --remote-host
option is used.
Use these URI notations for the remote log file:
pgbadger http://172.12.110.1//var/lib/pgpro/std-13/data/log/postgresql-2022-01-14_000000.log pgbadger ftp://username@172.12.110.14/postgresql-2022-01-14_000000.log pgbadger ssh://username@172.12.110.14:2222//var/lib/pgpro/std-13/data/log/postgresql-2022-01-14_000000.log*
You can parse a local Postgres Pro log and a remote pgbouncer log file together:
pgbadger /var/lib/pgpro/std-13/data/log/postgresql-2022-01-14_000000.log ssh://username@172.12.110.14/pgbouncer.log
Parallel Processing
To enable parallel processing, specify the -j
option, where N
N
is the number of cores to use.
Parallel processing in pgbadger follows the algorithm below:
For each log file chunk size = int(file size / N) look at start/end offsets of these chunks fork N processes and seek to the start offset of each chunk each process will terminate when the parser reaches the end offset of its chunk each process writes stats into a binary temporary file wait for all child processes to terminate All binary temporary files generated will then be read and loaded into memory to build the html output.
With this method, at start/end of chunks pgbadger may truncate or omit a maximum of N
queries per log file, which is an insignificant gap if you have millions of queries in your log file. The chance that the query that you were looking for is lost is near zero, so this gap can be considered suitable. Most of the time the query is counted twice, but truncated.
When you have many small log files and many CPUs, it is faster to dedicate one core to one log file at a time. To enable this behavior, specify the -J
option instead. Using this method, you can be sure not to lose any queries in the reports. With 200 log files of 10 MB each, the N
-J
option starts being really efficient with 8 cores.
Here is a benchmark done on a server with 8 CPUs and a single file of 9.5 GB.
Option | 1 CPU | 2 CPU | 4 CPU | 8 CPU --------+---------+-------+-------+------ -j | 1h41m18 | 50m25 | 25m39 | 15m58 -J | 1h41m18 | 54m28 | 41m16 | 34m45
With 200 log files of 10 MB each, which is 2 GB in total, the results are slightly different:
Option | 1 CPU | 2 CPU | 4 CPU | 8 CPU --------+-------+-------+-------+------ -j | 20m15 | 9m56 | 5m20 | 4m20 -J | 20m15 | 9m49 | 5m00 | 2m40
So it is recommended to use the -j
option unless you have hundreds of small log files and can use at least 8 CPUs.
Important
During parallel parsing, pgbadger generates a lot of temporary files named tmp_pgbadgerXXXX.bin
in the /tmp
directory and removes them at the end.
Building Incremental Reports
The following sample cron job builds a report every week with the incremental behavior assuming that your log file and HTML report are also rotated every week:
0 4 * * 1 /usr/bin/pgbadger -q `find /var/lib/pgpro/std-13/data/log/ -mtime -7 -name "postgresql.log*"` -o /var/www/pg_reports/pg_errors-`date +\%F`.html -l /var/reports/pgbadger_incremental_file.dat
But better turn on pgbadger's automatic building incremental reports by specifyng the -I
/--incremental
option. In this mode, pgbadger builds one report per day and a cumulative report per week. The output is first built in the binary format and saved to the output directory, specified by the -O
/--outdir
option, and then daily and weekly reports are built in the HTML format with the main index file. The main index file shows a dropdown menu per week with a link to the week's report and links to daily reports for that week. For example, run pgbadger as follows with the file rotated daily:
0 4 * * * /usr/bin/pgbadger -I -q /var/lib/pgpro/std-13/data/log/postgresql/postgresql.log.1 -O /var/www/pg_reports/
You will have all daily and weekly reports. In this mode pgbadger will automatically create an incremental file in the output directory, so you do not have to use the -l
option unless you want to change the path of that file. This means that you can run pgbadger in this mode every day on a log file rotated every week and it will not count the log entries twice. To save disk space, you may want to use the -X
/--extra-files
command-line option to force pgbadger to write CSS and JavaScript files to the output directory as separate files. The resources will then be loaded using script and link tags.
In the incremental mode, you can also specify the number of weeks to keep in the report by using the -O
/--retention
option:
/usr/bin/pgbadger --retention 2 -I -q /var/lib/pgpro/std-13/data/log/postgresql/postgresql.log.1 -O /var/www/pg_reports/
If pg_dump is scheduled to run at 23:00 and 13:00 every day, you can exclude these periods from the report as follows:
pgbadger --exclude-time "2013-09-.* (23|13):.*" postgresql.log
This will help avoid having COPY
statements generated by pg_dump on top of the list of slowest queries. Alternatively, you can use --exclude-appname "pg_dump"
to solve this problem in a simpler way.
Rebuilding Reports
To update all HTML reports after fixing a pgbadger report or adding a new feature to it, you can rebuild incremental reports. To rebuild all reports in the case a binary file is still available, run:
rm /path/to/reports/*.js rm /path/to/reports/*.css pgbadger -X -I -O /var/www/pg_reports/ --rebuild
This will also update all the resource files (JavaScript and CSS). Use the -E
/--explode
option if the reports were built with this option.
Building Monthly Reports
By default, in the incremental mode pgbadger only computes daily and weekly reports. To have monthly cumulative reports, you will need to use a separate command to specify the report to build. For example, to build a report for August 2021, run:
pgbadger -X --month-report 2021-08 /var/www/pg_reports/
This will add a link to the month name to the calendar view of incremental reports to look at the monthly report. The report for a current month can be run every day, and it is entirely rebuilt each time. The monthly report is not built by default because it could take too long. Like when rebuilding reports, if reports were built with the per-database option (-E
/--explode
), it must be used to build the monthly report:
pgbadger -E -X --month-report 2021-08 /var/www/pg_reports/
Choosing the Report File Format
The pgbadger report file format is determined by the extension of the file passed to the -o
/--outfile
option.
Use the binary format (-o
*.bin
) to create custom incremental and cumulative reports.
For example, to refresh a pgbadger report every hour from a daily log file, you can run the following commands every hour:
# Generate incremental data files in the binary format pgbadger --last-parsed .pgbadger_last_state_file -o sunday/hourX.bin /var/lib/pgpro/std-13/data/log/postgresql-Sun.log # Build a fresh HTML report from the generated binary file pgbadger sunday/*.bin
For another example, assume that you generate one log file per hour. To have reports rebuilt each time the log file is rotated, run:
pgbadger -o day1/hour01.bin /var/lib/pgpro/std-13/data/log/postgresql-2022-01-23_10.log pgbadger -o day1/hour02.bin /var/lib/pgpro/std-13/data/log/postgresql-2022-01-23_11.log pgbadger -o day1/hour03.bin /var/lib/pgpro/std-13/data/log/postgresql-2022-01-23_12.log ...
And to refresh the HTML report, for example, each time after a new binary file is generated, just run:
pgbadger -o day1_report.html day1/*.bin
Adjust the commands to your particular needs.
Use the JSON format (-o
*.json
) to share data with other languages and to facilitate integration of pgbadger output with other monitoring tools, such as Cacti or Graphite.
Select other output formats to meet your particular needs. For example, this command will generate Tsung sessions XML file for SELECT
queries only:
pgbadger -S -o sessions.tsung --prefix '%t [%p]: user=%u,db=%d ' /var/lib/pgpro/std-13/data/log/postgresql-2022-01-14_000000.log
Options
This section describes pgbadger command-line options.
-a
minutes
--average
minutes
Specifies the number of minutes for which to build average graphs of queries and connections.
Default: 5.
-A
minutes
--histo-average
minutes
Specifies the number of minutes for which to build histogram graphs of queries.
Default: 60.
-b
datetime
--begin
datetime
Specifies the start date/time for the data to be parsed in logs.
-c
host
--dbclient
host
Only report on log entries for the specified client host.
-C
--nocomment
Remove /* ... */ comments from queries.
-d
name
--dbname
name
Only report on log entries for the specified database.
-D
--dns-resolv
Replace client IP addresses with their DNS names.
Warning
This can considerably slow down pgbadger.
-e
datetime
--end
datetime
Specifies the end date/time for the data to be parsed in logs.
-E
--explode
Build one report per each database. Global information not related to any database gets added to the
postgres
database report.-f
logtype
--format
logtype
Specifies the log type.
Possible values:
syslog
,syslog2
,stderr
,jsonlog
,csv
,pgbouncer
,logplex
,rds
andredshift
. Use when pgbadger cannot detect the log format.-G
--nograph
Disables graphs in HTML output.
-h
--help
Show detailed information about pbadger options and exit.
-H
path
--html-outdir
path
Specifies the path to the directory where the HTML report must be written in the incremental mode. Note that binary files remain in the directory specified by
-O
/--outdir
.-i
name
--ident
name
Specifies the program name used to identify Postgres Pro messages in syslog logs.
Default:
postgres
.-I
--incremental
Use the incremental mode, where reports will be generated by days in a separate directory specified by the
-O
/--outdir
option.-j
number
--jobs
number
Specifies the number of jobs to run at same time. When working with csvlog logs, pgbadger always runs as single job.
Default: 1.
-J
number
--Jobs
number
Specifies the number of log files to parse in parallel. When working with csvlog logs, one log at a time is processed.
Default: 1.
-l
filename
--last-parsed
filename
Specifies the file where the last datetime and line parsed are registered to allow incremental log parsing. Useful to watch errors since the last run or to get one report per day with the log rotated weekly.
-L
filename
--logfile-list
filename
Specifies the file containing the list of log files to parse.
-m
size
--maxlength
size
Specifies the maximum length of a query in reports. Longer queries will be truncated.
Default: 100000.
-M
--no-multiline
Turns off collecting multiline statements to avoid reporting excessive information, especially on errors that generate a huge report.
-N
name
--appname
name
Only report on log entries for the specified application.
-o
filename
--outfile
filename
Specifies the filename for the output and determines the report file format. Can be used multiple times to output several formats. For the
json
output, ensure that the Perl moduleJSON::XS
is installed. To dump the output to stdout, use the value of “-” as filename.Default:
out.html
,out.txt
,out.bin
,out.json
orout.tsung
,for the respective output format.
-O
path
--outdir
path
Specifies the directory where the out file must be saved.
-p
string
--prefix
string
Specifies the value of your custom log_line_prefix string, as defined in your
postgresql.conf
. Only use if your log line prefix is different from the standardlog_line_prefix
strings, for example, if your prefix includes additional variables, such as client IP or application name.-P
--no-prettify
Disables SQL query code prettifier.
-q
--quiet
Disables printing anything to stdout, even the progress bar.
-Q
--query-numbering
Add numbering of queries to the output when used together with
--dump-all-queries
or--normalized-only
.-r
address
--remote-host
address
Specifies the host to execute the
cat
command on a remote log file to parse the file locally.-R
number
--retention
number
Specifies the number of weeks for which to keep reports in the output directory in the incremental mode. The directories for older weeks and days are automatically removed.
Default: 0 (no reports are removed).
-s
number
--sample
number
Specifies the number of query samples to store.
Default: 3.
-S
--select-only
Only report on SELECT queries.
-t
number
--top
number
Specifies the number of queries to store/display.
Default: 20.
-T
string
--title
string
Specifies the title of the HTML report page.
-u
username
--dbuser
username
Only report on log entries for the specified username.
-U
username
--exclude-user
username
Specifies the username to exclude log entries for from the report. Can be used multiple times.
-v
--verbose
Enables the verbose or debug mode.
Default: off.
-V
--version
Show pgbadger version and exit.
-w
--watch-mode
Only report errors, just like Logwatch can do.
-W
--wide-char
Encode HTML output of queries in UTF8 to avoid Perl messages “Wide character in print”.
-x
format
--extension
format
Specfies the output format. Possible values:
text
,html
,bin
,json
ortsung
.Default:
html
.-X
--extra-files
In the incremental mode, write CSS and JavaScript resources to the output directory as separate files.
-z
command
--zcat
command
Specifies the full command to run the zcat program. Use if zcat, bzcat or unzip is outside of your
PATH
directories.-Z
+/-XX
--timezone
+/-XX
Specifies the number of hours from GMT for the timezone. Use to adjust date/time in JavaScript graphs.
--pie-limit
number
Specifies the number such that instead of lower pie data the sum will be shown.
--exclude-query
regexp
Specifies the regular expression such that matching queries will be excluded from the report. For example: “^(VACUUM|COMMIT)”. Can be used multiple times.
--exclude-file
filename
Specifies the path to the file that contains regular expressions to use to exclude matching queries from the report, one expression per line.
--include-query
regexp
Specifies the regular expression such that only matching queries will be included in the report. Can be used multiple times. For example: “(tbl1|tbl2)”.
--include-file
filename
Specifies the path to the file that contains regular expressions to use to include matching queries in the report, one expression per line.
--disable-error
Turns off generation of an error report.
--disable-hourly
Turns off generation of an hourly report.
--disable-type
Turns off generation of a report on queries by type, database and user.
--disable-query
Turns off generation of query reports, such as slowest or most frequent queries, queries by users, by database and so on.
--disable-session
Turns off generation of a session report.
--disable-connection
Turns off generation of a connection report.
--disable-lock
Turns off generation of a lock report.
--disable-temporary
Turns off generation of a report on temporary files.
--disable-checkpoint
Turns off generation of checkpoint/restartpoint reports.
--disable-autovacuum
Turns off generation of an autovacuum report.
--charset
name
Specifies the HTML charset to be used.
Default: utf-8.
--csv-separator
char
Specifies the CSV field separator.
Default: “,”.
--exclude-time
regexp
Specifies the regular expression such that log entries for any matching timestamp will be excluded from the report. For example: “2013-04-12 .*”. Can be used multiple times.
--include-time
regexp
Specifies the regular expression such that log entries for any matching timestamp will be included in the report. For example: “2013-04-12 .*”. Can be used multiple times.
--exclude-db
name
Specifies the name of the database to exclude related log entries from the report. For example: “outdated_db”. Can be used multiple times.
--exclude-appname
name
Specifies the name of the application to exclude related log entries from the report. For example: “pg_dump”. Can be used multiple time.
--exclude-line
regexp
Specifies the regular expression such that any matching log entry will be excluded from the report. Can be used multiple times.
--exclude-client
address
Specifies the client IP/name to exclude related log entries from the report. Can be used multiple times.
--anonymize
Obscure all literals in queries. Useful to hide confidential data.
--noreport
Prevents generation of reports in the incremental mode.
--log-duration
Associate log entries generated by
andlog_duration
= on
.log_statement
= all--enable-checksum
Add the MD5 sum under each query report.
--journalctl
command
Specifies the command to produce the information similar to what the Postgres Pro log file contains. Usually, like this:
journalctl -u postgrespro-std-13
.--pid-dir
path
Specifies the path to store the PID file.
Default:
/tmp
.--pid-file
filename
Specifies the name of the PID file to manage concurrent execution of pgbadger.
Default:
pgbadger.pid
.--rebuild
Rebuild all HTML reports in incremental output directories that contain binary data files.
--pgbouncer-only
Only show pgbouncer-related menu in the header.
--start-monday
In the incremental mode, start calendar weeks on Monday. By default, they start on Sunday.
--iso-week-number
In the incremental mode, start calendar weeks on Monday with ISO 8601 week numbering: 01 to 53, where week 1 is the first week of a year that has at least 4 days.
--normalized-only
Only dump all normalized queries to
out.txt
.--log-timezone
+/-XX
Specifies the number of hours from GMT for the timezone to adjust date/time read from the log file before parsing. The use of this option makes log search with date/time more complicated.
--prettify-json
Prettify JSON output.
--month-report
YYYY-MM
Specifies the month (YYYY-MM) to create a cumulative HTML report for. Requires incremental output directories to be set and all the necessary binary data files available.
--day-report
YYYY-MM-DD
Specifies the day (YYYY-MM-DD) to create an HTML report for. Requires incremental output directories to be set and all the necessary binary data files available.
--noexplain
Avoid processing log lines generated by auto_explain.
--command
cmd
Specifies the command to run to retrieve log entries on stdin. pgbadger will open a pipe to this command and parse log entries that it generates.
--no-week
Avoid building weekly reports in the incremental mode. Use if building weekly reports takes too long.
--explain-url
URL
Specifes the URL to override the URL of the graphical explain tool.
Default:
http://explain.depesz.com/?is_public=0&is_anon=0&plan=
--tempdir
path
Specifies the directory for temporary files.
Default:
File::Spec->tmpdir() || '/tmp'
.--no-process-info
Disables changing the pgbadger process title to help identify this process. Useful for systems where changing process titles is not supported.
--dump-all-queries
Dump all queries found in the log file to a text file, replacing bind parameters in the queries at their respective placeholder positions.
--keep-comments
Retains comments in normalized queries. Useful to distinguish between same normalized queries.
--no-progressbar
Disables displaying the progress bar.
Remote Log Connection Options
pgbadger can parse a remote log file fetched using passwordless SSH connection. Use the -r
/--remote-host
option to set the IP address or name of the target host. More options to define SSH connection parameters are as follows:
--ssh-program
ssh
Specifies the path to the SSH program to use.
Default: ssh.
--ssh-port
port
Specifies the SSH port for the connection.
Default: 22.
--ssh-user
username
Specifies the username for the connection.
Default: user running pgbadger.
--ssh-identity
filename
Specifies the path to the identity file.
--ssh-timeout
seconds
Specifies the timeout in seconds in case of the SSH connection failure.
Default: 10.
--ssh-option
options
Specifies the list of options to define SSH connection parameters. The following options are always used:
-o ConnectTimeout=$ssh_timeout
-o PreferredAuthentications=hostbased,publickey
-o PreferredAuthentications=hostbased,publickey
Author
Gilles Darold <gilles@darold.net>