pgbadger

pgbadger — rapidly analyze Postgres Pro logs, producing detailed reports and graphs

Synopsis

pgbadger [connection-option...] [option...] [logfile...]

Description

pgbadger is a Postgres Pro/PostgreSQL log analyzer, which rapidly provides detailed reports based on your log files. pgbadger is provided with Postgres Pro as a standalone Perl script.

logfile can be a single log file, a list of files or a shell command that returns a list of files. To get log content from the standard input, pass - as logfile.

pgbadger can parse huge log files and compressed files. It can autodetect your log file format (syslog, stderr, csvlog or jsonlog) if the file is long enough. Supported compressed formats are gzip, bzip2, lz4, xz, zip and zstd. For the xz format, you must have an xz version higher than 5.05, which supports the --robot option. For pgbadger to determine the uncompressed file size for the lz4 format, the file must be compressed with the --content-size option.

pgbadger supports any format of log-line prefixes that can be specified through the log_line_prefix configuration setting of your postgresql.conf configuration file, provided that at least %t and %p are specified.

pgbouncer log files can also be parsed.

To speed up log parsing, you can use any of these multiprocessing modes: one core per log file and multiple cores per file. These modes can be combined.

pgbadger can also parse remote log files fetched using a passwordless SSH connection. This mode can be used with compressed files and even supports multiprocessing with multiple cores per file.

Examples of reports can be found at https://pgbadger.darold.net/#reports.

Limitations #

pgbadger currently has the following limitations:

  • Multiprocessing is not supported for compressed log files and CSV files, as well as on Windows.

  • CSV format of log files cannot be parsed remotely.

  • csvlog logs cannot be passed from the standard input.

Setup and Configuring #

pgbadger is provided with Postgres Pro Enterprise as a separate pre-built package pgbadger (for the detailed installation instructions, see Chapter 17). Once you have pgbadger installed, complete the setup described in sections below.

[Optional] Set up Parsing Specific Log Formats

If you plan to parse CSV log files, install Text::CSV_XS Perl module.

[Optional] Set up Export of Statistics

If you want to export statistics as a JSON file, install JSON::XS Perl module:

To install this optional module:

  • On a Debian-based system, run:

    sudo apt-get install libjson-xs-perl
    

  • On an RPM system, run:

    sudo yum install perl-JSON-XS
    

[Optional] Set up Parsing Compressed Log Files

By default, pgbadger autodetects the compressed log file format from the file extension and uses decompression utilities accordingly:

  • zcat for gz

  • bzcat for bz2

  • lz4cat for lz4

  • zstdcat for zst

  • unzip or xz for zip or xz

If the needed utility is outside of your PATH directories, use the --zcat command-line option to specify the path to the decompression utility. For example:

--zcat="/usr/local/bin/gunzip -c" or --zcat="/usr/local/bin/bzip2 -dc"
--zcat="C:\tools\unzip -p"

Note

With the default autodetection of the compressed file format, you can mix gz, bz2, lz4, xz, zip and zstd log files. Once you specified a custom value of --zcat, mixing compressed files of different formats is longer possible.

Configure Your Postgres Pro Server

Set the values of certain configuration parameters in your postgresql.conf:

  • Set up logging SQL queries.

    To enable SQL query logging and have the query statistics include actual query strings, set

    log_min_duration_statement = 0
    

    On a busy server you may want to increase this value to only log queries with a longer duration.

    If you just want to report the duration and number of queries and do not want details of queries, set log_min_duration_statement to -1, which disables logging statement durations, and enable log_duration.

    Enabling log_min_duration_statement will add reports about slowest queries and queries that took the most time. Note that if you set log_statement to all, the setting of log_min_duration_statement will have no effect.

    Warning

    Avoid setting log_min_duration_statement to a non-positive value together with enabling log_duration and log_statement as this will result in wrong counter values and drastically increase the size of your log. Always prefer setting log_min_duration_statement.

  • Set the log line prefix string in log_line_prefix.

    It must at least include a time escape sequence (%t, %m or %n) and the process-related escape sequence (%p or %c). For example, for stderr logs, the setting must be at least

    log_line_prefix = '%t [%p]: '
    

    The log line prefix could also specify the user, database name, application name and client IP address. For example,

    for stderr logs:

    log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h '
    

    or

    log_line_prefix = '%t [%p]: db=%d,user=%u,app=%a,client=%h '
    

    and for syslog logs:

    log_line_prefix = 'user=%u,db=%d,app=%a,client=%h '
    

    or

    log_line_prefix = 'db=%d,user=%u,app=%a,client=%h '
    

  • To get more information from your log files, set certain configuration parameters as follows:

    log_checkpoints = on
    log_connections = on
    log_disconnections = on
    log_lock_waits = on
    log_temp_files = 0
    log_autovacuum_min_duration = 0
    log_error_verbosity = default
    

    To benefit from these settings, do not enable log_statement as pgbadger does not parse the corresponding log format.

  • Set the language in which messages are displayed; messages must be in English with or without locale support:

    lc_messages='C'
    

    or

    lc_messages='en_US.UTF-8'
    

    Locales for other languages, such as ru_RU.utf8, are not supported.

Usage #

The following are simple examples to illustrate miscellaneous pgbadger usage details.

pgbadger /var/lib/pgpro/ent-17/data/log/postgresql-2022-01-14_000000.log
pgbadger /var/lib/pgpro/ent-17/data/log/postgres.log.2.gz /var/lib/pgpro/ent-17/data/log/postgres.log.1.gz /var/lib/pgpro/ent-17/data/log/postgresql-2022-01-14_000000.log
pgbadger /var/lib/pgpro/ent-17/data/log/postgresql/postgresql-2022-01-*
pgbadger --exclude-query="^(COPY|COMMIT)" /var/lib/pgpro/ent-17/data/log/postgresql-2022-01-14_000000.log
pgbadger -b "2022-01-25 10:56:11" -e "2022-01-25 10:59:11" /var/lib/pgpro/ent-17/data/log/postgresql-2022-01-25-0000.log 
cat /var/lib/pgpro/ent-17/data/log/postgresql-2022-01-14_000000.log | pgbadger -
# Log line prefix with stderr log output
pgbadger --prefix '%t [%p]: user=%u,db=%d,client=%h' /var/lib/pgpro/ent-17/data/log/postgresql-2022-08-21*
pgbadger --prefix '%m %u@%d %p %r %a : ' /var/lib/pgpro/ent-17/data/log/postgresql-2022-08-21-0000.log
# Log line prefix with syslog log output
pgbadger --prefix 'user=%u,db=%d,client=%h,appname=%a' /var/lib/pgpro/ent-17/data/log/postgresql-2022-08-21*
# Use 8 CPUs to parse 10GB file faster
pgbadger -j 8 /var/lib/pgpro/ent-17/data/log/postgresql-2022-08-21-0000.log
# Use a cron job to report errors weekly
30 23 * * 1 /usr/bin/pgbadger -q -w /var/lib/pgpro/ent-17/data/log/postgresql-2022-01*.log -o /var/www/pg_reports/pg_errors.html

Specifying Remote Log Files

Specify remote log files to parse using a URI. Supported protocols are HTTP[S] and [S]FTP. The curl command will be used to download the file, and the file will be parsed during download. The SSH protocol is also supported and will use the ssh command to get log files, as when the --remote-host option is used.

Use these URI notations for the remote log file:

pgbadger http://172.12.110.1//var/lib/pgpro/ent-17/data/log/postgresql-2022-01-14_000000.log
pgbadger ftp://username@172.12.110.14/postgresql-2022-01-14_000000.log
pgbadger ssh://username@172.12.110.14:2222//var/lib/pgpro/ent-17/data/log/postgresql-2022-01-14_000000.log*

You can parse a local Postgres Pro log and a remote pgbouncer log file together:

pgbadger /var/lib/pgpro/ent-17/data/log/postgresql-2022-01-14_000000.log ssh://username@172.12.110.14/pgbouncer.log  

Parallel Processing

To enable parallel processing, specify the -j N option, where N is the number of cores to use.

Parallel processing in pgbadger follows the algorithm below:

For each log file
  chunk size = int(file size / N)
  look at start/end offsets of these chunks
  fork N processes and seek to the start offset of each chunk
    each process will terminate when the parser reaches the end offset of its chunk
    each process writes stats into a binary temporary file
  wait for all child processes to terminate
All binary temporary files generated will then be read and loaded into
memory to build the html output.

With this method, at start/end of chunks pgbadger may truncate or omit a maximum of N queries per log file, which is an insignificant gap if you have millions of queries in your log file. The chance that the query that you were looking for is lost is near zero, so this gap can be considered suitable. Most of the time the query is counted twice, but truncated.

When you have many small log files and many CPUs, it is faster to dedicate one core to one log file at a time. To enable this behavior, specify the -J N option instead. Using this method, you can be sure not to lose any queries in the reports. With 200 log files of 10 MB each, the -J option starts being really efficient with 8 cores.

Here is a benchmark done on a server with 8 CPUs and a single file of 9.5 GB.

 Option |  1 CPU  | 2 CPU | 4 CPU | 8 CPU
--------+---------+-------+-------+------
   -j   | 1h41m18 | 50m25 | 25m39 | 15m58
   -J   | 1h41m18 | 54m28 | 41m16 | 34m45

With 200 log files of 10 MB each, which is 2 GB in total, the results are slightly different:

 Option | 1 CPU | 2 CPU | 4 CPU | 8 CPU
--------+-------+-------+-------+------
   -j   | 20m15 |  9m56 |  5m20 | 4m20
   -J   | 20m15 |  9m49 |  5m00 | 2m40

So it is recommended to use the -j option unless you have hundreds of small log files and can use at least 8 CPUs.

Important

During parallel parsing, pgbadger generates a lot of temporary files named tmp_pgbadgerXXXX.bin in the /tmp directory and removes them at the end.

Building Incremental Reports

The following sample cron job builds a report every week with the incremental behavior assuming that your log file and HTML report are also rotated every week:

0 4 * * 1 /usr/bin/pgbadger -q `find /var/lib/pgpro/ent-17/data/log/ -mtime -7 -name "postgresql.log*"` -o /var/www/pg_reports/pg_errors-`date +\%F`.html -l /var/reports/pgbadger_incremental_file.dat

But better turn on pgbadger's automatic building incremental reports by specifyng the -I/--incremental option. In this mode, pgbadger builds one report per day and a cumulative report per week. The output is first built in the binary format and saved to the output directory, specified by the -O/--outdir option, and then daily and weekly reports are built in the HTML format with the main index file. The main index file shows a dropdown menu per week with a link to the week's report and links to daily reports for that week. For example, run pgbadger as follows with the file rotated daily:

0 4 * * * /usr/bin/pgbadger -I -q /var/lib/pgpro/ent-17/data/log/postgresql/postgresql.log.1 -O /var/www/pg_reports/

You will have all daily and weekly reports. In this mode pgbadger will automatically create an incremental file in the output directory, so you do not have to use the -l option unless you want to change the path of that file. This means that you can run pgbadger in this mode every day on a log file rotated every week and it will not count the log entries twice. To save disk space, you may want to use the -X/--extra-files command-line option to force pgbadger to write CSS and JavaScript files to the output directory as separate files. The resources will then be loaded using script and link tags.

In the incremental mode, you can also specify the number of weeks to keep in the report by using the -O/--retention option:

/usr/bin/pgbadger --retention 2 -I -q /var/lib/pgpro/ent-17/data/log/postgresql/postgresql.log.1 -O /var/www/pg_reports/

If pg_dump is scheduled to run at 23:00 and 13:00 every day, you can exclude these periods from the report as follows:

pgbadger --exclude-time "2013-09-.* (23|13):.*" postgresql.log

This will help avoid having COPY statements generated by pg_dump on top of the list of slowest queries. Alternatively, you can use --exclude-appname "pg_dump" to solve this problem in a simpler way.

Rebuilding Reports

To update all HTML reports after fixing a pgbadger report or adding a new feature to it, you can rebuild incremental reports. To rebuild all reports in the case a binary file is still available, run:

rm /path/to/reports/*.js
rm /path/to/reports/*.css
pgbadger -X -I -O /var/www/pg_reports/ --rebuild 

This will also update all the resource files (JavaScript and CSS). Use the -E/--explode option if the reports were built with this option.

Building Monthly Reports

By default, in the incremental mode pgbadger only computes daily and weekly reports. To have monthly cumulative reports, you will need to use a separate command to specify the report to build. For example, to build a report for August 2021, run:

pgbadger -X --month-report 2021-08 /var/www/pg_reports/

This will add a link to the month name to the calendar view of incremental reports to look at the monthly report. The report for a current month can be run every day, and it is entirely rebuilt each time. The monthly report is not built by default because it could take too long. Like when rebuilding reports, if reports were built with the per-database option (-E/--explode), it must be used to build the monthly report:

pgbadger -E -X --month-report 2021-08 /var/www/pg_reports/ 

Choosing the Report File Format

The pgbadger report file format is determined by the extension of the file passed to the -o/--outfile option.

Use the binary format (-o *.bin) to create custom incremental and cumulative reports.

For example, to refresh a pgbadger report every hour from a daily log file, you can run the following commands every hour:

# Generate incremental data files in the binary format
pgbadger --last-parsed .pgbadger_last_state_file -o sunday/hourX.bin /var/lib/pgpro/ent-17/data/log/postgresql-Sun.log
# Build a fresh HTML report from the generated binary file
pgbadger sunday/*.bin

For another example, assume that you generate one log file per hour. To have reports rebuilt each time the log file is rotated, run:

pgbadger -o day1/hour01.bin /var/lib/pgpro/ent-17/data/log/postgresql-2022-01-23_10.log
pgbadger -o day1/hour02.bin /var/lib/pgpro/ent-17/data/log/postgresql-2022-01-23_11.log
pgbadger -o day1/hour03.bin /var/lib/pgpro/ent-17/data/log/postgresql-2022-01-23_12.log
...

And to refresh the HTML report, for example, each time after a new binary file is generated, just run:

pgbadger -o day1_report.html day1/*.bin

Adjust the commands to your particular needs.

Use the JSON format (-o *.json) to share data with other languages and to facilitate integration of pgbadger output with other monitoring tools, such as Cacti or Graphite.

Select other output formats to meet your particular needs. For example, this command will generate Tsung sessions XML file for SELECT queries only:

  pgbadger -S -o sessions.tsung --prefix '%t [%p]: user=%u,db=%d ' /var/lib/pgpro/ent-17/data/log/postgresql-2022-01-14_000000.log
  

Options #

This section describes pgbadger command-line options.

-a minutes
--average minutes #

Specifies the number of minutes for which to build average graphs of queries and connections.

Default: 5.

-A minutes
--histo-average minutes #

Specifies the number of minutes for which to build histogram graphs of queries.

Default: 60.

-b datetime
--begin datetime #

Specifies the start date/time for the data to be parsed in logs.

-c host
--dbclient host #

Only report on log entries for the specified client host.

-C
--nocomment #

Remove /* ... */ comments from queries.

-d name
--dbname name #

Only report on log entries for the specified database.

-D
--dns-resolv #

Replace client IP addresses with their DNS names.

Warning

This can considerably slow down pgbadger.

-e datetime
--end datetime #

Specifies the end date/time for the data to be parsed in logs.

-E
--explode #

Build one report per each database. Global information not related to any database gets added to the postgres database report.

-f logtype
--format logtype #

Specifies the log type.

Possible values: syslog, syslog2, stderr, jsonlog, csv, pgbouncer, logplex, rds and redshift. Use when pgbadger cannot detect the log format.

-G
--nograph #

Disables graphs in HTML output.

-h
--help #

Show detailed information about pbadger options and exit.

-H path
--html-outdir path #

Specifies the path to the directory where the HTML report must be written in the incremental mode. Note that binary files remain in the directory specified by -O/--outdir.

-i name
--ident name #

Specifies the program name used to identify Postgres Pro messages in syslog logs.

Default: postgres.

-I
--incremental #

Use the incremental mode, where reports will be generated by days in a separate directory specified by the -O/--outdir option.

-j number
--jobs number #

Specifies the number of jobs to run at same time. When working with csvlog logs, pgbadger always runs as single job.

Default: 1.

-J number
--Jobs number #

Specifies the number of log files to parse in parallel. When working with csvlog logs, one log at a time is processed.

Default: 1.

-l filename
--last-parsed filename #

Specifies the file where the last datetime and line parsed are registered to allow incremental log parsing. Useful to watch errors since the last run or to get one report per day with the log rotated weekly.

-L filename
--logfile-list filename #

Specifies the file containing the list of log files to parse.

-m size
--maxlength size #

Specifies the maximum length of a query in reports. Longer queries will be truncated.

Default: 100000.

-M
--no-multiline #

Turns off collecting multiline statements to avoid reporting excessive information, especially on errors that generate a huge report.

-N name
--appname name #

Only report on log entries for the specified application.

-o filename
--outfile filename #

Specifies the filename for the output and determines the report file format. Can be used multiple times to output several formats. For the json output, ensure that the Perl module JSON::XS is installed. To dump the output to stdout, use the value of - as filename.

Default: out.html, out.txt, out.bin, out.json or out.tsung,

for the respective output format.

-O path
--outdir path #

Specifies the directory where the out file must be saved.

-p string
--prefix string #

Specifies the value of your custom log_line_prefix string, as defined in your postgresql.conf. Only use if your log line prefix is different from the standard log_line_prefix strings, for example, if your prefix includes additional variables, such as client IP or application name.

-P
--no-prettify #

Disables SQL query code prettifier.

-q
--quiet #

Disables printing anything to stdout, even the progress bar.

-Q
--query-numbering #

Add numbering of queries to the output when used together with --dump-all-queries or --normalized-only.

-r address
--remote-host address #

Specifies the host to execute the cat command on a remote log file to parse the file locally.

-R number
--retention number #

Specifies the number of weeks for which to keep reports in the output directory in the incremental mode. The directories for older weeks and days are automatically removed.

Default: 0 (no reports are removed).

-s number
--sample number #

Specifies the number of query samples to store.

Default: 3.

-S
--select-only #

Only report on SELECT queries.

-t number
--top number #

Specifies the number of queries to store/display.

Default: 20.

-T string
--title string #

Specifies the title of the HTML report page.

-u username
--dbuser username #

Only report on log entries for the specified username.

-U username
--exclude-user username #

Specifies the username to exclude log entries for from the report. Can be used multiple times.

-v
--verbose #

Enables the verbose or debug mode.

Default: off.

-V
--version #

Show pgbadger version and exit.

-w
--watch-mode #

Only report errors, just like Logwatch can do.

-W
--wide-char #

Encode HTML output of queries in UTF8 to avoid Perl messages Wide character in print.

-x format
--extension format #

Specfies the output format. Possible values: text, html, bin, json or tsung.

Default: html.

-X
--extra-files #

In the incremental mode, write CSS and JavaScript resources to the output directory as separate files.

-z command
--zcat command #

Specifies the full command to run the zcat program. Use if zcat, bzcat or unzip is outside of your PATH directories.

-Z +/-XX
--timezone +/-XX #

Specifies the number of hours from GMT for the timezone. Use to adjust date/time in JavaScript graphs.

--pie-limit number #

Specifies the number such that instead of lower pie data the sum will be shown.

--exclude-query regexp #

Specifies the regular expression such that matching queries will be excluded from the report. For example: ^(VACUUM|COMMIT). Can be used multiple times.

--exclude-file filename #

Specifies the path to the file that contains regular expressions to use to exclude matching queries from the report, one expression per line.

--include-query regexp #

Specifies the regular expression such that only matching queries will be included in the report. Can be used multiple times. For example: (tbl1|tbl2).

--include-file filename #

Specifies the path to the file that contains regular expressions to use to include matching queries in the report, one expression per line.

--disable-error #

Turns off generation of an error report.

--disable-hourly #

Turns off generation of an hourly report.

--disable-type #

Turns off generation of a report on queries by type, database and user.

--disable-query #

Turns off generation of query reports, such as slowest or most frequent queries, queries by users, by database and so on.

--disable-session #

Turns off generation of a session report.

--disable-connection #

Turns off generation of a connection report.

--disable-lock #

Turns off generation of a lock report.

--disable-temporary #

Turns off generation of a report on temporary files.

--disable-checkpoint #

Turns off generation of checkpoint/restartpoint reports.

--disable-autovacuum #

Turns off generation of an autovacuum report.

--charset name #

Specifies the HTML charset to be used.

Default: utf-8.

--csv-separator char #

Specifies the CSV field separator.

Default: ,.

--exclude-time regexp #

Specifies the regular expression such that log entries for any matching timestamp will be excluded from the report. For example: 2013-04-12 .*. Can be used multiple times.

--include-time regexp #

Specifies the regular expression such that log entries for any matching timestamp will be included in the report. For example: 2013-04-12 .*. Can be used multiple times.

--exclude-db name #

Specifies the name of the database to exclude related log entries from the report. For example: outdated_db. Can be used multiple times.

--exclude-appname name #

Specifies the name of the application to exclude related log entries from the report. For example: pg_dump. Can be used multiple time.

--exclude-line regexp #

Specifies the regular expression such that any matching log entry will be excluded from the report. Can be used multiple times.

--exclude-client address #

Specifies the client IP/name to exclude related log entries from the report. Can be used multiple times.

--anonymize #

Obscure all literals in queries. Useful to hide confidential data.

--noreport #

Prevents generation of reports in the incremental mode.

--log-duration #

Associate log entries generated by log_duration = on and log_statement = all.

--enable-checksum #

Add the MD5 sum under each query report.

--journalctl command #

Specifies the command to produce the information similar to what the Postgres Pro log file contains. Usually, like this: journalctl -u postgrespro-ent-17.

--pid-dir path #

Specifies the path to store the PID file.

Default: /tmp.

--pid-file filename #

Specifies the name of the PID file to manage concurrent execution of pgbadger.

Default: pgbadger.pid.

--rebuild #

Rebuild all HTML reports in incremental output directories that contain binary data files.

--pgbouncer-only #

Only show pgbouncer-related menu in the header.

--start-monday #

In the incremental mode, start calendar weeks on Monday. By default, they start on Sunday.

--iso-week-number #

In the incremental mode, start calendar weeks on Monday with ISO 8601 week numbering: 01 to 53, where week 1 is the first week of a year that has at least 4 days.

--normalized-only #

Only dump all normalized queries to out.txt.

--log-timezone +/-XX #

Specifies the number of hours from GMT for the timezone to adjust date/time read from the log file before parsing. The use of this option makes log search with date/time more complicated.

--prettify-json #

Prettify JSON output.

--month-report YYYY-MM #

Specifies the month (YYYY-MM) to create a cumulative HTML report for. Requires incremental output directories to be set and all the necessary binary data files available.

--day-report YYYY-MM-DD #

Specifies the day (YYYY-MM-DD) to create an HTML report for. Requires incremental output directories to be set and all the necessary binary data files available.

--noexplain #

Avoid processing log lines generated by auto_explain.

--command cmd #

Specifies the command to run to retrieve log entries on stdin. pgbadger will open a pipe to this command and parse log entries that it generates.

--no-week #

Avoid building weekly reports in the incremental mode. Use if building weekly reports takes too long.

--explain-url URL #

Specifes the URL to override the URL of the graphical explain tool.

Default: http://explain.depesz.com/?is_public=0&is_anon=0&plan=

--tempdir path #

Specifies the directory for temporary files.

Default: File::Spec->tmpdir() || '/tmp'.

--no-process-info #

Disables changing the pgbadger process title to help identify this process. Useful for systems where changing process titles is not supported.

--dump-all-queries #

Dump all queries found in the log file to a text file, replacing bind parameters in the queries at their respective placeholder positions.

--keep-comments #

Retains comments in normalized queries. Useful to distinguish between same normalized queries.

--no-progressbar #

Disables displaying the progress bar.

Remote Log Connection Options #

pgbadger can parse a remote log file fetched using passwordless SSH connection. Use the -r/--remote-host option to set the IP address or name of the target host. More options to define SSH connection parameters are as follows:

--ssh-program ssh

Specifies the path to the SSH program to use.

Default: ssh.

--ssh-port port

Specifies the SSH port for the connection.

Default: 22.

--ssh-user username

Specifies the username for the connection.

Default: user running pgbadger.

--ssh-identity filename

Specifies the path to the identity file.

--ssh-timeout seconds

Specifies the timeout in seconds in case of the SSH connection failure.

Default: 10.

--ssh-option options

Specifies the list of options to define SSH connection parameters. The following options are always used:

-o ConnectTimeout=$ssh_timeout -o PreferredAuthentications=hostbased,publickey

-o PreferredAuthentications=hostbased,publickey

Author

Gilles Darold