Postgres Pro Standard : Documentation: 16: pgpro

pgpro_scout
Prev	Up	I.1. Additional PostgreSQL/Postgres Pro Client Applications	Home	Next

pgpro_scout

pgpro_scout — detect sensitive information in a database

Synopsis

pgpro_scout config [option...]

pgpro_scout inspect [option...]

Description #

pgpro_scout is a command-line utility to detect sensitive (confidential) information in a database. pgpro_scout looks for sensitive information in database column names and field values. The search requires a specialized dictionary that the utility allows you to create.

pgpro_scout does not aim to find all the sensitive information, but only performs data intelligence to determine whether such information exists and show sensitive data samples.

pgpro_scout is provided with Postgres Pro Standard as a separate pre-built package pgpro_scout (for the detailed installation instructions, see Chapter 16).

The dictionary is a text file whose lines have the key-value-comment format. Where:

key is a string with the conventional name of a search parameter.
value defines a condition that a database field must meet to be considered sensitive. Can be a regular expression or an array of string values.
comment is a text description of the search condition to be visible in search results.

pgpro_scout generates a report file with information on sample occurrences of the sensitive information found. Refer to the section called “pgpro_scout Report” for details.

pgpro_scout Usage #

To perform the database inspection for occurrences of sensitive information, you need to create the dictionary and launch the search passing a list of database objects to look through.

Creating the pgpro_scout Dictionary #

To create a base dictionary, run the config command. You can update this dictionary and extend it with new search conditions if needed. Here is an example of the dictionary:

scout:
  column_names:
    matchers:
      - key: phones
        match_values: ["phones", "phone"]
        comment: "Phone number fields"
  column_values:
    disabled: true # optional
    case_sensitive: false # optional
    matchers:
      - key: surnames
        disabled: false # optional
        case_sensitive: true # optional
        match_values: ["Holmes", "Watson"]
      - key: phones_data
        pattern: ^((8|\+7)[\- ]?)?(\(?\d{3}\)?[\- ]?)?[\d\- ]{7,10}$
        comment: "Phone numbers"
exclude:
  schemas: ['pgpro_sfile_data', 'dbms_lob', 'profile', 'information_schema', 'pg_catalog', 'pg_toast']
  column_types: ['timestamp with time zone']

Dictionary Structure #

The dictionary consists of the following sections:

scout — contains the description of search conditions.
exclude — lists database schemas and field types to be excluded from the search.

These types of search conditions can be specified:

column_names — field names.
column_values — field values.

A matchers element contains a list of search conditions. Each matchers element may contain the following parameters:

key — the search key.
match_values — the list of strings to look for in the database fields. Do not specify together with pattern.
pattern — a regular expression for search in the database fields. Do not specify together with match_values.
case_sensitive — a boolean value to define whether the search is case-sensitive if match_values is specified.
disabled — a boolean value to define whether to exclude the condition from the search.
comment — the comment to be shown in the report file.

Specifying Database Objects to Look through #

To specify the objects to look through, use the --search-path option when launching the search. --search-path takes a string that contains comma-separated substrings with database objects to restrict the search to:

To restrict the search to a database schema, specify a substring with the schema name.
To restrict the search to an individual table or tables, specify a substring like this: SCHEMA.TABLE or SCHEMA.{TABLE1, TABLE2}.
To restrict the search to an individual column or columns of a certain table, specify a substring like this: SCHEMA.TABLE.COLUMN or SCHEMA.TABLE.{COLUMN1, COLUMN2}.

Table I.1. Examples of Specifying Search Objects

`--search-path` Value	Search Range
`public`	All tables (with all columns) in the `public` schema.
`public.{products, sales}`	`products` and `sales` tables (with all columns) in the `public` schema.
`public.products.{id, email}`	`id` and `email` columns of the `products` table in the `public` schema.
`public.products`	Only the `public.products` table.
`public.products,events,holds.card.id`	The `public.products` table, all tables in the `events` schema, and the `id` column of the `holds.card` table.

Launching the Search of Sensitive Information #

To launch the search of sensitive information with the available dictionary, run the inspect command specifying --search-path as explained in the section called “Specifying Database Objects to Look through”.

Usage Limitations #

Take into account the following limitations when using pgpro_scout:

When looking for data of the json/jsonb type, pattern conditions must be used rather than match_values.
pgpro_scout does not currently support schemas, tables, and fields with case-sensitive names.
Search for values in large-volume fields can greatly increase the search time, so that this can even look like the application hanging. If you are aware of fields containing considerable volume of data, exclude these fields from the search by editing --search-path.
When looking for data in a table with a composite primary key, avoid excluding single fields contained in the primary key from search.

pgpro_scout supports tables with primary keys of the following types:

Numeric: smallint, integer, bigint, decimal, numeric, real, double, smallserial, serial, bigserial.
Monetary: money.
Character: text, bpchar, character, character(n).
Date and time: timestamp, timestampz, date, time, time with time zone, interval.
Describing network addresses: inet, cidr, macaddr, macaddr8.
Boolean.
UUID.
Range: int4range, int8range, daterange.
Array: arrays of types are supported whose element types are supported.

Use of primary keys of types different from those listed above can result in errors.

Command-Line Reference #

This section describes pgpro_scout commands. Optional parameters are enclosed in square brackets.

config #

pgpro_scout config [--out|-o string] [--override|-O] [--help|-h]

Generates a base pgpro_scout dictionary. The meaning of the options is as follows:

--h --help: Output the command reference information.
-O --override: Override the dictionary file if it exists.
-o filepath --out filepath: Path to the dictionary file.

inspect #

pgpro_scout inspect [--batch-size|-b uint] [--config-file|-f string]
[--dbname|-d string] [--disable-comment] [--hidden-sample] [--host|-H string]
[--order|-O string] [--output|-o string] [--password|-W string]
[--port|-p uint16] [--report-buffer uint16] [--rows-number|-n uint16]
[--sample-size uint16] [--search-path|-s string]
[--threads|-j int] [--username|-U string] [--help|-h]

Launches the search of sensitive information. The meaning of the options is as follows:

-b size --batch-size size #

Number of database rows to be processed by one query for each task.

Default: 1000.

-f filepath --config-file filepath #

Path to the dictionary file.

--d name --dbname name #

Database name to connect to.

--disable-comment #

Create the report file without comments.

--h string --help string #

Output the command reference information.

--hidden-sample #

Mask found samples in the report. For example: Br**te will be output instead of Bronte.

--H name --host name #

Database host name or IP address.

Default: localhost.

-O search_order --order search_order #

The order of looking through rows. Possible values:

tail — get last rows
head — get first rows
random — get random rows

Default: head.

-o filepath --output filepath #

Path to the report file.

--W string --password string #

Password to connect to the database.

--p name --port name #

Database port.

Default: 5432.

--report-buffer size #

Size of the report buffer.

Default: 64

-n number --rows-number number #

Number of rows to look through in each table.

Default: 10000.

--sample-size number #

Number of found samples to be included in the report for each table. If set to 0, samples of found rows will not be included in the report.

Default: 5.

-s filepath --search-path filepath #

List of schemas, tables, and fields to look through. See the section called “Specifying Database Objects to Look through” for details.

Default: 5.

-j filepath --threads filepath #

Number of threads used to run the utility.

Default: 14.

--U name --username name #

Username to connect to the database.

pgpro_scout Report #

A csv file is created as a result of pgpro_scout search. This file provides the following information:

Date and time of the search completion.
Report field headings.
Search results.

Report Structure #

The search results contain the following fields:

Table I.2. Report Fields

Name	Description
`location`	Full name of a table, that is <schema name>.<table_name>, with the occurrence found.
`column`	Column of the table.
`key`	Key of the search condition from the dictionary.
`comment`	Comment of the search condition from the dictionary.
`sample`	Sample occurrence found. If the search is run with `--sample-size` = 0, this field is not included in the report. To mask the contents of this field, enable `--hidden-sample` when launching the search.

The report will contain not more than --sample-size lines for each table.

Example #

The following is an example of the report file:

#report time: 2025-06-16T11:10:58+03:00
location,column,key,comment, sample
public.expected,phone,phones,Phone numbers,
events.holds,user_surname,surname,Last names, Holmes
events.holds,user_surname,surname,Last names, Watson

Prev	Up	Next
pg_probackup	Home	vacuumlo

epub pdf