pgpro_scout
pgpro_scout — detect sensitive information in a database
Synopsis
pgpro_scout config [option...]
pgpro_scout inspect [option...]
Description #
pgpro_scout is a command-line utility to detect sensitive (confidential) information in a database. pgpro_scout looks for sensitive information in database column names and field values. The search requires a specialized dictionary that the utility allows you to create.
pgpro_scout does not aim to find all the sensitive information, but only performs data intelligence to determine whether such information exists and show sensitive data samples.
pgpro_scout is provided with Postgres Pro Standard as a separate pre-built package pgpro_scout (for the detailed installation instructions, see Chapter 16).
The dictionary is a text file whose lines have the key-value-comment format. Where:
key is a string with the conventional name of a search parameter.
value defines a condition that a database field must meet to be considered sensitive. Can be a regular expression or an array of string values.
comment is a text description of the search condition to be visible in search results.
pgpro_scout generates a report file with information on sample occurrences of the sensitive information found. Refer to the section called “pgpro_scout Report” for details.
pgpro_scout Usage #
To perform the database inspection for occurrences of sensitive information, you need to create the dictionary and launch the search passing a list of database objects to look through.
Creating the pgpro_scout Dictionary #
To create a base dictionary, run the config command. You can update this dictionary and extend it with new search conditions if needed. Here is an example of the dictionary:
scout:
column_names:
matchers:
- key: phones
match_values: ["phones", "phone"]
comment: "Phone number fields"
column_values:
disabled: true # optional
case_sensitive: false # optional
matchers:
- key: surnames
disabled: false # optional
case_sensitive: true # optional
match_values: ["Holmes", "Watson"]
- key: phones_data
pattern: ^((8|\+7)[\- ]?)?(\(?\d{3}\)?[\- ]?)?[\d\- ]{7,10}$
comment: "Phone numbers"
exclude:
schemas: ['pgpro_sfile_data', 'dbms_lob', 'profile', 'information_schema', 'pg_catalog', 'pg_toast']
column_types: ['timestamp with time zone']
Dictionary Structure #
The dictionary consists of the following sections:
scout— contains the description of search conditions.exclude— lists database schemas and field types to be excluded from the search.
These types of search conditions can be specified:
column_names— field names.column_values— field values.
A matchers element contains a list of search conditions. Each matchers element may contain the following parameters:
key— the search key.match_values— the list of strings to look for in the database fields. Do not specify together withpattern.pattern— a regular expression for search in the database fields. Do not specify together withmatch_values.case_sensitive— a boolean value to define whether the search is case-sensitive ifmatch_valuesis specified.disabled— a boolean value to define whether to exclude the condition from the search.comment— the comment to be shown in the report file.
Specifying Database Objects to Look through #
To specify the objects to look through, use the --search-path option when launching the search. --search-path takes a string that contains comma-separated substrings with database objects to restrict the search to:
To restrict the search to a database schema, specify a substring with the schema name.
To restrict the search to an individual table or tables, specify a substring like this:
SCHEMA.TABLEorSCHEMA.{TABLE1, TABLE2}.To restrict the search to an individual column or columns of a certain table, specify a substring like this:
SCHEMA.TABLE.COLUMNorSCHEMA.TABLE.{COLUMN1, COLUMN2}.
Table I.1. Examples of Specifying Search Objects
--search-path Value | Search Range |
|---|---|
public | All tables (with all columns) in the public schema. |
public.{products, sales} | products and sales tables (with all columns) in the public schema. |
public.products.{id, email} | id and email columns of the products table in the public schema. |
public.products | Only the public.products table. |
public.products,events,holds.card.id | The public.products table, all tables in the events schema, and the id column of the holds.card table. |
Launching the Search of Sensitive Information #
To launch the search of sensitive information with the available dictionary, run the inspect command specifying --search path as explained in the section called “Specifying Database Objects to Look through”.
Command-Line Reference #
This section describes pgpro_scout commands. Optional parameters are enclosed in square brackets.
config #
pgpro_scout config [--out|-o string] [--override|-O] [--help|-h]
Generates a base pgpro_scout dictionary. The meaning of the options is as follows:
--h--helpOutput the command reference information.
-O--overrideOverride the dictionary file if it exists.
-ofilepath--outfilepathPath to the dictionary file.
inspect #
pgpro_scout inspect [--batch-size|-buint] [--config-file|-fstring] [--dbname|-dstring] [--disable-comment] [--hidden-sample] [--host|-Hstring] [--order|-Ostring] [--output|-ostring] [--password|-Wstring] [--port|-puint16] [--report-bufferuint16] [--rows-number|-nuint16] [--sample-sizeuint16] [--search-path|-sstring] [--threads|-jint] [--username|-Ustring] [--help|-h]
Launches the search of sensitive information. The meaning of the options is as follows:
-bsize--batch-size#sizeNumber of database rows to be processed by one query for each task.
Default:
1000.-ffilepath--config-file#filepathPath to the dictionary file.
--dname--dbname#nameDatabase name to connect to.
--disable-comment#Create the report file without comments.
--hstring--help#stringOutput the command reference information.
--hidden-sample#Mask found samples in the report. For example:
Br**tewill be output instead ofBronte.--Hname--host#nameDatabase host name or IP address.
Default:
localhost.-Osearch_order--ordersearch_order#The order of looking through rows. Possible values:
tail— get last rowshead— get first rowsrandom— get random rows
Default:
head.-ofilepath--outputfilepath#Path to the report file.
--Wstring--password#stringPassword to connect to the database.
--pname--port#nameDatabase port.
Default:
5432.--report-buffersize#Size of the report buffer.
Default:
64-nnumber--rows-numbernumber#Number of rows to look through in each table.
Default:
10000.--sample-sizenumber#Number of found samples to be included in the report for each table. If set to 0, samples of found rows will not be included in the report.
Default:
5.-sfilepath--search-pathfilepath#List of schemas, tables, and fields to look through. See the section called “Specifying Database Objects to Look through” for details.
Default:
5.-jfilepath--threadsfilepath#Number of threads used to run the utility.
Default:
14.--Uname--username#nameUsername to connect to the database.
pgpro_scout Report #
A csv file is created as a result of pgpro_scout search. This file provides the following information:
Date and time of the search completion.
Report field headings.
Search results.
Report Structure #
The search results contain the following fields:
Table I.2. Report Fields
| Name | Description |
|---|---|
location | Full name of a table, that is <schema name>.<table_name>, with the occurrence found. |
column | Column of the table. |
key | Key of the search condition from the dictionary. |
comment | Comment of the search condition from the dictionary. |
sample | Sample occurrence found. If the search is run with --sample-size = 0, this field is not included in the report. To mask the contents of this field, enable --hidden-sample when launching the search. |
The report will contain not more than --sample-size lines for each table.
Example #
The following is an example of the report file:
#report time: 2025-06-16T11:10:58+03:00 location,column,key,comment, sample public.expected,phone,phones,Phone numbers, events.holds,user_surname,surname,Last names, Holmes events.holds,user_surname,surname,Last names, Watson