pgpro_scout
pgpro_scout — detect sensitive information in a database
Synopsis
pgpro_scout config [option...]
pgpro_scout inspect [option...]
Description #
pgpro_scout is a command-line utility to detect sensitive (confidential) information in a database. pgpro_scout looks for sensitive information in database column names and field values. The search requires a specialized dictionary that the utility allows you to create.
pgpro_scout does not aim to find all the sensitive information, but only performs data intelligence to determine whether such information exists and show sensitive data samples.
pgpro_scout is provided with Postgres Pro Enterprise as a separate pre-built package pgpro_scout (for the detailed installation instructions, see Chapter 17).
The dictionary is a text file whose lines have the key-value-comment format. Where:
key is a string with the conventional name of a search parameter.
value defines a condition that a database field must meet to be considered sensitive. Can be a regular expression or an array of string values.
comment is a text description of the search condition to be visible in search results.
pgpro_scout generates a report file with information on sample occurrences of the sensitive information found. Refer to the section called “pgpro_scout Report” for details.
pgpro_scout Usage #
To perform the database inspection for occurrences of sensitive information, you need to create the dictionary and launch the search passing a list of database objects to look through.
Creating the pgpro_scout Dictionary #
To create a base dictionary, run the config command. You can update this dictionary and extend it with new search conditions if needed. Here is an example of the dictionary:
scout:
column_names:
matchers:
- key: phones
match_values: ["phones", "phone"]
comment: "Phone number fields"
column_values:
disabled: true # optional
case_sensitive: false # optional
matchers:
- key: surnames
disabled: false # optional
case_sensitive: true # optional
match_values: ["Holmes", "Watson"]
- key: phones_data
pattern: ^((8|\+7)[\- ]?)?(\(?\d{3}\)?[\- ]?)?[\d\- ]{7,10}$
comment: "Phone numbers"
exclude:
schemas: ['pgpro_sfile_data', 'dbms_lob', 'profile', 'information_schema', 'pg_catalog', 'pg_toast']
column_types: ['timestamp with time zone']
Dictionary Structure #
The dictionary consists of the following sections:
scout— contains the description of search conditions.exclude— lists database schemas and field types to be excluded from the search.
These types of search conditions can be specified:
column_names— field names.column_values— field values.
A matchers element contains a list of search conditions. Each matchers element may contain the following parameters:
key— the search key.match_values— the list of strings to look for in the database fields. Do not specify together withpattern.pattern— a regular expression for search in the database fields. Do not specify together withmatch_values.case_sensitive— a boolean value to define whether the search is case-sensitive ifmatch_valuesis specified.disabled— a boolean value to define whether to exclude the condition from the search.comment— the comment to be shown in the report file.
Specifying Database Objects to Look through #
To specify the objects to look through, use the --search-path option when launching the search. --search-path takes a string that contains comma-separated substrings with database objects to restrict the search to:
To restrict the search to a database schema, specify a substring with the schema name.
To restrict the search to an individual table or tables, specify a substring like this:
SCHEMA.TABLEorSCHEMA.{TABLE1, TABLE2}.To restrict the search to an individual column or columns of a certain table, specify a substring like this:
SCHEMA.TABLE.COLUMNorSCHEMA.TABLE.{COLUMN1, COLUMN2}.
Table I.1. Examples of Specifying Search Objects
--search-path Value | Search Range |
|---|---|
public | All tables (with all columns) in the public schema. |
public.{products, sales} | products and sales tables (with all columns) in the public schema. |
public.products.{id, email} | id and email columns of the products table in the public schema. |
public.products | Only the public.products table. |
public.products,events,holds.card.id | The public.products table, all tables in the events schema, and the id column of the holds.card table. |
Launching the Search of Sensitive Information #
To launch the search of sensitive information with the available dictionary, run the inspect command specifying --search-path as explained in the section called “Specifying Database Objects to Look through”.
Usage Limitations #
Take into account the following limitations when using pgpro_scout:
When looking for data of the
json/jsonbtype,patternconditions must be used rather thanmatch_values.pgpro_scout does not currently support schemas, tables, and fields with case-sensitive names.
Search for values in large-volume fields can greatly increase the search time, so that this can even look like the application hanging. If you are aware of fields containing considerable volume of data, exclude these fields from the search by editing
--search-path.When looking for data in a table with a composite primary key, avoid excluding single fields contained in the primary key from search.
pgpro_scout supports tables with primary keys of the following types:
Numeric:
smallint,integer,bigint,decimal,numeric,real,double,smallserial,serial,bigserial.Monetary:
money.Character:
text,bpchar,character,character(n).Date and time:
timestamp,timestampz,date,time,time with time zone,interval.Describing network addresses:
inet,cidr,macaddr,macaddr8.Boolean.
UUID.Range:
int4range,int8range,daterange.Array: arrays of types are supported whose element types are supported.
Use of primary keys of types different from those listed above can result in errors.
Command-Line Reference #
This section describes pgpro_scout commands. Optional parameters are enclosed in square brackets.
config #
pgpro_scout config [--out|-o string] [--override|-O] [--help|-h]
Generates a base pgpro_scout dictionary. The meaning of the options is as follows:
--h--helpOutput the command reference information.
-O--overrideOverride the dictionary file if it exists.
-ofilepath--outfilepathPath to the dictionary file.
inspect #
pgpro_scout inspect [--batch-size|-buint] [--config-file|-fstring] [--dbname|-dstring] [--disable-comment] [--hidden-sample] [--host|-Hstring] [--order|-Ostring] [--output|-ostring] [--password|-Wstring] [--port|-puint16] [--report-bufferuint16] [--rows-number|-nuint16] [--sample-sizeuint16] [--search-path|-sstring] [--threads|-jint] [--username|-Ustring] [--help|-h]
Launches the search of sensitive information. The meaning of the options is as follows:
-bsize--batch-size#sizeNumber of database rows to be processed by one query for each task.
Default:
1000.-ffilepath--config-file#filepathPath to the dictionary file.
--dname--dbname#nameDatabase name to connect to.
--disable-comment#Create the report file without comments.
--hstring--help#stringOutput the command reference information.
--hidden-sample#Mask found samples in the report. For example:
Br**tewill be output instead ofBronte.--Hname--host#nameDatabase host name or IP address.
Default:
localhost.-Osearch_order--ordersearch_order#The order of looking through rows. Possible values:
tail— get last rowshead— get first rowsrandom— get random rows
Default:
head.-ofilepath--outputfilepath#Path to the report file.
--Wstring--password#stringPassword to connect to the database.
--pname--port#nameDatabase port.
Default:
5432.--report-buffersize#Size of the report buffer.
Default:
64-nnumber--rows-numbernumber#Number of rows to look through in each table.
Default:
10000.--sample-sizenumber#Number of found samples to be included in the report for each table. If set to 0, samples of found rows will not be included in the report.
Default:
5.-sfilepath--search-pathfilepath#List of schemas, tables, and fields to look through. See the section called “Specifying Database Objects to Look through” for details.
Default:
5.-jfilepath--threadsfilepath#Number of threads used to run the utility.
Default:
14.--Uname--username#nameUsername to connect to the database.
pgpro_scout Report #
A csv file is created as a result of pgpro_scout search. This file provides the following information:
Date and time of the search completion.
Report field headings.
Search results.
Report Structure #
The search results contain the following fields:
Table I.2. Report Fields
| Name | Description |
|---|---|
location | Full name of a table, that is <schema name>.<table_name>, with the occurrence found. |
column | Column of the table. |
key | Key of the search condition from the dictionary. |
comment | Comment of the search condition from the dictionary. |
sample | Sample occurrence found. If the search is run with --sample-size = 0, this field is not included in the report. To mask the contents of this field, enable --hidden-sample when launching the search. |
The report will contain not more than --sample-size lines for each table.
Example #
The following is an example of the report file:
#report time: 2025-06-16T11:10:58+03:00 location,column,key,comment, sample public.expected,phone,phones,Phone numbers, events.holds,user_surname,surname,Last names, Holmes events.holds,user_surname,surname,Last names, Watson