Postgres Pro Enterprise : Documentation: 16: G.1. pgpro_anonymizer — mask or replace sensitive data : Postgres Professional

G.1. pgpro_anonymizer — mask or replace sensitive data
Prev	Up	Appendix G. Postgres Pro Modules and Extensions Shipped as Individual Packages	Home	Next

G.1. pgpro_anonymizer — mask or replace sensitive data #

G.1.1. Terms and Definitions
G.1.2. Anonymization Example
G.1.3. Installation and Setup
G.1.4. Configuration
G.1.5. Declare Masking Rules
G.1.6. Masking Functions
G.1.7. Static Masking
G.1.8. Dynamic Masking
G.1.9. Anonymous Dumps
G.1.10. Generalization
G.1.11. Performance
G.1.12. Security
G.1.13. Sampling

pgpro_anonymizer is an extension to mask or replace personally identifiable information (PII) or commercially sensitive data from a Postgres Pro database.

The project has a declarative approach of anonymization. This means you can declare the masking rules using the Data Definition Language (DDL) and specify your anonymization strategy inside the table definition itself.

Once the masking rules are defined, you can access the anonymized data in one of the following ways:

Anonymous dumps: simply export the masked data into an SQL file.
Static masking: remove the PII according to the rules.
Dynamic masking: hide PII only for the masked users.

In addition, various functions are available to implement the following masking techniques: randomization, faking, partial scrambling, shuffling, noise. You can also create a custom function.

Beyond masking, you can also use another approach called generalization, which is useful for statistics and data analytics.

G.1.1. Terms and Definitions #

The following main strategies are used:

Dynamic masking offers an altered view of the real data without modifying it. Some users may only read the masked data, others may access the authentic version.
Static masking is the definitive action of substituting the sensitive information with uncorrelated data. Once processed, the authentic data cannot be retrieved.

The data can be altered with several techniques:

Deletion simply removes data.
Static substitution consistently replaces the data with a generic value. For instance: replacing all values of a text column with the value “CONFIDENTIAL”.
Variance “shifts” dates and numeric values. For example, by applying a +/- 10% variance to a salary column, the dataset will remain meaningful.
Generalization reduces the accuracy of the data by replacing it with a range of values. Instead of saying “Bob is 28 years old”, you can say “Bob is between 20 and 30 years old”. This is useful for analytics because the data remains true.
Shuffling mixes values within the same columns. This method is open to being reversed if the shuffling algorithm can be deciphered.
Randomization replaces sensitive data with random-but-plausible values. The goal is to avoid any identification from the data record while remaining suitable for testing, data analysis and data processing.
Partial scrambling is similar to static substitution but leaves out some part of the data. For instance: a credit card number can be replaced by “40XX XXXX XXXX XX96”.
Custom rules are designed to alter data following specific needs (e.g., simultaneously randomize a zipcode and a city name while keeping them coherent).
Pseudonymization is a way to protect personal information by hiding it using additional information. Encryption and hashing are two examples of pseudonymization techniques. However, pseudonymizated data is still linked to the original data.

G.1.2. Anonymization Example #

Suppose you want to mask last names and phone numbers:

SELECT * FROM people;
 id | firstname | lastname |   phone
----+-----------+----------+------------
 T1 | Sarah     | Connor    | 0609110911

Activate the dynamic masking engine:
```
SELECT anon.start_dynamic_masking();
```

Declare a masked user:

CREATE ROLE skynet LOGIN;
SECURITY LABEL FOR anon ON ROLE skynet IS 'MASKED';

Declare the masking rules:

SECURITY LABEL FOR anon ON COLUMN people.lastname
IS 'MASKED WITH FUNCTION anon.fake_last_name()';

SECURITY LABEL FOR anon ON COLUMN people.phone
IS 'MASKED WITH FUNCTION anon.partial(phone,2,$$******$$,2)';

Connect with the masked user:

\connect - skynet
SELECT * FROM people;
 id | firstname | lastname  |   phone
----+-----------+-----------+------------
 T1 | Sarah     | Stranahan | 06******11

G.1.3. Installation and Setup #

The pgpro_anonymizer extension is provided with Postgres Pro Enterprise as a separate pre-built package pgpro-anonymizer-ent-16 (for the detailed installation instructions, see Chapter 17). To enable pgpro_anonymizer, complete the following steps:

Add the library name to the shared_preload_libraries variable in the postgresql.conf file:
```
shared_preload_libraries = 'anon'
```
Reload the database server for the changes to take effect.
Note
To verify that the library was installed correctly, you can run the following command:
```
SHOW shared_preload_libraries;
```
Create the extension using the following query:
```
CREATE EXTENSION anon CASCADE;
```
Initialize the extension:
```
SELECT anon.init();
```
The init() function imports a default dataset of random data (IBAN, names, cities, etc.). This dataset is in English and is very small (1000 values for each category).

G.1.4. Configuration #

The extension has several options that can be defined for the entire instance (in postgresql.conf or with ALTER SYSTEM).

It is also possible and often a good idea to define them at the database level like this:

ALTER DATABASE customers SET anon.algorithm = sha512;

Only a superuser can change the parameters below:

anon.algorithm #

The hashing method used by pseudonymizing functions. See pgcrypto documentation for the list of available options.

See anon.salt to learn why this parameter is a very sensitive information.

Type: text

Default: sha256

Visible: only to superusers

anon.maskschema #

The schema (i.e. namespace) where the dynamic masking views will be stored.

Type: text

Default: mask

Visible: to all users

anon.restrict_to_trusted_schemas #

When this parameter is enabled (by default), masking rules must be defined using functions located in a limited list of namespaces. By default, only the anon schema is trusted.

This improves security by preventing users from declaring their custom masking filters.

This also means that the schema must be explicit inside the masking rules. For more details, see the Write your own masks section of the Masking functions chapter.

SECURITY LABEL FOR anon ON COLUMN people.name
IS 'MASKED WITH FUNCTION lower(people.name)';

The correct way to declare it would be:

SECURITY LABEL FOR anon ON COLUMN people.name
IS 'MASKED WITH FUNCTION pg_catalog.lower(people.name)';

Type: boolean

Default: on

Visible: to all users

anon.salt #

The salt used by pseudonymizing functions. It is very important to define a custom salt for each database like this:

ALTER DATABASE foo SET anon.salt = 'This_Is_A_Very_Secret_Salt';

If masked users can read the salt, they can run a brute force attack to retrieve the original data based on the following elements:

The pseudonymized data
The hashing algorithm (see anon.algorithm)
The salt

The salt and the name of the hashing algorithm should be protected with the same level of security that the data itself. This is why the salt should be stored directly within the database with ALTER DATABASE.

Type: text

Default: (empty)

Visible: only to superusers

anon.sourceshema #

The schema (i.e. namespace) where the tables are masked by the dynamic masking engine.

Change this value before starting dynamic masking.

ALTER DATABASE foo SET anon.sourceschema TO 'my_app';

Then reconnect so that the change takes effect and start the engine.

SELECT anon.start_dynamic_masking();

Type: text

Default: public

Visible: to all users

G.1.5. Declare Masking Rules #

The main idea of this extension is to offer anonymization by design.

The data masking rules should be written by people who develop the application because they have the best knowledge of how the data model works. Therefore, masking rules must be implemented directly inside the database schema.

This allows to mask the data directly inside the Postgres Pro instance without using an external tool, and thus limiting the exposure and the risks of data leak.

The data masking rules are declared simply by using security labels:

CREATE TABLE player (id SERIAL, name TEXT, points INT);

INSERT INTO player VALUES
  ( 1, 'Kareem Abdul-Jabbar', 38387),
  ( 5, 'Michael Jordan', 32292 );

SECURITY LABEL FOR anon ON COLUMN player.name
  IS 'MASKED WITH FUNCTION anon.fake_last_name()';

SECURITY LABEL FOR anon ON COLUMN player.id
  IS 'MASKED WITH VALUE NULL';

Important

The maximum length of a masking rule is 1024 characters. If you need more, you should probably write a dedicated masking function.

Important

The masking rules are not inherited. If you have split a table into multiple partitions, you need to declare the masking rules for each partition.

G.1.5.1. Escaping String Literals #

As you may have noticed, the masking rule definitions are placed between single quotes. Therefore, if you need to use a string inside a masking rule, use dollar quoting:

SECURITY LABEL FOR anon ON COLUMN player.name
  IS 'MASKED WITH VALUE $$CONFIDENTIAL$$';

G.1.5.2. Listing Masking Rules #

To display all the masking rules declared in the current database, see anon.pg_masking_rules:

SELECT * FROM anon.pg_masking_rules;

G.1.5.3. Debugging Masking Rules #

When an error occurs due to a wrong masking rule, you can get more detailed information about the problem by setting client_min_messages to DEBUG and you will get useful details.

postgres=# SET client_min_messages=DEBUG;
postgres=# SELECT anon.anonymize_database();

DEBUG:  Anonymize table public.bar with firstname = anon.fake_first_name()
DEBUG:  Anonymize table public.foo with id = NULL
ERROR:  Cannot mask a "NOT NULL" column with a NULL value
HINT:  If privacy_by_design is enabled, add a default value to the column
CONTEXT:  PL/pgSQL function anon.anonymize_table(regclass) line 47 at RAISE
SQL function "anonymize_database" statement 1

G.1.5.4. Removing a Masking Rule #

You can simply erase a masking rule like this:

SECURITY LABEL FOR anon ON COLUMN player.name IS NULL;

To remove all rules at once, you can use:

SELECT anon.remove_masks_for_all_columns();

G.1.6. Masking Functions #

The extension provides functions to implement the following main anonymization strategies:

Depending on your data, you may need to use different strategies on different columns:

For names and other “direct identifiers”, faking is often useful
Shuffling is convenient for foreign keys
Adding noise is interesting for numeric values and dates
Partial scrambling is perfect for email addresses and phone numbers

G.1.6.1. Deletion #

First of all, the fastest and safest way to anonymize a data is to delete it.

In many cases, the best approach to hide the content of a column is to replace all the values with a single static value.

For instance, you can replace an entire column with the word CONFIDENTIAL like this:

SECURITY LABEL FOR anon
  ON COLUMN users.address
  IS 'MASKED WITH VALUE $$CONFIDENTIAL$$';

G.1.6.2. Adding Noise #

This is also called Variance. The idea is to “shift” dates and numeric values. For example, by applying a +/- 10% variance to a salary column, the dataset will remain meaningful.

anon.noise(noise_value anyelement, ratio double precision) #: If ratio is 0.33, the return value will be the original value randomly shifted with a ratio of +/- 33%.
anon.dnoise(noise_value anyelement, noise_range interval) #: If interval = “2 days”, the return value will be the original value randomly shifted by +/- 2 days.

Important

The noise() masking functions are vulnerable to a form of repeat attack, especially with Dynamic Masking. A masked user can guess an original value by requesting its masked value multiple times and then simply use the avg() function to get a close approximation. In a nutshell, these functions are best fitted for Anonymous Dumps and Static Masking. They should be avoided when using Dynamic Masking.

G.1.6.3. Randomization #

The extension provides a large choice of functions to generate purely random data:

G.1.6.3.1. Basic Random Values #

anon.random_date() #: Returns a date.
anon.random_string(n integer) #: Returns a text value containing n letters.
anon.random_zip() #: Returns a 5-digit code.
anon.random_phone(phone_prefix text) #: Returns an 8-digit phone with phone_prefix as a prefix.

G.1.6.3.2. Random Between #

To pick any value inside between two bounds:

random_date_between(d1 date, d2 date) #: Returns a date between d1 and d2.
random_int_between(i1 integer, i2 integer) #: Returns an integer between i1 and i2.
random_bigint_between(b1 bigint, b2 bigint) #: Returns a bigint between b1 and b2.

Note

With these functions, the lower and upper bounds are included. For instance, anon.random_int_between(1,3) returns either 1, 2 or 3.

For more advanced interval descriptions, see Random in Range.

G.1.6.3.3. Random in Array #

The random_in function returns an element in a given array. For example:

anon.random_in(ARRAY[1,2,3]) #: Returns an integer between 1 and 3.
anon.random_in(ARRAY['red','green','blue']) #: Returns a random text value from an array of ['red', 'green', 'blue'].

G.1.6.3.4. Random in Enum #

This function returns an element in a given variable of the ENUM type.

anon.random_in_enum(variable_of_an_enum_type) #: Returns any value.

CREATE TYPE card AS ENUM ('visa', 'mastercard', ‘amex’);
SELECT anon.random_in_enum(NULL::CARD);
 random_in_enum
----------------
 mastercard

CREATE TABLE customer (
  id INT,
  ...
  credit_card CARD
);

SECURITY LABEL FOR anon ON COLUMN customer.creditcard
IS 'MASKED WITH FUNCTION anon.random_in_enum(creditcard)'

G.1.6.3.5. Random in Range #

RANGE types are a powerfull way to describe an interval of values, where you can define inclusive or exclusive bounds. See examples.

There is a function for each subtype of range:

anon.random_in_int4range('[5,6)') #: Returns an INT of value 5.
anon.random_in_int8range('(6,7]') #: Returns a BIGINT of value 7.
anon.random_in_numrange('[0.1,0.9]') #: Returns a NUMERIC between 0.1 and 0.9.
anon.random_in_daterange('[2001-01-01, 2001-12-31)') #: Returns a date in 2001.
anon.random_in_tsrange('[2022-10-01,2022-10-31]') #: Returns a TIMESTAMP in October 2022.
anon.random_in_tstzrange('[2022-10-01,2022-10-31]') #: Returns a TIMESTAMP WITH TIMEZONE in October 2022.

Note

It is not possible to get a random value from a RANGE with an infinite bound. For example anon.random_in_int4range('[2022,)') returns NULL.

G.1.6.4. Faking #

The idea of Faking is to replace sensitive data with random-but-plausible values. The goal is to avoid any identification from the data record while remaining suitable for testing, data analysis and data processing.

The following faking functions are available:

anon.fake_address() #: Returns a complete postal address.
anon.fake_city() #: Returns an existing city.
anon.fake_country() #: Returns an existing country.
anon.fake_company() #: Returns a generic company name.
anon.fake_email() #: Returns a valid email address.
anon.fake_first_name() #: Returns a generic first name.
anon.fake_iban() #: Returns a valid IBAN.
anon.fake_last_name() #: Returns a generic last name.
anon.fake_postcode() #: Returns a valid zip code.

For text and varchar columns, you can use the classic Lorem Ipsum generator:

anon.lorem_ipsum() #: Returns five paragraphs.
anon.lorem_ipsum(2) #: Returns two paragraphs.
anon.lorem_ipsum( paragraphs := 4 ) #: Returns four paragraphs.
anon.lorem_ipsum( words := 20 ) #: Returns 20 words.
anon.lorem_ipsum( characters := anon.length(table.column) ) #: Returns the same number of characters as the original string.

G.1.6.5. Advanced Faking #

Generating fake data is a complex topic. The functions provided here are limited to basic use case. For more advanced faking methods, in particular if you are looking for localized fake data, take a look at PostgreSQL Faker, an extension based upon the well-known Faker Python library.

This extension provides an advanced faking engine with localization support.

For example:

CREATE SCHEMA faker;
CREATE EXTENSION faker SCHEMA faker;
SELECT faker.faker('de_DE');
SELECT faker.first_name_female();
 first_name_female
-------------------
 Mirja

G.1.6.6. Pseudonymization #

Pseudonymization is similar to Faking in the sense that it generates realistic values. The main difference is that the pseudonymization is deterministic: the functions will always return the same fake value based on a seed and an optional salt.

The following pseudonymization functions are available:

anon.pseudo_first_name(seed anyelement, salt text) #: Returns a generic first name.
anon.pseudo_last_name(seed anyelement, salt text) #: Returns a generic last name.
anon.pseudo_email(seed anyelement, salt text) #: Returns a valid email address.
anon.pseudo_city(seed anyelement, salt text) #: Returns an existing city.
anon.pseudo_country(seed anyelement, salt text) #: Returns an existing country.
anon.pseudo_company(seed anyelement, salt text) #: Returns a generic company name.
anon.pseudo_iban(seed anyelement, salt text) #: Returns a valid IBAN.
anon.pseudo_siret(seed anyelement, salt text) #: Returns a valid SIRET.

The second argument (salt) is optional. You can call each function only with the seed like this: anon.pseudo_city('bob'). The salt is here to increase complexity and avoid dictionary and brute force attacks. If a specific salt is not given, the value of the anon.salt GUC parameter is used instead (see Generic Hashing for more details).

The seed can be any information related to the subject. For instance, we can consistently generate the same fake email address for a given person by using her login as the seed:

SECURITY LABEL FOR anon
  ON COLUMN users.emailaddress
  IS 'MASKED WITH FUNCTION anon.pseudo_email(users.login)';

Note

You may want to produce unique values using a pseudonymization function. For instance, if you want to mask an email column that is declared as UNIQUE. In this case, you will need to initialize the extension with a fake dataset that is way bigger than the number of rows of the table. Otherwise, you may see some “collisions” happening, i.e. two different original values producing the same pseudo value.

Important

Pseudonymization is often confused with anonymization but in fact they serve two different purposes: pseudonymization is a way to protect the personal information but the pseudonymized data is still “linked” to the real data.

G.1.6.7. Generic Hashing #

In theory, hashing is not a valid anonymization technique, however in practice it is sometimes necessary to generate a determinist hash of the original data.

For instance, when a pair of primary key / foreign key is a “natural key”, it may contain actual information (like a customer number containing a birth date or something similar).

Hashing such columns allows to keep referential integrity intact even for relatively unusual source data.

anon.digest(value text, salt text, algorithm text) #

Lets you choose a salt and a hash algorithm. Supported algorithms are: md5, sha1, sha224, sha256, sha384, and sha512.

anon.hash(value text) #

Returns a text hash of the value using a secret salt that is defined by the anon.salt parameter and hash algorithm that is defined by the anon.algorithm parameter. The default value of anon.algorithm is sha256 and possible values are: md5, sha1, sha224, sha256, sha384 or sha512. The default value of anon.salt is an empty string. You can modify these values with:

ALTER DATABASE foo SET anon.salt TO 'xsfnjefnjsnfjsnf';
ALTER DATABASE foo SET anon.algorithm TO 'sha384';

Keep in mind that hashing is a form of Pseudonymization. This means that the data can be “de-anonymized” using the hashed value and the masking function. If attackers get access to these two elements, they could re-identify some persons using brute force or dictionary attacks. Therefore, the salt and the algorithm used to hash the data must be protected with the same level of security as the original dataset.

In a nutshell, we recommend that you use the anon.hash() function rather than anon.digest() because the salt will not appear clearly in the masking rule.

Furthermore, in practice the hash function will return a long string of character like this:

SELECT anon.hash('bob');
                                  hash
----------------------------------------------------------------------------------------------------------------------------------
95b6accef02c5a725a8c9abf19ab5575f99ca3d9997984181e4b3f81d96cbca4d0977d694ac490350e01d0d213639909987ef52de8e44d6258d536c55e427397

For some columns, this may be too long and you may have to cut some parts of the hash in order to fit into the column. For instance, if you have a foreign key based on a phone number and the column is a varchar(12), you can transform the data like this:

SECURITY LABEL FOR anon ON COLUMN people.phone_number
IS 'MASKED WITH FUNCTION anon.left(anon.hash(phone_number),12)';

SECURITY LABEL FOR anon ON COLUMN call_history.fk_phone_number
IS 'MASKED WITH FUNCTION anon.left(anon.hash(fk_phone_number),12)';

Of course, cutting the hash value to 12 characters will increase the risk of “collision” (2 different values having the same fake hash). In such case, it's up to you to evaluate this risk.

G.1.6.8. Partial Scrambling #

Partial scrambling leaves out some part of the data. For instance, a credit card number can be replaced by “40XX XXXX XXXX XX96”.

anon.partial(input text, prefix int, padding text, suffix int) #: Partially replaces a given text. For example, anon.partial('abcdefgh',1,'xxxx',3) returns axxxxfgh.
anon.partial_email(email text) #: Partially replaces a given email. For example, anon.partial_email('daamien@gmail.com') returns da******@gm******.com.

G.1.6.9. Conditional Masking #

In some situations, you may want to apply a masking filter only for some value or for a limited number of lines in the table.

For instance, if you want to "preserve NULL values", that is masking only the lines that contain a value, you can use the anon.ternary function, which works like a CASE WHEN x THEN y ELSE z statement:

SECURITY LABEL FOR anon ON COLUMN player.score
  IS 'MASKED WITH FUNCTION anon.ternary(score IS NULL,
                                        NULL,
                                        anon.random_int_between(0,100));

You may also want to exclude some lines within the table. Like keeping the password of some users so that they still may be able to connect to a testing deployment of your application:

SECURITY LABEL FOR anon ON COLUMN account.password
  IS 'MASKED WITH FUNCTION anon.ternary( id > 1000, NULL::TEXT, password)';

Warning

Conditional masking may create a partially deterministic "connection" between the original data and the masked data. And that connection can be used to retrieve personal information from the masked data. For instance, if NULL values are preserved for a "deceased_date" column, it will reveal which persons are still actually alive... In a nutshell: conditional masking may often produce a dataset that is not fully anonymized and therefore would still technically contain personal information.

G.1.6.10. Generalization #

Generalization is the principle of replacing the original value by a range containing this value. For instance, instead of saying “Paul is 42 years old”, you would say “Paul is between 40 and 50 years old”.

Note

The generalization functions perform data type transformation. Therefore, it is not possible to use them with the dynamic masking engine. However, they are useful to create anonymized views. See example below.

Let's imagine a table containing health information:

SELECT * FROM patient;
 id |   name   |  zipcode |   birth    |    disease
----+----------+----------+------------+---------------
  1 | Alice    |    47678 | 1979-12-29 | Heart Disease
  2 | Bob      |    47678 | 1959-03-22 | Heart Disease
  3 | Caroline |    47678 | 1988-07-22 | Heart Disease
  4 | David    |    47905 | 1997-03-04 | Flu
  5 | Eleanor  |    47909 | 1999-12-15 | Heart Disease
  6 | Frank    |    47906 | 1968-07-04 | Cancer
  7 | Geri     |    47605 | 1977-10-30 | Heart Disease
  8 | Harry    |    47673 | 1978-06-13 | Cancer
  9 | Ingrid   |    47607 | 1991-12-12 | Cancer

We can build a view upon this table to suppress some columns ( SSN and name ) and generalize the zipcode and the birth date like this:

CREATE VIEW anonymized_patient AS
SELECT
    'REDACTED' AS lastname,
    anon.generalize_int4range(zipcode,100) AS zipcode,
    anon.generalize_tsrange(birth,'decade') AS birth,
    disease
FROM patient;

The anonymized table now looks like that:

SELECT * FROM anonymized_patient;
 lastname |   zipcode     |           birth             |    disease
----------+---------------+-----------------------------+---------------
 REDACTED | [47600,47700) | ["1970-01-01","1980-01-01") | Heart Disease
 REDACTED | [47600,47700) | ["1950-01-01","1960-01-01") | Heart Disease
 REDACTED | [47600,47700) | ["1980-01-01","1990-01-01") | Heart Disease
 REDACTED | [47900,48000) | ["1990-01-01","2000-01-01") | Flu
 REDACTED | [47900,48000) | ["1990-01-01","2000-01-01") | Heart Disease
 REDACTED | [47900,48000) | ["1960-01-01","1970-01-01") | Cancer
 REDACTED | [47600,47700) | ["1970-01-01","1980-01-01") | Heart Disease
 REDACTED | [47600,47700) | ["1970-01-01","1980-01-01") | Cancer
 REDACTED | [47600,47700) | ["1990-01-01","2000-01-01") | Cancer

The generalized values are still useful for statistics because they remain true, but they are less accurate, and therefore reduce the risk of re-identification.

Postgres Pro offers several range data types, which are perfect for dates and numeric values.

For numeric values, the following functions are available:

generalize_int4range(value, step) generalize_int8range(value, step) generalize_numrange(value, step) #: where value is the data that will be generalized, and step is the size of each range.

G.1.6.11. Using `pg_catalog` Functions #

The pg_catalog schema is not trusted by default. This is a security measure designed to prevent users from using sophisticated functions in masking rules (such as pg_catalog.query_to_xml, pg_catalog.ts_stat or the system administration functions) that should not be used as masking functions.

However, the extension provides bindings to some useful and safe functions from the pg_catalog schema for your convenience:

anon.concat(TEXT,TEXT)
anon.concat(TEXT,TEXT, TEXT)
anon.date_add(TIMESTAMP WITH TIME ZONE,INTERVAL)
anon.date_part(TEXT,TIMESTAMP)
anon.date_part(TEXT,INTERVAL)
anon.date_subtract(TIMESTAMP WITH TIME ZONE, INTERVAL )
anon.date_trunc(TEXT,TIMESTAMP)
anon.date_trunc(TEXT,TIMESTAMP WITH TIME ZONE,TEXT)
anon.date_trunc(TEXT,INTERVAL)
anon.left(TEXT,INTEGER)
anon.length(TEXT)
anon.lower(TEXT)
anon.make_date(INT,INT,INT )
anon.make_time(INT,INT,DOUBLE PRECISION)
anon.md5(TEXT)
anon.random()
anon.replace(TEXT,TEXT,TEXT)
anon.regexp_replace(TEXT,TEXT,TEXT)
anon.regexp_replace(TEXT,TEXT,TEXT,TEXT)
anon.right(TEXT,INTEGER)
anon.substr(TEXT,INTEGER)
anon.substr(TEXT,INTEGER,INTEGER)
anon.upper(TEXT)

If you need more bindings, you can either:

Write your own mapping function in a trusted schema.
Set the pg_catalog schema as TRUSTED (not recommended).

G.1.6.12. Write Your Own Masks #

You can also use your own function as a mask. The function must either be destructive (like Partial Scrambling) or insert some randomness in the dataset (like Faking).

Especially for complex data types, you may have to write your own function. This will be a common use case if you have to hide certain parts of a JSON field.

For example:

CREATE TABLE company (
business_name TEXT,
info JSONB
)

The info field contains unstructured data like this:

SELECT jsonb_pretty(info) FROM company WHERE business_name = 'Soylent Green';
          jsonb_pretty
----------------------------------
{
    "employees": [
        {
            "lastName": "Doe",
            "firstName": "John"
        },
        {
            "lastName": "Smith",
            "firstName": "Anna"
        },
        {
            "lastName": "Jones",
            "firstName": "Peter"
        }
    ]
}
(1 row)

Using the JSON functions and operators, you can walk through the keys and replace the sensitive values as needed.

CREATE SCHEMA custom_masks;

-- This step requires superuser privilege
SECURITY LABEL FOR anon ON SCHEMA custom_masks IS 'TRUSTED';

CREATE FUNCTION custom_masks.remove_last_name(j JSONB)
RETURNS JSONB
VOLATILE
LANGUAGE SQL
AS $func$
SELECT
json_build_object(
  'employees' ,
  array_agg(
    jsonb_set(e ,'{lastName}', to_jsonb(anon.fake_last_name()))
  )
)::JSONB
FROM jsonb_array_elements( j->'employees') e
$func$;

Then check that the function is working correctly:

SELECT custom_masks.remove_last_name(info) FROM company;

When it is okay, you can declare this function as the mask of the info field:

SECURITY LABEL FOR anon ON COLUMN company.info
IS 'MASKED WITH FUNCTION custom_masks.remove_last_name(info)';

And try it out:

SELECT anonymize_table('company');
SELECT jsonb_pretty(info) FROM company WHERE business_name = 'Soylent Green';
          jsonb_pretty
-------------------------------------
{
    "employees": [                 +
        {                          +
            "lastName": "Prawdzik",+
            "firstName": "John"    +
        },                         +
        {                          +
            "lastName": "Baltazor",+
            "firstName": "Anna"    +
        },                         +
        {                          +
            "lastName": "Taylan",  +
            "firstName": "Peter"   +
        }                          +
    ]                              +
}
(1 row)

This is just a quick example. As you can see, manipulating a sophisticated JSON structure with SQL is possible but it can be tricky at first. There are multiple ways of walking through the keys and updating values. You will probably have to try different approaches, depending on your real JSON data and the performance you want to reach.

G.1.7. Static Masking #

Sometimes, it is useful to transform directly the original dataset. You can do that with different methods:

These methods will destroy the original data. Use with care.

G.1.7.1. Applying Masking Rules #

You can permanently apply the masking rules of a database with anon.anonymize_database().

Let's use a basic example:

CREATE TABLE customer (
  id SERIAL,
  full_name TEXT,
  birth DATE,
  employer TEXT,
  zipcode TEXT,
  fk_shop INTEGER
);

INSERT INTO customer
VALUES
(911,'Chuck Norris','1940-03-10','Texas Rangers', '75001',12),
(312,'David Hasselhoff','1952-07-17','Baywatch', '90001',423)
;

SELECT * FROM customer;

 id  |   full_name      |   birth    |    employer   | zipcode | fk_shop
-----+------------------+------------+---------------+---------+---------
 911 | Chuck Norris     | 1940-03-10 | Texas Rangers | 75001   | 12
 112 | David Hasselhoff | 1952-07-17 | Baywatch      | 90001   | 423

Declare the masking rules:

SECURITY LABEL FOR anon ON COLUMN customer.full_name
IS 'MASKED WITH FUNCTION anon.fake_first_name() || '' '' || anon.fake_last_name()';

SECURITY LABEL FOR anon ON COLUMN customer.employer
IS 'MASKED WITH FUNCTION anon.fake_company()';

SECURITY LABEL FOR anon ON COLUMN customer.zipcode
IS 'MASKED WITH FUNCTION anon.random_zip()';

Replace authentic data in the masked columns:

SELECT anon.anonymize_database();

SELECT * FROM customer;

 id  |  full_name  |   birth    |      employer       | zipcode | fk_shop
-----+-------------+------------+---------------------+---------+---------
 911 | Jesse Kosel | 1940-03-10 | Marigold Properties | 62172   |      12
 312 | Leolin Bose | 1952-07-17 | Inventure           | 20026   |     423

You can also use anonymize_table() and anonymize_column() to remove data from a subset of the database:

SELECT anon.anonymize_table('customer');
SELECT anon.anonymize_column('customer','zipcode');

Important

Static masking is a slow process. The principle of static masking is to update all lines of all tables containing at least one masked column. This basically means that the server will rewrite all the data on disk. Depending on the database size, the hardware and the instance configuration, it may be faster to export the anonymized data (see Anonymous Dumps) and reload it into the database.

G.1.7.2. Shuffling #

Shuffling mixes values within the same columns.

anon.shuffle_column(shuffle_table regclass, shuffle_column name, primary_key name) #: Rearranges all values in a given column. You need to provide the primary key of the table.

This is useful for foreign keys because referential integrity will be kept.

Important

shuffle_column() is not a masking function because it works “verticaly”: it will modify all the values of a column at once.

G.1.7.3. Adding Noise to a Column #

There are also some functions that can add noise on an entire column:

anon.add_noise_on_numeric_column(table regclass, column text, ratio float) #: If ratio = 0.33, all values of the column will be randomly shifted with a ratio of +/- 33%.
anon.add_noise_on_datetime_column(table regclass, column text, interval interval) #: If interval = 2 days, all values of the column will be randomly shifted by +/- 2 days.

Important

These noise functions are vulnerable to a form of repeat attack.

G.1.8. Dynamic Masking #

You can hide some data from a role by declaring this role as a “MASKED” one. Other roles will still access the original data.

CREATE TABLE people (id TEXT, firstname TEXT, lastname TEXT, phone TEXT);
INSERT INTO people VALUES ('T1','Sarah', 'Connor','0609110911');
SELECT * FROM people;

SELECT * FROM people;
 id | firstname | lastname |   phone
----+----------+----------+------------
 T1 | Sarah    | Connor    | 0609110911
(1 row)

Activate the dynamic masking engine:
```
SELECT anon.start_dynamic_masking();
```

Declare a masked user:

CREATE ROLE skynet LOGIN;
SECURITY LABEL FOR anon ON ROLE skynet
IS 'MASKED';

Declare the masking rules:

SECURITY LABEL FOR anon ON COLUMN people.lastname
IS 'MASKED WITH FUNCTION anon.fake_last_name()';

SECURITY LABEL FOR anon ON COLUMN people.phone
IS 'MASKED WITH FUNCTION anon.partial(phone,2,$$******$$,2)';

Connect with the masked user:

\c - skynet
SELECT * FROM people;
 id | firstname | lastname  |   phone
----+----------+-----------+------------
 T1 | Sarah    | Stranahan | 06******11
(1 row)

G.1.8.1. How to Change the Type of a Masked Column #

When dynamic masking is activated, you are not allowed to change the datatype of a column if there's a mask upon it.

To modify a masked column, you need to temporarily switch off the masking engine like this:

BEGIN;
SELECT anon.stop_dynamic_masking();
ALTER TABLE people ALTER COLUMN phone TYPE VARCHAR(255);
SELECT anon.start_dynamic_masking();
COMMIT;

G.1.8.2. How to Drop a Masked Table #

The dynamic masking engine will build masking views upon the masked tables. This means that it is not possible to drop a masked table directly. You will get an error like this:

DROP TABLE people;
psql: ERROR:  cannot drop table people because other objects depend on it
DETAIL:  view mask.company depends on table people

To effectively remove the table, it is necessary to add the CASCADE option, so that the masking view will be dropped too:

DROP TABLE people CASCADE;

G.1.8.3. How to Unmask a Role #

Simply remove the security label like this:

SECURITY LABEL FOR anon ON ROLE bob IS NULL;

To unmask all masked roles at once, you can type:

SELECT anon.remove_masks_for_all_roles();

G.1.8.4. Limitations #

G.1.8.4.1. Listing the Tables #

Due to how the dynamic masking engine works, when a masked role tries to display the tables in psql with the \dt command, then psql will not show any tables.

This is because the search_path of the masked role is rigged.

You can try adding explicit schema you want to search, for instance:

\dt *.*
\dt public.*

G.1.8.4.2. Only One Schema #

The dynamic masking system only works with one schema (by default public). When you start the masking engine with start_dynamic_masking(), you can specify the schema that will be masked as follows:

ALTER DATABASE foo SET anon.sourceschema TO 'sales';

Then open a new session to the database and type:

SELECT start_dynamic_masking();

However, static masking with anon.anonymize_column(), anon.anonymize_table(), and anon.anonymize_database() will work fine with multiple schemas.

G.1.8.4.3. Performance #

Dynamic Masking is known to be very slow with some queries, especially if you try to join two tables on a masked key using hashing or pseudonymization.

G.1.8.4.4. Graphic Tools #

When you are using a masked role with a graphic interface, such as DBeaver or pgAdmin, the data panel may produce the following error when trying to display the content of a masked table called foo:

SQL Error [42501]: ERROR: permission denied for table foo

This is because most of these tools will directly query the public.foo table instead of being “redirected” by the masking engine toward the mask.foo view.

In order the view the masked data with a graphic tool, you can either:

Open the SQL query panel and type SELECT * FROM foo

Navigate to Database > Schemas > mask > Views > foo

G.1.9. Anonymous Dumps #

G.1.9.1. pg_dump_anon.sh #

The pg_dump_anon.sh script supports most of the options of the regular pg_dump command. The environment variables (PGHOST, PGUSER, etc.) and the .pgpass files are also supported.

G.1.9.2. Example #

A user named bob can export an anonymous dump of the app database like this:

/opt/pgpro/ent-16/bin/pg_dump_anon.sh -h localhost -U bob --password --file=anonymous_dump.sql app

Important

The name of the database must be the last parameter.

For more details about the supported options, simply type ./pg_dump_anon.sh --help.

G.1.9.3. Limitations #

The user password is asked automatically. This means you must either add the --password option to define it interactively or declare it in the PGPASSWORD variable or put it inside the .pgpass file ( however on Windows,the PGPASSFILE variable must be specified explicitly)
The plain format is the only supported format. The other formats (custom, dir and tar) are not supported.

G.1.10. Generalization #

G.1.10.1. Reducing the Accuracy of Sensitive Data #

The idea of generalization is to replace data with a broader, less accurate value. For instance, instead of saying “Bob is 28 years old”, you can say “Bob is between 20 and 30 years old”. This is interesting for analytics because the data remains true while avoiding the risk of re-identification.

Generalization is a way to achieve k-anonymity.

Postgres Pro can handle generalization very easily with the range data types, a very powerful way to store and manipulate a set of values contained between a lower and an upper bound.

G.1.10.2. Example #

Here's a basic table containing medical data:

SELECT * FROM patient;
     ssn     | firstname | zipcode |   birth    |    disease
-------------+-----------+---------+------------+---------------
 253-51-6170 | Alice     |   47012 | 1989-12-29 | Heart Disease
 091-20-0543 | Bob       |   42678 | 1979-03-22 | Allergy
 565-94-1926 | Caroline  |   42678 | 1971-07-22 | Heart Disease
 510-56-7882 | Eleanor   |   47909 | 1989-12-15 | Acne
 098-24-5548 | David     |   47905 | 1997-03-04 | Flu
 118-49-5228 | Jean      |   47511 | 1993-09-14 | Flu
 263-50-7396 | Tim       |   47900 | 1981-02-25 | Heart Disease
 109-99-6362 | Bernard   |   47168 | 1992-01-03 | Asthma
 287-17-2794 | Sophie    |   42020 | 1972-07-14 | Asthma
 409-28-2014 | Arnold    |   47000 | 1999-11-20 | Diabetes
(10 rows)

We want the anonymized data to remain true because it will be used for statistics. We can build a view upon this table to remove useless columns and generalize the indirect identifiers:

CREATE MATERIALIZED VIEW generalized_patient AS
SELECT
  'REDACTED'::TEXT AS firstname,
  anon.generalize_int4range(zipcode,1000) AS zipcode,
  anon.generalize_daterange(birth,'decade') AS birth,
  disease
FROM patient;

This will give us a less accurate view of the data:

SELECT * FROM generalized_patient;
 firstname |    zipcode    |          birth          |    disease
-----------+---------------+-------------------------+---------------
 REDACTED  | [47000,48000) | [1980-01-01,1990-01-01) | Heart Disease
 REDACTED  | [42000,43000) | [1970-01-01,1980-01-01) | Allergy
 REDACTED  | [42000,43000) | [1970-01-01,1980-01-01) | Heart Disease
 REDACTED  | [47000,48000) | [1980-01-01,1990-01-01) | Acne
 REDACTED  | [47000,48000) | [1990-01-01,2000-01-01) | Flu
 REDACTED  | [47000,48000) | [1990-01-01,2000-01-01) | Flu
 REDACTED  | [47000,48000) | [1980-01-01,1990-01-01) | Heart Disease
 REDACTED  | [47000,48000) | [1990-01-01,2000-01-01) | Asthma
 REDACTED  | [42000,43000) | [1970-01-01,1980-01-01) | Asthma
 REDACTED  | [47000,48000) | [1990-01-01,2000-01-01) | Diabetes
(10 rows)

G.1.10.3. Generalization Functions #

pgpro_anonymizer provides generalization functions, one for each range type. Generally, these functions take the original value as the first parameter, and a second parameter for the length of each step.

For numeric values:

anon.generalize_int4range(value integer, step integer) #: For example, anon.generalize_int4range(42,5) returns the range [40,45).
anon.generalize_int8range(value integer, step integer) #: For example, anon.generalize_int8range(12345,1000) returns the range [12000,13000).
anon.generalize_numrange(value integer, step integer) #: For example, anon.generalize_numrange(42.32378,10) returns the range [40,50).

For time values:

anon.generalize_tsrange(value integer, step integer) #: For example, anon.generalize_tsrange('1904-11-07','year') returns ['1904-01-01','1905-01-01').
anon.generalize_tstzrange(value integer, step integer) #: For example, anon.generalize_tstzrange('1904-11-07','week') returns ['1904-11-07','1904-11-14').
anon.generalize_daterange(value integer, step integer) #: For example, anon.generalize_daterange('1904-11-07','decade') returns [1900-01-01,1910-01-01).

The possible steps are: microsecond, millisecond, second, minute, hour, day, week, month, year, decade, century and millennium.

G.1.10.4. Limitations #

G.1.10.4.1. Singling Out and Extreme Values #

Singling Out is the possibility to isolate an individual in a dataset by using extreme value or exceptional values.

For example:

SELECT * FROM employees;

  id  |  name          | job  | salary
------+----------------+------+--------
 1578 | xkjefus3sfzd   | NULL |    1498
 2552 | cksnd2se5dfa   | NULL |    2257
 5301 | fnefckndc2xn   | NULL |   45489
 7114 | npodn5ltyp3d   | NULL |    1821

In this table, we can see that a particular employee has a very high salary, very far from the average salary. Therefore this person is probably the CEO of the company.

With generalization, this is important because the size of the range (the “step”) must be wide enough to prevent the identification of one single individual.

k-anonymity is a way to assess this risk.

G.1.10.4.2. Generalization is Not Compatible with Dynamic Masking #

By definition, with generalization the data remains true, but the column type is changed.

This means that the transformation is not transparent, and therefore it cannot be used with dynamic masking.

G.1.10.5. k-anonymity #

k-anonymity is an industry-standard term used to describe a property of an anonymized dataset. The k-anonymity principle states that within a given dataset, any anonymized individual cannot be distinguished from at least k-1 other individuals. In other words, k-anonymity might be described as a “hiding in the crowd” guarantee. A low value of k indicates there's a risk of re-identification using linkage with other data sources.

anon.k_anonymity(id regclass) #

You can evaluate the k-anonymity factor of a table as follows:

Define the columns that are indirect identifiers (also known as quasi identifiers) like this:

SECURITY LABEL FOR k_anonymity ON COLUMN patient.firstname
IS 'INDIRECT IDENTIFIER';

SECURITY LABEL FOR k_anonymity ON COLUMN patient.zipcode
IS 'INDIRECT IDENTIFIER';

SECURITY LABEL FOR k_anonymity ON COLUMN patient.birth
IS 'INDIRECT IDENTIFIER';

Once the indirect identifiers are declared:
```
SELECT anon.k_anonymity('generalized_patient')
```
The higher the value, the better.

G.1.11. Performance #

Any anonymization process has a price as it will consume CPU time, RAM space and probably a lot of disk I/O. Here's a quick overview of the question depending on the strategy you are using.

In a nutshell, the anonymization performance mainly depends on the following important factors:

The size of the database
The number of masking rules

G.1.11.1. Static Masking #

Basically what static masking does it rewrite entirely the masked tables on disk. This may be slow depending on your environment. And during this process, the tables will be locked.

Note

In this case, the cost of anonymization is “paid” by all the users but it is paid once and for all.

G.1.11.2. Dynamic Masking #

With dynamic masking, the real data is replaced on-the-fly every time a masked user sends a query to the database. This means that the masking users will have slower response time than regular (unmasked) users. This is generally ok because usually masked users are not considered as important as the regular ones.

If you apply three or four rules to a table, the response time for the masked users should be approximately 20% to 30% slower than for the normal users.

As the masking rules are applied for each queries of the masked users, the dynamic masking is appropriate when you have a limited number of masked users that connect only from time to time to the database. For instance, a data analyst connecting once a week to generate a business report.

If there are multiple masked users or if a masked user is very active, you should probably export the masked data once-a-week on a secondary instance and let these users connect to this secondary instance.

Note

In this case, the cost of anonymization is “paid” only by the masked users.

G.1.11.3. Anonymous Dumps #

If the backup process of your database takes one hour with pg_dump, then anonymizing and exporting the entire database with pg_dump_anon.sh will probably take two hours.

Note

In this case, the cost of anonymization is “paid” by the user asking for the anonymous export. Other users of the database will not be affected.

G.1.11.4. How to Speed Things Up #

G.1.11.4.1. Prefer `MASKED WITH VALUE` Whenever Possible #

It is always faster to replace the original data with a static value instead of calling a masking function.

G.1.11.4.2. Sampling #

If you need to anonymize data for testing purposes, chances are that a smaller subset of your database will be enough. In that case, you can easily speed up the anonymization by downsizing the volume of data.

See Sampling for more details.

G.1.11.4.3. Materialized Views #

Dynamic masking is not always required. In some cases, it is more efficient to build Materialized Views instead.

For instance:

CREATE MATERIALIZED VIEW masked_customer AS
SELECT
    id,
    anon.random_last_name() AS name,
    anon.random_date_between('1920-01-01'::DATE,now()) AS birth,
    fk_last_order,
    store_id
FROM customer;

G.1.12. Security #

G.1.12.1. Permissions #

Here's an overview of what users can do depending on the privilege they have:

Table G.1. Privileges

Action	Superuser	Owner	Masked Role
Create the extension	Yes
Drop the extension	Yes
Init the extension	Yes
Reset the extension	Yes
Configure the extension	Yes
Put a mask upon a role	Yes
Start dynamic masking	Yes
Stop dynamic masking	Yes
Create a table	Yes	Yes
Declare a masking rule	Yes	Yes
Insert, delete, update a row	Yes	Yes
Use static masking	Yes	Yes
Select the real data	Yes	Yes
Use regular dumps	Yes	Yes
Use anonymous dumps	Yes	Yes
Use the masking functions	Yes	Yes	Yes
Select the masked data	Yes	Yes	Yes
View the masking rules	Yes	Yes	Yes

G.1.12.2. Limit Masking Filters Only to Trusted Schemas #

By default, database owners can only write masking rules with functions that are located in the trusted schemas which are controlled by the superusers.

Out of the box, only the anon schema is declared as trusted. This means that by default the functions from the pg_catalog cannot be used in masking rules.

For more information, see Using pg_catalog functions.

G.1.12.3. Security Context of the Functions #

Most of the functions of this extension are declared with the SECURITY INVOKER tag. This means that these functions are executed with the privileges of the user that calls them. This is an important restriction.

This extension contains another few functions declared with the tag SECURITY DEFINER.

G.1.13. Sampling #

If you are writing an anonymization policy for a dataset, chances are that you do not need to anonymize the entire database. In most cases, extracting a subset of the table is sufficient. For example, if you want to export an anonymous dumps of the data for testing purpose in a CI workflow, extracting and masking only 10% of the database might be enough.

Furthermore, anonymizing a smaller portion (that is a "sample") of the dataset will be way faster.

G.1.13.1. Example #

Let's say you have a huge amount of HTTP logs stored in a table. You want to remove the IP addresses and extract only 10% of the table:

CREATE TABLE http_logs (
  id integer NOT NULL,
  date_opened DATE,
  ip_address INET,
  url TEXT
);

SECURITY LABEL FOR anon ON COLUMN http_logs.ip_address
IS 'MASKED WITH VALUE NULL';

SECURITY LABEL FOR anon ON TABLE http_logs
IS 'TABLESAMPLE BERNOULLI(10)';

Now you can either do static masking, dynamic masking or an anonymous dumps. The mask data will represent a 10% portion of the real data.

G.1.13.2. Syntax #

The syntax is exactly the same as the TABLESAMPLE clause, which can be placed at the end of a SELECT statement.

You can also define a sampling ratio at the database level, and it will be applied to all the tables that do not have their own TABLESAMPLE rule.

SECURITY LABEL FOR anon ON DATABASE app
IS 'TABLESAMPLE SYSTEM(33)';

G.1.13.3. Maintaining Referential Integrity #

Warning

The sampling method described above WILL FAIL if you have foreign keys pointing at the table you want to sample.

Extracting a subset of a database while maintaining referential integrity is tricky and it is not supported by this extension.

If you really need to keep referential integrity in an anonymized dataset, you need to do it in two steps:

Extract a sample with pg_sample.
Anonymize the extracted sample.

There may be other sampling tools for PostgreSQL but pg_sample is probably the best one.

Prev	Up	Next
Appendix G. Postgres Pro Modules and Extensions Shipped as Individual Packages	Home	G.2. pgpro_datactl — manage Postgres Pro Enterprise data files

epub pdf

G.1. pgpro_anonymizer — mask or replace sensitive data #

G.1.1. Terms and Definitions #

G.1.2. Anonymization Example #

G.1.3. Installation and Setup #

Note

G.1.4. Configuration #

G.1.5. Declare Masking Rules #

Important

Important

G.1.5.1. Escaping String Literals #

G.1.5.2. Listing Masking Rules #

G.1.5.3. Debugging Masking Rules #

G.1.5.4. Removing a Masking Rule #

G.1.6. Masking Functions #

G.1.6.1. Deletion #

G.1.6.2. Adding Noise #

Important

G.1.6.3. Randomization #

G.1.6.3.1. Basic Random Values #

G.1.6.3.2. Random Between #

Note

G.1.6.3.3. Random in Array #

G.1.6.3.4. Random in Enum #

G.1.6.3.5. Random in Range #

Note

G.1.6.4. Faking #

G.1.6.5. Advanced Faking #

G.1.6.6. Pseudonymization #

Note

Important

G.1.6.7. Generic Hashing #

G.1.6.8. Partial Scrambling #

G.1.6.9. Conditional Masking #

Warning

G.1.6.10. Generalization #

Note

G.1.6.11. Using pg_catalog Functions #

G.1.6.12. Write Your Own Masks #

G.1.7. Static Masking #

G.1.7.1. Applying Masking Rules #

Important

G.1.7.2. Shuffling #

Important

G.1.7.3. Adding Noise to a Column #

Important

G.1.8. Dynamic Masking #

G.1.8.1. How to Change the Type of a Masked Column #

G.1.8.2. How to Drop a Masked Table #

G.1.8.3. How to Unmask a Role #

G.1.8.4. Limitations #

G.1.8.4.1. Listing the Tables #

G.1.8.4.2. Only One Schema #

G.1.8.4.3. Performance #

G.1.8.4.4. Graphic Tools #

G.1.9. Anonymous Dumps #

G.1.9.1. pg_dump_anon.sh #

G.1.9.2. Example #

Important

G.1.9.3. Limitations #

G.1.10. Generalization #

G.1.10.1. Reducing the Accuracy of Sensitive Data #

G.1.10.2. Example #

G.1.10.3. Generalization Functions #

G.1.10.4. Limitations #

G.1.10.4.1. Singling Out and Extreme Values #

G.1.10.4.2. Generalization is Not Compatible with Dynamic Masking #

G.1.10.5. k-anonymity #

G.1.11. Performance #

G.1.11.1. Static Masking #

Note

G.1.11.2. Dynamic Masking #

Note

G.1.11.3. Anonymous Dumps #

Note

G.1.11.4. How to Speed Things Up #

G.1.11.4.1. Prefer MASKED WITH VALUE Whenever Possible #

G.1.11.4.2. Sampling #

G.1.11.4.3. Materialized Views #

G.1.12. Security #

G.1.12.1. Permissions #

G.1.6.11. Using `pg_catalog` Functions #

G.1.11.4.1. Prefer `MASKED WITH VALUE` Whenever Possible #