3.2. Storages #

Storages are physical locations of Parquet files and shared directories. Postgres Pro AXE supports the following types of storages:

  • Local storages: Arrays of NVMe (Non-Volatile Memory Express) disks on servers with Postgres Pro AXE installed.

  • Network storages: The Network File Systems (NFS).

  • S3 storages.

You can select the storage type using information provided in the table below.

Local Storage

Network Storage

S3 Storage

Throughput

High.

It is determined by the number of NVMe disks in the array.

Medium.

It is limited by the capacity of the network interface of the server, network load, and network storage speed.

Medium.

It is limited by the capacity of the network interface of the server, network load, and S3 storage speed.

Data scalability

Medium.

It is determined by the number and volume of NVMe disks.

High.

High.

Data distribution between servers

Not supported.

Within the network of the organization.

Global.

High availability provider

Postgres Pro AXE server administrator.

NFS administrator.

S3 provider.

Cost per terabyte and per one storage request

Low.

Medium.

Depends on the S3 provider.

The metadata of storages is stored in the pga_storage metadata table.

Any directory structure can be used for storing the OLAP data, for example:

  • In a local or network storage:

    root_path/db_name/schema_name/table_name
    
  • In an S3 storage:

      s3://bucket/db_name/schema_name/table_name
    

pgpro_axe allows automatically exporting the OLAP data to multiple Parquet files and adding a unique number to each file name. It is easier to store the OLAP data as multiple Parquet files of the same size.

You can also use Hive partitioning to organize the OLAP data by partition keys in a directory hierarchy:

  table_name
  ├── year=2024
  │    ├── month=1
  │    │   ├── file1.parquet
  │    │   └── file2.parquet
  │    └── month=2
  │        └── file3.parquet
  └── year=2025
      ├── month=11
      │   ├── file4.parquet
      │   └── file5.parquet
      └── month=12
          └── file6.parquet

This type of hierarchy can be useful for large historical tables when analytical queries require the data associated with a specific subset of partition keys.

Performing filter pushdown on the path level is also supported. This allows skipping paths that do not contain the required OLAP data when reading.

Postgres Pro foreign servers are used to work with S3 storages. This allows storing connection parameters and user credentials securely within Postgres Pro without specifying them in functions. Currently, you can create only one S3 storage.

3.2.1. Pre-configuring an S3 storage #

Before creating an S3 storage, you must pre-configure it.

To pre-configure an S3 storage:

  1. Create an S3 server:

      CREATE SERVER simple_s3_secret
      TYPE 's3'
      FOREIGN DATA WRAPPER duckdb
      OPTIONS (
          endpoint 'IP_address_and_port_number',
          region 'region_name',
          use_ssl 'true' or 'false',
          url_style 'vhost' or 'path',
          url_compatibility_mode 'true' or 'false'
      );
    

    Where:

    • endpoint: The IP address and port number of the S3 server.

    • region: The region where the S3 server is located.

    • use_ssl: Specifies whether the S3 server uses HTTPS.

      Possible values:

      • true

      • false

    • url_style: The style of the server URL.

      Possible values:

      • vhost

      • path

    • url_compatibility_mode: Helps when the server URL contains problematic symbols.

      Possible values:

      • true

      • false

    Important

    The server name must be simple_s3_secret because pgpro_metastore uses connection parameters and credentials registered in postgres_fdw under this hostname.

    If you specify another hostname, connection parameters and credentials will not be found when creating the storage.

  2. Create a user mapping for pgpro_metastore:

      CREATE USER MAPPING FOR metastore_admin
      SERVER simple_s3_secret
      OPTIONS (
          key_id 'access_key',
          secret 'secret_key'
      );
    

    Where:

    • key_id: The access key (login) for the S3 storage.

    • secret: The secret key for the S3 storage.

    Important

    The user for whom you create a mapping must have duckdb.postgres_role.

3.2.2. Creating a Storage #

Before creating an S3 storage, pre-configure it.

To create a storage, execute the following query:

  SELECT metastore.add_storage('storage_name', 'root_directory_URI', 'temporary_directory_URI');

Where:

  • storage_name: The unique storage name.

  • root_directory_URI: The URI of the root directory of the storage.

  • temporary_directory_URI: The URI of the storage directory for temporary files.

Local storages have simple URIs, for example, file:///home/j.doe/workspace/axe/.

S3 storages have URIs with more complicated structure and parsing, for example, s3://premdb/team01.csv?versionId=w_B5qT8s5MkiT09.IRHay0lW.PycsHTS. These URIs are generated using third-party libraries.

Example 3.7. 

  SELECT metastore.add_storage('example_storage', 'file:///tmp/example_storage_dir/', 'file:///tmp/example_storage_dir/tmp_dir/');

Once the query is executed, pgpro_metastore performs the following actions:

  1. Verifies user privileges.

  2. Ensures that there is enough free space for the storage, by default 10 GB is required.

  3. Ensures that specified directories exist.

  4. Creates a new storage in the pga_storage metadata table.

3.2.3. Considerations for Working with a Local Storage #

The local storage is disabled by default, preventing users from exporting or reading files from it:

Example 3.8. 

postgres=> show duckdb.disabled_filesystems;
duckdb.disabled_filesystems
-----------------------------
LocalFileSystem
(1 row)

postgres=> COPY test TO 'out.parquet';
ERROR:  (PGDuckDB/DuckdbUtilityHook_Cpp) Permission Error: File system LocalFileSystem has been disabled by configuration
postgres=> SELECT * FROM read_parquet('test.parquet');
ERROR:  (PGDuckDB/CreatePlan) Prepared query returned an error: 'Permission Error: File system LocalFileSystem has been disabled by configurationinput>

If the local storage is enabled, members of the role specified in the duckdb.postgres_role configuration parameter have the same privileges within this storage as the postgres user. It is required for reading and writing to Parquet files:

Example 3.9. 

postgres=> show duckdb.disabled_filesystems;
duckdb.disabled_filesystems
-----------------------------

(1 row)

postgres=> COPY test TO 'out.parquet';
COPY 7
postgres=> SELECT * FROM read_parquet('out.parquet');
a | b
---+---
1 | 2
...
(7 rows)

However, in this case, users can also access files in the PGDATA directory:

Example 3.10. 

postgres=> show duckdb.disabled_filesystems ;
duckdb.disabled_filesystems
-----------------------------

(1 row)

postgres=> SELECT pg_read_file('postgresql.auto.conf');
ERROR:  permission denied for function pg_read_file

postgres=> SELECT * FROM duckdb.query('SELECT content FROM read_text(''postgresql.auto.conf'')');
                    content
-------------------------------------------------------
# Do not edit this file manually!                    +
# It will be overwritten by the ALTER SYSTEM command.+

Whereas if you disable the local storage:

Example 3.11. 

postgres=> show duckdb.disabled_filesystems ;
duckdb.disabled_filesystems
-----------------------------
LocalFileSystem
(1 row)

postgres=> SELECT * FROM duckdb.query('SELECT content FROM read_text(''postgresql.auto.conf'')');
ERROR:  (PGDuckDB/CreatePlan) Prepared query returned an error: 'Permission Error: File system LocalFileSystem has been disabled by configuration

Recommendations for using pgpro_axe with a local storage:

  • Use a local storage only if you trust users to call pgpro_axe functions correctly.

  • Isolate Postgres Pro processes from files outside the PGDATA directory and files to be read or written.

3.2.4. Considerations for Working with an S3 Storage #

To work with an S3 storage, specify the required connection information. For example, if you have an instance of the MinIO S3 storage deployed, follow the steps below under the postgres user:

  1. Create a server for the S3 storage:

    Example 3.12. 

    postgres=# CREATE SERVER my_secret1
    TYPE 's3'
    FOREIGN DATA WRAPPER duckdb
    OPTIONS (
      endpoint '127.0.0.1:9000',
      url_style 'path',
      use_ssl 'FALSE'
    );
    CREATE SERVER
    

  2. Create a mapping for the user that will work with the S3 storage:

    Example 3.13. 

    postgres=# CREATE USER MAPPING FOR user1 SERVER my_secret1
    OPTIONS (KEY_ID 'minioadmin', SECRET 'minioadmin');
    CREATE USER MAPPING
    

The specified user now has required privileges for working with the S3 storage:

Example 3.14. 

postgres=> SELECT * FROM read_parquet('s3://bucket1/results.parquet');
a | b
---+---
1 | 2
2 |
(2 rows)

postgres=> COPY a TO 's3://bucket1/a.parquet';
COPY 8

If you create a mapping for the user role specified in the duckdb.postgres_role configuration parameter, all members of this role will be able to work with the S3 storage.

To revoke the user access to the S3 storage, delete the corresponding mapping:

Example 3.15. 

postgres=# DROP USER MAPPING FOR user1 SERVER my_secret1;
DROP USER MAPPING

To revoke all user access to the S3 storage, disable this storage:

Example 3.16. 

postgres=> show duckdb.disabled_filesystems ;
duckdb.disabled_filesystems
------------------------------
LocalFileSystem,S3FileSystem
(1 row)

postgres=> SELECT * FROM read_parquet('s3://bucket1/results.parquet');
ERROR:  (PGDuckDB/CreatePlan) Prepared query returned an error: 'Permission Error: File system S3FileSystem has been disabled by configuration
postgres=> COPY a TO 's3://bucket1/a.parquet';
ERROR:  (PGDuckDB/DuckdbUtilityHook_Cpp) Permission Error: File system S3FileSystem has been disabled by configuration

3.2.5. Deleting a Storage #

Execute the following query:

  SELECT metastore.remove_storage('storage_name');

Where storage_name is the name of the storage that will be deleted.

Example 3.17. 

  SELECT metastore.remove_storage('example_storage');

Once the query is executed, pgpro_metastore performs the following actions:

  1. Verifies user privileges.

  2. Creates a new snapshot in the pga_snapshot metadata table and specifies for the storage and all its associated pgpro_metastore objects the end_snapshot value in the pga_storage metadata table and other metadata tables.