29.3. Description of Parquet File Storage Parameters #

You can specify the following Parquet file storage parameters in a JSON file, and apply them when executing the metastore.add_files or metastore.copy_table stored procedure:

  • compression: The data compression algorithm.

    Possible values:

    • snappy

    • zstd

    • gzip

    • lz4/lz4_raw

    • brotli

    • uncompressed

  • compression_level: The data compression level.

    Possible values are from 1 to 22.

    Default value: 3.

    Optional parameter. It is ignored if any compression algorithm other than zstd is used.

  • row_group_size: The maximum number of rows in a row group. The larger the value, the better the compression. The smaller the value, the more threads are used when reading Parquet files, and the better the statistics filtering.

    Minimal value: 2048.

    Default value: 122_880.

    Recommended value range is from 100_000 to 1_000_000.

Example 29.3. 

  {
      "compression": "zstd",
      "compression_level": 9,
      "row_group_size": 500000
  }