Re: Columnar format export in Postgres - Mailing list pgsql-hackers

From Sutou Kouhei
Subject Re: Columnar format export in Postgres
Date
Msg-id 20240616.063220.999225191405879719.kou@clear-code.com
Whole thread Raw
In response to Re: Columnar format export in Postgres  (Sushrut Shivaswamy <sushrut.shivaswamy@gmail.com>)
List pgsql-hackers
Hi,

In <CAH5mb98Dq7ssrQq9n5yW3G1YznH=Q7VvOZ20uhG7Vxg33ZBLDg@mail.gmail.com>
  "Re: Columnar format export in Postgres" on Thu, 13 Jun 2024 22:30:24 +0530,
  Sushrut Shivaswamy <sushrut.shivaswamy@gmail.com> wrote:

>  - To facilitate efficient querying it would help to export multiple
> parquet files for the table instead of a single file.
>    Having multiple files allows queries to skip chunks if the key range in
> the chunk does not match query filter criteria.
>    Even within a chunk it would help to be able to configure the size of a
> row group.
>       - I'm not sure how these parameters will be exposed within `COPY TO`.
>         Or maybe the extension implementing the `COPY TO` handler will
> allow this configuration?

Yes. But adding support for custom COPY TO options is
out-of-scope in the first version. We will focus on only the
minimal features in the first version. We can improve it
later based on use-cases.

See also: https://www.postgresql.org/message-id/20240131.141122.279551156957581322.kou%40clear-code.com

>  - Regarding using file_fdw to read Apache Arrow and Apache Parquet file
> because file_fdw is based on COPY FROM:
>      - I'm not too clear on this. file_fdw seems to allow creating a table
> from  data on disk exported using COPY TO.

Correct.

>        But is the newly created table still using the data on disk(maybe in
> columnar format or csv) or is it just reading that data to create a row
> based table.

The former.

>        I'm not aware of any capability in the postgres planner to read
> columnar files currently without using an extension like parquet_fdw.

Correct. We still need another approach such as parquet_fdw
with the COPY format extensible feature to optimize query
against Apache Parquet data. file_fdw can just read Apache
Parquet data by SELECT. Sorry for confusing you.


Thanks,
-- 
kou



pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: Shouldn't jsonpath .string() Unwrap?
Next
From: Greg Sabino Mullane
Date:
Subject: Re: RFC: adding pytest as a supported test framework