Hi,
In <CAH5mb98Dq7ssrQq9n5yW3G1YznH=Q7VvOZ20uhG7Vxg33ZBLDg@mail.gmail.com>
"Re: Columnar format export in Postgres" on Thu, 13 Jun 2024 22:30:24 +0530,
Sushrut Shivaswamy <sushrut.shivaswamy@gmail.com> wrote:
> - To facilitate efficient querying it would help to export multiple
> parquet files for the table instead of a single file.
> Having multiple files allows queries to skip chunks if the key range in
> the chunk does not match query filter criteria.
> Even within a chunk it would help to be able to configure the size of a
> row group.
> - I'm not sure how these parameters will be exposed within `COPY TO`.
> Or maybe the extension implementing the `COPY TO` handler will
> allow this configuration?
Yes. But adding support for custom COPY TO options is
out-of-scope in the first version. We will focus on only the
minimal features in the first version. We can improve it
later based on use-cases.
See also: https://www.postgresql.org/message-id/20240131.141122.279551156957581322.kou%40clear-code.com
> - Regarding using file_fdw to read Apache Arrow and Apache Parquet file
> because file_fdw is based on COPY FROM:
> - I'm not too clear on this. file_fdw seems to allow creating a table
> from data on disk exported using COPY TO.
Correct.
> But is the newly created table still using the data on disk(maybe in
> columnar format or csv) or is it just reading that data to create a row
> based table.
The former.
> I'm not aware of any capability in the postgres planner to read
> columnar files currently without using an extension like parquet_fdw.
Correct. We still need another approach such as parquet_fdw
with the COPY format extensible feature to optimize query
against Apache Parquet data. file_fdw can just read Apache
Parquet data by SELECT. Sorry for confusing you.
Thanks,
--
kou