Re: Columnar format export in Postgres - Mailing list pgsql-hackers

From Sushrut Shivaswamy
Subject Re: Columnar format export in Postgres
Date
Msg-id CAH5mb98Dq7ssrQq9n5yW3G1YznH=Q7VvOZ20uhG7Vxg33ZBLDg@mail.gmail.com
Whole thread Raw
In response to Re: Columnar format export in Postgres  (Sutou Kouhei <kou@clear-code.com>)
Responses Re: Columnar format export in Postgres
List pgsql-hackers
Thanks for the response.

I had considered using COPY TO to export columnar data but gave up on it since the formats weren't extensible.
It's great to see that you are making it extensible.

I'm still going through the thread of comments on your patch but I have some early thoughts about using it for columnar data export.

 - To maintain data freshness there would need to be a way to schedule exports using `COPY TO 'parquet`` periodically
      - pg_analytica has the scheduling logic, once available COPY TO can be used to export the data instead of reading table in chunks being used currently.

 - To facilitate efficient querying it would help to export multiple parquet files for the table instead of a single file.
   Having multiple files allows queries to skip chunks if the key range in the chunk does not match query filter criteria.
   Even within a chunk it would help to be able to configure the size of a row group.
      - I'm not sure how these parameters will be exposed within `COPY TO`. 
        Or maybe the extension implementing the `COPY TO` handler will allow this configuration?

 - Regarding using file_fdw to read Apache Arrow and Apache Parquet file because file_fdw is based on COPY FROM:
     - I'm not too clear on this. file_fdw seems to allow creating a table from  data on disk exported using COPY TO.
       But is the newly created table still using the data on disk(maybe in columnar format or csv) or is it just reading that data to create a row based table.
       I'm not aware of any capability in the postgres planner to read columnar files currently without using an extension like parquet_fdw.
        - For your usecase how do you plan to query the arrow / parquet data?

       

pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Avoid orphaned objects dependencies, take 3
Next
From: Sushrut Shivaswamy
Date:
Subject: Re: Columnar format export in Postgres