Re: COPY TO STDOUT Apache Arrow support - Mailing list pgsql-hackers

From Adam Lippai
Subject Re: COPY TO STDOUT Apache Arrow support
Date
Msg-id CAGrfaBXhi_mNzVQR=ZirOCm1AFFw2ntH_Uk=7CKFNus-c4ajHg@mail.gmail.com
Whole thread Raw
In response to COPY TO STDOUT Apache Arrow support  (Adam Lippai <adam@rigo.sk>)
Responses Re: COPY TO STDOUT Apache Arrow support
List pgsql-hackers
Hi,

There are two bigger developments in this topic:
  1. Pandas 2.0 is released and it can use Apache Arrow as a backend
  2. Apache Arrow ADBC is released which standardizes the client API. Currently it uses the postgresql wire protocol underneath
Best regards,
Adam Lippai

On Thu, Apr 21, 2022 at 10:41 AM Adam Lippai <adam@rigo.sk> wrote:
Hi,

would it be possible to add Apache Arrow streaming format to the copy backend + frontend?
The use case is fetching (or storing) tens or hundreds of millions of rows for client side data science purposes (Pandas, Apache Arrow compute kernels, Parquet conversion etc). It looks like the serialization overhead when using the postgresql wire format can be significant.

Best regards,
Adam Lippai

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply
Next
From: sirisha chamarthi
Date:
Subject: Fix documentation for max_wal_size and min_wal_size