On Wed, Dec 6, 2023 at 8:32 PM Daniel Verite <daniel@manitou-mail.org> wrote:
>
> Sutou Kouhei wrote:
>
> > * 2022-04: Apache Arrow [2]
> > * 2018-02: Apache Avro, Apache Parquet and Apache ORC [3]
> >
> > (FYI: I want to add support for Apache Arrow.)
> >
> > There were discussions how to add support for more formats. [3][4]
> > In these discussions, we got a consensus about making COPY
> > format extendable.
>
>
> These formats seem all column-oriented whereas COPY is row-oriented
> at the protocol level [1].
> With regard to the procotol, how would it work to support these formats?
>
They have kind of *RowGroup* concepts, a bunch of rows goes to a RowBatch
and the data of the same column goes together.
I think they should fit the COPY semantics and there are some FDW out there for
these modern formats, like [1]. If we support COPY to deal with the
format, it will
be easier to interact with them(without creating
server/usermapping/foreign table).
[1]: https://github.com/adjust/parquet_fdw
>
> [1] https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY
>
>
> Best regards,
> --
> Daniel Vérité
> https://postgresql.verite.pro/
> Twitter: @DanielVerite
>
>
--
Regards
Junwang Zhao