Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id CAD21AoC_dhfS97DKwTL+2nvgBOYrmN9XVYrE8w2SuDgghb-yzg@mail.gmail.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
On Wed, Jan 10, 2024 at 12:00 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <ZYTfqGppMc9e_w2k@paquier.xyz>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:00:24 +0900,
>   Michael Paquier <michael@paquier.xyz> wrote:
>
> >> 3. Export CopySend*()
> >>
> >>    * If we like minimum API, we just need to export
> >>      CopySendData() and CopySendEndOfRow(). But
> >>      CopySend{String,Char,Int32,Int16}() will be convenient
> >>      custom COPY TO handlers. (A custom COPY TO handler for
> >>      Apache Arrow doesn't need them.)
> >
> > Hmm.  Not sure on this one.  This may come down to externalize the
> > manipulation of fe_msgbuf.  Particularly, could it be possible that
> > some custom formats don't care at all about the network order?
>
> It means that all custom formats should control byte order
> by themselves instead of using CopySendInt*() that always
> use network byte order, right? It makes sense. Let's export
> only CopySendData() and CopySendEndOfRow().
>
>
> >> 1. What value should be used for "format" in
> >>    PgMsg_CopyOutResponse message?
> >>
> >>    It's 1 for binary format and 0 for text/csv format.
> >>
> >>    Should we make it customizable by custom COPY TO handler?
> >>    If so, what value should be used for this?
> >
> > Interesting point.  It looks very tempting to give more flexibility to
> > people who'd like to use their own code as we have one byte in the
> > protocol but just use 0/1.  Hence it feels natural to have a callback
> > for that.
>
> OK. Let's add a callback something like:
>
> typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
>
> > It also means that we may want to think harder about copy_is_binary in
> > libpq in the future step.  Now, having a backend implementation does
> > not need any libpq bits, either, because a client stack may just want
> > to speak the Postgres protocol directly.  Perhaps a custom COPY
> > implementation would be OK with how things are in libpq, as well,
> > tweaking its way through with just text or binary.
>
> Can we defer this discussion after we commit a basic custom
> COPY format handler mechanism?
>
> >> 2. Do we need more tries for design discussion for the first
> >>    implementation? If we need, what should we try?
> >
> > A makeNode() is used with an allocation in the current memory context
> > in the function returning the handler.  I would have assume that this
> > stuff returns a handler as a const struct like table AMs.
>
> If we use this approach, we can't use the Sawada-san's
> idea[1] that provides a convenient API to hide
> CopyFormatRoutine internal. The idea provides
> MakeCopy{To,From}FormatRoutine(). They return a new
> CopyFormatRoutine* with suitable is_from member. They can't
> use static const CopyFormatRoutine because they may be called
> multiple times in the same process.
>
> We can use the satic const struct approach by choosing one
> of the followings:
>
> 1. Use separated function for COPY {TO,FROM} format handlers
>    as I suggested.
>
> 2. Don't provide convenient API. Developers construct
>    CopyFormatRoutine by themselves. But it may be a bit
>    tricky.
>
> 3. Similar to 2. but don't use a bit tricky approach (don't
>    embed Copy{To,From}FormatRoutine nodes into
>    CopyFormatRoutine).
>
>    Use unified function for COPY {TO,FROM} format handlers
>    but CopyFormatRoutine always have both of COPY {TO,FROM}
>    format routines and these routines aren't nodes:
>
>    typedef struct CopyToFormatRoutine
>    {
>            CopyToStart_function start_fn;
>            CopyToOneRow_function onerow_fn;
>            CopyToEnd_function end_fn;
>    } CopyToFormatRoutine;
>
>    /* XXX: just copied from COPY TO routines */
>    typedef struct CopyFromFormatRoutine
>    {
>            CopyFromStart_function start_fn;
>            CopyFromOneRow_function onerow_fn;
>            CopyFromEnd_function end_fn;
>    } CopyFromFormatRoutine;
>
>    typedef struct CopyFormatRoutine
>    {
>            NodeTag              type;
>
>            CopyToFormatRoutine     to_routine;
>            CopyFromFormatRoutine           from_routine;
>    } CopyFormatRoutine;
>
>    ----
>
>    static const CopyFormatRoutine testfmt_handler = {
>        .type = T_CopyFormatRoutine,
>        .to_routine = {
>            .start_fn = testfmt_copyto_start,
>            .onerow_fn = testfmt_copyto_onerow,
>            .end_fn = testfmt_copyto_end,
>        },
>        .from_routine = {
>            .start_fn = testfmt_copyfrom_start,
>            .onerow_fn = testfmt_copyfrom_onerow,
>            .end_fn = testfmt_copyfrom_end,
>        },
>    };
>
>    PG_FUNCTION_INFO_V1(copy_testfmt_handler);
>    Datum
>    copy_testfmt_handler(PG_FUNCTION_ARGS)
>    {
>            PG_RETURN_POINTER(&testfmt_handler);
>    }
>
> 4. ... other idea?

It's a just idea but the fourth idea is to provide a convenient macro
to make it easy to construct the CopyFormatRoutine. For example,

#define COPYTO_ROUTINE(...) (Node *) &(CopyToFormatRoutine) {__VA_ARGS__}

static const CopyFormatRoutine testfmt_copyto_handler = {
    .type = T_CopyFormatRoutine,
    .is_from = true,
    .routine = COPYTO_ROUTINE (
        .start_fn = testfmt_copyto_start,
        .onerow_fn = testfmt_copyto_onerow,
        .end_fn = testfmt_copyto_end
        )
};

Datum
copy_testfmt_handler(PG_FUNCTION_ARGS)
{
    PG_RETURN_POINTER(& testfmt_copyto_handler);
}

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: Custom explain options
Next
From: torikoshia
Date:
Subject: Re: Parent/child context relation in pg_get_backend_memory_contexts()