Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | CAD21AoC_dhfS97DKwTL+2nvgBOYrmN9XVYrE8w2SuDgghb-yzg@mail.gmail.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Sutou Kouhei <kou@clear-code.com>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
|
List | pgsql-hackers |
On Wed, Jan 10, 2024 at 12:00 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <ZYTfqGppMc9e_w2k@paquier.xyz> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:00:24 +0900, > Michael Paquier <michael@paquier.xyz> wrote: > > >> 3. Export CopySend*() > >> > >> * If we like minimum API, we just need to export > >> CopySendData() and CopySendEndOfRow(). But > >> CopySend{String,Char,Int32,Int16}() will be convenient > >> custom COPY TO handlers. (A custom COPY TO handler for > >> Apache Arrow doesn't need them.) > > > > Hmm. Not sure on this one. This may come down to externalize the > > manipulation of fe_msgbuf. Particularly, could it be possible that > > some custom formats don't care at all about the network order? > > It means that all custom formats should control byte order > by themselves instead of using CopySendInt*() that always > use network byte order, right? It makes sense. Let's export > only CopySendData() and CopySendEndOfRow(). > > > >> 1. What value should be used for "format" in > >> PgMsg_CopyOutResponse message? > >> > >> It's 1 for binary format and 0 for text/csv format. > >> > >> Should we make it customizable by custom COPY TO handler? > >> If so, what value should be used for this? > > > > Interesting point. It looks very tempting to give more flexibility to > > people who'd like to use their own code as we have one byte in the > > protocol but just use 0/1. Hence it feels natural to have a callback > > for that. > > OK. Let's add a callback something like: > > typedef int16 (*CopyToGetFormat_function) (CopyToState cstate); > > > It also means that we may want to think harder about copy_is_binary in > > libpq in the future step. Now, having a backend implementation does > > not need any libpq bits, either, because a client stack may just want > > to speak the Postgres protocol directly. Perhaps a custom COPY > > implementation would be OK with how things are in libpq, as well, > > tweaking its way through with just text or binary. > > Can we defer this discussion after we commit a basic custom > COPY format handler mechanism? > > >> 2. Do we need more tries for design discussion for the first > >> implementation? If we need, what should we try? > > > > A makeNode() is used with an allocation in the current memory context > > in the function returning the handler. I would have assume that this > > stuff returns a handler as a const struct like table AMs. > > If we use this approach, we can't use the Sawada-san's > idea[1] that provides a convenient API to hide > CopyFormatRoutine internal. The idea provides > MakeCopy{To,From}FormatRoutine(). They return a new > CopyFormatRoutine* with suitable is_from member. They can't > use static const CopyFormatRoutine because they may be called > multiple times in the same process. > > We can use the satic const struct approach by choosing one > of the followings: > > 1. Use separated function for COPY {TO,FROM} format handlers > as I suggested. > > 2. Don't provide convenient API. Developers construct > CopyFormatRoutine by themselves. But it may be a bit > tricky. > > 3. Similar to 2. but don't use a bit tricky approach (don't > embed Copy{To,From}FormatRoutine nodes into > CopyFormatRoutine). > > Use unified function for COPY {TO,FROM} format handlers > but CopyFormatRoutine always have both of COPY {TO,FROM} > format routines and these routines aren't nodes: > > typedef struct CopyToFormatRoutine > { > CopyToStart_function start_fn; > CopyToOneRow_function onerow_fn; > CopyToEnd_function end_fn; > } CopyToFormatRoutine; > > /* XXX: just copied from COPY TO routines */ > typedef struct CopyFromFormatRoutine > { > CopyFromStart_function start_fn; > CopyFromOneRow_function onerow_fn; > CopyFromEnd_function end_fn; > } CopyFromFormatRoutine; > > typedef struct CopyFormatRoutine > { > NodeTag type; > > CopyToFormatRoutine to_routine; > CopyFromFormatRoutine from_routine; > } CopyFormatRoutine; > > ---- > > static const CopyFormatRoutine testfmt_handler = { > .type = T_CopyFormatRoutine, > .to_routine = { > .start_fn = testfmt_copyto_start, > .onerow_fn = testfmt_copyto_onerow, > .end_fn = testfmt_copyto_end, > }, > .from_routine = { > .start_fn = testfmt_copyfrom_start, > .onerow_fn = testfmt_copyfrom_onerow, > .end_fn = testfmt_copyfrom_end, > }, > }; > > PG_FUNCTION_INFO_V1(copy_testfmt_handler); > Datum > copy_testfmt_handler(PG_FUNCTION_ARGS) > { > PG_RETURN_POINTER(&testfmt_handler); > } > > 4. ... other idea? It's a just idea but the fourth idea is to provide a convenient macro to make it easy to construct the CopyFormatRoutine. For example, #define COPYTO_ROUTINE(...) (Node *) &(CopyToFormatRoutine) {__VA_ARGS__} static const CopyFormatRoutine testfmt_copyto_handler = { .type = T_CopyFormatRoutine, .is_from = true, .routine = COPYTO_ROUTINE ( .start_fn = testfmt_copyto_start, .onerow_fn = testfmt_copyto_onerow, .end_fn = testfmt_copyto_end ) }; Datum copy_testfmt_handler(PG_FUNCTION_ARGS) { PG_RETURN_POINTER(& testfmt_copyto_handler); } Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: