Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | ZbijVn9_51mljMAG@paquier.xyz Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Sutou Kouhei <kou@clear-code.com>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
|
List | pgsql-hackers |
On Tue, Jan 30, 2024 at 02:45:31PM +0900, Sutou Kouhei wrote: > In <CAD21AoBmNiWwrspuedgAPgbAqsn7e7NoZYF6gNnYBf+gXEk9Mg@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 30 Jan 2024 11:11:59 +0900, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> --- >> + if (!format_specified) >> + /* Set the default format. */ >> + ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL); >> + >> >> I think we can pass "text" in this case instead of NULL. That way, >> ProcessCopyOptionFormatTo doesn't need to handle NULL case. > > Yes, we can do it. But it needs a DefElem allocation. Is it > acceptable? I don't think that there is a need for a DelElem at all here? While I am OK with the choice of calling CopyToInit() in the ProcessCopyOption*() routines that exist to keep the set of callbacks local to copyto.c and copyfrom.c, I think that this should not bother about setting opts_out->csv_mode or opts_out->csv_mode but just set the opts_out->{to,from}_routine callbacks. >> +static void >> +CopyToTextBasedInit(CopyToState cstate) >> +{ >> +} >> >> and >> >> +static void >> +CopyToBinaryInit(CopyToState cstate) >> +{ >> +} >> >> Do we really need separate callbacks for the same behavior? I think we >> can have a common init function say CopyToBuitinInit() that does >> nothing. Or we can make the init callback optional. Keeping empty options does not strike as a bad idea, because this forces extension developers to think about this code path rather than just ignore it. Now, all the Init() callbacks are empty for the in-core callbacks, so I think that we should just remove it entirely for now. Let's keep the core patch a maximum simple. It is always possible to build on top of it depending on what people need. It's been mentioned that JSON would want that, but this also proves that we just don't care about that for all the in-core callbacks, as well. I would choose a minimalistic design for now. >> + /* Get info about the columns we need to process. */ >> + cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * >> sizeof(Fmgr\Info)); >> + foreach(cur, cstate->attnumlist) >> + { >> + int attnum = lfirst_int(cur); >> + Oid out_func_oid; >> + bool isvarlena; >> + Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); >> + >> + getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); >> + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); >> + } >> >> Is preparing the out functions an extension's responsibility? I >> thought the core could prepare them based on the overall format >> specified by extensions, as long as the overall format matches the >> actual data format to send. What do you think? > > Hmm. I want to keep the preparation as an extension's > responsibility. Because it's not needed for all formats. For > example, Apache Arrow FORMAT doesn't need it. And JSON > FORMAT doesn't need it too because it use > composite_to_json(). I agree that it could be really useful for extensions to be able to force that. We already know that for the in-core formats we've cared about being able to enforce the way data is handled in input and output. > It makes sense. I couldn't find the documentation when I > wrote it but I found it now...: > https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY > > Is there recommended comment style to refer a documentation? > "See doc/src/sgml/protocol.sgml for the CopyOutResponse > message details" is OK? There are a couple of places in the C code where we refer to SGML docs when it comes to specific details, so using a method like that here to avoid a duplication with the docs sounds sensible for me. I would be really tempted to put my hands on this patch to put into shape a minimal set of changes because I'm caring quite a lot about the performance gains reported with the removal of the "if" checks in the per-row callbacks, and that's one goal of this thread quite independent on the extensibility. Sutou-san, would you be OK with that? -- Michael
Attachment
pgsql-hackers by date: