Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Sutou Kouhei |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | 20240130.144531.1257430878438173740.kou@clear-code.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
|
List | pgsql-hackers |
Hi, In <CAD21AoBmNiWwrspuedgAPgbAqsn7e7NoZYF6gNnYBf+gXEk9Mg@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 30 Jan 2024 11:11:59 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > --- > + if (!format_specified) > + /* Set the default format. */ > + ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL); > + > > I think we can pass "text" in this case instead of NULL. That way, > ProcessCopyOptionFormatTo doesn't need to handle NULL case. Yes, we can do it. But it needs a DefElem allocation. Is it acceptable? > We need curly brackets for this "if branch" as follows: > > if (!format_specifed) > { > /* Set the default format. */ > ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL); > } Oh, sorry. I assumed that pgindent adjusts the style too. > --- > + /* Process not built-in options. */ > + foreach(option, unknown_options) > + { > + DefElem *defel = lfirst_node(DefElem, option); > + bool processed = false; > + > + if (!is_from) > + processed = > opts_out->to_routine->CopyToProcessOption(cstate, defel); > + if (!processed) > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("option \"%s\" not recognized", > + defel->defname), > + parser_errposition(pstate, > defel->location))); > + } > + list_free(unknown_options); > > I think we can check the duplicated options in the core as we discussed. Oh, sorry. I missed the part. I'll implement it. > --- > +static void > +CopyToTextBasedInit(CopyToState cstate) > +{ > +} > > and > > +static void > +CopyToBinaryInit(CopyToState cstate) > +{ > +} > > Do we really need separate callbacks for the same behavior? I think we > can have a common init function say CopyToBuitinInit() that does > nothing. Or we can make the init callback optional. > > The same is true for process-option callback. OK. I'll make them optional. > --- > List *convert_select; /* list of column names (can be NIL) */ > + const CopyToRoutine *to_routine; /* callback > routines for COPY TO */ > } CopyFormatOptions; > > I think CopyToStateData is a better place to have CopyToRoutine. > copy_data_dest_cb is also there. We can do it but ProcessCopyOptions() accepts NULL CopyToState for file_fdw. Can we create an empty CopyToStateData internally like we did for opts_out in ProcessCopyOptions()? (But it requires exporting CopyToStateData. We'll export it in a later patch but it's not yet in 0001.) > The 0002 patch replaces the options checks with > ProcessCopyOptionFormatFrom(). However, both > ProcessCopyOptionFormatTo() and ProcessCOpyOptionFormatFrom() would > set format-related options such as opts_out->csv_mode etc, which seems > not elegant. IIUC the reason why we process only the "format" option > first is to set the callback functions and call the init callback. So > I think we don't necessarily need to do both setting callbacks and > setting format-related options together. Probably we can do only the > callback stuff first and then set format-related options in the > original place we used to do? If we do it, we need to write the (strcmp(format, "csv") == 0) condition in copyto.c and copy.c. I wanted to avoid it. I think that the duplication (setting opts_out->csv_mode in copyto.c and copyfrom.c) is not a problem. But it's not a strong opinion. If (strcmp(format, "csv") == 0) duplication is better than opts_out->csv_mode = true duplication, I'll do it. BTW, if we want to make the CSV format implementation more modularized, we will remove opts_out->csv_mode, move CSV related options to CopyToCSVProcessOption() and keep CSV related options in its opaque space. For example, opts_out->force_quote exists in COPY TO opaque space but doesn't exist in COPY FROM opaque space because it's not used in COPY FROM. > +static void > +CopyToTextBasedFillCopyOutResponse(CopyToState cstate, StringInfoData *buf) > +{ > + int16 format = 0; > + int natts = list_length(cstate->attnumlist); > + int i; > + > + pq_sendbyte(buf, format); /* overall format */ > + pq_sendint16(buf, natts); > + for (i = 0; i < natts; i++) > + pq_sendint16(buf, format); /* per-column formats */ > +} > > This function and CopyToBinaryFillCopyOutResponse() fill three things: > overall format, the number of columns, and per-column formats. While > this approach is flexible, extensions will have to understand the > format of CopyOutResponse message. An alternative is to have one or > more callbacks that return these three things. Yes, we can choose the approach. I don't have a strong opinion on which approach to choose. > + /* Get info about the columns we need to process. */ > + cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * > sizeof(Fmgr\Info)); > + foreach(cur, cstate->attnumlist) > + { > + int attnum = lfirst_int(cur); > + Oid out_func_oid; > + bool isvarlena; > + Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); > + > + getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); > + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); > + } > > Is preparing the out functions an extension's responsibility? I > thought the core could prepare them based on the overall format > specified by extensions, as long as the overall format matches the > actual data format to send. What do you think? Hmm. I want to keep the preparation as an extension's responsibility. Because it's not needed for all formats. For example, Apache Arrow FORMAT doesn't need it. And JSON FORMAT doesn't need it too because it use composite_to_json(). > + /* > + * Called when COPY TO via the PostgreSQL protocol is > started. This must > + * fill buf as a valid CopyOutResponse message: > + * > + */ > + /*-- > + * +--------+--------+--------+--------+--------+ +--------+--------+ > + * | Format | N attributes | Attr1's format |...| AttrN's format | > + * +--------+--------+--------+--------+--------+ +--------+--------+ > + * 0: text 0: text 0: text > + * 1: binary 1: binary 1: binary > + */ > > I think this kind of diagram could be missed from being updated when > we update the CopyOutResponse format. It's better to refer to the > documentation instead. It makes sense. I couldn't find the documentation when I wrote it but I found it now...: https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY Is there recommended comment style to refer a documentation? "See doc/src/sgml/protocol.sgml for the CopyOutResponse message details" is OK? Thanks, -- kou
pgsql-hackers by date: