Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | CAD21AoBmNiWwrspuedgAPgbAqsn7e7NoZYF6gNnYBf+gXEk9Mg@mail.gmail.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Sutou Kouhei <kou@clear-code.com>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
|
List | pgsql-hackers |
On Mon, Jan 29, 2024 at 6:45 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAEG8a3Jnmbjw82OiSvRK3v9XN2zSshsB8ew1mZCQDAkKq6r9YQ@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 29 Jan 2024 11:37:07 +0800, > Junwang Zhao <zhjwpku@gmail.com> wrote: > > >> > > Does it make sense to pass only non-builtin options to the custom > >> > > format callback after parsing and evaluating the builtin options? That > >> > > is, we parse and evaluate only the builtin options and populate > >> > > opts_out first, then pass each rest option to the custom format > >> > > handler callback. The callback can refer to the builtin option values. > >> > >> What I imagined is that while parsing the all specified options, we > >> evaluate builtin options and we add non-builtin options to another > >> list. Then when parsing a non-builtin option, we check if this option > >> already exists in the list. If there is, we raise the "option %s not > >> recognized" error.". Once we complete checking all options, we pass > >> each option in the list to the callback. > > I implemented this idea and the following ideas: > > 1. Add init callback for initialization > 2. Change GetFormat() to FillCopyXXXResponse() > because JSON format always use 1 column > 3. FROM only: Eliminate more cstate->opts.csv_mode branches > (This is for performance.) > > See the attached v9 patch set for details. Changes since v7: > > 0001: > > * Move CopyToProcessOption() calls to the end of > ProcessCopyOptions() for easy to option validation > * Add CopyToState::CopyToInit() and call it in > ProcessCopyOptionFormatTo() > * Change CopyToState::CopyToGetFormat() to > CopyToState::CopyToFillCopyOutResponse() and use it in > SendCopyBegin() Thank you for updating the patch! Here are comments on 0001 patch: --- + if (!format_specified) + /* Set the default format. */ + ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL); + I think we can pass "text" in this case instead of NULL. That way, ProcessCopyOptionFormatTo doesn't need to handle NULL case. We need curly brackets for this "if branch" as follows: if (!format_specifed) { /* Set the default format. */ ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL); } --- + /* Process not built-in options. */ + foreach(option, unknown_options) + { + DefElem *defel = lfirst_node(DefElem, option); + bool processed = false; + + if (!is_from) + processed = opts_out->to_routine->CopyToProcessOption(cstate, defel); + if (!processed) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("option \"%s\" not recognized", + defel->defname), + parser_errposition(pstate, defel->location))); + } + list_free(unknown_options); I think we can check the duplicated options in the core as we discussed. --- +static void +CopyToTextBasedInit(CopyToState cstate) +{ +} and +static void +CopyToBinaryInit(CopyToState cstate) +{ +} Do we really need separate callbacks for the same behavior? I think we can have a common init function say CopyToBuitinInit() that does nothing. Or we can make the init callback optional. The same is true for process-option callback. --- List *convert_select; /* list of column names (can be NIL) */ + const CopyToRoutine *to_routine; /* callback routines for COPY TO */ } CopyFormatOptions; I think CopyToStateData is a better place to have CopyToRoutine. copy_data_dest_cb is also there. --- - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - opts_out->csv_mode = true; - else if (strcmp(fmt, "binary") == 0) - opts_out->binary = true; + + if (is_from) + { + char *fmt = defGetString(defel); + + if (strcmp(fmt, "text") == 0) + /* default format */ ; + else if (strcmp(fmt, "csv") == 0) + { + opts_out->csv_mode = true; + } + else if (strcmp(fmt, "binary") == 0) + { + opts_out->binary = true; + } else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt\), - parser_errposition(pstate, defel->location))); + ProcessCopyOptionFormatTo(pstate, opts_out, cstate, defel); The 0002 patch replaces the options checks with ProcessCopyOptionFormatFrom(). However, both ProcessCopyOptionFormatTo() and ProcessCOpyOptionFormatFrom() would set format-related options such as opts_out->csv_mode etc, which seems not elegant. IIUC the reason why we process only the "format" option first is to set the callback functions and call the init callback. So I think we don't necessarily need to do both setting callbacks and setting format-related options together. Probably we can do only the callback stuff first and then set format-related options in the original place we used to do? --- +static void +CopyToTextBasedFillCopyOutResponse(CopyToState cstate, StringInfoData *buf) +{ + int16 format = 0; + int natts = list_length(cstate->attnumlist); + int i; + + pq_sendbyte(buf, format); /* overall format */ + pq_sendint16(buf, natts); + for (i = 0; i < natts; i++) + pq_sendint16(buf, format); /* per-column formats */ +} This function and CopyToBinaryFillCopyOutResponse() fill three things: overall format, the number of columns, and per-column formats. While this approach is flexible, extensions will have to understand the format of CopyOutResponse message. An alternative is to have one or more callbacks that return these three things. --- + /* Get info about the columns we need to process. */ + cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(Fmgr\Info)); + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Oid out_func_oid; + bool isvarlena; + Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); + + getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } Is preparing the out functions an extension's responsibility? I thought the core could prepare them based on the overall format specified by extensions, as long as the overall format matches the actual data format to send. What do you think? --- + /* + * Called when COPY TO via the PostgreSQL protocol is started. This must + * fill buf as a valid CopyOutResponse message: + * + */ + /*-- + * +--------+--------+--------+--------+--------+ +--------+--------+ + * | Format | N attributes | Attr1's format |...| AttrN's format | + * +--------+--------+--------+--------+--------+ +--------+--------+ + * 0: text 0: text 0: text + * 1: binary 1: binary 1: binary + */ I think this kind of diagram could be missed from being updated when we update the CopyOutResponse format. It's better to refer to the documentation instead. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: