Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | ZcbLIf4ezfiw6xf4@paquier.xyz Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Fri, Feb 09, 2024 at 11:27:05AM -0800, Andres Freund wrote: > On 2024-02-09 13:21:34 +0900, Michael Paquier wrote: >> +static void >> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid, >> + FmgrInfo *finfo, Oid *typioparam) >> +{ >> + Oid func_oid; >> + >> + getTypeInputInfo(atttypid, &func_oid, typioparam); >> + fmgr_info(func_oid, finfo); >> +} > > FWIW, we should really change the copy code to initialize FunctionCallInfoData > instead of re-initializing that on every call, realy makes a difference > performance wise. You mean to initialize once its memory and let the internal routines call InitFunctionCallInfoData for each attribute. Sounds like a good idea, doing that for HEAD before the main patch. More impact with more attributes. >> +/* >> + * CopyFromTextStart >> + * >> + * Start of COPY FROM for text/CSV format. >> + */ >> +static void >> +CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc) >> +{ >> + AttrNumber attr_count; >> + >> + /* >> + * If encoding conversion is needed, we need another buffer to hold the >> + * converted input data. Otherwise, we can just point input_buf to the >> + * same buffer as raw_buf. >> + */ >> + if (cstate->need_transcoding) >> + { >> + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); >> + cstate->input_buf_index = cstate->input_buf_len = 0; >> + } >> + else >> + cstate->input_buf = cstate->raw_buf; >> + cstate->input_reached_eof = false; >> + >> + initStringInfo(&cstate->line_buf); > > Seems kinda odd that we have a supposedly extensible API that then stores all > this stuff in the non-extensible CopyFromState. That relates to the introduction of the the opaque pointer mentioned upthread to point to a per-format structure, where we'd store data specific to each format. >> + /* create workspace for CopyReadAttributes results */ >> + attr_count = list_length(cstate->attnumlist); >> + cstate->max_fields = attr_count; > > Why is this here? This seems like generic code, not text format specific. We don't care about that for binary. >> +/* >> + * CopyFromTextOneRow >> + * >> + * Copy one row to a set of `values` and `nulls` for the text and CSV >> + * formats. >> + */ > > I'm very doubtful it's a good idea to combine text and CSV here. They have > basically no shared parsing code, so what's the point in sending them through > one input routine? The code shared between text and csv involves a path called once per attribute. TBH, I am not sure how much of the NULL handling should be put outside the per-row routine as these options are embedded in the core options. So I don't have a better idea on this one than what's proposed here if we cannot dispatch the routine calls once per attribute. >> + /* read raw fields in the next line */ >> + if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) >> + return false; >> + >> + /* check for overflowing fields */ >> + if (attr_count > 0 && fldct > attr_count) >> + ereport(ERROR, >> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), >> + errmsg("extra data after last expected column"))); > > It bothers me that we look to be ending up with different error handling > across the various output formats, particularly if they're ending up in > extensions. That'll make it harder to evolve this code in the future. But different formats may have different requirements, including the number of attributes detected vs expected. That was not really nothing me. >> + if (cstate->opts.csv_mode) >> + { > > More unfortunate intermingling of multiple formats in a single > routine. Similar answer as a few paragraphs above. Sutou-san was suggesting to use an internal routine with fixed arguments instead, which would be enough at the end with some inline instructions? >> + >> + if (cstate->defaults[m]) >> + { >> + /* >> + * The caller must supply econtext and have switched into the >> + * per-tuple memory context in it. >> + */ >> + Assert(econtext != NULL); >> + Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); >> + >> + values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); >> + } > > I don't think it's good that we end up with this code in different copy > implementations. Yeah, still we don't care about that for binary. -- Michael
Attachment
pgsql-hackers by date: