Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id CAKFQuwbhSssKTJyeYo9rn20zffV3L7wdQSbEQ8zwRfC=uXLkVA@mail.gmail.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
On Wed, Mar 26, 2025 at 8:28 PM Sutou Kouhei <kou@clear-code.com> wrote:

> ---
> +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
> +        .type = T_CopyFromRoutine,
> +        .CopyFromInFunc = CopyFromInFunc,
> +        .CopyFromStart = CopyFromStart,
> +        .CopyFromOneRow = CopyFromOneRow,
> +        .CopyFromEnd = CopyFromEnd,
> +};
>


In trying to document the current API I'm strongly disliking it.  Namely, the amount of internal code an extension needs to care about/copy-paste to create a working handler.

Presently, pg_type defines text and binary I/O routines and the text/csv formats use the text I/O while binary uses binary I/O - for all attributes.  The CopyFromInFunc API allows for each attribute to somehow have its I/O format individualized.  But I don't see how that is practical or useful, and it adds burden on API users.

I suggest we remove both .CopyFromInFunc and .CopyFromStart/End and add a property to CopyFromRoutine (.ioMode?) with values of either Copy_IO_Text or Copy_IO_Binary and then just branch to either:

CopyFromTextLikeInFunc & CopyFromTextLikeStart/End
or
CopyFromBinaryInFunc & CopyFromStart/End

So, in effect, the only method an extension needs to write is converting to/from the 'serialized' form to the text/binary form (text being near unanimous).

In a similar manner, the amount of boilerplate within CopyFromOneRow seems undesirable from an API perspective.

cstate->cur_attname = NameStr(att->attname);
cstate->cur_attval = string;

if (string != NULL)
    nulls[m] = false;

if (cstate->defaults[m])
{
    /* We must have switched into the per-tuple memory context */
    Assert(econtext != NULL);
    Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);

    values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}

/*
    * If ON_ERROR is specified with IGNORE, skip rows with soft errors
    */
else if (!InputFunctionCallSafe(&in_functions[m],
                                string,
                                typioparams[m],
                                att->atttypmod,
                                (Node *) cstate->escontext,
                                &values[m]))
{
    CopyFromSkipErrorRow(cstate);
    return true;
}

cstate->cur_attname = NULL;
cstate->cur_attval = NULL;

It seems to me that CopyFromOneRow could simply produce a *string collection,
one cell per attribute, and NextCopyFrom could do all of the above on a for-loop over *string

The pg_type and pg_proc catalogs are not extensible so the API can and should be limited to
producing the, usually text, values that are ready to be passed into the text I/O routines and all
the work to find and use types and functions left in the template code.

I haven't looked at COPY TO but I am expecting much the same. The API should simply receive
the content of the type I/O output routine (binary or text as it dictates) for each output attribute, by
row, and be expected to take those values and produce a final output from them.

David J.

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Reduce "Var IS [NOT] NULL" quals during constant folding
Next
From: Tom Lane
Date:
Subject: Re: general purpose array_sort