Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id CAD21AoBjzkL2Lv7j4teaHBZvNmKctQtH6X71kN_sj6Fm-+VvJQ@mail.gmail.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
On Thu, Feb 20, 2025 at 6:48 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoAni3cKToPfdShTsc0NmaJOtbJuUb=skyz3Udj7HZY7dA@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 20 Feb 2025 15:28:26 -0800,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > Looking at the 0001 patch again, I have a question: we have
> > CopyToTextLikeOneRow() for both CSV and text format:
> >
> > +/* Implementation of the per-row callback for text format */
> > +static void
> > +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
> > +{
> > +       CopyToTextLikeOneRow(cstate, slot, false);
> > +}
> > +
> > +/* Implementation of the per-row callback for CSV format */
> > +static void
> > +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
> > +{
> > +       CopyToTextLikeOneRow(cstate, slot, true);
> > +}
> >
> > These two functions pass different is_csv value to that function,
> > which is used as follows:
> >
> > +                       if (is_csv)
> > +                               CopyAttributeOutCSV(cstate, string,
> > +
> >  cstate->opts.force_quote_flags[attnum - 1]);
> > +                       else
> > +                               CopyAttributeOutText(cstate, string);
> >
> > However, we can know whether the format is CSV or text by checking
> > cstate->opts.csv_mode instead of passing is_csv. That way, we can
> > directly call CopyToTextLikeOneRow() but not via CopyToCSVOneRow() or
> > CopyToTextOneRow(). It would not help performance since we already
> > inline CopyToTextLikeOneRow(), but it looks simpler.
>
> This means the following, right?
>
> 1. We remove CopyToTextOneRow() and CopyToCSVOneRow()
> 2. We remove "bool is_csv" parameter from CopyToTextLikeOneRow()
>    and use cstate->opts.csv_mode in CopyToTextLikeOneRow()
>    instead of is_csv
> 3. We use CopyToTextLikeOneRow() for
>    CopyToRoutineText::CopyToOneRow and
>    CopyToRoutineCSV::CopyToOneRow
>
> If we use this approach, we can't remove the following
> branch in compile time:
>
> +                       if (is_csv)
> +                               CopyAttributeOutCSV(cstate, string,
> +                                                                       cstate->opts.force_quote_flags[attnum - 1]);
> +                       else
> +                               CopyAttributeOutText(cstate, string);
>
> We can remove the branch in compile time with the current
> approach (constant argument + inline).
>
> It may have a negative performance impact because the "if"
> is used many times with large data. (That's why we choose
> the constant argument + inline approach in this thread.)
>

Thank you for the explanation, I missed that fact. I'm fine with having is_csv.

The first two patches are refactoring patches (+ small performance
improvements). I've reviewed these patches again and attached the
updated patches. I reorganized the function order and updated comments
etc. I find that these patches are reasonably ready to push. Could you
review these versions? I'm going to push them, barring objections and
further comments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgbench client-side performance issue on large scripts
Next
From: Melanie Plageman
Date:
Subject: Re: Trigger more frequent autovacuums of heavy insert tables