Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | CAD21AoAni3cKToPfdShTsc0NmaJOtbJuUb=skyz3Udj7HZY7dA@mail.gmail.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
|
List | pgsql-hackers |
On Fri, Feb 7, 2025 at 5:01 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoBkDE4JwjPgcLxSEwqu3nN4VXjkYS9vpRQDwA2GwNQCsg@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 4 Feb 2025 22:20:51 -0800, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > >> I was just looking at bit at this series of patch labelled with v31, > >> to see what is happening here. > >> > >> In 0001, we have that: > >> > >> + /* format-specific routines */ > >> + const CopyToRoutine *routine; > >> [...] > >> - CopySendEndOfRow(cstate); > >> + cstate->routine->CopyToOneRow(cstate, slot); > >> > >> Having a callback where the copy state is processed once per row is > >> neat in terms of design for the callbacks and what extensions can do, > >> and this is much better than what 2889fd23be5 has attempted (later > >> reverted in 1aa8324b81fa) because we don't do indirect function calls > >> for each attribute. Still, I have a question here: what happens for a > >> COPY TO that involves one attribute, a short field size like an int2 > >> and many rows (the more rows the more pronounced the effect, of > >> course)? Could this level of indirection still be the cause of some > >> regressions in a case like that? This is the worst case I can think > >> about, on top of my mind, and I am not seeing tests with few > >> attributes like this one, where we would try to make this callback as > >> hot as possible. This is a performance-sensitive area. > > > > FYI when Sutou-san last measured the performance[1], it showed a > > slight speed up even with fewer columns (5 columns) in both COPY TO > > and COPY FROM cases. The callback design has not changed since then. > > But it would be a good idea to run the benchmark with a table having a > > single small size column. > > > > [1] https://www.postgresql.org/message-id/20241114.161948.1677325020727842666.kou%40clear-code.com > > I measured v31 patch set with 1,6,11,16,21,26,31 int2 > columns. See the attached PDF for 0001 and 0002 result. > > 0001 - to: > > It's faster than master when the number of rows are > 1,000,000-5,000,000. > > It's almost same as master when the number of rows are > 6,000,000-10,000,000. > > There is no significant slow down when the number of columns > is 1. > > 0001 - from: > > 0001 doesn't change COPY FROM code. So the differences are > not real difference. > > 0002 - to: > > 0002 doesn't change COPY TO code. So "0001 - to" and "0002 - > to" must be the same result. But 0002 is faster than master > for all cases. It shows that the CopyToOneRow() approach > doesn't have significant slow down. > > 0002 - from: > > 0002 changes COPY FROM code. So this may have performance > impact. > > It's almost same as master when data is smaller > ((1,000,000-2,000,000 rows) or (3,000,000 rows and 1,6,11,16 > columns)). > > It's faster than master when data is larger. > > There is no significant slow down by 0002. > Thank you for sharing the benchmark results. That looks good to me. Looking at the 0001 patch again, I have a question: we have CopyToTextLikeOneRow() for both CSV and text format: +/* Implementation of the per-row callback for text format */ +static void +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, false); +} + +/* Implementation of the per-row callback for CSV format */ +static void +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, true); +} These two functions pass different is_csv value to that function, which is used as follows: + if (is_csv) + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1]); + else + CopyAttributeOutText(cstate, string); However, we can know whether the format is CSV or text by checking cstate->opts.csv_mode instead of passing is_csv. That way, we can directly call CopyToTextLikeOneRow() but not via CopyToCSVOneRow() or CopyToTextOneRow(). It would not help performance since we already inline CopyToTextLikeOneRow(), but it looks simpler. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: