Re: COPY FROM WHEN condition - Mailing list pgsql-hackers

From David Rowley
Subject Re: COPY FROM WHEN condition
Date
Msg-id CAKJS1f_SQso+=U5_hEps=z5PJ7=4Lchs_MFt4no0etyP53qukQ@mail.gmail.com
Whole thread Raw
In response to Re: COPY FROM WHEN condition  (Andres Freund <andres@anarazel.de>)
Responses Re: COPY FROM WHEN condition
List pgsql-hackers
(Fixed all of what you mentioned)

On Wed, 3 Apr 2019 at 06:57, Andres Freund <andres@anarazel.de> wrote:
> > +/*
> > + * CopyMultiInsertInfo_Flush
> > + *           Write out all stored tuples in all buffers out to the tables.
> > + *
> > + * To save us from ending up with buffers for 1000s of partitions we remove
> > + * buffers belonging to partitions that we've seen no tuples for in this batch
>
> That seems a little naive (imagine you have like 5 partitions, and we
> constantly cycle through 2-3 of them per batch).  It's probably OK for
> this version.   I guess I'd only do that cleanup if we're actually
> flushing due to the number of partitions.

hmm good point.  It seems like being smarter there would be a good thing.

I've ended up getting rid of the hash table in favour of the List that
you mentioned and storing the buffer in ResultRelInfo. I also changed
the logic that removes buffers once we reach the limit.  Instead of
getting rid of buffers that were not used on this run, I've changed it
so it just gets rid of the buffers starting with the oldest one first,
but stops once the number of buffers is at the maximum again.  This
can mean that we end up with MAX_BUFFERED_TUPLES buffers instead of
MAX_PARTITION_BUFFERS if there is only 1 tuple per buffer.  My current
thinking is that this does not matter since only 1 slot will be
allocated per buffer.  We'll remove all of the excess buffers during
the flush and keep just MAX_PARTITION_BUFFERS of the newest buffers.

Also, after changing CopyMultiInsertBuffer to use fixed sized arrays
instead of allocating them with another palloc the performance has
improved a bit more.

Using the perl files mentioned in [1]

Master + Patched:
# copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
COPY 35651564
Time: 9106.776 ms (00:09.107)
# truncate table listp;
TRUNCATE TABLE
# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 10154.196 ms (00:10.154)


Master only:
# copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
COPY 35651564
Time: 22200.535 ms (00:22.201)
# truncate table listp;
TRUNCATE TABLE
# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 18592.107 ms (00:18.592)

> >               if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> >               {
> >                       ereport(ERROR,
> > -                             (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > -                                     errmsg("cannot perform FREEZE on a partitioned table")));
> > +                                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +                                      errmsg("cannot perform FREEZE on a partitioned table")));
> >               }
>
> accidental hunk?

It was left over from a pgindent run. Now removed.

[1] https://www.postgresql.org/message-id/CAKJS1f98Fa+QRTGKwqbtz0M=Cy1EHYR8Q-W08cpA78tOy4euKQ@mail.gmail.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: New vacuum option to do only freezing
Next
From: Haribabu Kommi
Date:
Subject: Re: pg_basebackup ignores the existing data directory permissions