Re: Reduce WAL logging of INSERT SELECT - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Reduce WAL logging of INSERT SELECT
Date
Msg-id 201108112049.p7BKnS727775@momjian.us
Whole thread Raw
In response to Re: Reduce WAL logging of INSERT SELECT  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Simon Riggs wrote:
> On Sat, Aug 6, 2011 at 4:16 AM, Bruce Momjian <bruce@momjian.us> wrote:
>
> > Well, if the table is created in the same transaction (which is the only
> > case under consideration), no other sessions can write to the table so
> > you are just writing the entire table on commit, rather than to the WAL.
>
> Below a certain point, skipping WAL is slower and over an intermediate
> range there is no benefit. So small amounts of data on large servers
> goes slower.
>
> heap_fsync() requires a scan of shared buffers, which may not be cheap.
>
> There is a difficulty because you would need to calculate the cut-off
> is for a particular database, and then predict ahead of time whether
> the number of rows that will be handled by the statement is low enough
> to warrant using the optimisation. Both of which I call a hard
> problem.
>
> I think we should remove the COPY optimisation because of this and
> definitely not extend INSERT SELECT to perform it automatically.

I ran some tests and Simon was correct and I was wrong.  There is a
measurable overhead to the optimization of avoiding WAL traffic for
small tables.

I tested git head with COPY and created the table inside and outside the
COPY transaction, with the attached script.  It ran in 11 seconds
without the optimization, and 12 seconds with the CREATE TABLE inside
the COPY transaction.

With these results, I withdraw my idea of adding this optimization to
other commands.  I think COPY is usually used in bulk mode, but the
other commands are often used in smaller batches that would be worse
with this optimization.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
:

. traprm

echo 1 > $TMP/1

echo "DROP TABLE copytest;" > $TMP/0
for X in $(jot 1000)
do
    cat >> $TMP/0 <<END
-- move the next line into the transaction block to test the optimization overhead
CREATE TABLE copytest(x int);
BEGIN WORK;
COPY copytest FROM '$TMP/1';
COMMIT;
DROP TABLE copytest;
END
done
time sql test < $TMP/0 > /dev/null


pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: WIP: Fast GiST index build
Next
From: "Greg Sabino Mullane"
Date:
Subject: Re: index-only scans