Re: COPY with hints, rebirth - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: COPY with hints, rebirth
Date
Msg-id CA+U5nMLM5MVXmt-gW=kM_evyckxxFKNFSLaLLMvZ-TTs3MrkNQ@mail.gmail.com
Whole thread Raw
In response to Re: COPY with hints, rebirth  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: COPY with hints, rebirth
List pgsql-hackers
On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 24.02.2012 22:55, Simon Riggs wrote:
>>
>> A long time ago, in a galaxy far away, we discussed ways to speed up
>> data loads/COPY.
>> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php
>>
>> In particular, the idea that we could mark tuples as committed while
>> we are still loading them, to avoid negative behaviour for the first
>> reader.
>>
>> Simple patch to implement this is attached, together with test case.
>>
>>  ...
>>
>>
>> What exactly does it do? Previously, we optimised COPY when it was
>> loading data into a newly created table or a freshly truncated table.
>> This patch extends that and actually sets the tuple header flag as
>> HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
>> code. The patch also adds some tests for corner cases that would make
>> that action break MVCC - though those cases are minor and typical data
>> loads will benefit fully from this.
>
>
> This doesn't work with subtransactions:
...
> The query should return the row copied in the same subtransaction.

Thanks for pointing that out.

New patch with corrected logic and test case attached.

>> In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
>> and adding current xid to snapshots. That is an invasive change that I
>> would wish to avoid at any time and explains the long delay in
>> tackling this. The way I've implemented it, is just as a short test
>> during XidInMVCCSnapshot() so that we trap the case when the xid ==
>> xmax and so would appear to be running. This is much less invasive and
>> just as performant as Tom's original suggestion.
>
>
> TransactionIdIsCurrentTransactionId() can be fairly expensive if you have a
> lot of subtransactions open...

I've put in something to avoid that cost for the common case - just a boolean.

This seems like the best plan rather than the explicit FREEZE option.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: performance results on IBM POWER7
Next
From: Peter Eisentraut
Date:
Subject: Re: pgsql_fdw, FDW for PostgreSQL server