Home > mailing lists

Re: Inserting heap tuples in bulk in COPY - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Inserting heap tuples in bulk in COPY
Date	October 6, 2011 11:33:38
Msg-id	4E8D9204.2010304@enterprisedb.com Whole thread Raw
In response to	Re: Inserting heap tuples in bulk in COPY (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Inserting heap tuples in bulk in COPY
List	pgsql-hackers

Tree view

On 25.09.2011 19:01, Robert Haas wrote:
> On Wed, Sep 14, 2011 at 6:52 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>>> Why do you need new WAL replay routines?  Can't you just use the existing
>>> XLOG_HEAP_NEWPAGE support?
>>>
>>> By any large, I think we should be avoiding special-purpose WAL entries
>>> as much as possible.
>>
>> I tried that, but most of the reduction in WAL-size melts away with that.
>> And if the page you're copying to is not empty, logging the whole page is
>> even more expensive. You'd need to fall back to retail inserts in that case
>> which complicates the logic.
>
> Where does it go?  I understand why it'd be a problem for partially
> filled pages, but it seems like it ought to be efficient for pages
> that are initially empty.

A regular heap_insert record leaves out a lot of information that can be 
deduced at replay time. It can leave out all the headers, including just 
the null bitmap + data. In addition to that, there's just the location 
of the tuple (RelFileNode+ItemPointer). At replay, xmin is taken from 
the WAL record header.

For a multi-insert record, you don't even need to store the RelFileNode 
and the block number for every tuple, just the offsets.

In comparison, a full-page image will include the full tuple header, and 
also the line pointers. If I'm doing my math right, a full-page image 
takes 25 bytes more data per tuple, than the special-purpose 
multi-insert record.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 06 October 2011, 10:19:23
Subject: Re: Double sorting split patch

From: Pavel Stehule
Date: 06 October 2011, 11:53:00
Subject: patch: CHECK FUNCTION statement

Re: Inserting heap tuples in bulk in COPY - Mailing list pgsql-hackers

Previous

Next