Re: Bulk Inserts - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Bulk Inserts
Date
Msg-id f67928030909141855y2ff8993epe4d967a769cebb56@mail.gmail.com
Whole thread Raw
In response to Bulk Inserts  (Pierre Frédéric Caillaud<lists@peufeu.com>)
Responses Re: Bulk Inserts
List pgsql-hackers
2009/9/14 Pierre Frédéric Caillaud <lists@peufeu.com>

I've done a little experiment with bulk inserts.

=> heap_bulk_insert()

Behaves like heap_insert except it takes an array of tuples (HeapTuple *tups, int ntups).

- Grabs a page (same as heap_insert)

- While holding exclusive lock, inserts as many tuples as it can on the page.
       - Either the page gets full
       - Or we run out of tuples.

- Generate xlog : choice between
       - Full Xlog mode :
               - if we inserted more than 10 tuples (totaly bogus heuristic), log the entire page
               - Else, log individual tuples as heap_insert does

Does that heuristic change the timings much?  If not, it seems like it would better to keep it simple and always do the same thing, like log the tuples (if it is done under one WALInsertLock, which I am assuming it is..)
 
       - Light log mode :
               - if page was empty, only xlog a "new empty page" record, not page contents
               - else, log fully
               - heap_sync() at the end

- Release the page
- If we still have tuples to insert, repeat.

Am I right in assuming that :

1)
- If the page was empty,
- and log archiving isn't used,
- and the table is heap_sync()'d at the end,
=> only a "new empty page" record needs to be created, then the page can be completely filled ?

Do you even need the new empty page record?  I think a zero page will be handled correctly next time it is read into shared buffers, won't it?  But I guess it is need to avoid  problems with partial page writes that would leave in a state that is neither all zeros nor consistent.



2)
- If the page isn't empty
- or log archiving is used,
=> logging either the inserted tuples or the entire page is OK to guarantee persistence ?

If the entire page is logged, would it have to marked as not removable by the log compression tool?  Or can the tool recreate the needed delta?
 
Jeff

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Issues for named/mixed function notation patch
Next
From: Tom Lane
Date:
Subject: Re: CommitFest 2009-09: Now In Progress