Re: A little COPY speedup - Mailing list pgsql-patches
From | Heikki Linnakangas |
---|---|
Subject | Re: A little COPY speedup |
Date | |
Msg-id | 45E73402.2060101@enterprisedb.com Whole thread Raw |
In response to | Re: A little COPY speedup (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: A little COPY speedup
|
List | pgsql-patches |
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> On every row, PageAddItem will scan all the line pointers on the target >> page, just to see that they're all in use, and create a new line >> pointer. That adds up, especially with narrow tuples like what I used in >> the test. >> Attached is a fix for that. > > This has been proposed before, and rejected before. IIRC the previous > patch was quite a lot less invasive than this one (it didn't require > making special space on heap pages). I don't recall why it wasn't > accepted. Ahh, found that thread: http://archives.postgresql.org/pgsql-hackers/2005-07/msg00609.php The main differences between that patch and mine is that - the previous patch used an offset to the first free line pointer, and I used just a flag. - the previous patch stored the offset in the page header, and I used the special space I think using the special space is a cleaner approach; the field is only meaningful in heap pages. However, now that I think of it, if we could squeeze the flag into one of the existing fields in the page header, we could put it there without decreasing the amount of space available for tuples. We could use the unused pd_tli field, as you suggested later in that thread. At the end of the thread, Bruce added the patch to his hold-queue, but I couldn't find a trace of it after that so I'm not clear why it was rejected in the end. This comment (by you) seems most relevant: > I tried making a million-row table with just two int4 columns and then > duplicating it with CREATE TABLE AS SELECT. In this context gprof > shows PageAddItem as taking 7% of the runtime, which your patch knocks > down to 1.5%. This seems to be about the best possible real-world case, > though (the wider the rows, the fewer times PageAddItem can loop), and > so I'm still unconvinced that there's a generic gain here. Adding an > additional word to page headers has a very definite cost --- we can > assume about a .05% increase in net I/O demands across *every* > application, whether they do a lot of inserts or not --- and so a > patch that provides a noticeable improvement in only a very small set > of circumstances is going to have to be rejected. I believe the PageAddItem overhead has become more noticeable since then because of other improvements to COPY. In 8.3, we're also going to reduce the tuple length (combocids and the varvarlen thing), so we can fit more tuples per page, again making it slightly more significant. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-patches by date: