Thread: Insert performance and multi-column index order

Insert performance and multi-column index order

From
bob_lunney@yahoo.com
Date:
I have a partitioned table with a multi-column unique index.  The table is partitioned on a timestamp with time zone
column. (I realize this has nothing to do with the unique index.)  The original unique index was in the order
(timestamptz,varchar, text, text) and most queries against it were slow.  I changed the index order to (varchar, text,
timestamptz,text) and queries now fly, but loading data (via copy from stdin) in the table is 2-4 times slower.  The
uniqueindex is required during the load.   

The original index is in the same order as the table's columns (2,3,4,5), while the changed index is in column order
(3,5,2,4). I've tested this several times and the effect is repeatable.  It does not seem the column order in the table
mattersto the insert/index performance, just the column order in the index. 

Why would changing the column order on a unique index cause data loading or index servicing to slow down?  Page splits
inthe b-tree, maybe? 

Thanks in advance for any advice.





Re: Insert performance and multi-column index order

From
Tom Lane
Date:
bob_lunney@yahoo.com writes:
> Why would changing the column order on a unique index cause data loading or index servicing to slow down?  Page
splitsin the b-tree, maybe? 

Yeah, perhaps.  Tell us about the data distributions in the columns?
Is there any ordering to the keys that're being inserted?

It's not in the least surprising that different column orders might be
better or worse suited for particular queries.  I'm mildly interested
in the question of why the bulk load speed is different, though.

            regards, tom lane

Re: Insert performance and multi-column index order

From
Greg Smith
Date:
On Fri, 26 Jun 2009, bob_lunney@yahoo.com wrote:

> The original unique index was in the order (timestamptz, varchar, text,
> text) and most queries against it were slow.  I changed the index order
> to (varchar, text, timestamptz, text) and queries now fly, but loading
> data (via copy from stdin) in the table is 2-4 times slower.

Is the input data closer to being sorted by the timestamptz field than the
varchar field?  What you might be seeing is that the working set of index
pages needed to keep building the varchar index are bigger or have more of
a random access component to them as they spill in and out of the buffer
cache.  Usually you can get a better idea what the difference is by
comparing the output from vmstat while the two are loading.  More random
read/write requests in the mix will increase the waiting for I/O
percentage while not increasing the total amount read/written per second.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Insert performance and multi-column index order

From
Bob Lunney
Date:
Greg,

Thanks for the mental prod!  Yes, the original data is more closely sorted by the timestamptz column, since they
representevents coming into the collection system in real time.  As for the distribution of data values, it goes
withoutsaying the timestamptz value is monotonically increasing, with roughly 1300 entries having the same timestamptz
value. The other three columns' values are essentially reference data, with 400 values for the varchar, 680 for the
firsttext column, and 60 for the second text column.  The distribution is fairly even, with some small spikes but
nothingsignificant. 

The "duh" moment came for me when you pointed out the implicit sort order of the data.  After resorting the data into
thenew index column order the insert performance was largely restored.  I didn't monitor the process with vmstat,
however- the end result is good enough for me.  I believe that the index maintenance of page splitting, etc., that you
describebelow was exactly the culprit, and that presorting the data solved that problem.   

I call it my "duh" moment since I've presorted data for Sybase and Oracle for exactly the same reason, but forgot to
applythe lesson to PostgreSQL. 

BTW, this is PG 8.2.1 and 8.3.7 running on SLES 10.3, although I don't think it matters.

Thanks for the help, Greg and Tom!

--- On Sat, 6/27/09, Greg Smith <gsmith@gregsmith.com> wrote:

> From: Greg Smith <gsmith@gregsmith.com>
> Subject: Re: [PERFORM] Insert performance and multi-column index order
> To: bob_lunney@yahoo.com
> Cc: pgsql-performance@postgresql.org
> Date: Saturday, June 27, 2009, 1:08 AM
> On Fri, 26 Jun 2009, bob_lunney@yahoo.com
> wrote:
>
> > The original unique index was in the order
> (timestamptz, varchar, text, text) and most queries against
> it were slow.  I changed the index order to (varchar, text,
> timestamptz, text) and queries now fly, but loading data
> (via copy from stdin) in the table is 2-4 times slower.
>
> Is the input data closer to being sorted by the timestamptz
> field than the varchar field?  What you might be seeing
> is that the working set of index pages needed to keep
> building the varchar index are bigger or have more of a
> random access component to them as they spill in and out of
> the buffer cache.  Usually you can get a better idea
> what the difference is by comparing the output from vmstat
> while the two are loading.  More random read/write
> requests in the mix will increase the waiting for I/O
> percentage while not increasing the total amount
> read/written per second.
>
> --
> * Greg Smith gsmith@gregsmith.com
> http://www.gregsmith.com Baltimore, MD