Greg,
Thanks for the mental prod! Yes, the original data is more closely sorted by the timestamptz column, since they
representevents coming into the collection system in real time. As for the distribution of data values, it goes
withoutsaying the timestamptz value is monotonically increasing, with roughly 1300 entries having the same timestamptz
value. The other three columns' values are essentially reference data, with 400 values for the varchar, 680 for the
firsttext column, and 60 for the second text column. The distribution is fairly even, with some small spikes but
nothingsignificant.
The "duh" moment came for me when you pointed out the implicit sort order of the data. After resorting the data into
thenew index column order the insert performance was largely restored. I didn't monitor the process with vmstat,
however- the end result is good enough for me. I believe that the index maintenance of page splitting, etc., that you
describebelow was exactly the culprit, and that presorting the data solved that problem.
I call it my "duh" moment since I've presorted data for Sybase and Oracle for exactly the same reason, but forgot to
applythe lesson to PostgreSQL.
BTW, this is PG 8.2.1 and 8.3.7 running on SLES 10.3, although I don't think it matters.
Thanks for the help, Greg and Tom!
--- On Sat, 6/27/09, Greg Smith <gsmith@gregsmith.com> wrote:
> From: Greg Smith <gsmith@gregsmith.com>
> Subject: Re: [PERFORM] Insert performance and multi-column index order
> To: bob_lunney@yahoo.com
> Cc: pgsql-performance@postgresql.org
> Date: Saturday, June 27, 2009, 1:08 AM
> On Fri, 26 Jun 2009, bob_lunney@yahoo.com
> wrote:
>
> > The original unique index was in the order
> (timestamptz, varchar, text, text) and most queries against
> it were slow. I changed the index order to (varchar, text,
> timestamptz, text) and queries now fly, but loading data
> (via copy from stdin) in the table is 2-4 times slower.
>
> Is the input data closer to being sorted by the timestamptz
> field than the varchar field? What you might be seeing
> is that the working set of index pages needed to keep
> building the varchar index are bigger or have more of a
> random access component to them as they spill in and out of
> the buffer cache. Usually you can get a better idea
> what the difference is by comparing the output from vmstat
> while the two are loading. More random read/write
> requests in the mix will increase the waiting for I/O
> percentage while not increasing the total amount
> read/written per second.
>
> --
> * Greg Smith gsmith@gregsmith.com
> http://www.gregsmith.com Baltimore, MD