Home > mailing lists

Re: Specifying many rows in a table - Mailing list pgsql-general

From	Steve Atkins
Subject	Re: Specifying many rows in a table
Date	January 28, 2004 23:31:27
Msg-id	20040129032405.GA11231@gp.word-to-the-wise.com Whole thread Raw
In response to	Re: Specifying many rows in a table (Steve Atkins <steve@blighty.com>)
List	pgsql-general

Tree view

On Wed, Jan 28, 2004 at 04:14:21PM -0800, Steve Atkins wrote:
> I'm trying a number of variants (in(list), in(select), exists, join) and
> benchmarking all of them across a range of numbers, randomly chosen or
> clustered with appropriate splashing of the DB and disk buffers between
> runs.
>
> Once the perl script has finished I'll post a pointer to the data.

On the off-chance anyone else is interested in the performance
variations (or is searching the archives from the year 2020)...

I have a table of 10,000,000 rows, each row having a few hundred bytes
of data. It has an integer primary key, and is clustered on that key.
Real data, not synthetic, not that it makes a difference for this
test.

I have in the application layer a list of the rows I want to retrieve
(a sorted list of unique integers) and want to retrieve all those rows
from the big table.

The test system is a single processor Athlon 1800 with a Reiser
filesystem on two striped decent scsi drives. PostgreSQL is pretty
well tuned. Between each test run PostgreSQL was shut down, a gig of
data read from the disk to flush OS caches and PostgreSQL restarted.

The big table is indexed and analysed. All temporary tables are analysed
(and times include construction, population, any indexing and analysing
 of the table, but not destruction of it, nor time to transfer data to the
 client).

The sequence of ids to be retrieved is identical in each run. In the
first batch of tests it's completely random, but sorted. In the second batch
it's a sequence (42, 43, 44, 45...) with a random starting point.

inscalar: select * from big where id in (1, 17, 43 ...., 2071)
inselect: select * from big where id in (select id from small)
exists:   select * from big where exists (select 1 from small where id=big.id)
join:     select * from big, small where big.id = small.id;

inselect-idx: As inselect, but small is indexed
exists-idx:   As exists, but small is indexed
join-idx:     As join, but small is indexed

In all but the first case, small is created as a temporary table, populated
with a sequence of inserts (inside a transaction), possibly indexed then
analysed.

                             number of random rows retrieved
                1        10       100      1000      10000    100000
inscalar       0.394    0.818    3.80    23.5
inselect       0.752    0.770    0.751    0.977     0.798    0.790
join           0.728    0.746    0.625    0.766     0.703    0.806
inselect-idx   0.655    0.642    0.748    0.822     0.777    0.820
exists-idx    50.7     49.4     49.4     50.0      49.4     51.2
join-idx       0.657    0.703    0.760    0.730     0.731    0.815

                             number of sequential rows retrieved
                1        10       100      1000      10000    100000
inscalar       0.464    0.474    0.488    0.529
inselect       0.820    0.826    0.778    0.776     0.612    0.770
join           0.754    0.855    0.831    0.829     0.899    0.742
inselect-idx   0.686    0.679    0.703    0.747     0.747    0.729
exists-idx    49.2     79.8     49.5     49.3      49.3     53.7
join-idx       0.625    0.754    0.697    0.770     0.750    0.783

Overall, IN(scalar) looks pretty good for single digit numbers. A simple
join looks nice for anything else - and indexing the temporary table
doesn't seem to buy anything.

Pretty impressive overall, though.

Cheers,
  Steve

pgsql-general by date:

From: Craig Addleman
Date: 28 January 2004, 22:57:59
Subject: Pl/tcl auto-load problem

From: "Chris Travers"
Date: 28 January 2004, 23:45:34
Subject: Re: Interest in database-based email?

Re: Specifying many rows in a table - Mailing list pgsql-general

Previous

Next