Re: Specifying many rows in a table - Mailing list pgsql-general

From Steve Atkins
Subject Re: Specifying many rows in a table
Date
Msg-id 20040129001421.GB7707@gp.word-to-the-wise.com
Whole thread Raw
In response to Re: Specifying many rows in a table  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Specifying many rows in a table  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Specifying many rows in a table  (Steve Atkins <steve@blighty.com>)
List pgsql-general
On Wed, Jan 28, 2004 at 06:13:36PM -0500, Tom Lane wrote:
> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > You could try inserting those
> > numbers into a temp table and subselecting it.
>
> That's what I'd try too; it should work reasonably well in 7.4.  Two
> thoughts:
>
> 1. Be sure to "analyze" the temp table after filling it.  The planner
> must know the number of rows in the table to have any shot at choosing a
> good plan.
>
> 2. If you know that the set of numbers you have emitted are all
> different, don't use IN at all, but a straight join to the temp table:
>
>     select table.* from table, tmp where id = tmp.id;
>
> With IN the system will have to work harder to eliminate duplicates from
> the temp table; why make it do that if you know it's not necessary?

Yup.

I'm trying a number of variants (in(list), in(select), exists, join) and
benchmarking all of them across a range of numbers, randomly chosen or
clustered with appropriate splashing of the DB and disk buffers between
runs.

Once the perl script has finished I'll post a pointer to the data.

An interesting thing, though is that for 10,000,000 rows in big and
one row in little, everything indexed and analyzed an exists query
is painfully bad...

  select * from big where exists (select 1 from little where s=big.id)

...expands to nested seqscans over big and little, rather than the
seqscan over little with a nested index scan over big I was expecting.

I'm pretty sure that in(list) will be a clear win for a small number
of rows, and a simple join will win for a larger number, but I'm
benchmarking the others out of random interest.

Cheers,
  Steve

pgsql-general by date:

Previous
From: Kris Jurka
Date:
Subject: Re: Is autocommit=true bad?
Next
From: Tom Lane
Date:
Subject: Re: Specifying many rows in a table