Re: improve 'where not exists' query.. - Mailing list pgsql-general

From Noel Whelan
Subject Re: improve 'where not exists' query..
Date
Msg-id 1eb3081c0510270918p14ec35d9oec8e6e7c344663a4@mail.gmail.com
Whole thread Raw
In response to Re: improve 'where not exists' query..  (Richard Huxton <dev@archonet.com>)
List pgsql-general
I've created an index on contacts.cempid (I'd not even checked whether one existed); and the query is certainly improved. I'm wondering..I get the impression that the hash index is ideal in this case, technically, because I only intend to query with '='; but the btree index is a bit faster. I'll need to look further into that..

Thank you for the input on this.
- Noel

On 10/27/05, Richard Huxton <dev@archonet.com> wrote:
Noel Whelan wrote:
> I executed the following:
>
> EXPLAIN ANALYZE SELECT cwit.cempid
> FROM "cwItems" cwit
> WHERE (NOT (EXISTS (SELECT con.cempid
> FROM contacts con
> WHERE ( con.cempid = cwit.cempid))));
>
> It comes back with:
>
> Seq Scan on "cwItems" cwit (cost=0.00..8929702.11 rows=5132 width=8) (actual
> time=132218.29..148623.27 rows=31 loops=1)
> Filter: (NOT (subplan))
> SubPlan
> -> Seq Scan on contacts con (cost=0.00..870.77 rows=1 width=8) (actual time=
> 11.82..11.82 rows=1 loops=12528)
> Filter: (cempid = $0)
> Total runtime: 148623.54 msec
>
> I'm not certain it's an issue with the query itself going wrong. I consider
> I'm basically telling it: for each cempid in 'cwItems', check whether or not
> it exists as a cempid in 'contacts', which could be inherently inefficient
> in itself.

Well, as you say it's never going to be possible without checking all
items. However, you've got two seq-scans there and I'd expect a seq-scan
and looping over an index (or a hash-based plan perhaps).

Now, looking at the values it's only seeing 31 rows in cwItems and 1 row
(!) in contacts, so that would explain the scans. However, in that case
I'd expect it to be much faster than it is.

Hmm - I'd take the following steps:

1. VACUUM FULL VERBOSE ANALYSE <table> on both those tables and then see
what happens. Make a note of how many removable/non-removable rows it
finds. Rerun the explain analyse and see what happens.

2. Check that you have an index on contact.cempid and that the types of
cempid match in both tables. Then issue "SET enable_seqscan=true" and
run the explain analyse again - are things faster?

Let us know what happens, oh and don't forget to cc: the list, you were
lucky I read this.

>  I just wondered if there'd be a way to improve on it or not.
>  Installation is postgres-7.3.4.

Upgrade to the latest 7.3.x version as soon as is convenient - lots of
bug fixes to be had.

--
   Richard Huxton
   Archonet Ltd

pgsql-general by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: SHA1 authentication
Next
From: "Andrus"
Date:
Subject: Re: alt+F not working after calling pg_dump