Re: Slow query with sub-select

From: - -
Subject: Re: Slow query with sub-select
Date: ,
Msg-id: BAY147-W52EBF0D7C26204DA1DF028F0480@phx.gbl
(view: Whole thread, Raw)
In response to: Re: Slow query with sub-select  (Tom Lane)
Responses: Re: Slow query with sub-select  (Rick Genter)
List: pgsql-general

Tree view

Slow query with sub-select  (- -, )
 Re: Slow query with sub-select  (David Johnston, )
  Re: Slow query with sub-select  (- -, )
   Re: Slow query with sub-select  (Tom Lane, )
    Re: Slow query with sub-select  (- -, )
     Re: Slow query with sub-select  (Rick Genter, )
   Re: Slow query with sub-select  (Michael Nolan, )

> - - <> writes:
> > The weird thing is that before I updated my server the query was about 5 times faster.
> > I've googled and I think the problem lies with the under-estimation of the query planner about the number of rows in the nested table.I will be trying the 'set enable_seqscan = false' solution to see if that'll improve.
>
> You evidently already do have that turned off. I'd suggest reverting
> that change (ie, allow seqscan) and instead increase work_mem enough
> so that the hash join can work without spilling to disk. This query
> is a perfect example of where indexes do not help, and trying to force
> them to be used makes things slower not faster.
>
> regards, tom lane


< div>
I have switched on seqscan and increased work_mem to 1GB ... but no luck so far.

The version I'm using is PostgreSQL 8.4.8 on i686-pc-linux-gnu, compiled by GCC gcc-4.5.real (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 32-bit
Before that I used an earlier minor version (8.4.x - I don't remember what x is but it was the one packaged in the version before Ubuntu Natty).


These are the relevant schemas.

< div>CREATE TABLE q (
  mid VARCHAR(10) NOT NULL
  ...
);

CREATE TABLE t (
  mid VARCHAR(10) NOT NULL PRIMARY KEY
  ...
);

I would like to count rows in q whose mid does not exist in t.

This is the query I used.

SELECT COUNT(*) FROM q
      WHERE NOT EXISTS (SELECT 1
                          FROM t
                         WHERE t.mid = q.mid);

Based on my understanding, I believe t he query will loop through each row in q (which has about 500m rows) and for each row it will check a one-to-one mapping against t (which has about 3m rows) by using an index scan on t (mid).

However, the EXPLAIN outputs for seqscan = on and seqscan = off, respectively, seem to indicate that it is not a one-to-one mapping of t.mid and q.mid.

I then switched the comparison operator in the where clause as follows:

SELECT COUNT(*) FROM q
      WHERE NOT EXISTS (SELECT 1
                          FROM t
                         WHERE q.mid = t.mid);

As there is no index on q (mid) this type of query should take a considerably longer time.  However, the EXPLAIN outputs seem to be the same.  
Here they are:

With seqscan = on
&nbs p;                                    QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Aggregate  (cost=18566199.92..18566199.93 rows=1 width=0)
   ->  Hash Anti Join  (cost=747023.15..18566199.91 rows=1 width=0)
         Hash Cond: ((q.mid)::text = (t.mid)::text)
         ->  Seq Scan on q  (cost=0.00..11451989.24 rows=565972224 width=10)
         ->  Hash  (cost=701775.29..701775.29 rows=3619829 width=10)
               ->  Seq Scan on t  (cost=0.00..701775.29 rows=3619829 width=10)
(6 rows)


With seqscan = off
                                              QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10024599995.73..10024599995.74 rows=1 width=0)
   ->  Hash Anti Join  (cost=10006780818.96..10024599995.72 rows=1 width=0)
         Hash Cond: ((q.mid)::text = (t.mid)::text)
         ->  Seq Scan on q  (cost=1000000 0000.00..10011451989.24 rows=565972224 width=10)
         ->  Hash  (cost=6735571.10..6735571.10 rows=3619829 width=10)
               ->  Index Scan using t_pkey on t  (cost=0.00..6735571.10 rows=3619829 width=10)
(6 rows)


Any help is greatly appreciated as this problem has been depressing me for two weeks.

pgsql-general by date:

From: Rick Genter
Date:
Subject: Re: Slow query with sub-select
From: AI Rumman
Date:
Subject: "FATAL: role "postgres" does not exist" after recover