Home > mailing lists

Re: Index is not used for "IN (non-correlated subquery)" - Mailing list pgsql-general

From	George
Subject	Re: Index is not used for "IN (non-correlated subquery)"
Date	December 1, 2016 17:28:08
Msg-id	CAO=sJoUJuc_pA=z7WL6m+Th5SM9CmsY8FO7nPxUH0bg6BBMO-g@mail.gmail.com Whole thread
In response to	Re: Index is not used for "IN (non-correlated subquery)" (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Index is not used for "IN (non-correlated subquery)"
List	pgsql-general

Tree view

On Thu, Dec 1, 2016 at 6:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> George <pinkisntwell@gmail.com> writes:
>> explain analyze select *
>> from wg3ppbm_transaction where partner_uuid in (
>>             select p.uuid
>>             from wg3ppbm_userpartner up
>>             join wg3ppbm_partner p on p.id = up.partner_id
>> );
>
>> "Hash Semi Join  (cost=2.07..65628.14 rows=663727 width=380) (actual
>> time=0.346..1542.730 rows=1 loops=1)"
>> "  Hash Cond: ((wg3ppbm_transaction.partner_uuid)::text = (p.uuid)::text)"
>> "  ->  Seq Scan on wg3ppbm_transaction  (cost=0.00..54757.54
>> rows=1327454 width=380) (actual time=0.004..878.568 rows=1327587
>> loops=1)"
>
> So you're still getting the 50% default estimate, which is why it doesn't
> want to use the index ...
>
>> "  ->  Hash  (cost=2.06..2.06 rows=1 width=37) (actual
>> time=0.017..0.017 rows=1 loops=1)"
>> "        Buckets: 1024  Batches: 1  Memory Usage: 5kB"
>> "        ->  Nested Loop  (cost=0.00..2.06 rows=1 width=37) (actual
>> time=0.011..0.012 rows=1 loops=1)"
>> "              Join Filter: (up.partner_id = p.id)"
>> "              Rows Removed by Join Filter: 1"
>> "              ->  Seq Scan on wg3ppbm_userpartner up
>> (cost=0.00..1.01 rows=1 width=4) (actual time=0.004..0.004 rows=1
>> loops=1)"
>> "              ->  Seq Scan on wg3ppbm_partner p  (cost=0.00..1.02
>> rows=2 width=41) (actual time=0.001..0.001 rows=2 loops=1)"
>
> ... and you still don't have any meaningful number of rows in
> wg3ppbm_userpartner or wg3ppbm_partner.  However, I don't understand how
> it knows that there's only one or two rows in those tables and yet is
> producing the stupid default estimate for the semijoin.  I spent some time
> trying to duplicate that behavior, without success.  What PG version is
> that, exactly?

"PostgreSQL 9.5.5 on i686-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-17), 32-bit"

> Have you vacuumed and/or analyzed those two tables?

Yes.

> What
> do you get for
>
> select * from pg_stats where tablename = 'wg3ppbm_userpartner';
> and likewise for wg3ppbm_partner?

It is a wide table. Do you want me to dump csv here?

In the meantime, with the help of the folks at #postgresql I was able
to wisen up the query planner by using either one of the following two
settings:

 SET enable_seqscan = false

 SET cpu_tuple_cost = 0.1

I think this should be helpful.

pgsql-general by date:

From: Adrian Klaver
Date: 01 December 2016, 17:02:01
Subject: Re: PostgreSQL ODBC driver for OSX 10.8

From: Tom Lane
Date: 01 December 2016, 17:41:28
Subject: Re: Index is not used for "IN (non-correlated subquery)"

Re: Index is not used for "IN (non-correlated subquery)" - Mailing list pgsql-general

Previous

Next