Re: Index is not used for "IN (non-correlated subquery)" - Mailing list pgsql-general

From Merlin Moncure
Subject Re: Index is not used for "IN (non-correlated subquery)"
Date
Msg-id CAHyXU0wb7Bte5SNT+f1WxeZbTMMjoBVyJvjLZZQTq4KaoG7Bxw@mail.gmail.com
Whole thread Raw
In response to Re: Index is not used for "IN (non-correlated subquery)"  (George <pinkisntwell@gmail.com>)
Responses Re: Index is not used for "IN (non-correlated subquery)"
List pgsql-general
On Wed, Nov 30, 2016 at 11:05 AM, George <pinkisntwell@gmail.com> wrote:
> On Wed, Nov 30, 2016 at 6:45 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> On Wed, Nov 30, 2016 at 10:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> George <pinkisntwell@gmail.com> writes:
>>>> explain select * from wg3ppbm_transaction where partner_uuid in (
>>>>             select p.uuid
>>>>             from wg3ppbm_userpartner up
>>>>             join wg3ppbm_partner p on p.id = up.partner_id
>>>> );
>>>
>>>> "Hash Semi Join  (cost=2.07..425.72 rows=2960 width=482)"
>>>> "  Hash Cond: ((wg3ppbm_transaction.partner_uuid)::text = (p.uuid)::text)"
>>>> "  ->  Seq Scan on wg3ppbm_transaction  (cost=0.00..375.19 rows=5919 width=482)"
>>>> "  ->  Hash  (cost=2.06..2.06 rows=1 width=37)"
>>>> "        ->  Nested Loop  (cost=0.00..2.06 rows=1 width=37)"
>>>> "              Join Filter: (up.partner_id = p.id)"
>>>> "              ->  Seq Scan on wg3ppbm_userpartner up
>>>> (cost=0.00..1.01 rows=1 width=4)"
>>>> "              ->  Seq Scan on wg3ppbm_partner p  (cost=0.00..1.02
>>>> rows=2 width=41)"
>>>
>>> This plan is expecting to have to return about half of the rows in
>>> wg3ppbm_transaction, a situation for which an indexscan would NOT
>>> be a better choice.  The usual rule of thumb is that you need to be
>>> retrieving at most one or two percent of a table's rows for an indexscan
>>> on it to be faster than a seqscan.
>>>
>>> I think however that the "half" may be a default estimate occasioned
>>> by the other tables being empty and therefore not having any statistics.
>>> Another rule of thumb is that the plans you get for tiny tables have
>>> little to do with what happens once there's lots of data.
>>
>> Yeah, don't make query plan assumptions against empty or nearly empty
>> tables.  As the data grows, the plans will suitably change.  Perhaps
>> OP just recently loaded a bunch of data and the tables haven't been
>> analyzed yet?
>
> I just added a significant number of rows to the table. I now have
> 1.3M rows in total but only 8K rows that contain the value I am
> seeking. I also ran ANALYZE after loading the data. The query plans
> for the two queries did not change. Also, the simple query returns in
> 45 ms while the one with the subquery needs 1.5 s, i.e. it is about
> 30x slower.
>
> So there is definitely something wrong here. This situation makes many
> row-level security use cases cumbersome since you need to have
> almost the same WHERE clause both in the row-level security policy and
> in every SELECT query in order for the index to be used.

can you give EXPLAIN ANALYZE for the 'good' query and the 'bad' query?

merlin


pgsql-general by date:

Previous
From: George
Date:
Subject: Re: Index is not used for "IN (non-correlated subquery)"
Next
From: "David G. Johnston"
Date:
Subject: Re: select function alias