Re: Query planner riddle (array-related?) - Mailing list pgsql-general

From Tom Lane
Subject Re: Query planner riddle (array-related?)
Date
Msg-id 25193.1525440728@sss.pgh.pa.us
Whole thread Raw
In response to Query planner riddle (array-related?)  (Markus <m@tfiu.de>)
Responses Re: Query planner riddle (array-related?)  (Markus <m@tfiu.de>)
List pgsql-general
Markus <m@tfiu.de> writes:
> I'm running Postgresql 9.6.7 on Debian stretch -- and I'm trying to
> understand a query plan, with any hint where to gain further insight
> welcome.

Well, you say

>   select count(*) from gaia.dr2light where parallax>50;
> gives 5400 rows in no time.

but the planner thinks there are 12991627 such rows:

>          ->  Bitmap Heap Scan on dr2light  (cost=243173.69..25288015.74 rows=12991627 width=132)
>                Recheck Cond: (parallax > '50'::double precision)
>                ->  Bitmap Index Scan on dr2light_parallax  (cost=0.00..239925.78 rows=12991627 width=0)
>                      Index Cond: (parallax > '50'::double precision)

So my first instinct would be to try to get that estimate more in
line with reality.  Maybe you need to increase the statistics target
for that column.

Also, this sort of thing is usually much easier to diagnose from
EXPLAIN ANALYZE output.  All we can see from these queries is that
the planner picked what it thought was the lowest-cost plan.  Without
actual rowcounts it's very hard to guess why the estimates were wrong.
You happened to provide one actual-rowcount number that maybe was
enough to diagnose the issue; but if the above doesn't do the trick,
we're going to need to see EXPLAIN ANALYZE to guess what else is up.

            regards, tom lane




>    ->  Hash  (cost=118285.38..118285.38 rows=551038 width=1523)
>          ->  Seq Scan on dr2epochflux  (cost=0.00..118285.38 rows=551038 width=1523)


> And here's the bad plan (for query 1):

> --------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.58..4801856.96 rows=4229 width=1647)
>    ->  Seq Scan on dr2epochflux  (cost=0.00..118285.38 rows=551038 width=1523)
>    ->  Index Scan using dr2light_pkey on dr2light  (cost=0.58..8.49 rows=1 width=132)
>          Index Cond: (source_id = dr2epochflux.source_id)
>          Filter: (parallax > '50'::double precision)

> If I enable_seqscan=0, it comes up with this for query 1:

> ---------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=1.00..4810726.18 rows=4229 width=1647)
>    ->  Index Scan using dr2epochflux_pkey on dr2epochflux  (cost=0.42..127154.60 rows=551038 width=1523)
>    ->  Index Scan using dr2light_pkey on dr2light  (cost=0.58..8.49 rows=1 width=132)
>          Index Cond: (source_id = dr2epochflux.source_id)
>          Filter: (parallax > '50'::double precision)

> -- which in reality appears to be a good deal faster than the "bad"
> plan, though still much, much slower than the "good plan".

> Both tables are ANALYZE-d, and they should be reasonably VACUUMED.

> Is there anything I can do to make it easier for the planner to see the
> light?

>            -- Markus



pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: How to find the hits on the databases and tables in Postgres
Next
From: Michael Paquier
Date:
Subject: Re: relkind='p' has no pg_stat_user_tables