Home > mailing lists

Re: Query planner riddle (array-related?) - Mailing list pgsql-general

From	Tom Lane
Subject	Re: Query planner riddle (array-related?)
Date	May 4, 2018 16:32:08
Msg-id	25193.1525440728@sss.pgh.pa.us Whole thread Raw
In response to	Query planner riddle (array-related?) (Markus <m@tfiu.de>)
Responses	Re: Query planner riddle (array-related?)
List	pgsql-general

Tree view

Markus <m@tfiu.de> writes:
> I'm running Postgresql 9.6.7 on Debian stretch -- and I'm trying to
> understand a query plan, with any hint where to gain further insight
> welcome.

Well, you say

>   select count(*) from gaia.dr2light where parallax>50;
> gives 5400 rows in no time.

but the planner thinks there are 12991627 such rows:

>          ->  Bitmap Heap Scan on dr2light  (cost=243173.69..25288015.74 rows=12991627 width=132)
>                Recheck Cond: (parallax > '50'::double precision)
>                ->  Bitmap Index Scan on dr2light_parallax  (cost=0.00..239925.78 rows=12991627 width=0)
>                      Index Cond: (parallax > '50'::double precision)

So my first instinct would be to try to get that estimate more in
line with reality.  Maybe you need to increase the statistics target
for that column.

Also, this sort of thing is usually much easier to diagnose from
EXPLAIN ANALYZE output.  All we can see from these queries is that
the planner picked what it thought was the lowest-cost plan.  Without
actual rowcounts it's very hard to guess why the estimates were wrong.
You happened to provide one actual-rowcount number that maybe was
enough to diagnose the issue; but if the above doesn't do the trick,
we're going to need to see EXPLAIN ANALYZE to guess what else is up.

            regards, tom lane




>    ->  Hash  (cost=118285.38..118285.38 rows=551038 width=1523)
>          ->  Seq Scan on dr2epochflux  (cost=0.00..118285.38 rows=551038 width=1523)


> And here's the bad plan (for query 1):

> --------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.58..4801856.96 rows=4229 width=1647)
>    ->  Seq Scan on dr2epochflux  (cost=0.00..118285.38 rows=551038 width=1523)
>    ->  Index Scan using dr2light_pkey on dr2light  (cost=0.58..8.49 rows=1 width=132)
>          Index Cond: (source_id = dr2epochflux.source_id)
>          Filter: (parallax > '50'::double precision)

> If I enable_seqscan=0, it comes up with this for query 1:

> ---------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=1.00..4810726.18 rows=4229 width=1647)
>    ->  Index Scan using dr2epochflux_pkey on dr2epochflux  (cost=0.42..127154.60 rows=551038 width=1523)
>    ->  Index Scan using dr2light_pkey on dr2light  (cost=0.58..8.49 rows=1 width=132)
>          Index Cond: (source_id = dr2epochflux.source_id)
>          Filter: (parallax > '50'::double precision)

> -- which in reality appears to be a good deal faster than the "bad"
> plan, though still much, much slower than the "good plan".

> Both tables are ANALYZE-d, and they should be reasonably VACUUMED.

> Is there anything I can do to make it easier for the planner to see the
> light?

>            -- Markus

pgsql-general by date:

From: Adrian Klaver
Date: 04 May 2018, 16:21:00
Subject: Re: How to find the hits on the databases and tables in Postgres

From: Michael Paquier
Date: 04 May 2018, 17:13:33
Subject: Re: relkind='p' has no pg_stat_user_tables

Re: Query planner riddle (array-related?) - Mailing list pgsql-general

Previous

Next