Thread: Query planner riddle (array-related?)

Query planner riddle (array-related?)

From
Markus
Date:
Hi,

I'm running Postgresql 9.6.7 on Debian stretch -- and I'm trying to
understand a query plan, with any hint where to gain further insight
welcome.

Situation: Two tables, both with a bigint source_id as primary key:

gaia.dr2light -- roughly 1.6 billion rows, about 30 columns, all of them
reals, double precisions, bigints.

gaia.dr2epochflux -- roughly 0.5 million rows, about 15 columns, 10 of
which are arrays, typically real[] or double precision[] with 10 .. 30
array elements each.

The source_ids in dr2epochflux are a small subset of the ones in
dr2light.  Also, dr2light has an index on a column called parallax, and 

  select count(*) from gaia.dr2light where parallax>50;

gives 5400 rows in no time.  The distribution is such that this will
actually touch 5400 disk blocks on gaia.dr2light.  Both tables are read
only (i.e., have been ingested and analyzed once and remained unchanged
ever since).

The story:

SELECT * 
FROM gaia.dr2epochflux 
JOIN gaia.dr2light
USING (source_id)
WHERE parallax>50

(query 1) is horribly slow (like, many minutes).  Forcing the planner to
do the right thing, like this:

SELECT * 
FROM gaia.dr2epochflux 
JOIN (SELECT * FROM gaia.dr2light
  WHERE parallax>50 OFFSET 0) AS q
USING (source_id)

(query 2) makes the query execute in about a second (and yields 18 rows).

Why would Postgres choose a dramatically inferior plan?  Here's what
EXPLAIN gives for query 2 (the good plan):

--------------------------------------------------------------------------------------------------------
 Hash Join  (cost=472743.04..26313612.35 rows=551038 width=1647)
   Hash Cond: (q.source_id = dr2epochflux.source_id)
   ->  Subquery Scan on q  (cost=243173.69..25417932.01 rows=12991627 width=132)
         ->  Bitmap Heap Scan on dr2light  (cost=243173.69..25288015.74 rows=12991627 width=132)
               Recheck Cond: (parallax > '50'::double precision)
               ->  Bitmap Index Scan on dr2light_parallax  (cost=0.00..239925.78 rows=12991627 width=0)
                     Index Cond: (parallax > '50'::double precision)
   ->  Hash  (cost=118285.38..118285.38 rows=551038 width=1523)
         ->  Seq Scan on dr2epochflux  (cost=0.00..118285.38 rows=551038 width=1523)


And here's the bad plan (for query 1):

--------------------------------------------------------------------------------------
 Nested Loop  (cost=0.58..4801856.96 rows=4229 width=1647)
   ->  Seq Scan on dr2epochflux  (cost=0.00..118285.38 rows=551038 width=1523)
   ->  Index Scan using dr2light_pkey on dr2light  (cost=0.58..8.49 rows=1 width=132)
         Index Cond: (source_id = dr2epochflux.source_id)
         Filter: (parallax > '50'::double precision)

If I enable_seqscan=0, it comes up with this for query 1:

---------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.00..4810726.18 rows=4229 width=1647)
   ->  Index Scan using dr2epochflux_pkey on dr2epochflux  (cost=0.42..127154.60 rows=551038 width=1523)
   ->  Index Scan using dr2light_pkey on dr2light  (cost=0.58..8.49 rows=1 width=132)
         Index Cond: (source_id = dr2epochflux.source_id)
         Filter: (parallax > '50'::double precision)

-- which in reality appears to be a good deal faster than the "bad"
plan, though still much, much slower than the "good plan".

Both tables are ANALYZE-d, and they should be reasonably VACUUMED.

Is there anything I can do to make it easier for the planner to see the
light?

           -- Markus



Re: Query planner riddle (array-related?)

From
Tom Lane
Date:
Markus <m@tfiu.de> writes:
> I'm running Postgresql 9.6.7 on Debian stretch -- and I'm trying to
> understand a query plan, with any hint where to gain further insight
> welcome.

Well, you say

>   select count(*) from gaia.dr2light where parallax>50;
> gives 5400 rows in no time.

but the planner thinks there are 12991627 such rows:

>          ->  Bitmap Heap Scan on dr2light  (cost=243173.69..25288015.74 rows=12991627 width=132)
>                Recheck Cond: (parallax > '50'::double precision)
>                ->  Bitmap Index Scan on dr2light_parallax  (cost=0.00..239925.78 rows=12991627 width=0)
>                      Index Cond: (parallax > '50'::double precision)

So my first instinct would be to try to get that estimate more in
line with reality.  Maybe you need to increase the statistics target
for that column.

Also, this sort of thing is usually much easier to diagnose from
EXPLAIN ANALYZE output.  All we can see from these queries is that
the planner picked what it thought was the lowest-cost plan.  Without
actual rowcounts it's very hard to guess why the estimates were wrong.
You happened to provide one actual-rowcount number that maybe was
enough to diagnose the issue; but if the above doesn't do the trick,
we're going to need to see EXPLAIN ANALYZE to guess what else is up.

            regards, tom lane




>    ->  Hash  (cost=118285.38..118285.38 rows=551038 width=1523)
>          ->  Seq Scan on dr2epochflux  (cost=0.00..118285.38 rows=551038 width=1523)


> And here's the bad plan (for query 1):

> --------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.58..4801856.96 rows=4229 width=1647)
>    ->  Seq Scan on dr2epochflux  (cost=0.00..118285.38 rows=551038 width=1523)
>    ->  Index Scan using dr2light_pkey on dr2light  (cost=0.58..8.49 rows=1 width=132)
>          Index Cond: (source_id = dr2epochflux.source_id)
>          Filter: (parallax > '50'::double precision)

> If I enable_seqscan=0, it comes up with this for query 1:

> ---------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=1.00..4810726.18 rows=4229 width=1647)
>    ->  Index Scan using dr2epochflux_pkey on dr2epochflux  (cost=0.42..127154.60 rows=551038 width=1523)
>    ->  Index Scan using dr2light_pkey on dr2light  (cost=0.58..8.49 rows=1 width=132)
>          Index Cond: (source_id = dr2epochflux.source_id)
>          Filter: (parallax > '50'::double precision)

> -- which in reality appears to be a good deal faster than the "bad"
> plan, though still much, much slower than the "good plan".

> Both tables are ANALYZE-d, and they should be reasonably VACUUMED.

> Is there anything I can do to make it easier for the planner to see the
> light?

>            -- Markus



Re: Query planner riddle (array-related?)

From
Markus
Date:
Hi Tom,

On Fri, May 04, 2018 at 09:32:08AM -0400, Tom Lane wrote:
> Markus <m@tfiu.de> writes:
> > I'm running Postgresql 9.6.7 on Debian stretch -- and I'm trying to
> > understand a query plan, with any hint where to gain further insight
> > welcome.
> 
> Well, you say
> 
> >   select count(*) from gaia.dr2light where parallax>50;
> > gives 5400 rows in no time.
> 
> but the planner thinks there are 12991627 such rows:

Ah... yeah, the parallax distribution is fairly sharply peaked around
0, so >50 might be severely off.

So, I've run

  alter table gaia.dr2light alter parallax set statistics 1000;
  analyze gaia.dr2light;

and lo and behold, the both queries become a good deal faster (a
couple of seconds).  That's essentially enough to make me happy --
thanks!

> Also, this sort of thing is usually much easier to diagnose from
> EXPLAIN ANALYZE output.  All we can see from these queries is that
> the planner picked what it thought was the lowest-cost plan.  Without
> actual rowcounts it's very hard to guess why the estimates were wrong.
> You happened to provide one actual-rowcount number that maybe was
> enough to diagnose the issue; but if the above doesn't do the trick,
> we're going to need to see EXPLAIN ANALYZE to guess what else is up.

With this, the query plans converge to trivial variations of

 Hash Join  (cost=253856.23..4775113.84 rows=422 width=1647) (actual time=1967.095..2733.109 rows=18 loops=1)
   Hash Cond: (dr2light.source_id = dr2epochflux.source_id)
   ->  Bitmap Heap Scan on dr2light  (cost=24286.88..4385601.28 rows=1297329 width=132) (actual time=3.113..19.346
rows=5400loops=1)
 
         Recheck Cond: (parallax > '50'::double precision)
         Heap Blocks: exact=5184
         ->  Bitmap Index Scan on dr2light_parallax  (cost=0.00..23962.54 rows=1297329 width=0) (actual
time=1.721..1.721rows=5400 loops=1)
 
               Index Cond: (parallax > '50'::double precision)
   ->  Hash  (cost=118285.38..118285.38 rows=551038 width=1523) (actual time=1885.177..1885.177 rows=550737 loops=1)
         Buckets: 65536  Batches: 16  Memory Usage: 53292kB
         ->  Seq Scan on dr2epochflux  (cost=0.00..118285.38 rows=551038 width=1523) (actual time=0.008..430.692
rows=550737loops=1)
 
 Planning time: 6.504 ms
 Execution time: 2733.653 ms

(with 10% or so jitter in the actual times, obviously); this one is
for

  SELECT *
  FROM gaia.dr2epochflux
  JOIN gaia.dr2light
  USING (source_id)
  WHERE parallax>50


While that's a reasonable plan and fast enough, I'd still like to
keep the box from seqscanning dr2epochflux with its large arrays and
use that table's index on source_id.  If I read the query plan right,
part of the problem is that it still massively overestimates the
result of parallax>50 (1297329 rather than 5400).  Is there anything
I can do to improve that estimate?

But even with that suboptimal estimate, postgres, under the
assumption of something not too far from a uniform distribution on
source_id, should end up estimating the cardinality of the end result
as something like

(selectivity on dr2light)*(cardinality of dr2epochflux),

and hence roughly (1297329/1.6e9*5e5)=405 rows to be drawn from
dr2epochflux.  It would seem a lot smarter to just pull these few 1e2
rows using the source_id index on dr2epochflux than seqscanning that
table, no?

Btw., I've raised the statistics target on dr2light to 1000 for
source_id; so, postgres should know source_id isn't uniformly
distributed, but it should also see it's not *that* far away from it.


           -- Markus



Re: Query planner riddle (array-related?)

From
Tom Lane
Date:
Markus <m@tfiu.de> writes:
> Ah... yeah, the parallax distribution is fairly sharply peaked around
> 0, so >50 might be severely off.
> So, I've run
>   alter table gaia.dr2light alter parallax set statistics 1000;
>   analyze gaia.dr2light;
> With this, the query plans converge to trivial variations of

>  Hash Join  (cost=253856.23..4775113.84 rows=422 width=1647) (actual time=1967.095..2733.109 rows=18 loops=1)
>    Hash Cond: (dr2light.source_id = dr2epochflux.source_id)
>    ->  Bitmap Heap Scan on dr2light  (cost=24286.88..4385601.28 rows=1297329 width=132) (actual time=3.113..19.346
rows=5400loops=1) 
>    ->  Hash  (cost=118285.38..118285.38 rows=551038 width=1523) (actual time=1885.177..1885.177 rows=550737 loops=1)

> While that's a reasonable plan and fast enough, I'd still like to
> keep the box from seqscanning dr2epochflux with its large arrays and
> use that table's index on source_id.  If I read the query plan right,
> part of the problem is that it still massively overestimates the
> result of parallax>50 (1297329 rather than 5400).  Is there anything
> I can do to improve that estimate?

Raise the parallax stats target still higher, perhaps.  I think we let you
set it as high as 10000.

> But even with that suboptimal estimate, postgres, under the
> assumption of something not too far from a uniform distribution on
> source_id, should end up estimating the cardinality of the end result
> as something like
> (selectivity on dr2light)*(cardinality of dr2epochflux),
> and hence roughly (1297329/1.6e9*5e5)=405 rows to be drawn from
> dr2epochflux.  It would seem a lot smarter to just pull these few 1e2
> rows using the source_id index on dr2epochflux than seqscanning that
> table, no?

No.  Given the above estimates, it would have to do 1297329 index lookups
in dr2epochflux, which is not going to be a win compared to 1297329 probes
into an in-memory hash table.  Even with a dead-accurate estimate of 5400
dr2light rows to be joined, I don't think an inner indexscan is
necessarily a better plan than a hash.  It's the number of probes that
matter, not the number of successful probes.

(It's not clear to me why so few of the dr2light rows have join partners,
but the planner does seem to understand that most of them don't.)

You say you're worried about "large arrays" in dr2epochflux; but if they
are large enough to be toasted out-of-line, it's really a nonissue.  Only
the toast pointers would be read during the seqscan or stored in the hash.

            regards, tom lane