Thread: Query planner riddle (array-related?)
Hi, I'm running Postgresql 9.6.7 on Debian stretch -- and I'm trying to understand a query plan, with any hint where to gain further insight welcome. Situation: Two tables, both with a bigint source_id as primary key: gaia.dr2light -- roughly 1.6 billion rows, about 30 columns, all of them reals, double precisions, bigints. gaia.dr2epochflux -- roughly 0.5 million rows, about 15 columns, 10 of which are arrays, typically real[] or double precision[] with 10 .. 30 array elements each. The source_ids in dr2epochflux are a small subset of the ones in dr2light. Also, dr2light has an index on a column called parallax, and select count(*) from gaia.dr2light where parallax>50; gives 5400 rows in no time. The distribution is such that this will actually touch 5400 disk blocks on gaia.dr2light. Both tables are read only (i.e., have been ingested and analyzed once and remained unchanged ever since). The story: SELECT * FROM gaia.dr2epochflux JOIN gaia.dr2light USING (source_id) WHERE parallax>50 (query 1) is horribly slow (like, many minutes). Forcing the planner to do the right thing, like this: SELECT * FROM gaia.dr2epochflux JOIN (SELECT * FROM gaia.dr2light WHERE parallax>50 OFFSET 0) AS q USING (source_id) (query 2) makes the query execute in about a second (and yields 18 rows). Why would Postgres choose a dramatically inferior plan? Here's what EXPLAIN gives for query 2 (the good plan): -------------------------------------------------------------------------------------------------------- Hash Join (cost=472743.04..26313612.35 rows=551038 width=1647) Hash Cond: (q.source_id = dr2epochflux.source_id) -> Subquery Scan on q (cost=243173.69..25417932.01 rows=12991627 width=132) -> Bitmap Heap Scan on dr2light (cost=243173.69..25288015.74 rows=12991627 width=132) Recheck Cond: (parallax > '50'::double precision) -> Bitmap Index Scan on dr2light_parallax (cost=0.00..239925.78 rows=12991627 width=0) Index Cond: (parallax > '50'::double precision) -> Hash (cost=118285.38..118285.38 rows=551038 width=1523) -> Seq Scan on dr2epochflux (cost=0.00..118285.38 rows=551038 width=1523) And here's the bad plan (for query 1): -------------------------------------------------------------------------------------- Nested Loop (cost=0.58..4801856.96 rows=4229 width=1647) -> Seq Scan on dr2epochflux (cost=0.00..118285.38 rows=551038 width=1523) -> Index Scan using dr2light_pkey on dr2light (cost=0.58..8.49 rows=1 width=132) Index Cond: (source_id = dr2epochflux.source_id) Filter: (parallax > '50'::double precision) If I enable_seqscan=0, it comes up with this for query 1: --------------------------------------------------------------------------------------------------------- Nested Loop (cost=1.00..4810726.18 rows=4229 width=1647) -> Index Scan using dr2epochflux_pkey on dr2epochflux (cost=0.42..127154.60 rows=551038 width=1523) -> Index Scan using dr2light_pkey on dr2light (cost=0.58..8.49 rows=1 width=132) Index Cond: (source_id = dr2epochflux.source_id) Filter: (parallax > '50'::double precision) -- which in reality appears to be a good deal faster than the "bad" plan, though still much, much slower than the "good plan". Both tables are ANALYZE-d, and they should be reasonably VACUUMED. Is there anything I can do to make it easier for the planner to see the light? -- Markus
Markus <m@tfiu.de> writes: > I'm running Postgresql 9.6.7 on Debian stretch -- and I'm trying to > understand a query plan, with any hint where to gain further insight > welcome. Well, you say > select count(*) from gaia.dr2light where parallax>50; > gives 5400 rows in no time. but the planner thinks there are 12991627 such rows: > -> Bitmap Heap Scan on dr2light (cost=243173.69..25288015.74 rows=12991627 width=132) > Recheck Cond: (parallax > '50'::double precision) > -> Bitmap Index Scan on dr2light_parallax (cost=0.00..239925.78 rows=12991627 width=0) > Index Cond: (parallax > '50'::double precision) So my first instinct would be to try to get that estimate more in line with reality. Maybe you need to increase the statistics target for that column. Also, this sort of thing is usually much easier to diagnose from EXPLAIN ANALYZE output. All we can see from these queries is that the planner picked what it thought was the lowest-cost plan. Without actual rowcounts it's very hard to guess why the estimates were wrong. You happened to provide one actual-rowcount number that maybe was enough to diagnose the issue; but if the above doesn't do the trick, we're going to need to see EXPLAIN ANALYZE to guess what else is up. regards, tom lane > -> Hash (cost=118285.38..118285.38 rows=551038 width=1523) > -> Seq Scan on dr2epochflux (cost=0.00..118285.38 rows=551038 width=1523) > And here's the bad plan (for query 1): > -------------------------------------------------------------------------------------- > Nested Loop (cost=0.58..4801856.96 rows=4229 width=1647) > -> Seq Scan on dr2epochflux (cost=0.00..118285.38 rows=551038 width=1523) > -> Index Scan using dr2light_pkey on dr2light (cost=0.58..8.49 rows=1 width=132) > Index Cond: (source_id = dr2epochflux.source_id) > Filter: (parallax > '50'::double precision) > If I enable_seqscan=0, it comes up with this for query 1: > --------------------------------------------------------------------------------------------------------- > Nested Loop (cost=1.00..4810726.18 rows=4229 width=1647) > -> Index Scan using dr2epochflux_pkey on dr2epochflux (cost=0.42..127154.60 rows=551038 width=1523) > -> Index Scan using dr2light_pkey on dr2light (cost=0.58..8.49 rows=1 width=132) > Index Cond: (source_id = dr2epochflux.source_id) > Filter: (parallax > '50'::double precision) > -- which in reality appears to be a good deal faster than the "bad" > plan, though still much, much slower than the "good plan". > Both tables are ANALYZE-d, and they should be reasonably VACUUMED. > Is there anything I can do to make it easier for the planner to see the > light? > -- Markus
Hi Tom, On Fri, May 04, 2018 at 09:32:08AM -0400, Tom Lane wrote: > Markus <m@tfiu.de> writes: > > I'm running Postgresql 9.6.7 on Debian stretch -- and I'm trying to > > understand a query plan, with any hint where to gain further insight > > welcome. > > Well, you say > > > select count(*) from gaia.dr2light where parallax>50; > > gives 5400 rows in no time. > > but the planner thinks there are 12991627 such rows: Ah... yeah, the parallax distribution is fairly sharply peaked around 0, so >50 might be severely off. So, I've run alter table gaia.dr2light alter parallax set statistics 1000; analyze gaia.dr2light; and lo and behold, the both queries become a good deal faster (a couple of seconds). That's essentially enough to make me happy -- thanks! > Also, this sort of thing is usually much easier to diagnose from > EXPLAIN ANALYZE output. All we can see from these queries is that > the planner picked what it thought was the lowest-cost plan. Without > actual rowcounts it's very hard to guess why the estimates were wrong. > You happened to provide one actual-rowcount number that maybe was > enough to diagnose the issue; but if the above doesn't do the trick, > we're going to need to see EXPLAIN ANALYZE to guess what else is up. With this, the query plans converge to trivial variations of Hash Join (cost=253856.23..4775113.84 rows=422 width=1647) (actual time=1967.095..2733.109 rows=18 loops=1) Hash Cond: (dr2light.source_id = dr2epochflux.source_id) -> Bitmap Heap Scan on dr2light (cost=24286.88..4385601.28 rows=1297329 width=132) (actual time=3.113..19.346 rows=5400loops=1) Recheck Cond: (parallax > '50'::double precision) Heap Blocks: exact=5184 -> Bitmap Index Scan on dr2light_parallax (cost=0.00..23962.54 rows=1297329 width=0) (actual time=1.721..1.721rows=5400 loops=1) Index Cond: (parallax > '50'::double precision) -> Hash (cost=118285.38..118285.38 rows=551038 width=1523) (actual time=1885.177..1885.177 rows=550737 loops=1) Buckets: 65536 Batches: 16 Memory Usage: 53292kB -> Seq Scan on dr2epochflux (cost=0.00..118285.38 rows=551038 width=1523) (actual time=0.008..430.692 rows=550737loops=1) Planning time: 6.504 ms Execution time: 2733.653 ms (with 10% or so jitter in the actual times, obviously); this one is for SELECT * FROM gaia.dr2epochflux JOIN gaia.dr2light USING (source_id) WHERE parallax>50 While that's a reasonable plan and fast enough, I'd still like to keep the box from seqscanning dr2epochflux with its large arrays and use that table's index on source_id. If I read the query plan right, part of the problem is that it still massively overestimates the result of parallax>50 (1297329 rather than 5400). Is there anything I can do to improve that estimate? But even with that suboptimal estimate, postgres, under the assumption of something not too far from a uniform distribution on source_id, should end up estimating the cardinality of the end result as something like (selectivity on dr2light)*(cardinality of dr2epochflux), and hence roughly (1297329/1.6e9*5e5)=405 rows to be drawn from dr2epochflux. It would seem a lot smarter to just pull these few 1e2 rows using the source_id index on dr2epochflux than seqscanning that table, no? Btw., I've raised the statistics target on dr2light to 1000 for source_id; so, postgres should know source_id isn't uniformly distributed, but it should also see it's not *that* far away from it. -- Markus
Markus <m@tfiu.de> writes: > Ah... yeah, the parallax distribution is fairly sharply peaked around > 0, so >50 might be severely off. > So, I've run > alter table gaia.dr2light alter parallax set statistics 1000; > analyze gaia.dr2light; > With this, the query plans converge to trivial variations of > Hash Join (cost=253856.23..4775113.84 rows=422 width=1647) (actual time=1967.095..2733.109 rows=18 loops=1) > Hash Cond: (dr2light.source_id = dr2epochflux.source_id) > -> Bitmap Heap Scan on dr2light (cost=24286.88..4385601.28 rows=1297329 width=132) (actual time=3.113..19.346 rows=5400loops=1) > -> Hash (cost=118285.38..118285.38 rows=551038 width=1523) (actual time=1885.177..1885.177 rows=550737 loops=1) > While that's a reasonable plan and fast enough, I'd still like to > keep the box from seqscanning dr2epochflux with its large arrays and > use that table's index on source_id. If I read the query plan right, > part of the problem is that it still massively overestimates the > result of parallax>50 (1297329 rather than 5400). Is there anything > I can do to improve that estimate? Raise the parallax stats target still higher, perhaps. I think we let you set it as high as 10000. > But even with that suboptimal estimate, postgres, under the > assumption of something not too far from a uniform distribution on > source_id, should end up estimating the cardinality of the end result > as something like > (selectivity on dr2light)*(cardinality of dr2epochflux), > and hence roughly (1297329/1.6e9*5e5)=405 rows to be drawn from > dr2epochflux. It would seem a lot smarter to just pull these few 1e2 > rows using the source_id index on dr2epochflux than seqscanning that > table, no? No. Given the above estimates, it would have to do 1297329 index lookups in dr2epochflux, which is not going to be a win compared to 1297329 probes into an in-memory hash table. Even with a dead-accurate estimate of 5400 dr2light rows to be joined, I don't think an inner indexscan is necessarily a better plan than a hash. It's the number of probes that matter, not the number of successful probes. (It's not clear to me why so few of the dr2light rows have join partners, but the planner does seem to understand that most of them don't.) You say you're worried about "large arrays" in dr2epochflux; but if they are large enough to be toasted out-of-line, it's really a nonissue. Only the toast pointers would be read during the seqscan or stored in the hash. regards, tom lane