Re: plan_rows confusion with parallel queries - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: plan_rows confusion with parallel queries |
Date | |
Msg-id | CA+TgmoY4XN2G05Ka8qNmegL4WkRX0HtiZgun9keNKPG8+BXBrA@mail.gmail.com Whole thread Raw |
In response to | Re: plan_rows confusion with parallel queries (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
List | pgsql-hackers |
On Wed, Nov 2, 2016 at 10:44 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Although - it is estimating 1M rows, but only "per worker" estimates are > shown, and because there are 2 workers planned it says 1M/2.4 which is the > 416k. I agree it's a bit unclear, but at least it's consistent with how we > treat loops (i.e. that the numbers are per loop). Right. Which I think was a horrible decision. I think that it would be best to change EXPLAIN so that the row counts and costs are never divided by nloops. That would be a backward-incompatible change, but I think it would be worth it. What you typically want to understand is the total effort expended in a particular plan node, and the current system makes that incredibly difficult to understand, especially because we then round off the row count estimates to the nearest integer, so that you can't even reverse the division if you want to (which you always do). > But there's more fun with joins - consider for example this simple join: > > QUERY PLAN > ------------------------------------------------------------------------------ > Gather (cost=19515.96..43404.82 rows=96957 width=12) > (actual time=295.167..746.312 rows=99999 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Hash Join (cost=18515.96..32709.12 rows=96957 width=12) > (actual time=249.281..670.309 rows=33333 loops=3) > Hash Cond: (t2.a = t1.a) > -> Parallel Seq Scan on t2 > (cost=0.00..8591.67 rows=416667 width=8) > (actual time=0.100..184.315 rows=333333 loops=3) > -> Hash (cost=16925.00..16925.00 rows=96957 width=8) > (actual time=246.760..246.760 rows=99999 loops=3) > Buckets: 131072 Batches: 2 Memory Usage: 2976kB > -> Seq Scan on t1 > (cost=0.00..16925.00 rows=96957 width=8) > (actual time=0.065..178.385 rows=99999 loops=3) > Filter: (b < 100000) > Rows Removed by Filter: 900001 > Planning time: 0.763 ms > Execution time: 793.653 ms > (13 rows) > > Suddenly we don't show per-worker estimates for the hash join - both the > Hash Join and the Gather have exactly the same cardinality estimate. I'm not sure why that's happening, but I haven't made any changes to the costing for a node like hash join. It doesn't treat the parallel sequential scan that is coming as its first input any differently than it would if that were a non-parallel plan. It's just costing the join normally, based on an input row count that is lower than what it would be if it were going to see every row from t2 rather than only some of them. > So, different join method but same result - 2 workers, loops=3. But let's > try with small tables (100k rows instead of 1M rows): > > QUERY PLAN > ---------------------------------------------------------------------------- > Gather (cost=0.29..36357.94 rows=100118 width=12) (actual > time=13.219..589.723 rows=100000 loops=1) > Workers Planned: 1 > Workers Launched: 1 > Single Copy: true > -> Nested Loop (cost=0.29..36357.94 rows=100118 width=12) > (actual time=0.288..442.821 rows=100000 loops=1) > -> Seq Scan on t1 (cost=0.00..1444.18 rows=100118 width=8) > (actual time=0.148..49.308 rows=100000 loops=1) > -> Index Scan using t2_a_idx on t2 > (cost=0.29..0.34 rows=1 width=8) > (actual time=0.002..0.002 rows=1 loops=100000) > Index Cond: (a = t1.a) > Planning time: 0.483 ms > Execution time: 648.941 ms > (10 rows) > > Suddenly, we get nworkers=1 with loops=1 (and not nworkers+1 as before). > FWIW I've only seen this with force_parallel_mode=on, and the row counts are > correct, so perhaps that's OK. single_copy seems a bit underdocumented, > though. This is certainly entirely as expected. Single-copy means that there's one process running the non-parallel plan beneath it, and that's it. So the Gather is just a pass-through node here, like a Materialize or Sort: the number of input rows and the number of output rows have to be the same. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: