Re: [HACKERS] Gather Merge - Mailing list pgsql-hackers
From | Andreas Joseph Krogh |
---|---|
Subject | Re: [HACKERS] Gather Merge |
Date | |
Msg-id | VisenaEmail.2f.17bd9d65d44ba7b9.15ab7c23b58@tc7-visena Whole thread Raw |
In response to | Re: [HACKERS] Gather Merge (Rushabh Lathia <rushabh.lathia@gmail.com>) |
List | pgsql-hackers |
På fredag 10. mars 2017 kl. 10:34:48, skrev Rushabh Lathia <rushabh.lathia@gmail.com>:
On Fri, Mar 10, 2017 at 2:42 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:På fredag 10. mars 2017 kl. 10:09:22, skrev Rushabh Lathia <rushabh.lathia@gmail.com>:On Fri, Mar 10, 2017 at 2:33 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:[...]The execution-plan seems (unsurprisingly) to depend on data-distribution, so is there a way I can force a GatherMerge?Not directly. GatherMerge cost is mainly depend on parallel_setup_cost,parallel_tuple_cost and cpu_operator_cost. May be you can force this by setting this cost low enough. Or another way to force is by disable theother plans.What plan you are getting now? You not seeing the below error ?ERROR: GatherMerge child's targetlist doesn't match GatherMergeI'm seeing the same error, it's just that for reproducing it I'd rather not copy my whole dataset.Can you share me a schema information, I will try to reproduce at my side?
The relevant schema is this:
drop table if EXISTS temp_email_address_owner; drop table if EXISTS temp_email_delivery; drop table if EXISTS temp_email_message; create table temp_email_message( entity_id BIGSERIAL PRIMARY KEY ); create table temp_email_delivery( entity_id BIGSERIAL PRIMARY KEY, message_id bigint not null references temp_email_message(entity_id), from_entity_id bigint, received_timestamp timestamp not null ); create table temp_email_address_owner( entity_id BIGSERIAL PRIMARY KEY, message_id bigint not null references temp_email_message(entity_id), recipient_id bigint );
EXPLAIN ANALYSE SELECT em.entity_id FROM temp_email_delivery del JOIN temp_email_message em ON (del.message_id = em.entity_id) WHERE del.from_entity_id = 279519 OR em.entity_id IN ( SELECT ea_owner.message_id FROM temp_email_address_owner ea_owner WHERE ea_owner.recipient_id = 279519 ) ORDER BY del.received_timestamp DESC LIMIT 101 OFFSET 0;
.. But I'm having a hard time reproducing it.
I've tried to copy data from the relevant tables to the test-tables (temp_*), adding indexes etc. but Gathre Merge works just fine:
│ Limit (cost=209378.96..209391.05 rows=101 width=16) (actual time=799.380..799.432 rows=101 loops=1) │
│ -> Gather Merge (cost=209378.96..262335.79 rows=442285 width=16) (actual time=799.379..799.420 rows=101 loops=1) │
│ Workers Planned: 4 │
│ Workers Launched: 4 │
│ -> Sort (cost=208378.90..208655.33 rows=110571 width=16) (actual time=785.029..785.042 rows=81 loops=5) │
│ Sort Key: del.received_timestamp DESC │
│ Sort Method: quicksort Memory: 29kB │
│ -> Hash Join (cost=52036.86..204145.01 rows=110571 width=16) (actual time=400.812..784.907 rows=95 loops=5) │
│ Hash Cond: (del.message_id = em.entity_id) │
│ Join Filter: ((del.from_entity_id = 279519) OR (hashed SubPlan 1)) │
│ Rows Removed by Join Filter: 176799 │
│ -> Parallel Seq Scan on temp_email_delivery del (cost=0.00..142515.18 rows=221118 width=24) (actual time=0.033..211.196 rows=176894 loops=5) │
│ -> Hash (cost=39799.72..39799.72 rows=730772 width=8) (actual time=368.746..368.746 rows=730772 loops=5) │
│ Buckets: 1048576 Batches: 2 Memory Usage: 22496kB │
│ -> Seq Scan on temp_email_message em (cost=0.00..39799.72 rows=730772 width=8) (actual time=0.017..208.116 rows=730772 loops=5) │
│ SubPlan 1 │
│ -> Index Scan using temp_email_address_owner_recipient_id_idx on temp_email_address_owner ea_owner (cost=0.43..247.32 rows=68 width=8) (actual time=0.072..0.759 rows=175 loops=5) │
│ Index Cond: (recipient_id = 279519) │
│ Planning time: 2.134 ms │
│ Execution time: 830.313 ms │
│ -> Gather Merge (cost=209378.96..262335.79 rows=442285 width=16) (actual time=799.379..799.420 rows=101 loops=1) │
│ Workers Planned: 4 │
│ Workers Launched: 4 │
│ -> Sort (cost=208378.90..208655.33 rows=110571 width=16) (actual time=785.029..785.042 rows=81 loops=5) │
│ Sort Key: del.received_timestamp DESC │
│ Sort Method: quicksort Memory: 29kB │
│ -> Hash Join (cost=52036.86..204145.01 rows=110571 width=16) (actual time=400.812..784.907 rows=95 loops=5) │
│ Hash Cond: (del.message_id = em.entity_id) │
│ Join Filter: ((del.from_entity_id = 279519) OR (hashed SubPlan 1)) │
│ Rows Removed by Join Filter: 176799 │
│ -> Parallel Seq Scan on temp_email_delivery del (cost=0.00..142515.18 rows=221118 width=24) (actual time=0.033..211.196 rows=176894 loops=5) │
│ -> Hash (cost=39799.72..39799.72 rows=730772 width=8) (actual time=368.746..368.746 rows=730772 loops=5) │
│ Buckets: 1048576 Batches: 2 Memory Usage: 22496kB │
│ -> Seq Scan on temp_email_message em (cost=0.00..39799.72 rows=730772 width=8) (actual time=0.017..208.116 rows=730772 loops=5) │
│ SubPlan 1 │
│ -> Index Scan using temp_email_address_owner_recipient_id_idx on temp_email_address_owner ea_owner (cost=0.43..247.32 rows=68 width=8) (actual time=0.072..0.759 rows=175 loops=5) │
│ Index Cond: (recipient_id = 279519) │
│ Planning time: 2.134 ms │
│ Execution time: 830.313 ms │
Can it be that the data-set is created with a PG-version from yesterday, before Gather Merge was commited, then I just recompiled PG and re-installed over the old installation without re-initdb'ing? I saw no catversion.h changes so I assumed this was fine.
--
Andreas Joseph Krogh
pgsql-hackers by date: