Re: [HACKERS] Gather Merge - Mailing list pgsql-hackers

From Andreas Joseph Krogh
Subject Re: [HACKERS] Gather Merge
Date
Msg-id VisenaEmail.2f.17bd9d65d44ba7b9.15ab7c23b58@tc7-visena
Whole thread Raw
In response to Re: [HACKERS] Gather Merge  (Rushabh Lathia <rushabh.lathia@gmail.com>)
List pgsql-hackers
På fredag 10. mars 2017 kl. 10:34:48, skrev Rushabh Lathia <rushabh.lathia@gmail.com>:
 
 
On Fri, Mar 10, 2017 at 2:42 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På fredag 10. mars 2017 kl. 10:09:22, skrev Rushabh Lathia <rushabh.lathia@gmail.com>:
 
 
On Fri, Mar 10, 2017 at 2:33 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
[...]
The execution-plan seems (unsurprisingly) to depend on data-distribution, so is there a way I can force a GatherMerge?
 
Not directly. GatherMerge cost is mainly depend on parallel_setup_cost,
parallel_tuple_cost and cpu_operator_cost. May be you can force this
by setting this cost low enough. Or another way to force is by disable the
other plans.
 
What plan you are getting now? You not seeing the below error ?
 
ERROR:  GatherMerge child's targetlist doesn't match GatherMerge
 
I'm seeing the same error, it's just that for reproducing it I'd rather not copy my whole dataset.
 
Can you share me a schema information, I will try to reproduce at my side?
 
The relevant schema is this:
 
drop table if EXISTS temp_email_address_owner;
drop table if EXISTS temp_email_delivery;
drop table if EXISTS temp_email_message;

create table temp_email_message(   entity_id BIGSERIAL PRIMARY KEY
);

create table temp_email_delivery(   entity_id BIGSERIAL PRIMARY KEY,   message_id bigint not null references temp_email_message(entity_id),   from_entity_id bigint,   received_timestamp timestamp not null
);

create table temp_email_address_owner(   entity_id BIGSERIAL PRIMARY KEY,   message_id bigint not null references temp_email_message(entity_id),   recipient_id bigint
);

EXPLAIN ANALYSE SELECT
                    em.entity_id               FROM temp_email_delivery del                   JOIN temp_email_message em ON (del.message_id = em.entity_id)               WHERE del.from_entity_id = 279519
                      OR
                      em.entity_id IN (                         SELECT
                              ea_owner.message_id                         FROM temp_email_address_owner ea_owner                         WHERE ea_owner.recipient_id = 279519
                      )               ORDER BY del.received_timestamp DESC
                LIMIT 101 OFFSET 0;
 
.. But I'm having a hard time reproducing it.
I've tried to copy data from the relevant tables to the test-tables (temp_*), adding indexes etc. but Gathre Merge works just fine:
 
│ Limit  (cost=209378.96..209391.05 rows=101 width=16) (actual time=799.380..799.432 rows=101 loops=1)                                                                                                       │
│   ->  Gather Merge  (cost=209378.96..262335.79 rows=442285 width=16) (actual time=799.379..799.420 rows=101 loops=1)                                                                                       │
│         Workers Planned: 4                                                                                                                                                                                 │
│         Workers Launched: 4                                                                                                                                                                                │
│         ->  Sort  (cost=208378.90..208655.33 rows=110571 width=16) (actual time=785.029..785.042 rows=81 loops=5)                                                                                          │
│               Sort Key: del.received_timestamp DESC                                                                                                                                                        │
│               Sort Method: quicksort  Memory: 29kB                                                                                                                                                         │
│               ->  Hash Join  (cost=52036.86..204145.01 rows=110571 width=16) (actual time=400.812..784.907 rows=95 loops=5)                                                                                │
│                     Hash Cond: (del.message_id = em.entity_id)                                                                                                                                             │
│                     Join Filter: ((del.from_entity_id = 279519) OR (hashed SubPlan 1))                                                                                                                     │
│                     Rows Removed by Join Filter: 176799                                                                                                                                                    │
│                     ->  Parallel Seq Scan on temp_email_delivery del  (cost=0.00..142515.18 rows=221118 width=24) (actual time=0.033..211.196 rows=176894 loops=5)                                         │
│                     ->  Hash  (cost=39799.72..39799.72 rows=730772 width=8) (actual time=368.746..368.746 rows=730772 loops=5)                                                                             │
│                           Buckets: 1048576  Batches: 2  Memory Usage: 22496kB                                                                                                                              │
│                           ->  Seq Scan on temp_email_message em  (cost=0.00..39799.72 rows=730772 width=8) (actual time=0.017..208.116 rows=730772 loops=5)                                                │
│                     SubPlan 1                                                                                                                                                                              │
│                       ->  Index Scan using temp_email_address_owner_recipient_id_idx on temp_email_address_owner ea_owner  (cost=0.43..247.32 rows=68 width=8) (actual time=0.072..0.759 rows=175 loops=5) │
│                             Index Cond: (recipient_id = 279519)                                                                                                                                            │
│ Planning time: 2.134 ms                                                                                                                                                                                    │
│ Execution time: 830.313 ms                                                                                                                                                                                 │


 
Can it be that the data-set is created with a PG-version from yesterday, before Gather Merge was commited, then I just recompiled PG and re-installed over the old installation without re-initdb'ing? I saw no catversion.h changes so I assumed this was fine.
 
--
Andreas Joseph Krogh

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: [HACKERS] Page Scan Mode in Hash Index
Next
From: "Tels"
Date:
Subject: Re: [HACKERS] Upgrading postmaster's log messages about bind/listenerrors