Re: simple query on why a merge join plan got selected - Mailing list pgsql-general
From | Ron |
---|---|
Subject | Re: simple query on why a merge join plan got selected |
Date | |
Msg-id | d0f8fcfd-70f8-089f-6bb3-89daedda8a5b@gmail.com Whole thread Raw |
In response to | simple query on why a merge join plan got selected (Vijaykumar Jain <vjain@opentable.com>) |
List | pgsql-general |
I'd run each query multiple times -- before and after disabling mergejoin -- to ensure that all the data is safely in RAM. On 12/15/2018 02:13 PM, Vijaykumar Jain wrote: > Hey Guys, > > I was just playing with exploring joins and plans i came across this > > create table t1(a int); > create table t2(a int); > insert into t1 select (x % 10) from generate_series(1, 100000) x; > insert into t2 select (x % 100) from generate_series(1, 100000) x; > > pgtesting=> analyze t1; > ANALYZE > > pgtesting=> analyze t2; > ANALYZE > > > this is reproducible > > the below query by default makes use of merge join (which takes way > longer to return rows as compared to when i explicitly disable merge > join it returns in half the time taken by merge join) but i am not > able to figure out why, although i have run analyze on the tables. > > > pgtesting=> explain (analyze, buffers) select * from t1 join t2 using (a); > > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------- > > Merge Join (cost=19495.64..1039705.09 rows=97241600 width=4) (actual > time=124.153..22243.262 rows=100000000 loops=1) > > Merge Cond: (t1.a = t2.a) > > Buffers: shared hit=886, temp read=320384 written=616 > > -> Sort (cost=9747.82..9997.82 rows=100000 width=4) (actual > time=56.442..81.611 rows=100000 loops=1) > > Sort Key: t1.a > > Sort Method: external merge Disk: 1376kB > > Buffers: shared hit=443, temp read=172 written=173 > > -> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=4) > (actual time=0.030..10.003 rows=100000 loops=1) > > Buffers: shared hit=443 > > -> Sort (cost=9747.82..9997.82 rows=100000 width=4) (actual > time=67.702..9469.366 rows=100000001 loops=1) > > Sort Key: t2.a > > Sort Method: external sort Disk: 1768kB > > Buffers: shared hit=443, temp read=220222 written=443 > > -> Seq Scan on t2 (cost=0.00..1443.00 rows=100000 width=4) > (actual time=0.013..8.186 rows=100000 loops=1) > > Buffers: shared hit=443 > > Planning time: 0.402 ms > > Execution time: 26093.192 ms > > (17 rows) > > > pgtesting=> set enable_mergejoin TO FALSE; > > SET > > pgtesting=> explain (analyze, buffers) select * from t1 join t2 using (a); > > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------- > > Hash Join (cost=3084.00..1117491.00 rows=97241600 width=4) (actual > time=26.893..10229.924 rows=100000000 loops=1) > > Hash Cond: (t1.a = t2.a) > > Buffers: shared hit=889, temp read=273 written=271 > > -> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=4) > (actual time=0.028..18.123 rows=100000 loops=1) > > Buffers: shared hit=443 > > -> Hash (cost=1443.00..1443.00 rows=100000 width=4) (actual > time=26.255..26.255 rows=100000 loops=1) > > Buckets: 131072 Batches: 2 Memory Usage: 2713kB > > Buffers: shared hit=443, temp written=152 > > -> Seq Scan on t2 (cost=0.00..1443.00 rows=100000 width=4) > (actual time=0.017..9.163 rows=100000 loops=1) > > Buffers: shared hit=443 > > Planning time: 0.099 ms > > Execution time: 14095.975 ms > > (12 rows) > > > pgtesting=> show work_mem; > > work_mem > > ---------- > > 4MB > > (1 row) > > > pgtesting=> show shared_buffers; > > shared_buffers > > ---------------- > > 1GB > > (1 row) > > > pgtesting=> select version(); > > > version > > --------------------------------------------------------------------------------------------------------------------------------------------- > > PostgreSQL 10.5 (Ubuntu 10.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu, > compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, > 64-bit > > (1 row) > > > May be i am missing something way obvious :) but my only concern being > high cardinality joins may not use the statistics correctly? > > Regards, > Vijay > -- Angular momentum makes the world go 'round.
pgsql-general by date: