Re: [GENERAL] How to change order sort of table in HashJoin - Mailing list pgsql-hackers

From Melvin Davidson
Subject Re: [GENERAL] How to change order sort of table in HashJoin
Date
Msg-id CANu8FizPo25iBB0h98LbmwZFV+p1VFznCJ7mkT25Crx9PzLZ7Q@mail.gmail.com
Whole thread Raw
In response to How to change order sort of table in HashJoin  (Man Trieu <man.trieu@gmail.com>)
List pgsql-hackers


On Sat, Nov 19, 2016 at 12:46 AM, Man Trieu <man.trieu@gmail.com> wrote:
Hi Experts,

As in the example below, i think the plan which hash table is created on testtbl2 (the fewer tuples) should be choosen.
Because creating of hash table should faster in testtbl2. But it did not. 

I have tried to change the ordering of table by tuning parameter even if using pg_hint_plan but not success.

Why does planner do not choose the plan which hash table is created on testtbl2 (which can take less time)?
And how to change the order?

# I also confirm planner info by rebuild postgresql but not found related usefull info about hash table  

---
postgres=# create table testtbl1(id integer, c1 text, c2 text, c3 text, primary key (c1,c2,c3));
CREATE TABLE
postgres=# create table testtbl2(id integer, c1 text, c2 text, c3 text, primary key (c1,c2,c3));
CREATE TABLE
postgres=# insert into testtbl1 select generate_series(1,1000000),random()::text,random()::text,random()::text;
INSERT 0 1000000
postgres=# insert into testtbl2 select * from testtbl1 where id%7 = 0;
INSERT 0 142857

postgres=# explain analyze select * from testtbl1 inner join testtbl2 using(c1,c2,c3);
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=38775.00..47171.72 rows=1 width=59) (actual time=1120.824..1506.236 rows=142857 loops=1)
   Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3))
   ->  Seq Scan on testtbl2  (cost=0.00..3039.57 rows=142857 width=56) (actual time=0.008..27.964 rows=142857 loops=1)
   ->  Hash  (cost=21275.00..21275.00 rows=1000000 width=55) (actual time=1120.687..1120.687 rows=1000000 loops=1)
         Buckets: 131072  Batches: 1  Memory Usage: 89713kB
         ->  Seq Scan on testtbl1  (cost=0.00..21275.00 rows=1000000 width=55) (actual time=0.035..458.522 rows=1000000 loops=1)
 Planning time: 0.922 ms
 Execution time: 1521.258 ms
(8 rows)

postgres=# set pg_hint_plan.enable_hint to on;
SET
postgres=# /*+
postgres*# HashJoin(testtbl1 testtbl2)
postgres*# Leading(testtbl1 testtbl2)
postgres*# */
postgres-# explain analyze select * from testtbl1 inner join testtbl2 using(c1,c2,c3);
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=48541.00..67352.86 rows=1 width=59) (actual time=1220.625..1799.709 rows=142857 loops=1)
   Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3))
   ->  Seq Scan on testtbl2  (cost=0.00..3039.57 rows=142857 width=56) (actual time=0.011..58.649 rows=142857 loops=1)
   ->  Hash  (cost=21275.00..21275.00 rows=1000000 width=55) (actual time=1219.295..1219.295 rows=1000000 loops=1)
         Buckets: 8192  Batches: 32  Memory Usage: 2851kB
         ->  Seq Scan on testtbl1  (cost=0.00..21275.00 rows=1000000 width=55) (actual time=0.021..397.583 rows=1000000 loops=1)
 Planning time: 3.971 ms
 Execution time: 1807.710 ms
(8 rows)

postgres=#
---


Thanks and best regard!

AFAIK, the only way to change a sort order is to use the ORDER BY clause in the SELECT.
https://www.postgresql.org/docs/9.4/static/sql-select.html

"8. If the ORDER BY clause is specified, the returned rows are sorted in the specified order. If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce."

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Mail thread references in commits
Next
From: Magnus Hagander
Date:
Subject: Re: Mail thread references in commits