Home > mailing lists

Thread: How to change order sort of table in HashJoin

How to change order sort of table in HashJoin

From

Man Trieu

Date:

19 November 2016, 08:46:52

Hi Experts,

As in the example below, i think the plan which hash table is created on testtbl2 (the fewer tuples) should be choosen.

Because creating of hash table should faster in testtbl2. But it did not.

I have tried to change the ordering of table by tuning parameter even if using pg_hint_plan but not success.

Why does planner do not choose the plan which hash table is created on testtbl2 (which can take less time)?

And how to change the order?

# I also confirm planner info by rebuild postgresql but not found related usefull info about hash table

---

postgres=# create table testtbl1(id integer, c1 text, c2 text, c3 text, primary key (c1,c2,c3));

CREATE TABLE

postgres=# create table testtbl2(id integer, c1 text, c2 text, c3 text, primary key (c1,c2,c3));

CREATE TABLE

postgres=# insert into testtbl1 select generate_series(1,1000000),random()::text,random()::text,random()::text;

INSERT 0 1000000

postgres=# insert into testtbl2 select * from testtbl1 where id%7 = 0;

INSERT 0 142857

postgres=# explain analyze select * from testtbl1 inner join testtbl2 using(c1,c2,c3);

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------

Hash Join (cost=38775.00..47171.72 rows=1 width=59) (actual time=1120.824..1506.236 rows=142857 loops=1)

Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3))

-> Seq Scan on testtbl2 (cost=0.00..3039.57 rows=142857 width=56) (actual time=0.008..27.964 rows=142857 loops=1)

-> Hash (cost=21275.00..21275.00 rows=1000000 width=55) (actual time=1120.687..1120.687 rows=1000000 loops=1)

Buckets: 131072 Batches: 1 Memory Usage: 89713kB

-> Seq Scan on testtbl1 (cost=0.00..21275.00 rows=1000000 width=55) (actual time=0.035..458.522 rows=1000000 loops=1)

Planning time: 0.922 ms

Execution time: 1521.258 ms

(8 rows)

postgres=# set pg_hint_plan.enable_hint to on;

SET

postgres=# /*+

postgres*# HashJoin(testtbl1 testtbl2)

postgres*# Leading(testtbl1 testtbl2)

postgres*# */

postgres-# explain analyze select * from testtbl1 inner join testtbl2 using(c1,c2,c3);

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------

Hash Join (cost=48541.00..67352.86 rows=1 width=59) (actual time=1220.625..1799.709 rows=142857 loops=1)

Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3))

-> Seq Scan on testtbl2 (cost=0.00..3039.57 rows=142857 width=56) (actual time=0.011..58.649 rows=142857 loops=1)

-> Hash (cost=21275.00..21275.00 rows=1000000 width=55) (actual time=1219.295..1219.295 rows=1000000 loops=1)

Buckets: 8192 Batches: 32 Memory Usage: 2851kB

-> Seq Scan on testtbl1 (cost=0.00..21275.00 rows=1000000 width=55) (actual time=0.021..397.583 rows=1000000 loops=1)

Planning time: 3.971 ms

Execution time: 1807.710 ms

(8 rows)

postgres=#

---

Thanks and best regard!

Re: How to change order sort of table in HashJoin

From

Melvin Davidson

Date:

19 November 2016, 17:45:35

On Sat, Nov 19, 2016 at 12:46 AM, Man Trieu <man.trieu@gmail.com> wrote:

Hi Experts,

As in the example below, i think the plan which hash table is created on testtbl2 (the fewer tuples) should be choosen.
Because creating of hash table should faster in testtbl2. But it did not.

I have tried to change the ordering of table by tuning parameter even if using pg_hint_plan but not success.

Why does planner do not choose the plan which hash table is created on testtbl2 (which can take less time)?
And how to change the order?

# I also confirm planner info by rebuild postgresql but not found related usefull info about hash table

---
postgres=# create table testtbl1(id integer, c1 text, c2 text, c3 text, primary key (c1,c2,c3));
CREATE TABLE
postgres=# create table testtbl2(id integer, c1 text, c2 text, c3 text, primary key (c1,c2,c3));
CREATE TABLE
postgres=# insert into testtbl1 select generate_series(1,1000000),random()::text,random()::text,random()::text;
INSERT 0 1000000
postgres=# insert into testtbl2 select * from testtbl1 where id%7 = 0;
INSERT 0 142857

postgres=# explain analyze select * from testtbl1 inner join testtbl2 using(c1,c2,c3);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=38775.00..47171.72 rows=1 width=59) (actual time=1120.824..1506.236 rows=142857 loops=1)
Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3))
-> Seq Scan on testtbl2 (cost=0.00..3039.57 rows=142857 width=56) (actual time=0.008..27.964 rows=142857 loops=1)
-> Hash (cost=21275.00..21275.00 rows=1000000 width=55) (actual time=1120.687..1120.687 rows=1000000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 89713kB
-> Seq Scan on testtbl1 (cost=0.00..21275.00 rows=1000000 width=55) (actual time=0.035..458.522 rows=1000000 loops=1)
Planning time: 0.922 ms
Execution time: 1521.258 ms
(8 rows)

postgres=# set pg_hint_plan.enable_hint to on;
SET
postgres=# /*+
postgres*# HashJoin(testtbl1 testtbl2)
postgres*# Leading(testtbl1 testtbl2)
postgres*# */
postgres-# explain analyze select * from testtbl1 inner join testtbl2 using(c1,c2,c3);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=48541.00..67352.86 rows=1 width=59) (actual time=1220.625..1799.709 rows=142857 loops=1)
Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3))
-> Seq Scan on testtbl2 (cost=0.00..3039.57 rows=142857 width=56) (actual time=0.011..58.649 rows=142857 loops=1)
-> Hash (cost=21275.00..21275.00 rows=1000000 width=55) (actual time=1219.295..1219.295 rows=1000000 loops=1)
Buckets: 8192 Batches: 32 Memory Usage: 2851kB
-> Seq Scan on testtbl1 (cost=0.00..21275.00 rows=1000000 width=55) (actual time=0.021..397.583 rows=1000000 loops=1)
Planning time: 3.971 ms
Execution time: 1807.710 ms
(8 rows)

postgres=#
---

Thanks and best regard!

AFAIK, the only way to change a sort order is to use the ORDER BY clause in the SELECT.
https://www.postgresql.org/docs/9.4/static/sql-select.html

"8. If the ORDER BY clause is specified, the returned rows are sorted in the specified order. If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce."

Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Re: [HACKERS] How to change order sort of table in HashJoin

From

Tom Lane

Date:

19 November 2016, 19:18:43

Man Trieu <man.trieu@gmail.com> writes:
> As in the example below, i think the plan which hash table is created on
> testtbl2 (the fewer tuples) should be choosen.

The planner usually prefers to hash on the table that has a flatter
MCV histogram, since a hash table with many key collisions will be
inefficient.  You might find it illuminating to read the comments around
estimate_hash_bucketsize().

In general, given a hashtable that fits in memory and light bucket
loading, a hash join is more or less O(M) + O(N); it doesn't matter
so much whether the larger table is on the inside.  It does matter if
the table gets big enough to force batching of the join, but that's
not happening in your example (at least not the first one; it's unclear
to me why it did happen in the second one).  The key thing that will
drive the choice, then, is avoiding a skewed bucket distribution that
causes lots of comparisons for common values.

            regards, tom lane

Re: [HACKERS] How to change order sort of table in HashJoin

From

Man

Date:

20 November 2016, 13:21:46

Thanks for response, sir.

On 11/20/2016 1:18 AM, Tom Lane wrote:
> Man Trieu <man.trieu@gmail.com> writes:
>> As in the example below, i think the plan which hash table is created on
>> testtbl2 (the fewer tuples) should be choosen.
> The planner usually prefers to hash on the table that has a flatter
> MCV histogram, since a hash table with many key collisions will be
> inefficient.  You might find it illuminating to read the comments around
> estimate_hash_bucketsize().

Thanks, I will read it.

Additional information.
In 9.6 the second table (lesser tuple) was choosen (the same testdata).
There are something (cost estimation?) different  in previous versions.

--- In 9.6.1 ---
postgres=# explain analyze select * from testtbl1 inner join testtbl2
using(c1,c2,c3);
                                                          QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=6935.57..60389.58 rows=1 width=60) (actual
time=80.214..1165.762 rows=142857 loops=1)
    Hash Cond: ((testtbl1.c1 = testtbl2.c1) AND (testtbl1.c2 =
testtbl2.c2) AND (testtbl1.c3 = testtbl2.c3))
    ->  Seq Scan on testtbl1  (cost=0.00..21276.00 rows=1000000
width=56) (actual time=0.038..226.324 rows=1000000 loops=1)
    ->  Hash  (cost=3039.57..3039.57 rows=142857 width=56) (actual
time=79.632..79.632 rows=142857 loops=1)
          Buckets: 65536  Batches: 4  Memory Usage: 3658kB
          ->  Seq Scan on testtbl2  (cost=0.00..3039.57 rows=142857
width=56) (actual time=0.028..20.646 rows=142857 loops=1)
  Planning time: 0.252 ms
  Execution time: 1174.588 ms
(8 rows)
------

--- In 9.4.10 ---
postgres=# explain analyze select * from testtbl1 inner join testtbl2
using(c1,c2,c3);
                                                            QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=48542.00..67353.86 rows=1 width=60) (actual
time=880.580..1277.611 rows=142857 loops=1)
    Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 =
testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3))
    ->  Seq Scan on testtbl2  (cost=0.00..3039.57 rows=142857 width=56)
(actual time=0.016..24.421 rows=142857 loops=1)
    ->  Hash  (cost=21276.00..21276.00 rows=1000000 width=56) (actual
time=878.296..878.296 rows=1000000 loops=1)
          Buckets: 8192  Batches: 32  Memory Usage: 2839kB
          ->  Seq Scan on testtbl1  (cost=0.00..21276.00 rows=1000000
width=56) (actual time=0.025..258.193 rows=1000000 loops=1)
  Planning time: 2.683 ms
  Execution time: 1285.868 ms
(8 rows)
------

> In general, given a hashtable that fits in memory and light bucket
> loading, a hash join is more or less O(M) + O(N); it doesn't matter
> so much whether the larger table is on the inside.  It does matter if
> the table gets big enough to force batching of the join, but that's
> not happening in your example (at least not the first one; it's unclear
> to me why it did happen in the second one).  The key thing that will
> drive the choice, then, is avoiding a skewed bucket distribution that
> causes lots of comparisons for common values.
>
>             regards, tom lane

Thanks and best regards,

Re: [HACKERS] How to change order sort of table in HashJoin

From

Tom Lane

Date:

20 November 2016, 19:39:48

Man <man.trieu@gmail.com> writes:
> Additional information.
> In 9.6 the second table (lesser tuple) was choosen (the same testdata).
> There are something (cost estimation?) different  in previous versions.

I'd bet on different statistics in the two installations (either you
forgot to ANALYZE, or the random sample came up quite a bit different).
And I'm a little suspicious that these tests weren't all done with the
same work_mem setting.

            regards, tom lane

Re: [HACKERS] How to change order sort of table in HashJoin

From

Man

Date:

21 November 2016, 16:12:25

Thanks for reply, sir.

On 11/21/2016 1:39 AM, Tom Lane wrote:
> Man <man.trieu@gmail.com> writes:
>> Additional information.
>> In 9.6 the second table (lesser tuple) was choosen (the same testdata).
>> There are something (cost estimation?) different  in previous versions.
> I'd bet on different statistics in the two installations (either you
> forgot to ANALYZE, or the random sample came up quite a bit different).
> And I'm a little suspicious that these tests weren't all done with the
> same work_mem setting.

I dumped the two tables in pg9.4 and restored to pg9.6, sir.
I also set default_statistics_target to 1000 and ANALYZE d two tables in
both installations.
And so that were result.

Anyway i know that order can not change by tuning parameters because it
depend on storing data, thanks.

>             regards, tom lane

Thanks and best regards,