Re: Delay Memoize hashtable build until executor run - Mailing list pgsql-hackers

From David Rowley
Subject Re: Delay Memoize hashtable build until executor run
Date
Msg-id CAApHDvp+5gdgc4Znjc3AT2J_Q6py1MjPQ=kPjMH1KaL5PVnZUQ@mail.gmail.com
Whole thread Raw
In response to Delay Memoize hashtable build until executor run  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Delay Memoize hashtable build until executor run
List pgsql-hackers
On Fri, 26 Jan 2024 at 19:54, David Rowley <dgrowleyml@gmail.com> wrote:
> Currently, nodeMemoize.c builds the hashtable for the cache during
> executor startup.  This is not what is done in hash joins. I think we
> should make the two behave the same way.

I ran a few benchmarks on this, mostly for archive purposes.

-- Test 1:  Demonstrate there is a problem

drop table if exists t,r;
create table t (a int);
create table r (a int primary key);
insert into t select x%5000000 from generate_Series(1,20000000)x;
insert into r select x from generate_Series(0,4999999)x;
vacuum analyze t,r;
set work_mem='1GB';
set enable_hashjoin=0;
set enable_mergejoin=0;
set max_parallel_workers_per_gather=0;
\timing on

explain (summary on) select count(*) from t inner join r on t.a=r.a;

set enable_memoize=1;

-- I'm including Planning Time just to show that the extra time is not
spent in planning
Planning Time: 0.094 ms -> Time: 53.061 ms
Planning Time: 0.093 ms -> Time: 53.064 ms
Planning Time: 0.095 ms -> Time: 69.682 ms

set enable_memoize=0;

Planning Time: 0.113 ms -> Time: 0.438 ms
Planning Time: 0.111 ms -> Time: 0.436 ms
Planning Time: 0.113 ms -> Time: 0.445 ms

Conclusion: There's a problem

-- Patched with memoize on
Planning Time: 0.116 ms -> Time: 0.472 ms
Planning Time: 0.118 ms -> Time: 0.444 ms
Planning Time: 0.117 ms -> Time: 0.443 ms

Conclusion: The patch fixes the problem

-- Test 2:  Make sure we're not slowing things down by checking the
table exists each tuple

drop table if exists t,r;

create table t (a int);
create table r (a int primary key);

insert into t select 1 from generate_series(1,1000000);
insert into r select x from generate_series(1,1000000)x;
vacuum analyze t,r;

set enable_hashjoin=0;
set enable_mergejoin=0;
set enable_memoize=1;
set max_parallel_workers_per_gather=0;

-- only 1 cache miss so that we hammer the cache hit code as hard as we can
-- with the smallest hash table possible so lookups are very fast.
explain (analyze, timing off) select count(*) from t inner join r on t.a=r.a;

-- Master
Execution Time: 206.403 ms
Execution Time: 211.472 ms
Execution Time: 204.688 ms

-- Patched
Execution Time: 205.967 ms
Execution Time: 206.406 ms
Execution Time: 205.061 ms

Conclusion: No slowdown.

I'll push this change to master only as there don't seem to have been
any complaints.  We can reconsider that if someone complains.

David



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: When extended query protocol ends?
Next
From: Tom Lane
Date:
Subject: Re: Make documentation builds reproducible