RE: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From k.jamison@fujitsu.com
Subject RE: [Patch] Optimize dropping of relation buffers using dlist
Date
Msg-id OSBPR01MB2341672E9A95E5EC6D2E79B5EF020@OSBPR01MB2341.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: [Patch] Optimize dropping of relation buffers using dlist  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: [Patch] Optimize dropping of relation buffers using dlist
RE: [Patch] Optimize dropping of relation buffers using dlist
List pgsql-hackers
On Tuesday, October 13, 2020 10:09 AM, Tsunakawa-san wrote:
> Why do you do this way?  I think the previous patch was more correct (while
> agreeing with Horiguchi-san in that nTotalBlocks may be unnecessary.  What
> you want to do is "if the size of any fork could be inaccurate, do the traditional
> full buffer scan without performing any optimization for any fork," right?  But
> the above code performs optimization for forks until it finds a fork with
> inaccurate size.
>
> (2)
> +     * Get the total number of cached blocks and to-be-invalidated
> blocks
> +     * of the relation.  The cached value returned by smgrnblocks could
> be
> +     * smaller than the actual number of existing buffers of the file.
>
> As you changed the meaning of the smgrnblocks() argument from cached to
> accurate, and you nolonger calculate the total blocks, the comment should
> reflect them.
>
>
> (3)
> In smgrnblocks(), accurate is not set to false when mdnblocks() is called.
> The caller doesn't initialize the value either, so it can see garbage value.
>
>
> (4)
> +        if (nForkBlocks[i] != InvalidBlockNumber &&
> +            nBlocksToInvalidate <
> BUF_DROP_FULL_SCAN_THRESHOLD)
> +        {
> ...
> +        }
> +        else
> +            break;
> +    }
>
> In cases like this, it's better to reverse the if and else.  Thus, you can reduce
> the nest depth.

Thank you for the review!
1. I have revised the patch addressing your comments/feedback. Attached are the latest set of patches.

2. Non-recovery Performance
I also included a debug version of the patch (0004) where I removed the recovery-related checks
to measure non-recovery performance.
However, I still can't seem to find the cause of why the non-recovery performance
does not change when compared to master. (1 min 15 s for the given test case below)

> -       if (InRecovery && reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
> +       if (reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)

Here's how I measured it:
0. postgresql.conf setting
shared_buffers = 100GB
autovacuum = off
full_page_writes = off
checkpoint_timeout = 30min
max_locks_per_transaction = 100
wal_log_hints = on
wal_keep_size = 100
max_wal_size = 20GB

1. createdb test

2. Create tables: SELECT create_tables(1000);

create or replace function create_tables(numtabs int)
returns void as $$
declare query_string text;
begin
  for i in 1..numtabs loop
    query_string := 'create table tab_' || i::text || ' (a int);';
    execute query_string;
  end loop;
end;
$$ language plpgsql;

3 Insert rows to tables (3.5 GB db): SELECT insert_tables(1000);

create or replace function insert_tables(numtabs int)
returns void as $$
declare query_string text;
begin
  for i in 1..numtabs loop
    query_string := 'insert into tab_' || i::text || ' SELECT generate_series(1, 100000);' ;
    execute query_string;
  end loop;
end;
$$ language plpgsql;

4. DELETE FROM tables: SELECT delfrom_tables(1000);

create or replace function delfrom_tables(numtabs int)
returns void as $$
declare query_string text;
begin
  for i in 1..numtabs loop
    query_string := 'delete from tab_' || i::text;
    execute query_string;
  end loop;
end;
$$ language plpgsql;

5. Measure VACUUM timing
\timing
VACUUM;

Using the debug version of the patch, I have confirmed that it enters the optimization path
when it meets the conditions. Here are some printed logs from 018_wal_optimize_node_replica.log:
> make world -j4 -s && make -C src/test/recovery/ check PROVE_TESTS=t/018_wal_optimize.pl

WARNING:  current fork 0, nForkBlocks[i] 1, accurate: 1
CONTEXT:  WAL redo at 0/162B4E0 for Storage/TRUNCATE: base/13751/24577 to 0 blocks flags 7
WARNING:  Optimization Loop.
buf_id = 41. nforks = 1. current fork = 0. forkNum: 0 == tag's forkNum: 0. curBlock: 0  <  nForkBlocks[i] = 1. tag
blockNum:0  >=  firstDelBlock[i]: 0. nBlocksToInvalidate = 1 < threshold = 32. 

--
3. Recovery Performance (hot standby, failover)
OTOH, when executing recovery performance (using 0003 patch), the results were great.

| s_b   | master | patched | %reg   |
|-------|--------|---------|--------|
| 128MB | 3.043  | 2.977   | -2.22% |
| 1GB   | 3.417  | 3.41    | -0.21% |
| 20GB  | 20.597 | 2.409   | -755%  |
| 100GB | 66.862 | 2.409   | -2676% |

To execute this on a hot standby setup (after inserting rows to tables)
1. [Standby] Pause WAL replay
   SELECT pg_wal_replay_pause();

2. [Master] Measure VACUUM timing. Then stop server.
\timing
VACUUM;
\q
pg_ctl stop -mi -w

3. [Standby] Use the attached script to promote standby and measure the performance.
# test.sh recovery


So the current issue I'm still investigating is why the performance for non-recovery is bad,
while OTOH it's good when InRecovery.

Regards,
Kirk Jamison


Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers
Next
From: Andy Fan
Date:
Subject: Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY