Re: Recovery performance of DROP DATABASE with many tablespaces - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Recovery performance of DROP DATABASE with many tablespaces
Date
Msg-id CAHGQGwGgHCieoM_Vk6J++OLf8Cy+L7FrQ_VOkAuoX=JkH=-Ztg@mail.gmail.com
Whole thread Raw
In response to RE: Recovery performance of DROP DATABASE with many tablespaces  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
Responses RE: Recovery performance of DROP DATABASE with many tablespaces  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
List pgsql-hackers
On Wed, Nov 13, 2019 at 3:57 PM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:
>
> On Wed, Oct. 2, 2019 5:40 PM, Fujii Masao wrote:
> > On Tue, Jul 10, 2018 at 3:04 PM Michael Paquier <michael@paquier.xyz> wrote:
> > >
> > > On Thu, Jul 05, 2018 at 01:42:20AM +0900, Fujii Masao wrote:
> > > > TBH, I have no numbers measured by the test.
> > > > One question about your test is; how did you measure the *recovery
> > > > time* of DROP DATABASE? Since it's *recovery* performance, basically
> > > > it's not easy to measure that.
> > >
> > > It would be simple to measure the time it takes to replay this single
> > > DROP DATABASE record by putting two gettimeofday() calls or such
> > > things and then take the time difference.  There are many methods that
> > > you could use here, and I suppose that with a shared buffer setting of
> > > a couple of GBs of shared buffers you would see a measurable
> > > difference with a dozen of tablespaces or so.  You could also take a
> > > base backup after creating all the tablespaces, connect the standby
> > > and then drop the database on the primary to see the actual time it
> > > takes.  Your patch looks logically correct to me because
> > > DropDatabaseBuffers is a
> > > *bottleneck* with large shared_buffers, and it would be nice to see
> > > numbers.
> >
> > Thanks for the comment!
> >
> > I measured how long it takes to replay DROP DATABASE with 1000 tablespaces,
> > in master and patched version. shared_buffers was set to 16GB.
> >
> > [master]
> > It took 8 seconds to replay DROP DATABASE with 1000 tablespaces, as follows.
> > In this case, 16GB shared_buffers was fully scanned 1000 times.
> >
> >     2019-10-02 16:50:14 JST LOG:  redo starts at 0/2000028
> >     2019-10-02 16:50:22 JST LOG:  redo done at 0/300A298
> >
> > [patched]
> > It took less than 1 second to replay DROP DATABASE with 1000 tablespaces,
> > as follows. In this case, 16GB shared_buffers was scanned only one time.
> >
> >     2019-10-02 16:47:03 JST LOG:  redo starts at 0/2000028
> >     2019-10-02 16:47:03 JST LOG:  redo done at 0/3001588
> >
>
> Hi Fujii-san,
>
> It's been a while, so I checked the patch once again.
> It's fairly straightforward and I saw no problems nor bug in the code.

Thanks for the review!

> > [patched]
> > It took less than 1 second to replay DROP DATABASE with 1000 tablespaces,
> The results are good.
> I want to replicate the performance to confirm the results as well.
> Could you share how you measured the recovery replay?

I forgot the actual steps that I used for the measurement.
But I think they are something like

1. create database "hoge"
2. create 1,000 tablespaces
3. create 1,000 tables on the database "hoge".
    each table should be placed in different tablespace.
4. take a base backup
5. drop database "hoge"
6. shutdown the server with immediate mode
7. start an archive recovery from the backup taken at #4
8. measure how long it takes to apply DROP DATABASE record by
    checking the timestamp at REDO start and REDO end.

I think that I performed the above steps on the master and
the patched version.

> Did you actually execute a failover?

No.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Coding in WalSndWaitForWal
Next
From: Fujii Masao
Date:
Subject: Re: pg_waldump and PREPARE