RE: Recovery performance of DROP DATABASE with many tablespaces - Mailing list pgsql-hackers
From | k.jamison@fujitsu.com |
---|---|
Subject | RE: Recovery performance of DROP DATABASE with many tablespaces |
Date | |
Msg-id | OSBPR01MB32077D0AF3728A1F92469C6BEF4C0@OSBPR01MB3207.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Recovery performance of DROP DATABASE with many tablespaces (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: Recovery performance of DROP DATABASE with many tablespaces
|
List | pgsql-hackers |
On Wed, Nov 13, 2019 5:34PM (GMT+9), Fujii Masao wrote: > On Wed, Nov 13, 2019 at 3:57 PM k.jamison@fujitsu.com <k.jamison@fujitsu.com> > wrote: > > > > On Wed, Oct. 2, 2019 5:40 PM, Fujii Masao wrote: > > > On Tue, Jul 10, 2018 at 3:04 PM Michael Paquier <michael@paquier.xyz> > wrote: > > > > > > > > On Thu, Jul 05, 2018 at 01:42:20AM +0900, Fujii Masao wrote: > > > > > TBH, I have no numbers measured by the test. > > > > > One question about your test is; how did you measure the > > > > > *recovery > > > > > time* of DROP DATABASE? Since it's *recovery* performance, > > > > > basically it's not easy to measure that. > > > > > > > > It would be simple to measure the time it takes to replay this > > > > single DROP DATABASE record by putting two gettimeofday() calls or > > > > such things and then take the time difference. There are many > > > > methods that you could use here, and I suppose that with a shared > > > > buffer setting of a couple of GBs of shared buffers you would see > > > > a measurable difference with a dozen of tablespaces or so. You > > > > could also take a base backup after creating all the tablespaces, > > > > connect the standby and then drop the database on the primary to > > > > see the actual time it takes. Your patch looks logically correct > > > > to me because DropDatabaseBuffers is a > > > > *bottleneck* with large shared_buffers, and it would be nice to > > > > see numbers. > > > > > > Thanks for the comment! > > > > > > I measured how long it takes to replay DROP DATABASE with 1000 > > > tablespaces, in master and patched version. shared_buffers was set to > 16GB. > > > > > > [master] > > > It took 8 seconds to replay DROP DATABASE with 1000 tablespaces, as follows. > > > In this case, 16GB shared_buffers was fully scanned 1000 times. > > > > > > 2019-10-02 16:50:14 JST LOG: redo starts at 0/2000028 > > > 2019-10-02 16:50:22 JST LOG: redo done at 0/300A298 > > > > > > [patched] > > > It took less than 1 second to replay DROP DATABASE with 1000 > > > tablespaces, as follows. In this case, 16GB shared_buffers was scanned > only one time. > > > > > > 2019-10-02 16:47:03 JST LOG: redo starts at 0/2000028 > > > 2019-10-02 16:47:03 JST LOG: redo done at 0/3001588 > > > > > > > Hi Fujii-san, > > > > It's been a while, so I checked the patch once again. > > It's fairly straightforward and I saw no problems nor bug in the code. > > Thanks for the review! > > > > [patched] > > > It took less than 1 second to replay DROP DATABASE with 1000 > > > tablespaces, > > The results are good. > > I want to replicate the performance to confirm the results as well. > > Could you share how you measured the recovery replay? > > I forgot the actual steps that I used for the measurement. > But I think they are something like > > 1. create database "hoge" > 2. create 1,000 tablespaces > 3. create 1,000 tables on the database "hoge". > each table should be placed in different tablespace. > 4. take a base backup > 5. drop database "hoge" > 6. shutdown the server with immediate mode 7. start an archive recovery from > the backup taken at #4 8. measure how long it takes to apply DROP DATABASE > record by > checking the timestamp at REDO start and REDO end. > > I think that I performed the above steps on the master and the patched version. > > > Did you actually execute a failover? > > No. I'm sorry for the late reply, and thanks for the guide above. I replicated the same recovery test above on a standalone server and have confirmed with the logs that the patch made the recovery faster. [MASTER/UNPATCHED] ~10 seconds 2019-11-19 15:25:23.891 JST [23042] LOG: redo starts at 0/180006A0 ... 2019-11-19 15:25:34.492 JST [23042] LOG: redo done at 0/1800A478 [PATCHED] ~less than 1 sec 2019-11-19 15:31:59.415 JST [17625] LOG: redo starts at 0/40005B8 ... 2019-11-19 15:32:00.159 JST [17625] CONTEXT: WAL redo at 0/4000668 for Database/DROP: dir 1663/16384 16385/16384...//furtherdetails ommitted//... ... 2019-11-19 15:32:00.159 JST [17625] LOG: redo done at 0/4001638 I believe there are no problems, so I am marking this patch now as "Ready for Committer". Regards, Kirk Jamison
pgsql-hackers by date: