Re: PATCH: optimized DROP of multiple tables within a transaction - Mailing list pgsql-hackers

From 花田 茂
Subject Re: PATCH: optimized DROP of multiple tables within a transaction
Date
Msg-id C140EC80-DF0D-4963-A65E-5E49E20B2781@gmail.com
Whole thread Raw
In response to Re: PATCH: optimized DROP of multiple tables within a transaction  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-hackers
Hi Tomas,

On 2012/10/17, at 20:45, Tomas Vondra <tv@fuzzy.cz> wrote:
>
> Dne 17.10.2012 12:34, Shigeru HANADA napsal:
>> Performance test
>> ================
>> I tested 1000 tables case (each is copy of pgbench_branches with 100000
>> rows) on 1GB shared_buffers server.  Please note that I tested on
>> MacBook air, i.e. storage is not HDD but SSD.  Here is the test procedure:
>>
>> 1) loop 1000 times
>> 1-1) create copy table of pgbench_accounts as accounts$i
>> 1-2) load 100000 rows
>> 1-3) add primary key
>> 1-4) select all rows to cache pages in shared buffer
>> 2) BEGIN
>> 3) loop 1000 times
>> 3-1) DROP TABLE accounts$i
>> 4) COMMIT
>
> I don't think the 'load rows' and 'select all rows' is really necessary.
> And AFAIK sequential scans use small circular buffer not to pollute shared
> buffers, so I'd guess the pages are not cached in shared buffers anyway.
> Have you verified that, e.g. by pg_buffercache?

Oops, you're right.  I omitted 1-3 and 1-4 in actual measurement, but IMO loading data is necessary to fill the shared
bufferup, because # of buffers which are deleted during COMMIT is major factor of this patch.  And, yes, I verified
thatall shared buffers are used after all loading have been finished. 

>
>>> Our system creates a lot of "working tables" (even 100.000) and we need
>>> to perform garbage collection (dropping obsolete tables) regularly. This
>>> often took ~ 1 hour, because we're using big AWS instances with lots of
>>> RAM (which tends to be slower than RAM on bare hw). After applying this
>>> patch and dropping tables in groups of 100, the gc runs in less than 4
>>> minutes (i.e. a 15x speed-up).
>>
>> Hm, my environment seems very different from yours.  Could you show the
>> setting of shared_buffers in your environment?  I'd like to make my test
>> environment as similar as possible to yours.
>
> We're using m2.4xlarge instances (70G of RAM) with 10GB shared buffers.

Thank you, it's more huge than I expected.  I'm not sure whether my boss allows me to use such rich environment... :(


Here are results of additional measurements on my MBA.

* stats of 1000 bare DROP TABLE statements

90%ile of patched PG is just 2% slower than Master, so it would be acceptable.
        |  Patched   |   Master
---------+------------+------------Average |   1.595 ms |   1.634 msMedian  |   1.791 ms |   1.900 ms90%ile  |   2.517
ms|   2.477 msMax     |  37.526 ms |  24.553 ms 

* Total time to complete 1000 DROP TABLEs and COMMIT
      | Patched |  Master
-------+---------+---------Bare  | 1595 ms | 1634 msIn TX |  672 ms | 1459 ms

Regards,
--
Shigeru HANADA
shigeru.hanada@gmail.com







pgsql-hackers by date:

Previous
From: Joachim Wieland
Date:
Subject: Re: [PATCH] pg_dump: Sort overloaded functions in deterministic order
Next
From: "Kevin Grittner"
Date:
Subject: Re: Bugs in CREATE/DROP INDEX CONCURRENTLY