Re: PATCH: optimized DROP of multiple tables within a transaction - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PATCH: optimized DROP of multiple tables within a transaction
Date
Msg-id 509FFE4E.1070002@fuzzy.cz
Whole thread Raw
In response to Re: PATCH: optimized DROP of multiple tables within a transaction  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: PATCH: optimized DROP of multiple tables within a transaction
List pgsql-hackers
Hi,

I've prepared a slightly updated patch, based on the previous review.
See it attached.

On 18.10.2012 04:28, 花田 茂 wrote:> Hi Tomas,
>
> On 2012/10/17, at 20:45, Tomas Vondra <tv@fuzzy.cz> wrote:
>>
>> Dne 17.10.2012 12:34, Shigeru HANADA napsal:
>>> Performance test
>>> ================
>>> I tested 1000 tables case (each is copy of pgbench_branches with
>>> 100000 rows) on 1GB shared_buffers server. Please note that I
>>> tested on MacBook air, i.e. storage is not HDD but SSD. Here is
>>> the test procedure:
>>>
>>> 1) loop 1000 times
>>> 1-1) create copy table of pgbench_accounts as accounts$i
>>> 1-2) load 100000 rows
>>> 1-3) add primary key
>>> 1-4) select all rows to cache pages in shared buffer
>>> 2) BEGIN
>>> 3) loop 1000 times
>>> 3-1) DROP TABLE accounts$i
>>> 4) COMMIT
>>
>> I don't think the 'load rows' and 'select all rows' is really
>> necessary. And AFAIK sequential scans use small circular buffer
>> not to pollute sharedbuffers, so I'd guess the pages are not cached
>> in shared buffers anyway. Have you verified that, e.g. by
>> pg_buffercache?
>
> Oops, you're right. I omitted 1-3 and 1-4 in actual measurement, but
> IMO loading data is necessary to fill the shared buffer up, because
> # of buffers which are deleted during COMMIT is major factor of this
> patch. And, yes, I verified that all shared buffers are used after
> all loading have been finished.

I don't think it's an important factor, as the original code does this:

  for (i = 0; i < NBuffers; i++)
  {
    volatile BufferDesc *bufHdr = &BufferDescriptors[i];
    ...
  }

in the DropRelFileNodeAllBuffers. That loops through all shared buffers
no matter what, so IMHO the performance in this case depends on the
total size of the shared buffers. But maybe I'm missing something important.

>>>> Our system creates a lot of "working tables" (even 100.000)
>>>> and we need to perform garbage collection (dropping obsolete
>>>> tables) regularly. Thisoften took ~ 1 hour, because we're using
>>>> big AWS instances with lots of RAM (which tends to be slower
>>>> than RAM on bare hw). After applying this patch and dropping
>>>> tables in groups of 100, the gc runs in less than 4 minutes
>>>> (i.e. a 15x speed-up).
>>>
>>> Hm, my environment seems very different from yours. Could you
>>> show the setting of shared_buffers in your environment? I'd like
>>> to make my test environment as similar as possible to yours.
>>
>> We're using m2.4xlarge instances (70G of RAM) with 10GB shared
>> buffers.
>
> Thank you, it's more huge than I expected. I'm not sure whether my
> boss allows me to use such rich environment... :(

I've done a simple benchmark on my laptop with 2GB shared buffers, it's
attached in the drop-test.py (it's a bit messy, but it works).

It does four things:

1) creates N tables
2) drops them one by one
3) creates N tables
4) drops them in batches (batch = one transaction)

To run the script, simply specify number of tables you want to work with
(say 10000), size of the batch (e.g. 100) and connection string (e.g.
'host=localhost dbname=mydb').

With those parameters, I got these numbers on the laptop:

  creating 10000 tables
    all tables created in 3.298694 seconds
  dropping 10000 tables one by one ...
    all tables dropped in 32.692478 seconds
  creating 10000 tables
    all tables created in 3.458178 seconds
  dropping 10000 tables in batches by 100...
    all tables dropped in 3.28268 seconds

So it's 33 seconds vs. 3.3 seconds, i.e. 10x speedup. On AWS we usually
get larger differences, as we use larger shared buffers and the memory
is significantly slower there.

Regarding the other comments:

> * As Robert commented, this patch adds DropRelFileNodeAllBuffersList
> by copying code from DropRelFileNodeAllBuffers. Please refactor it
> to avoid code duplication.

Yes, I've merged the two functions, i.e. I've removed the original
DropRelFileNodeAllBuffers and used the name for the new one (taking
array instead of single relnode). Then I've modified the existing calls
to to use

    DropRelFileNodeAllBuffers(&relnode, 1)

instead of the original one

    DropRelFileNodeAllBuffers(relnode)

Maybe this should be done for smgrdounlink/smgrdounlinkall too.

> * In smgrDoPendingDeletes, you free srels explicitly. Can't we leave
>  them to memory context stuff? Even it is required, at least pfree
> must be called in the case nrels == 0 too.

Hmmm, right. Not sure which choice is better, so for now I've moved the
pfree out of the 'if' block, so it's executed for 'nrels==0' too.

> * In smgrDoPendingDeletes, the buffer srels is expanded in every
> iteration. This seems a bit inefficient. How about doubling the
> capacity when used it up? This requires additional variable, but
> reduces repalloc call considerably.

Done, although I haven't seen any significant speed improvement.

> * Just an idea, but if we can remove entries for local relations from
> rnodes array before buffer loop in DropRelFileNodeAllBuffersList,
> following bsearch might be more efficient, though dropping many
> temporary tables might be rare.

My reasoning, exactly. But maybe it should be done to keep the code
clean, i.e. not letting temp tables to code paths where they are not
expected.

Tomas

Attachment

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Next
From: Noah Misch
Date:
Subject: Re: Unresolved error 0xC0000409 on Windows Server