Home > mailing lists

Re: PATCH: optimized DROP of multiple tables within a transaction - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: PATCH: optimized DROP of multiple tables within a transaction
Date	December 8, 2012 19:07:43
Msg-id	50C365CA.4070408@fuzzy.cz Whole thread Raw
In response to	Re: PATCH: optimized DROP of multiple tables within a transaction (Tomas Vondra <tv@fuzzy.cz>)
Responses	Re: PATCH: optimized DROP of multiple tables within a transaction Re: PATCH: optimized DROP of multiple tables within a transaction
List	pgsql-hackers

Tree view

On 8.12.2012 15:49, Tomas Vondra wrote:
> On 8.12.2012 15:26, Andres Freund wrote:
>> On 2012-12-06 23:38:59 +0100, Tomas Vondra wrote:
>>> I've re-run the tests with the current patch on my home workstation, and
>>> the results are these (again 10k tables, dropped either one-by-one or in
>>> batches of 100).
>>>
>>> 1) unpatched
>>>
>>> dropping one-by-one:        15.5 seconds
>>> dropping in batches of 100: 12.3 sec
>>>
>>> 2) patched (v3.1)
>>>
>>> dropping one-by-one:        32.8 seconds
>>> dropping in batches of 100:  3.0 sec
>>>
>>> The problem here is that when dropping the tables one-by-one, the
>>> bsearch overhead is tremendous and significantly increases the runtime.
>>> I've done a simple check (if dropping a single table, use the original
>>> simple comparison) and I got this:
>>>
>>> 3) patched (v3.2)
>>>
>>> dropping one-by-one:        16.0 seconds
>>> dropping in batches of 100:  3.3 sec
>>>
>>> i.e. the best of both. But it seems like an unnecessary complexity to me
>>> - if you need to drop a lot of tables you'll probably do that in a
>>> transaction anyway.
>>>
>>
>> Imo that's still a pretty bad performance difference. And your
>> single-table optimization will probably fall short as soon as the table
>> has indexes and/or a toast table...
> 
> Why? I haven't checked the code but either those objects are droppped
> one-by-one (which seems unlikely) or they're added to the pending list
> and then the new code will kick in, making it actually faster.
> 
> Yes, the original code might be faster even for 2 or 3 objects, but I
> can't think of a simple solution to fix this, given that it really
> depends on CPU/RAM speed and shared_buffers size.

I've done some test and yes - once there are other objects the
optimization falls short. For example for tables with one index, it
looks like this:
 1) unpatched
 one by one:  28.9 s 100 batches: 23.9 s
 2) patched
 one by one:  44.1 s 100 batches:  4.7 s

So the patched code is by about 50% slower, but this difference quickly
disappears with the number of indexes / toast tables etc.

I see this as an argument AGAINST such special-case optimization. My
reasoning is this:

* This difference is significant only if you're dropping a table with low number of indexes / toast tables. In reality
thisis not going to be very frequent.

* If you're dropping a single table, it really does not matter - the difference will be like 100 ms vs. 200 ms or
somethinglike that.

* If you're dropping a lot of tables, you should do than in a transaction anyway, or you should be aware that doing
thatin a transaction will be faster (we can add this info into DROP TABLE docs).

IMHO this is similar to doing a lot of INSERTs - doing all of them in a
single transaction is faster. The possibility that someone needs to drop
a lot of tables in separate transactions exists in theory, but I don't
know of a realistic real-world usage.

So I'd vote to ditch that special case optimization.

>>> +
>>> +/*
>>> + * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), so
>>> + * that it's suitable for bsearch.
>>> + */
>>> +static int
>>> +rnode_comparator(const void * p1, const void * p2)
>>> +{
>>> +    RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1;
>>> +    RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2;
>>> +
>>> +    if (n1.node.relNode < n2.node.relNode)
>>> +        return -1;
>>> +    else if (n1.node.relNode > n2.node.relNode)
>>> +        return 1;
>>> +
>>> +    if (n1.node.dbNode < n2.node.dbNode)
>>> +        return -1;
>>> +    else if (n1.node.dbNode > n2.node.dbNode)
>>> +        return 1;
>>> +
>>> +    if (n1.node.spcNode < n2.node.spcNode)
>>> +        return -1;
>>> +    else if (n1.node.spcNode > n2.node.spcNode)
>>> +        return 1;
>>> +    else
>>> +        return 0;
>>> +}
>>
>> ISTM that this whole comparator could be replaced by memcmp(). That
>> could quite possibly lower the overhead of the bsearch() in simple
>> cases. We already rely on the fact that RelFileNode's have no padding
>> atm (c.f. buf_internal.h), so a memcmp() seems fine to me.
> 
> Good point, I'll try that. The original code used a macro that simply
> compared the fields and I copied that logic, but if we can remove the
> code ...

Hmmm, I've replaced the comparator with this one:
   static int   rnode_comparator(const void * p1, const void * p2)   {           return memcmp(p1, p2,
sizeof(RelFileNode));  }

and while it's not significantly faster (actually it's often a bit
slower than the original comparator), it's a much simpler code.

Tomas

pgsql-hackers by date:

From: Andres Freund
Date: 08 December 2012, 19:07:36
Subject: Re: Failing SSL connection due to weird interaction with openssl

From: Andrew Dunstan
Date: 08 December 2012, 19:13:49
Subject: Re: parallel pg_dump

Re: PATCH: optimized DROP of multiple tables within a transaction - Mailing list pgsql-hackers

Previous

Next