Re: Large number of open(2) calls with bulk INSERT into empty table - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Large number of open(2) calls with bulk INSERT into empty table
Date
Msg-id CA+TgmobOxE3NHyFYQU9GEpes8zocGz2PVNuQfxfZ8_2j0kQAzA@mail.gmail.com
Whole thread Raw
In response to Re: Large number of open(2) calls with bulk INSERT into empty table  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Dec 6, 2011 at 8:12 PM, Andres Freund <andres@anarazel.de> wrote:
> On Tuesday, December 06, 2011 08:53:42 PM Robert Haas wrote:
>> On Tue, Dec 6, 2011 at 7:12 AM, Florian Weimer <fweimer@bfk.de> wrote:
>> > * Robert Haas:
>> >> I tried whacking out the call to GetPageWithFreeSpace() in
>> >> RelationGetBufferForTuple(), and also with the unpatched code, but the
>> >> run-to-run randomness was way more than any difference the change
>> >> made.  Is there a better test case?
>> >
>> > I think that if you want to exercise file system lookup performance, you
>> > need a larger directory, which presumably means a large number of
>> > tables.
>>
>> OK.  I created 100,000 dummy tables, 10,000 at a time avoid blowing up
>> the lock manager.  I then repeated my previous tests, and I still
>> can't see any meaningful difference (on my MacBook Pro, running MacOS
>> X v10.6.8).  So at least on this OS, it doesn't seem to matter much.
>> I'm inclined to defer putting any more work into it until such time as
>> someone can demonstrate that it actually causes a problem and provides
>> a reproducible test case.  I don't deny that there's probably an
>> effect and it would be nice to improve this, but it doesn't seem worth
>> spending a lot of time on until we can find a case where the effect is
>> measurable.
> I think if at all youre going to notice differences at a high concurrency
> because you then would start to hit the price of synchronizing the dcache
> between cpu cores in the kernel.

Well, if the premise is that the table has been truncated in the same
transaction, then it's going to be tough to get high concurrency.
Maybe you could do it with multiple tables or with without truncation,
but either way I think you're going to be primarily limited by I/O
bandwidth or WALInsertLock contention, not kernel dcache
synchronization.  I might be wrong, of course, but that's what I
think.  I'm not saying this isn't worth improving, just that I don't
see it as a priority for me personally to spend time on right now.  If
you or someone else wants to beat on it, or even just come up with a
test case, great!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Timing overhead and Linux clock sources
Next
From: Robert Haas
Date:
Subject: Re: Timing overhead and Linux clock sources