Re: 9.3: load path to mitigate load penalty for checksums - Mailing list pgsql-hackers

From Robert Haas
Subject Re: 9.3: load path to mitigate load penalty for checksums
Date
Msg-id CA+TgmoYsE44JbBYP8+=kuuQ-=UF_6r7GvbPzDV_svW7jO5ppKg@mail.gmail.com
Whole thread Raw
In response to Re: 9.3: load path to mitigate load penalty for checksums  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Wed, Jun 6, 2012 at 8:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Wed, 2012-06-06 at 15:08 -0400, Robert Haas wrote:
>> On Mon, Jun 4, 2012 at 9:26 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> > Thoughts?
>>
>> Simon already proposed a way of doing this that doesn't require
>> explicit user action, which seems preferable to a method that does
>> require explicit user action, even though it's a little harder to
>> implement.  His idea was to store the XID of the process creating the
>> table in the pg_class row, which I think is *probably* better than
>> your idea of having a process that waits and then flips the flag.
>> There are some finicky details though - see previous thread for
>> discussion of some of the issues.
>
> My goals include:
>
> * The ability to load into existing tables with existing data
> * The ability to load concurrently
>
> My understanding was that the proposal to which you're referring can't
> do those things, which seem like major limitations. Did I miss
> something?

No, you're correct.  I misread your original email, sorry.

I'm just thinking about this a little more.  It strikes me that the
core trade-off here is between doing more post-commit work and doing
more post-abort work.  For example, in your proposal, we've got to run
a lazy vacuum before exiting bulk load mode, because if a transaction
has aborted, we've got to get rid of its tuples before letting anyone
trust hint bits again.  That is, abort cleanup gets harder.  OTOH,
commit cleanup gets easier, because all the hint bits and visibility
map bits are already set: the only thing left is to freeze.  In
general, I think that's a good trade-off.  Commits are much more
common than aborts, and so we ought to be optimizing for the commit
case.

But maybe there are other ways of doing it, besides what you've
proposed here.   Not sure exactly what, but it might be worth thinking
about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [COMMITTERS] pgsql: Run pgindent on 9.2 source tree in preparation for first 9.3
Next
From: Robert Haas
Date:
Subject: Re: /proc/self/oom_adj is deprecated in newer Linux kernels