Re: crash-safe visibility map, take three - Mailing list pgsql-hackers

From Robert Haas
Subject Re: crash-safe visibility map, take three
Date
Msg-id AANLkTikNcbySP_HDS0ZoaEWmaA=JBRWhssstD7xTSmNc@mail.gmail.com
Whole thread Raw
In response to Re: crash-safe visibility map, take three  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: crash-safe visibility map, take three
List pgsql-hackers
On Thu, Dec 2, 2010 at 2:01 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> * We don't get an exclusive lock when dirtying a page with hint bits
> - Why: we write while reading, and we want good concurrency.
> - Why': because after a bulk load, we don't have any hint bits, and the
> only way to get them set without VACUUM is to write while reading. I've
> never been entirely sure why VACUUM isn't good enough in this case,
> aside from the fact that a user might not run VACUUM (and autovacuum
> might not either, if it was only a bulk load and no updates/deletes).
>
> * We don't WAL log setting hint bits (which dirties a page)
> - Why: because after a bulk load, we don't want to write the data a 4th
> time
>
> Hypothetically, if we had a bulk loading strategy, these problems would
> go away, and we could follow the rules. Right? Is there a case other
> than bulk loading which demands that we break these rules?

I'm not really convinced that this problem is confined to bulk
loading.  Every INSERT or UPDATE results in a new tuple that may need
hit bits set and eventually to be frozen.  A bulk load is just a time
when you do lots of inserts all at once; it seems to me that a large
update would cause all the same problems, plus bloat.  The triple I/O
problem exists for small transactions as well (and isn't desirable
there either); it's just less noticeable because the second and third
writes are, like the first one, small.

> And, if we had a bulk loading path, we could probably get away with
> writing the data only twice (today, we write it 3 times including the
> hint bits) or maybe once if WAL archiving is off.

It seems to me that a COPY command executed in a transaction with no
other open snapshots writing to a table created or truncated within
the same transaction should be able to write frozen tuples from the
get-go, regardless of anything else we do.

> So, is there a case other than bulk loading for which we need to break
> these rules? If not, perhaps we should consider bulk loading a different
> problem, and simplify the design of all of these other features (and
> allow new storage-touching features to come about, like CRCs, without
> exponentially increasing the complexity with each one).

I don't think we're exponentially increasing complexity - I think
we're incrementally improving our algorithms.  If you want to propose
a bulk loading path, great.  Propose away!  But without something a
bit more concrete, I don't think it would be appropriate to hold off
making the visibility map crash-safe, on the off chance that our
design for so doing might complicate something else we want to do
later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PATCH] V3: Idle in transaction cancellation
Next
From: Tom Lane
Date:
Subject: Re: WIP patch for parallel pg_dump