Re: [PATCHES] Non-transactional pg_class, try 2 - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [PATCHES] Non-transactional pg_class, try 2
Date
Msg-id 1150211979.2691.656.camel@localhost.localdomain
Whole thread Raw
In response to Re: [PATCHES] Non-transactional pg_class, try 2  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCHES] Non-transactional pg_class, try 2  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: [PATCHES] Non-transactional pg_class, try 2  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
On Tue, 2006-06-13 at 10:02 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Elsewhere, it has been discussed that we might hold the number of blocks
> > in a relation in shared memory. Does that idea now fall down, or is it
> > complementary to this?
> 
> It's been the case for some time that the planner uses
> RelationGetNumberOfBlocks() to determine true rel size.  The only reason
> relpages is still stored at all is that it's used to approximate true
> number of tuples via
>     true_ntuples = (reltuples/relpages) * true_npages
> ie, assuming that the tuple density is still what it was at the last
> VACUUM or ANALYZE.  So you can't fool the system with a totally made-up
> relation size anyway.  (This too is moderately annoying for planner
> testing, but it seems the only way to get the planner to react when a
> table's been filled without an immediate vacuum/analyze.)
> 
> The only point of tracking rel size in shared memory would be to avoid
> the costs of lseek() kernel calls in RelationGetNumberOfBlocks.

Yes, understood. With the second point to allow them to be separately
set for PGSQL developer testing of optimizer, and application dev
testing of SQL and/or what/if scenarios.

> >> The main thing we are trying to accomplish here is to decouple
> >> transactional and nontransactional updates to a pg_class row.
> 
> > With the goal being avoiding table bloat??
> 
> No, with the goal being correctness.  If you have a freeze/unfreeze
> mechanism then unfreezing a relation is an action that must NOT be
> rolled back if your transaction (or any other one for that matter) later
> aborts.  The tuples you put into it meanwhile need to be vacuumed anyway.
> So you can't mark it unfrozen in an uncommitted pg_class entry that
> might never become committed.
> 
> > For me, freezing is last step before writing to WORM media, so there is
> > never an unfreeze step.
> 
> That is not what Alvaro is after.  Nor anyone else here. 

So what is unfreeze for again?

> I have not
> heard anyone mention WORM media for Postgres in *years*.

Oh? Big requirements for archive these days, much more so than before.
This will allow years of data in a seamless on-line/near-line
partitioned table set. Lots of people want that: .gov, .mil, .com

More modern equivalent: a MAID archive system for WORO data

> It strikes me though that automatic UNFREEZE isn't necessarily the
> requirement.  What if VACUUM FREEZE causes the table to become
> effectively read-only, and you need an explicit UNFREEZE command to
> put it back into a read-write state?  Then UNFREEZE could be a
> transactional operation, and most of these issues go away.  

That works for me. Very much preferred.

> The case
> where this doesn't work conveniently is copying a frozen database
> (viz template0), but maybe biting the bullet and finding a way to do
> prep work in a freshly made database is the answer for that.  We've
> certainly seen plenty of other possible uses for post-CREATE processing
> in a new database.
> 
> Another reason for not doing unfreeze automatically is that as the patch
> stands, any database user can force unfreezing of any table, whether he
> has any access rights on it or not (because the LockTable will happen
> before we check access rights, I believe).  This is probably Not Good.
> Ideally I think FREEZE/UNFREEZE would be owner-permission-required.

Seems like a plan.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: CSV mode option for pg_dump
Next
From: Andrew Dunstan
Date:
Subject: Re: CSV mode option for pg_dump