Re: CLUSTER and MVCC - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: CLUSTER and MVCC
Date
Msg-id 45F159E9.3050301@enterprisedb.com
Whole thread Raw
In response to Re: CLUSTER and MVCC  (Csaba Nagy <nagy@ecircle-ag.com>)
Responses Re: CLUSTER and MVCC  (Csaba Nagy <nagy@ecircle-ag.com>)
List pgsql-hackers
Csaba Nagy wrote:
> On Fri, 2007-03-09 at 12:29, Heikki Linnakangas wrote:
>> Csaba, you mentioned recently 
>> (http://archives.postgresql.org/pgsql-hackers/2007-03/msg00027.php) that 
>> you're actually using the MVCC-violation to clean up tables during a 
>> backup. Can you tell us a bit more about that? Would you be upset if we 
>> shut that backdoor?
> 
> My use case: a queue-like table (in fact a 'task' table) which is very
> frequently inserted/updated/deleted. This table tends to be bloated in
> the presence of any long running transaction... the only transactional
> behavior we need from this table is to make sure that when we insert
> something in this table in a transaction (possibly together with other
> actions) and then commit/rollback, it commits/rolls back the insert.
> CLUSTER's violation of MVCC does not affect this, as CLUSTER will not be
> able to lock the table if another transaction inserted something in it
> (the inserting transaction will have a lock on the table). Selections on
> this table are not critical for us, it just doesn't matter which job
> processor is getting which task and in what order... (actually it does
> matter, but CLUSTER won't affect that either).

Hmm. You could use something along these lines instead:

0. LOCK TABLE queue_table
1. SELECT * INTO queue_table_new FROM queue_table
2. DROP TABLE queue_table
3. ALTER TABLE queue_table_new RENAME queue_table

After all, it's not that you care about the clustering of the table, you 
just want to remove old tuples.

As a long term solution, it would be nice if we had more fine-grained 
bookkeeping of snapshots that are in use in the system. In your case, 
there's a lot of tuples that are not visible to pg_dump because xmin is 
too new, and also not visible to any other transaction because xmax is 
too old. If we had a way to recognize situations like that, and vacuum 
those tuples, much of the problem with long-running transactions would 
go away.

> Wouldn't be possible to do it like Simon (IIRC) suggested, and add a
> parameter to enable/disable the current behavior, and use the MVCC
> behavior as default ?

I guess we could, but I don't see why should encourage using CLUSTER for 
that. A more aggressive, MVCC-breaking version of VACUUM would make more 
sense to me, but I don't like the idea of adding "break-MVCC" flags to 
any commands.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Csaba Nagy
Date:
Subject: Re: CLUSTER and MVCC
Next
From: Alvaro Herrera
Date:
Subject: Re: CLUSTER and MVCC