Visibility Groups - Mailing list pgsql-hackers

From Simon Riggs
Subject Visibility Groups
Date
Msg-id 1218105518.4549.365.camel@ebony.2ndQuadrant
Whole thread Raw
Responses Re: Visibility Groups
Re: Visibility Groups
Re: Visibility Groups
Re: Visibility Groups
Re: Visibility Groups
List pgsql-hackers
Currently, we calculate a single OldestXmin across all snapshots on the
assumption that any transaction might access any table.

I propose creating "Visibility Groups" that *explicitly* limit the
ability of a transaction to access data outside its visibility group(s).
By default, visibility_groups would be NULL, implying potential access
to all tables.

Once set, any attempt to lock an object outside of a transactions
defined visibility_groups will result in an error: ERROR attempt to lock table outside of visibility group(s): foo HINT
youneed to set a different value for visibility_groups
 
A transaction can only ever reduce or restrict its visibility_groups, it
cannot reset or add visibility groups.

This would give us the ability to explicitly prevent long running
statements from interfering with updates of critical tables, when those
tables are not accessed.

The visibility_groups list would be held on each PGPROC, so that a
snapshot will be able to calculate an xmin for each visibility group.
When checking visibility of rows using HeapTupleSatisfiesVacuum() we
would use the oldest xmin for the visibility group of the table, rather
than the single global value.

If no visibility groups are in use then everything happens just as it
does now, with only a single "if" test in GetSnapshotData() and
HeapTupleSatisfiesVacuum().

Patch would require providing info through to HeapTupleSatisfiesVacuum()
to allow it to resolve the appropriate xmin to use for visibility
checks, since it will have more than one to choose from. Various ways of
doing that.

We might like the visibility group to be set automatically but that
seems like a harder problem. I do not propose to solve that here. This
general idea has been proposed before, but we always get hung up on our
inability to automatically determine the visibility group. Let's just do
this explicitly, so we can get benefit in the cases where we know very
clearly which tables we'll access and more importantly, which we won't.

How do we specify visibility groups? Well various ways, various syntax,
so that is up for discussion and debate. This might be the place the
concept falls down, maybe where it starts having wings.

* SET visibility_groups = table, schema.*
* define visibility groups using a function: create_visibility_group('name', 'table/schema list')
* specify them via ALTER TABLE etc

This idea is connected somewhat to replication, so floating it now to
see how viable a concept people think this is. I'm floating the idea in
a fairly neutral way in the hope that it leads others to even
better/more workable proposals, possibly now, possibly over the next few
years.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Parsing of pg_hba.conf and authentication inconsistencies
Next
From: "Jochem van Dieten"
Date:
Subject: Re: Visibility Groups