Thread: [WIP] The relminxid addition, try 3

[WIP] The relminxid addition, try 3

From
Alvaro Herrera
Date:
This patch is not ready for application, but read below because I need
some input.

Here is a reworked relminxid patch.  I added XLOG operations for heap
unfreeze and database unfreeze.  The latter happens when someone
connects to a frozen database.  The former happens when a LockRelation()
is called on a frozen relation, and the lock is stronger than
AccessShare.

A database is only frozen when somebody calls VACUUM FREEZE in a
standalone backend.

A table is frozen when somebody calls VACUUM FREEZE on it, which
acquires ExclusiveLock.

I'm not too sure about the XLOG routines -- I don't understand very well
the business about attaching the changes to a buffer; I thought at first
that since all the changes go to a tuple, they all belong to the buffer,
so I assigned a single XLogRecData struct with all the info and the
buffer containing the tuple; but then on replay, I got "PANIC: invalid
xlog record length 0" So I went to read the code for that case and
noticed that it said that the test "is somewhat fishy" but none of the
callers used that case currently.  So I decided to review what others
routines are doing.  I thought that heap_insert should be doing more or
less the same that heap_unfreeze, since all the changes in the tuple go
into the buffer, right?  But for some reason it doesn't.  So I just
added a second XLogRecData, without any data but attached to the buffer,
and I removed the reference to the buffer in the first XLogRecData.

This is probably wrong, but I'd like to know why and what's the correct
way to do it :-)

The replay routines also appear to work, but I'm not too sure about
them -- backup blocks and stuff, I'm not sure what is really happening.
I appreciate any advice.


The patch is not complete because I need to fiddle with relvacuumxid so
that it follows the same behavior as relminxid.  That comes next.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment

Re: [WIP] The relminxid addition, try 3

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I'm not too sure about the XLOG routines -- I don't understand very well
> the business about attaching the changes to a buffer; I thought at first
> that since all the changes go to a tuple, they all belong to the buffer,
> so I assigned a single XLogRecData struct with all the info and the
> buffer containing the tuple; but then on replay, I got "PANIC: invalid
> xlog record length 0"

Generally you want an xlog record to consist of some fixed overhead plus
attached data.  The attached data is the part that should link to the
buffer (normally it's data that is/will be actually stored into that buffer).
The fixed overhead isn't in the buffer and doesn't link.

But why do you need your own xlogging at all?  Shouldn't these actions
be perfectly ordinary updates of the relevant catalog tuples?

            regards, tom lane

Re: [WIP] The relminxid addition, try 3

From
Alvaro Herrera
Date:
Tom Lane wrote:

> But why do you need your own xlogging at all?  Shouldn't these actions
> be perfectly ordinary updates of the relevant catalog tuples?

The XLog entry can be smaller, AFAICT (we need to store the whole new
tuple in a heap_update, right?).  If that's not a worthy goal I'll
gladly rewrite this piece of code.

Ah, there's another reason, and it's that I'm rewriting the tuple in
place, not calling heap_update.  I'm not sure if I can reuse
log_heap_update for this purpose -- I'll take a look.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [WIP] The relminxid addition, try 3

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Ah, there's another reason, and it's that I'm rewriting the tuple in
> place, not calling heap_update.

Is that really a good idea, as compared to using heap_update?

(Now, if you're combining this with the very grotty relpages/reltuples
update code, then I'm all for making that xlog properly --- we've gotten
away without it so far but it really should xlog the change.)

            regards, tom lane

Re: [WIP] The relminxid addition, try 3

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Ah, there's another reason, and it's that I'm rewriting the tuple in
> > place, not calling heap_update.
>
> Is that really a good idea, as compared to using heap_update?

Not sure -- we would leave dead tuples after the VACUUM is finished if
we used heap_update, no?  ISTM it's undesirable.

> (Now, if you're combining this with the very grotty relpages/reltuples
> update code, then I'm all for making that xlog properly --- we've gotten
> away without it so far but it really should xlog the change.)

I hadn't done that, but I'll see what I can do.  Notice however that I'm
doing this in both pg_class _and_ pg_database.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [WIP] The relminxid addition, try 3

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> (Now, if you're combining this with the very grotty relpages/reltuples
>> update code, then I'm all for making that xlog properly --- we've gotten
>> away without it so far but it really should xlog the change.)

> I hadn't done that, but I'll see what I can do.  Notice however that I'm
> doing this in both pg_class _and_ pg_database.

BTW, one thing I was looking at last week was whether we couldn't
refactor the relpages/reltuples updates to be done in a cleaner place.
Right now UpdateStats is called (for indexes) directly from the index
AM, which sucks from a modularity point of view, and what's worse it
forces two successive updates of the same tuple during CREATE INDEX.
We should reorganize things so this is done once at the outer level.
It'd require some change of the ambuild() API, but considering we're
hacking several other aspects of the AM API in this development cycle,
that doesn't bother me.

            regards, tom lane

Re: [WIP] The relminxid addition, try 3

From
Alvaro Herrera
Date:
Tom Lane wrote:

> BTW, one thing I was looking at last week was whether we couldn't
> refactor the relpages/reltuples updates to be done in a cleaner place.
> Right now UpdateStats is called (for indexes) directly from the index
> AM, which sucks from a modularity point of view, and what's worse it
> forces two successive updates of the same tuple during CREATE INDEX.
> We should reorganize things so this is done once at the outer level.
> It'd require some change of the ambuild() API, but considering we're
> hacking several other aspects of the AM API in this development cycle,
> that doesn't bother me.

I'm rather stuck in Mammoth Replicator currently :-( so I'm not sure
when I could devote some time to this task.  If you want to do it, go
ahead and I can adapt the relminxid patch to your changes.  Or you can
take the relminxid patch from where it currently is, if you feel so
inclined.

If you don't, I'll eventually come back to it, but I'm not sure when
will that be.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [WIP] The relminxid addition, try 3

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> We should reorganize things so this is done once at the outer level.
>> It'd require some change of the ambuild() API, but considering we're
>> hacking several other aspects of the AM API in this development cycle,
>> that doesn't bother me.

> I'm rather stuck in Mammoth Replicator currently :-( so I'm not sure
> when I could devote some time to this task.

OK, I'll try to get to it later this week.

            regards, tom lane

Re: [WIP] The relminxid addition, try 3

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> (Now, if you're combining this with the very grotty relpages/reltuples
>> update code, then I'm all for making that xlog properly --- we've gotten
>> away without it so far but it really should xlog the change.)

> I hadn't done that, but I'll see what I can do.  Notice however that I'm
> doing this in both pg_class _and_ pg_database.

It strikes me that the cleanest way to deal with this is to invent a
single new type of xlog record, something like HEAP_UPDATE_IN_PLACE,
which just replaces tuple contents with a new tuple of the same size.
This would serve for both stats updates and unfreezing in both pg_class
and pg_database, and might have other uses in future for
non-transactional updates.  It's a little bit more xlog space than your
unfreeze-specific operations, but I don't think we need to sweat a few
bytes for those; they're not likely to be common.

Barring objections, I'll add this when I refactor UpdateStats.  Got some
other work I need to get back to right now, though.

            regards, tom lane

Re: [WIP] The relminxid addition, try 3

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

Hi,

> Here is a reworked relminxid patch.  I added XLOG operations for heap
> unfreeze and database unfreeze.  The latter happens when someone
> connects to a frozen database.  The former happens when a LockRelation()
> is called on a frozen relation, and the lock is stronger than
> AccessShare.

I've been further reworking this patch.  It is basically as discussed:
tables are unfrozen when LockRelation() is called, databases are
unfrozen when they are connected to.  Those operations are performed
using heap_inplace_update, so we no longer need to take particular steps
to XLog them.

There are two new columns in pg_class (relminxid and relvacuumxid).
Both of them can be set to FrozenTransactionId.  The minimum of each is
used to determine database-wide datminxid and datvacuumxid.  Those can
also be to FrozenXid.  (Note that it seems to be problematic that all
databases have FrozenXids; however, this can only happen in a standalone
backend, when you VACUUM FREEZE a database and all other databases are
already frozen.  So we can truncate to CurrentTransactionId without
problem.)

However there's a big problem once again but I don't know how to fix it.
I don't know why this didn't show up before.  Consider the following
scenario:

- pg_attribute is frozen
- pg_class is frozen

CREATE TABLE foo (a int);

for some unknown reason, an inval message involving relation foo seems
to be emitted.

heap_unfreeze(pg_class)
  CommandCounterIncrement()
heap_unfreeze(pg_attribute)
  CommandCounterIncrement()
... insert the pg_attribute rows ...

During one of those CommandCounterIncrement calls, the "foo" relation
receives an invalidation message and the system tries to rebuild the
RelationDesc.  But at this point the pg_attribute rows are not there
yet, so the rebuild fails because it can't find them:

$ runpg 10relminxid client
Bienvenido a psql 8.2devel, la terminal interactiva de PostgreSQL.

Digite:  \copyright para ver los términos de distribución
       \h para ayuda de comandos SQL
       \? para ayuda de comandos psql
       \g o or termine con punto y coma para ejecutar una consulta
       \q para salir

alvherre=# vacuum freeze ;
alvherre=# \q

$ runpg 10relminxid client
Bienvenido a psql 8.2devel, la terminal interactiva de PostgreSQL.

Digite:  \copyright para ver los términos de distribución
       \h para ayuda de comandos SQL
       \? para ayuda de comandos psql
       \g o or termine con punto y coma para ejecutar una consulta
       \q para salir

alvherre=# create table hmmmm (a int);
NOTICE:  unfreezing pg_type
NOTICE:  unfreezing pg_depend
NOTICE:  unfreezing pg_shdepend
NOTICE:  unfreezing pg_attribute
ERROR:  catalog is missing 1 attribute(s) for relid 16409


Strangely this doesn't happen if I create the table right away at the
same connection that did the VACUUM FREEZE.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment

Re: [WIP] The relminxid addition, try 3

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> CREATE TABLE foo (a int);

> for some unknown reason, an inval message involving relation foo seems
> to be emitted.

> heap_unfreeze(pg_class)
>   CommandCounterIncrement()
> heap_unfreeze(pg_attribute)
>   CommandCounterIncrement()
> ... insert the pg_attribute rows ...

Where did all these CommandCounterIncrement calls come from?
I don't think it's going to work if you are throwing in CCIs
at random places; this problem with the relcache will be the
least of your worries.  Why do you think you need that anyway?

            regards, tom lane

Re: [WIP] The relminxid addition, try 3

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > CREATE TABLE foo (a int);
>
> > for some unknown reason, an inval message involving relation foo seems
> > to be emitted.
>
> > heap_unfreeze(pg_class)
> >   CommandCounterIncrement()
> > heap_unfreeze(pg_attribute)
> >   CommandCounterIncrement()
> > ... insert the pg_attribute rows ...
>
> Where did all these CommandCounterIncrement calls come from?
> I don't think it's going to work if you are throwing in CCIs
> at random places; this problem with the relcache will be the
> least of your worries.  Why do you think you need that anyway?

I added them in heap_unfreeze precisely because I want the relation to
be frozen exactly once, and this doesn't seem to happen if I don't CCI
there -- I was seeing multiple occurences of the NOTICE I put in
heap_unfreeze for the same relation for a single CREATE TABLE (multiple
unfreezes of pg_class and pg_attribute, for example).

Maybe the problem is elsewhere.  I'll investigate more.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [WIP] The relminxid addition, try 3

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> Where did all these CommandCounterIncrement calls come from?

> I added them in heap_unfreeze precisely because I want the relation to
> be frozen exactly once, and this doesn't seem to happen if I don't CCI
> there -- I was seeing multiple occurences of the NOTICE I put in
> heap_unfreeze for the same relation for a single CREATE TABLE (multiple
> unfreezes of pg_class and pg_attribute, for example).

Well, that needs rethinking.  The unfreeze has to be a non-transactional
update (if our transaction rolls back, the unfreeze still has to
"stick", because we may have put dead tuples into the rel).  Therefore,
a CCI is neither necessary nor useful.  I didn't look at your patch in
any detail ... didn't you modify it to use the non-transactional update
code I put in heapam.c recently?

It might be that we need to broadcast an sinval message for the tuple
when we update it this way ... although sinval believes updates are
transactional, so that's not going to work all that well.  Maybe we have
to legislate that catcache/syscache entries shouldn't be trusted to have
up-to-date values of these fields.

            regards, tom lane

Re: [WIP] The relminxid addition, try 3

From
Tom Lane
Date:
I wrote:
> Well, that needs rethinking.  The unfreeze has to be a non-transactional
> update (if our transaction rolls back, the unfreeze still has to
> "stick", because we may have put dead tuples into the rel).

Actually, this seems even messier than I thought.  Consider a
transaction that does something transactional to a table's schema,
thereby generating a new pg_class row (but not touching any data within
the table), and then alters the table contents, requiring an unfreeze.
An update to the apparently-current pg_class tuple is not good because
that tuple might be rolled back.  An update to the last committed
version doesn't work either.

This seems real close to the recent discussions about how to put
sequence data into a single one-row-per-sequence system catalog,
specifically about how there were some parts of the sequence catalog
data that should be transactional and some that should not be.
I'm wondering if we need a second pg_class-derived catalog that carries
just the nontransactional columns.

            regards, tom lane

Re: [WIP] The relminxid addition, try 3

From
Alvaro Herrera
Date:
Tom Lane wrote:
> I wrote:
> > Well, that needs rethinking.  The unfreeze has to be a non-transactional
> > update (if our transaction rolls back, the unfreeze still has to
> > "stick", because we may have put dead tuples into the rel).
>
> Actually, this seems even messier than I thought.  Consider a
> transaction that does something transactional to a table's schema,
> thereby generating a new pg_class row (but not touching any data within
> the table), and then alters the table contents, requiring an unfreeze.
> An update to the apparently-current pg_class tuple is not good because
> that tuple might be rolled back.  An update to the last committed
> version doesn't work either.

Well, if a transaction modifies a table in some way, even without
changing the data, should generate an unfreeze event, because it will
need to lock the table; for example AlterTable locks the affected
relation with AccessExclusiveLock.  It's important for the
non-transactional change to the pg_class tuple be the very first in the
transaction, because otherwise the change could be lost; but other than
this, I don't think there's any problem.

Not that I had actually considered this problem, to be frank; but it
seems a nice side effect of how the unfreezing works.

> This seems real close to the recent discussions about how to put
> sequence data into a single one-row-per-sequence system catalog,
> specifically about how there were some parts of the sequence catalog
> data that should be transactional and some that should not be.
> I'm wondering if we need a second pg_class-derived catalog that carries
> just the nontransactional columns.

I hope we don't need to do this because ISTM it will be a very big change.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [WIP] The relminxid addition, try 3

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Well, if a transaction modifies a table in some way, even without
> changing the data, should generate an unfreeze event, because it will
> need to lock the table; for example AlterTable locks the affected
> relation with AccessExclusiveLock.  It's important for the
> non-transactional change to the pg_class tuple be the very first in the
> transaction, because otherwise the change could be lost; but other than
> this, I don't think there's any problem.

You can't guarantee that.  Consider for instance manual updates to
pg_class:

    BEGIN;
    UPDATE pg_class SET reltriggers = 0 WHERE relname = ...
    ... alter table contents ...
    COMMIT or ROLLBACK;

I believe there are actually patterns like this in some pg_dump output.
Will you hack every UPDATE operation to test whether it's changing
pg_class and if so force an "unfreeze" operation before changing any
row?  No thanks :-(

>> I'm wondering if we need a second pg_class-derived catalog that carries
>> just the nontransactional columns.

> I hope we don't need to do this because ISTM it will be a very big change.

(Yawn...)  We've made far bigger changes than that.  The important
thing is to get it right.

            regards, tom lane

Re: [WIP] The relminxid addition, try 3

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Well, if a transaction modifies a table in some way, even without
> > changing the data, should generate an unfreeze event, because it will
> > need to lock the table; for example AlterTable locks the affected
> > relation with AccessExclusiveLock.  It's important for the
> > non-transactional change to the pg_class tuple be the very first in the
> > transaction, because otherwise the change could be lost; but other than
> > this, I don't think there's any problem.
>
> You can't guarantee that.  Consider for instance manual updates to
> pg_class:
>
>     BEGIN;
>     UPDATE pg_class SET reltriggers = 0 WHERE relname = ...
>     ... alter table contents ...
>     COMMIT or ROLLBACK;
>
> I believe there are actually patterns like this in some pg_dump output.
> Will you hack every UPDATE operation to test whether it's changing
> pg_class and if so force an "unfreeze" operation before changing any
> row?  No thanks :-(

Oh, true, I hadn't thought of direct updates to pg_class.

> >> I'm wondering if we need a second pg_class-derived catalog that carries
> >> just the nontransactional columns.
>
> > I hope we don't need to do this because ISTM it will be a very big change.
>
> (Yawn...)  We've made far bigger changes than that.  The important
> thing is to get it right.

Yeah, I know -- I've been involved in some of them.  I hereby volunteer
to do it for 8.2 because I'd really like to see this patch in.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support