Thread: switch UNLOGGED to LOGGED

switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
Hi,

I read the discussion at


http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php


From what I can understand, going from/to unlogged to/from logged in
the wal_level == minimal case is not too complicated.

Suppose I try to write a patch that allows

ALTER TABLE tablename SET LOGGED (or UNLOGGED)
(proper sql wording to be discussed...)

only in the wal_level == minimal case: would it be accepted as a
"first step"? Or rejected because it doesn't allow it in the other
cases?

From what I got in the discussion, handling the other wal_level cases
can be very complicated (example: the issues in case "we *crash*
without writing an abort record").




Leonardo


Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Fri, Apr 8, 2011 at 6:01 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
> I read the discussion at
>
> http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php
>
> From what I can understand, going from/to unlogged to/from logged in
> the wal_level == minimal case is not too complicated.
>
> Suppose I try to write a patch that allows
>
> ALTER TABLE tablename SET LOGGED (or UNLOGGED)
> (proper sql wording to be discussed...)
>
> only in the wal_level == minimal case: would it be accepted as a
> "first step"? Or rejected because it doesn't allow it in the other
> cases?

I'm pretty sure we wouldn't accept a patch for a feature that would
only work with wal_level=minimal, but it might be a useful starting
point for someone else to keep hacking on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> I'm pretty sure we wouldn't accept a patch for a  feature that would
> only work with wal_level=minimal, but it might be a useful  starting
> point for someone else to keep hacking on.


I understand.

Reading your post at
http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php
I thought I got the part:

"what happens if we *crash* without writing an abort record?  It
seems  like that could leave a stray file around on a standby,
because the  current code only cleans things up on the standby
at the start of  recovery"


But re-reading it, I don't understand: what's the difference in creating
a new "regular" table and crashing before emitting the abort record,
and converting an unlogged table to logged and crashing before
emitting the abort record? How do the standby servers handle a
"CREATE TABLE" followed by a ROLLBACK if the master crashes
before writing the abort record? I thought that too would "leave a
stray file around on a standby".



Leonardo



Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Sat, Apr 9, 2011 at 3:29 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>> I'm pretty sure we wouldn't accept a patch for a  feature that would
>> only work with wal_level=minimal, but it might be a useful  starting
>> point for someone else to keep hacking on.
>
> I understand.
>
> Reading your post at
> http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php
> I thought I got the part:
>
> "what happens if we *crash* without writing an abort record?  It
> seems  like that could leave a stray file around on a standby,
> because the  current code only cleans things up on the standby
> at the start of  recovery"
>
> But re-reading it, I don't understand: what's the difference in creating
> a new "regular" table and crashing before emitting the abort record,
> and converting an unlogged table to logged and crashing before
> emitting the abort record? How do the standby servers handle a
> "CREATE TABLE" followed by a ROLLBACK if the master crashes
> before writing the abort record? I thought that too would "leave a
> stray file around on a standby".

I've been thinking about the same thing.  And AFAICS, your analysis is
correct, though there may be some angle to it I'm not seeing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> > But re-reading  it, I don't understand: what's the difference in creating
> > a new  "regular" table and crashing before emitting the abort record,
> > and  converting an unlogged table to logged and crashing before
> > emitting the  abort record? How do the standby servers handle a
> > "CREATE TABLE"  followed by a ROLLBACK if the master crashes
> > before writing the abort  record? I thought that too would "leave a
> > stray file around on a  standby".
>
> I've been thinking about the same thing.  And AFAICS, your  analysis is
> correct, though there may be some angle to it I'm not  seeing.


Anyone else? I would like to know if what I'm trying to do is, in fact,
possible... otherwise starting with thewal_level=minimal case first
will be wasted effort in case the other cases can't be integrated
somehow...



Leonardo


Re: switch UNLOGGED to LOGGED

From
Noah Misch
Date:
On Mon, Apr 11, 2011 at 11:41:18AM +0100, Leonardo Francalanci wrote:
> > > But re-reading  it, I don't understand: what's the difference in creating
> > > a new  "regular" table and crashing before emitting the abort record,
> > > and  converting an unlogged table to logged and crashing before
> > > emitting the  abort record? How do the standby servers handle a
> > > "CREATE TABLE"  followed by a ROLLBACK if the master crashes
> > > before writing the abort  record? I thought that too would "leave a
> > > stray file around on a  standby".
> > 
> > I've been thinking about the same thing.  And AFAICS, your  analysis is
> > correct, though there may be some angle to it I'm not  seeing.
> 
> 
> Anyone else? I would like to know if what I'm trying to do is, in fact,
> possible... otherwise starting with thewal_level=minimal case first
> will be wasted effort in case the other cases can't be integrated
> somehow...

If the master crashes while a transaction that used CREATE TABLE is unfinished,
both the master and the standby will indefinitely retain identical, stray (not
referenced by pg_class) files.  The catalogs do reference the relfilenode of
each unlogged relation; currently, that relfilenode never exists on a standby
while that standby is accepting connections.  By the time the startup process
releases the AccessExclusiveLock acquired by the proposed UNLOGGED -> normal
conversion process, that relfilenode needs to be either fully copied or unlinked
all over again.  (Alternately, find some other way to make sure queries don't
read the half-copied file.)  In effect, the problem is that the relfilenode is
*not* stray, so its final state does need to be well-defined.

nm


Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Mon, Apr 11, 2011 at 10:29 AM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Apr 11, 2011 at 11:41:18AM +0100, Leonardo Francalanci wrote:
>> > > But re-reading  it, I don't understand: what's the difference in creating
>> > > a new  "regular" table and crashing before emitting the abort record,
>> > > and  converting an unlogged table to logged and crashing before
>> > > emitting the  abort record? How do the standby servers handle a
>> > > "CREATE TABLE"  followed by a ROLLBACK if the master crashes
>> > > before writing the abort  record? I thought that too would "leave a
>> > > stray file around on a  standby".
>> >
>> > I've been thinking about the same thing.  And AFAICS, your  analysis is
>> > correct, though there may be some angle to it I'm not  seeing.
>>
>> Anyone else? I would like to know if what I'm trying to do is, in fact,
>> possible... otherwise starting with thewal_level=minimal case first
>> will be wasted effort in case the other cases can't be integrated
>> somehow...
>
> If the master crashes while a transaction that used CREATE TABLE is unfinished,
> both the master and the standby will indefinitely retain identical, stray (not
> referenced by pg_class) files.  The catalogs do reference the relfilenode of
> each unlogged relation; currently, that relfilenode never exists on a standby
> while that standby is accepting connections.  By the time the startup process
> releases the AccessExclusiveLock acquired by the proposed UNLOGGED -> normal
> conversion process, that relfilenode needs to be either fully copied or unlinked
> all over again.  (Alternately, find some other way to make sure queries don't
> read the half-copied file.)  In effect, the problem is that the relfilenode is
> *not* stray, so its final state does need to be well-defined.

Oh, right.

Maybe we should just put in a rule that a server in Hot Standby mode
won't ever try to read from an unlogged table (right now we count on
the fact that there will be nothing to read).  If we crash before
copying the whole file, it won't matter, because the catalogs won't
have been updated, so we'll refuse to look at it anyway.  And we have
to reinitialize on entering normal running anyway, so we can clean it
up then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> > If the  master crashes while a transaction that used CREATE TABLE is
>unfinished,
> >  both the master and the standby will indefinitely retain identical, stray
>(not
> > referenced by pg_class) files.  The catalogs do reference the  relfilenode
of
> > each unlogged relation; currently, that relfilenode never  exists on a
>standby
> > while that standby is accepting connections.  By the  time the startup
>process
> > releases the AccessExclusiveLock acquired by  the proposed UNLOGGED ->
normal
> > conversion process, that relfilenode  needs to be either fully copied or
>unlinked
> > all over again.   (Alternately, find some other way to make sure queries
>don't
> > read the  half-copied file.)  In effect, the problem is that the relfilenode
>is
> >  *not* stray, so its final state does need to be well-defined.
>
> Oh,  right.
>
> Maybe we should just put in a rule that a server in Hot Standby  mode
> won't ever try to read from an unlogged table (right now we count  on
> the fact that there will be nothing to read).  If we crash  before
> copying the whole file, it won't matter, because the catalogs  won't
> have been updated, so we'll refuse to look at it anyway.  And we  have
> to reinitialize on entering normal running anyway, so we can clean  it
> up then.


Ok then... I'll try to code the "easy" version first (the wal_level=minimal
case)
and then we'll see....

Leonardo


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
I think I coded a very basic version of the UNLOGGED to LOGGED patch
(only wal_level=minimal case for the moment).

To remove the INIT fork, I changed somehow PendingRelDelete to have
a flag "bool onlyInitFork" so that the delete would remove only the INIT
fork at commit.

Everything "works" (note the quotes...) except in the case of not-clean
shutdown ("-m immediate" to pg_ctl stop). The reason is that the replay
code doesn't have any idea that it has to remove only the INIT fork: it will
remove ALL forks; so at the end of the redo procedure (at startup, after
a "pg_ctl -m immediate stop") the table doesn't have any forks at all.

See xact_redo_commit:

/* Make sure files supposed to be dropped are dropped */
for (i = 0; i < xlrec->nrels; i++)
{  [...]  for (fork = 0; fork <= MAX_FORKNUM; fork++)     {          if (smgrexists(srel, fork))          {
   XLogDropRelation(xlrec->xnodes[i], fork);                 smgrdounlink(srel, fork, true);          }     }
smgrclose(srel);
}
[...]


Should I change xl_xact_commit in order to have, instead of:


/* Array of RelFileNode(s) to drop at commit */
RelFileNode xnodes[1];        /* VARIABLE LENGTH ARRAY */



an array of structures such as:

{   RelFileNode   relfilenode;   bool               onlyInitFork;
}

???

Otherwise I don't know how to tell the redo commit code to delete only
the INIT fork, instead of all the relation forks...
(maybe I'm doing all wrong: I'm open to any kind of suggestion here...)


Leonardo


Re: switch UNLOGGED to LOGGED

From
Alvaro Herrera
Date:
Excerpts from Leonardo Francalanci's message of lun abr 18 09:36:13 -0300 2011:
> I think I coded a very basic version of the UNLOGGED to LOGGED patch
> (only wal_level=minimal case for the moment).
> 
> To remove the INIT fork, I changed somehow PendingRelDelete to have
> a flag "bool onlyInitFork" so that the delete would remove only the INIT
> fork at commit.
> 
> Everything "works" (note the quotes...) except in the case of not-clean
> shutdown ("-m immediate" to pg_ctl stop). The reason is that the replay
> code doesn't have any idea that it has to remove only the INIT fork: it will
> remove ALL forks; so at the end of the redo procedure (at startup, after
> a "pg_ctl -m immediate stop") the table doesn't have any forks at all.

Maybe you should change xl_act_commit to have a separate list of rels to
drop the init fork for (instead of mixing those with the list of files to
drop as a whole).

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> Maybe you should change  xl_act_commit to have a separate list of rels to
> drop the init fork for  (instead of mixing those with the list of files to
> drop as a  whole).


I tried to follow your suggestion, thank you very much.

Here's a first attempt at the patch.

I "tested" it with:


create table forei (v integer primary key);
insert into forei select * from generate_series(1,10000);
create unlogged table pun (c integer primary key, constraint
    con foreign key (c) references forei(v));
insert into pun select * from generate_series(1,10000);
alter table pun  set logged;


then shutdown the master with "immediate":

bin/pg_ctl -D data -m immediate stop
bin/pg_ctl -D data start


and "pun" still has data:

select * from pun where c=100;


Question/comments:

1) it's a very-first-stage patch; I would need to know if something is
*very* wrong before cleaning it.
2) there are some things I implemented using a logic like "let's see how it
worked 10 lines above, and I'll do the same". For example, the 2PC stuff
is totally "copied" from the other places, I have no idea if the code makes
sense at all (how can I test it?)
3) Should we have a "cascade" option? I don't know if I have to handle
inherited tables and other dependent objects
4) During the check for dependencies problems, I stop as soon as I find an
error; would it be enough?



Leonardo
Attachment

Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Fri, Apr 22, 2011 at 4:13 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>> Maybe you should change  xl_act_commit to have a separate list of rels to
>> drop the init fork for  (instead of mixing those with the list of files to
>> drop as a  whole).
>
> I tried to follow your suggestion, thank you very much.

I have to admit I don't like this approach very much.  I can't see
adding 4 bytes to every commit record for this feature.

> 3) Should we have a "cascade" option? I don't know if I have to handle
> inherited tables and other dependent objects

Look at the way ALTER TABLE [ONLY] works for other action types, and copy it.

> 4) During the check for dependencies problems, I stop as soon as I find an
> error; would it be enough?

It's a bit awkwardly phrased the way you have it.  I would suggest
something like:

ERROR:  constraints on permanent tables may reference only permanent tables
HINT:  constraint %s

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Fri, May 6, 2011 at 10:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> ERROR:  constraints on permanent tables may reference only permanent tables
> HINT:  constraint %s

Argh, hit send too soon.

HINT: constraint %s references table %s

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, May 6, 2011 at 10:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> ERROR: �constraints on permanent tables may reference only permanent tables
>> HINT: �constraint %s

> Argh, hit send too soon.

> HINT: constraint %s references table %s

<nitpick>
That looks like a DETAIL, not a HINT.  Also see message style guide
about how that should be a complete sentence.
</nitpick>
        regards, tom lane


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> On Fri, Apr 22, 2011 at 4:13 AM, Leonardo Francalanci <m_lists@yahoo.it>
wrote:
> >> Maybe  you should change  xl_act_commit to have a separate list of rels to
> >>  drop the init fork for  (instead of mixing those with the list of files
to
> >> drop as a  whole).
> >
> > I tried to follow your  suggestion, thank you very much.
>
> I have to admit I don't like this  approach very much.  I can't see
> adding 4 bytes to every commit record  for this feature.


I understand.

What if, in xl_xact_commit, instead of

RelFileNode xnodes

I use another struct for xnodes, something like:

{RelFileNode xnode;bool            onlyInitFork;
}


That would increase the commit record size only when there are
RelFileNode(s) to drop at commit. So, instead of 4 bytes in
every commit, there are "wasted" bytes when the commit record
contains deleted permanent relations (that should happen much
less). I'm open to suggestions here...


> > 3) Should we have a "cascade" option? I don't know  if I have to handle
> > inherited tables and other dependent  objects
>
> Look at the way ALTER TABLE [ONLY] works for other action types,  and copy it.


Ok


Thank you very much



Leonardo



Re: switch UNLOGGED to LOGGED

From
Alvaro Herrera
Date:
Excerpts from Robert Haas's message of vie may 06 23:25:09 -0300 2011:
> On Fri, Apr 22, 2011 at 4:13 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
> >> Maybe you should change  xl_act_commit to have a separate list of rels to
> >> drop the init fork for  (instead of mixing those with the list of files to
> >> drop as a  whole).
> >
> > I tried to follow your suggestion, thank you very much.
> 
> I have to admit I don't like this approach very much.  I can't see
> adding 4 bytes to every commit record for this feature.

Hmm, yeah.  Maybe we can add a "flags" int8 somewhere in that struct and
set a bit in it if nrels, nsubxacts, nmsgs and respective arrays are present.
That would save some int's that are already in there.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Mon, May 9, 2011 at 12:51 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Robert Haas's message of vie may 06 23:25:09 -0300 2011:
>> On Fri, Apr 22, 2011 at 4:13 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>> >> Maybe you should change  xl_act_commit to have a separate list of rels to
>> >> drop the init fork for  (instead of mixing those with the list of files to
>> >> drop as a  whole).
>> >
>> > I tried to follow your suggestion, thank you very much.
>>
>> I have to admit I don't like this approach very much.  I can't see
>> adding 4 bytes to every commit record for this feature.
>
> Hmm, yeah.  Maybe we can add a "flags" int8 somewhere in that struct and
> set a bit in it if nrels, nsubxacts, nmsgs and respective arrays are present.
> That would save some int's that are already in there.

Yes, that seems like a very appealing approach.  There is plenty of
bit-space available in xinfo, and we could reserve a bit each for
nrels, nsubxacts, and nmsgs, with set meaning that an integer count of
that item is present and clear meaning that the count is omitted from
the structure (and zero).  This will probably require a bit of tricky
code reorganization so I think it should be done separately from the
main patch.  With that done, then it's not a big deal for the main
patch to add in one more array that will normally get omitted.  And in
the process, we can save 12 bytes on every commit record in the common
case, which is quite appealing: I don't expect a huge performance
gain, but a penny saved is a penny earned.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> Yes, that seems like a very appealing approach.   There is plenty of
> bit-space available in xinfo, and we could reserve a bit  each for
> nrels, nsubxacts, and nmsgs, with set meaning that an integer count  of
> that item is present and clear meaning that the count is omitted  from
> the structure (and zero).  This will probably require a bit of  tricky
> code reorganization so I think it should be done separately from  the
> main patch.

Ok, I'll try and send a patch with this change only.
BTW  xinfo  is 32 bit long, but I think only 2 bits are used right now?
I think I can make it a 8 bits, and add another 8 bits for nrels,
nsubxacts, and nmsgs and the new thing. That should save
another 2 bytes, while leaving space for extention. Or we can make
it a 8 bits only, but only 2 bits would be left "empty" for future
extentions; I don't know if we care about it...


Leonardo



Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Tue, May 10, 2011 at 3:35 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>> Yes, that seems like a very appealing approach.   There is plenty of
>> bit-space available in xinfo, and we could reserve a bit  each for
>> nrels, nsubxacts, and nmsgs, with set meaning that an integer count  of
>> that item is present and clear meaning that the count is omitted  from
>> the structure (and zero).  This will probably require a bit of  tricky
>> code reorganization so I think it should be done separately from  the
>> main patch.
>
> Ok, I'll try and send a patch with this change only.
> BTW  xinfo  is 32 bit long, but I think only 2 bits are used right now?
> I think I can make it a 8 bits, and add another 8 bits for nrels,
> nsubxacts, and nmsgs and the new thing. That should save
> another 2 bytes, while leaving space for extention. Or we can make
> it a 8 bits only, but only 2 bits would be left "empty" for future
> extentions; I don't know if we care about it...

I don't think making xinfo shorter will save anything, because
whatever follows it is going to be a 4-byte quantity and therefore
4-byte aligned.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> I don't  think making xinfo shorter will save anything, because
> whatever follows it is  going to be a 4-byte quantity and therefore
> 4-byte aligned.


ups, didn't notice it.

I'll split    xinfo into:
uint16   xinfo;
uint16   presentFlags;


I guess it helps with the reading? I mean, instead
of having a single uint32?


Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Tue, May 10, 2011 at 8:03 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>> I don't  think making xinfo shorter will save anything, because
>> whatever follows it is  going to be a 4-byte quantity and therefore
>> 4-byte aligned.
>
>
> ups, didn't notice it.
>
> I'll split    xinfo into:
>
> uint16   xinfo;
> uint16   presentFlags;
>
>
> I guess it helps with the reading? I mean, instead
> of having a single uint32?

My feeling would be just keep it as uint32.  Breaking it up into
chunks doesn't seem useful to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> By the time the  startup process
> releases the AccessExclusiveLock acquired by the proposed
> UNLOGGED -> normal conversion process, that relfilenode
> needs to be either  fully copied or unlinked all over again.
> (Alternately, find some other  way to make sure queries don't
> read the half-copied file.)

About this issue: how are AccessExclusiveLocks released on
the standby when the master crashes?


Re: switch UNLOGGED to LOGGED

From
Noah Misch
Date:
On Wed, May 18, 2011 at 04:02:59PM +0100, Leonardo Francalanci wrote:
> > By the time the  startup process
> > releases the AccessExclusiveLock acquired by the proposed 
> > UNLOGGED -> normal conversion process, that relfilenode
> > needs to be either  fully copied or unlinked all over again. 
> > (Alternately, find some other  way to make sure queries don't
> > read the half-copied file.)  
> 
> About this issue: how are AccessExclusiveLocks released on
> the standby when the master crashes?

I assume those locks remain.  It wouldn't be safe to release them; a master
crash is just one kind of WAL receipt latency.

When you promote the standby, though, ShutdownRecoveryTransactionEnvironment()
releases the locks.


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> On Wed, May 18, 2011 at 04:02:59PM +0100, Leonardo Francalanci wrote:
> >  > By the time the  startup process
> > > releases the  AccessExclusiveLock acquired by the proposed
> > > UNLOGGED -> normal  conversion process, that relfilenode
> > > needs to be either  fully  copied or unlinked all over again.
> > > (Alternately, find some  other  way to make sure queries don't
> > > read the half-copied  file.)
> >
> > About this issue: how are AccessExclusiveLocks  released on
> > the standby when the master crashes?
>
> I assume those  locks remain.  It wouldn't be safe to release them; a master
> crash is  just one kind of WAL receipt latency.


But somehow when the master restarts the standby gets notified it
has the unlock???

> When you promote the standby,  though,
ShutdownRecoveryTransactionEnvironment()
> releases the locks.


Ok; then the problem in the UNLOGGED -> normal  conversion  is that:

1) the master and the standby acquire a lock on the table
2) the master starts sending the pages to the standby
3) the master crashes before committing

up until here no problems, as the standby still has the lock on the
table.

4) when the master restarts, it removes all the fork for rels with INIT forks;
are those "deletes" sent to the standby? And, if yes,
would those be replayed by the standby *before* releasing the lock?
If the answer is "yes", then I don't think we have a problem... but I think
that at the moment those unlogged-table-forks deletes aren't sent at all.

(When promoting the standby,  we could have
ShutdownRecoveryTransactionEnvironment() remove all the fork for rels
with INIT forks before releasing the locks)



Leonardo


Re: switch UNLOGGED to LOGGED

From
Noah Misch
Date:
On Thu, May 19, 2011 at 09:23:53AM +0100, Leonardo Francalanci wrote:
> > On Wed, May 18, 2011 at 04:02:59PM +0100, Leonardo Francalanci wrote:
> > >  > By the time the  startup process
> > > > releases the  AccessExclusiveLock acquired by the proposed 
> > > > UNLOGGED -> normal  conversion process, that relfilenode
> > > > needs to be either  fully  copied or unlinked all over again. 
> > > > (Alternately, find some  other  way to make sure queries don't
> > > > read the half-copied  file.)  
> > > 
> > > About this issue: how are AccessExclusiveLocks  released on
> > > the standby when the master crashes?
> > 
> > I assume those  locks remain.  It wouldn't be safe to release them; a master
> > crash is  just one kind of WAL receipt latency.
> 
> But somehow when the master restarts the standby gets notified it
> has the unlock??? 

I'd guess some WAL record arising from the post-crash master restart makes the
standby do so.  When a crash isn't involved, the commit or abort record is that
signal.  You could test and find out how it happens after a master crash with a
procedure like this:

1. Start a master and standby on the same machine.
2. Connect to master; CREATE TABLE t(); BEGIN; ALTER TABLE t ADD c int;
3. kill -9 -`head -n1 $master_PGDATA/postmaster.pid`
4. Connect to standby and confirm that t is still locked.
5. Attach debugger to standby startup process and set breakpoints on
StandbyReleaseLocks and StandbyReleaseLocksMany.
6. Restart master.

> > When you promote the standby,  though, 
> ShutdownRecoveryTransactionEnvironment()
> > releases the locks.
> 
> Ok; then the problem in the UNLOGGED -> normal  conversion  is that:
> 
> 1) the master and the standby acquire a lock on the table
> 2) the master starts sending the pages to the standby
> 3) the master crashes before committing
> 
> up until here no problems, as the standby still has the lock on the
> table.

Correct.

> 4) when the master restarts, it removes all the fork for rels with INIT forks; 
> are those "deletes" sent to the standby? And, if yes,
> would those be replayed by the standby *before* releasing the lock?
> If the answer is "yes", then I don't think we have a problem... but I think
> that at the moment those unlogged-table-forks deletes aren't sent at all.

I think you are correct that they are not currently WAL-logged.

> (When promoting the standby,  we could have
> ShutdownRecoveryTransactionEnvironment() remove all the fork for rels
> with INIT forks before releasing the locks)

Makes sense.

nm


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> I'd guess some WAL record arising  from the post-crash master restart makes
the
> standby do so.  When a  crash isn't involved, the commit or abort record is
>that
> signal.  You  could test and find out how it happens after a master crash with
>a
> procedure  like this:
>
> 1. Start a master and standby on the same machine.
> 2.  Connect to master; CREATE TABLE t(); BEGIN; ALTER TABLE t ADD c int;
> 3. kill  -9 -`head -n1 $master_PGDATA/postmaster.pid`
> 4. Connect to standby and  confirm that t is still locked.
> 5. Attach debugger to standby startup process  and set breakpoints on
> StandbyReleaseLocks and StandbyReleaseLocksMany.
> 6.  Restart master.


Well yes, based on the test the stack is something like:

StandbyReleaseLocksManyStandbyReleaseOldLocks
ProcArrayApplyRecoveryInfo
xlog_redo


It's not very clear to me what ProcArrayApplyRecoveryInfo does (not too
familiar with the standby part I guess) but I see it's called by xlog_redo in
the "info == XLOG_CHECKPOINT_SHUTDOWN" case and by StartupXLOG.

But I don't know if calling      ResetUnloggedRelations before
the call to ProcArrayApplyRecoveryInfo in   xlog_redo makes sense...
if it makes sense, it would solve the problem of the stray files in
the master crashing case I guess?


> > > When you promote the standby,  though,
> > ShutdownRecoveryTransactionEnvironment()
> > > releases the  locks.


If I understand the code, ResetUnloggedRelations is called before
ShutdownRecoveryTransactionEnvironment, so that part shouldn't be
an issue


Leonardo


Re: switch UNLOGGED to LOGGED

From
Noah Misch
Date:
On Thu, May 19, 2011 at 01:52:46PM +0100, Leonardo Francalanci wrote:
> > I'd guess some WAL record arising  from the post-crash master restart makes 
> the
> > standby do so.  When a  crash isn't involved, the commit or abort record is 
> >that
> > signal.  You  could test and find out how it happens after a master crash with 
> >a
> > procedure  like this:
> > 
> > 1. Start a master and standby on the same machine.
> > 2.  Connect to master; CREATE TABLE t(); BEGIN; ALTER TABLE t ADD c int;
> > 3. kill  -9 -`head -n1 $master_PGDATA/postmaster.pid`
> > 4. Connect to standby and  confirm that t is still locked.
> > 5. Attach debugger to standby startup process  and set breakpoints on
> > StandbyReleaseLocks and StandbyReleaseLocksMany.
> > 6.  Restart master.
> 
> 
> Well yes, based on the test the stack is something like:
> 
> StandbyReleaseLocksMany
> StandbyReleaseOldLocks 
> ProcArrayApplyRecoveryInfo  
> xlog_redo
> 
> It's not very clear to me what ProcArrayApplyRecoveryInfo does (not too
> familiar with the standby part I guess) but I see it's called by xlog_redo in
> the "info == XLOG_CHECKPOINT_SHUTDOWN" case and by StartupXLOG.
> 
> But I don't know if calling      ResetUnloggedRelations before 
> the call to ProcArrayApplyRecoveryInfo in   xlog_redo makes sense...
> if it makes sense, it would solve the problem of the stray files in
> the master crashing case I guess?

It would solve the problem, but it would mean resetting unlogged relations on
the standby at every shutdown checkpoint.  That's probably not a performance
problem, but it is a hack.  Offhand, I'd add a new smgr WAL record issued by
ResetUnloggedRelations() when called with UNLOGGED_RELATION_CLEANUP.  Another,
simpler, idea is to split XLOG_CHECKPOINT_SHUTDOWN into XLOG_CHECKPOINT_SHUTDOWN
and XLOG_CHECKPOINT_END_OF_RECOVERY, mirroring CreateCheckPoint()'s distinction.
(Given that I regularly lack good taste, you might want to wait for other people
to weigh in before spending too much time on that.)

> > > > When you promote the standby,  though,  
> > > ShutdownRecoveryTransactionEnvironment()
> > > > releases the  locks.
> 
> 
> If I understand the code, ResetUnloggedRelations is called before 
> ShutdownRecoveryTransactionEnvironment, so that part shouldn't be
> an issue 

Seems correct.

nm


Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Thu, May 19, 2011 at 11:13 AM, Noah Misch <noah@leadboat.com> wrote:
> It would solve the problem, but it would mean resetting unlogged relations on
> the standby at every shutdown checkpoint.  That's probably not a performance
> problem, but it is a hack.

I haven't thought about this too deeply, but I'm not sure I agree
that's a hack.  Why do you think it is?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Noah Misch
Date:
On Thu, May 19, 2011 at 11:42:03AM -0400, Robert Haas wrote:
> On Thu, May 19, 2011 at 11:13 AM, Noah Misch <noah@leadboat.com> wrote:
> > It would solve the problem, but it would mean resetting unlogged relations on
> > the standby at every shutdown checkpoint. ?That's probably not a performance
> > problem, but it is a hack.
> 
> I haven't thought about this too deeply, but I'm not sure I agree
> that's a hack.  Why do you think it is?

It would make the standby reset unlogged relations on both regular shutdowns and
crashes, while the master only does so on crashes.  This creates no functional
hazard since unlogged relation contents are never accessible during hot standby.
It seems like a hack to rely on that fact at any distance, but perhaps a comment
is enough to assuage that.

nm


Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Thu, May 19, 2011 at 3:24 PM, Noah Misch <noah@leadboat.com> wrote:
> On Thu, May 19, 2011 at 11:42:03AM -0400, Robert Haas wrote:
>> On Thu, May 19, 2011 at 11:13 AM, Noah Misch <noah@leadboat.com> wrote:
>> > It would solve the problem, but it would mean resetting unlogged relations on
>> > the standby at every shutdown checkpoint. ?That's probably not a performance
>> > problem, but it is a hack.
>>
>> I haven't thought about this too deeply, but I'm not sure I agree
>> that's a hack.  Why do you think it is?
>
> It would make the standby reset unlogged relations on both regular shutdowns and
> crashes, while the master only does so on crashes.  This creates no functional
> hazard since unlogged relation contents are never accessible during hot standby.
> It seems like a hack to rely on that fact at any distance, but perhaps a comment
> is enough to assuage that.

I think I'd be more comfortable with that route if it seems like it'll
work.  Whacking around the recovery code always makes me a little
nervous about bugs, since it's easy to fail to notice the problem
until something Bad happens.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Noah Misch
Date:
On Thu, May 19, 2011 at 03:53:12PM -0400, Robert Haas wrote:
> On Thu, May 19, 2011 at 3:24 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Thu, May 19, 2011 at 11:42:03AM -0400, Robert Haas wrote:
> >> On Thu, May 19, 2011 at 11:13 AM, Noah Misch <noah@leadboat.com> wrote:
> >> > It would solve the problem, but it would mean resetting unlogged relations on
> >> > the standby at every shutdown checkpoint. ?That's probably not a performance
> >> > problem, but it is a hack.
> >>
> >> I haven't thought about this too deeply, but I'm not sure I agree
> >> that's a hack. ?Why do you think it is?
> >
> > It would make the standby reset unlogged relations on both regular shutdowns and
> > crashes, while the master only does so on crashes. ?This creates no functional
> > hazard since unlogged relation contents are never accessible during hot standby.
> > It seems like a hack to rely on that fact at any distance, but perhaps a comment
> > is enough to assuage that.
> 
> I think I'd be more comfortable with that route if it seems like it'll
> work.  Whacking around the recovery code always makes me a little
> nervous about bugs, since it's easy to fail to notice the problem
> until something Bad happens.

No remaining objection from me, then.  Thanks for reviewing.


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
I'll try to sum up what I understood:

1) the standby keeps the lock, so no problem with
stray files coming from the unlogged->logged log
reply, as the table can't be read during the operation

2) calling ResetUnloggedRelations before
ProcArrayApplyRecoveryInfo would remove the problem
of the stray files on the standby in case of master crash
before commit/abort

3) promoting the standby shouldn't be an issue,
since ResetUnloggedRelations is already called in
ShutdownRecoveryTransactionEnvironment


Now, to move forward, some questions:

- the patch is missing the "send all table pages to the
standby" part; is there some code I can use as base?
I guess I have to generate some special log type that
is only "played" by standby servers.

- on the standby, the commit part should be played as it
is on the master (that is, removing the INIT fork).
The abort case is different though: it would mean
doing nothing on the master, while removing every forks
but the INIT fork on the standby.
Would it be ok to add to xl_xact_abort a new array of
RelFileNode(s), where for each one at abort all the forks,
except the init fork, have to be deleted by the standby
(while the master shouldn't do anything with them)?
I bet there's a cleaner solution...



Leonardo


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> - the patch is missing the "send all table pages to  the
> standby" part; is there some code I can use as base?
> I guess I have to  generate some special log type that
> is only "played" by standby  servers.


Maybe I could use log_newpage, but instead of
XLOG_HEAP_NEWPAGE I could use something like
XLOG_HEAP_COPYPAGE; and in heap_redo, in the
XLOG_HEAP_COPYPAGE case, call heap_xlog_newpage
only in case we're in standby...


Re: switch UNLOGGED to LOGGED

From
Noah Misch
Date:
On Fri, May 20, 2011 at 09:37:20AM +0100, Leonardo Francalanci wrote:
> I'll try to sum up what I understood:
> 
> 1) the standby keeps the lock, so no problem with
> stray files coming from the unlogged->logged log
> reply, as the table can't be read during the operation
> 
> 2) calling ResetUnloggedRelations before 
> ProcArrayApplyRecoveryInfo would remove the problem
> of the stray files on the standby in case of master crash
> before commit/abort
> 
> 3) promoting the standby shouldn't be an issue,
> since ResetUnloggedRelations is already called in
> ShutdownRecoveryTransactionEnvironment

All correct, as far as I can tell.

> Now, to move forward, some questions:
> 
> - the patch is missing the "send all table pages to the
> standby" part; is there some code I can use as base?

Nothing comes to mind as especially similar.

> I guess I have to generate some special log type that
> is only "played" by standby servers.

What you described in your followup mail seemed reasonable.

> - on the standby, the commit part should be played as it
> is on the master (that is, removing the INIT fork).
> The abort case is different though: it would mean
> doing nothing on the master, while removing every forks
> but the INIT fork on the standby.
> Would it be ok to add to xl_xact_abort a new array of
> RelFileNode(s), where for each one at abort all the forks,
> except the init fork, have to be deleted by the standby
> (while the master shouldn't do anything with them)?
> I bet there's a cleaner solution...

Your "use less space in xl_xact_commit patch" seems to be going in a good
direction here.  It would probably also be okay to do a ResetUnloggedRelations()
on the standby at every abort of a transaction that had started an UNLOGGED ->
LOGGED conversion.  That is, just a flag might be enough.

nm


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> From: Noah Misch <noah@leadboat.com>
> > - the  patch is missing the "send all table pages to the
> > standby" part; is  there some code I can use as base?
>
> Nothing comes to mind as especially  similar.
>
> > I guess I have to generate some special log type  that
> > is only "played" by standby servers.
>
> What you described in  your followup mail seemed reasonable.


So, it's ok to have a log item that is replayed only if

WalRcvInProgress()

is true?

Is it a correct approach? I couldn't find any other way to
find out if we are in a standby or a master...

> > - on the standby, the commit  part should be played as it
> > is on the master (that is, removing the INIT  fork).
> > The abort case is different though: it would mean
> > doing  nothing on the master, while removing every forks
> > but the INIT fork on  the standby.
> > Would it be ok to add to xl_xact_abort a new array  of
> > RelFileNode(s), where for each one at abort all the forks,
> >  except the init fork, have to be deleted by the standby
> > (while the  master shouldn't do anything with them)?
> > I bet there's a cleaner  solution...
>
> Your "use less space in xl_xact_commit patch" seems to be  going in a good
> direction here.  It would probably also be okay to do a
>ResetUnloggedRelations()
> on the standby at every abort of a transaction that  had started an UNLOGGED
->
> LOGGED conversion.  That is, just a flag  might be enough.
ok, but that would mean that a transaction that aborts a conversion
would try to reset all unlogged relations (traversing all the FS)...
I don't know if that's acceptable performance-wise.


Re: switch UNLOGGED to LOGGED

From
Noah Misch
Date:
On Fri, May 27, 2011 at 10:49:13AM +0100, Leonardo Francalanci wrote:
> > From: Noah Misch <noah@leadboat.com>
> > > - the  patch is missing the "send all table pages to the
> > > standby" part; is  there some code I can use as base?
> > 
> > Nothing comes to mind as especially  similar.
> > 
> > > I guess I have to generate some special log type  that
> > > is only "played" by standby servers.
> > 
> > What you described in  your followup mail seemed reasonable.
> 
> 
> So, it's ok to have a log item that is replayed only if 
> 
> WalRcvInProgress()
> 
> is true?

No, that checks for WAL streaming in particular.  A log-shipping standby needs
the same treatment.

> Is it a correct approach? I couldn't find any other way to
> find out if we are in a standby or a master...

InArchiveRecovery looks like the right thing, but it's currently static to
xlog.c.  Perhaps exporting that is the way to go.

> > > - on the standby, the commit  part should be played as it
> > > is on the master (that is, removing the INIT  fork).
> > > The abort case is different though: it would mean
> > > doing  nothing on the master, while removing every forks
> > > but the INIT fork on  the standby.
> > > Would it be ok to add to xl_xact_abort a new array  of
> > > RelFileNode(s), where for each one at abort all the forks,
> > >  except the init fork, have to be deleted by the standby
> > > (while the  master shouldn't do anything with them)?
> > > I bet there's a cleaner  solution...
> > 
> > Your "use less space in xl_xact_commit patch" seems to be  going in a good
> > direction here.  It would probably also be okay to do a  
> >ResetUnloggedRelations()
> > on the standby at every abort of a transaction that  had started an UNLOGGED 
> ->
> > LOGGED conversion.  That is, just a flag  might be enough.
>  
> ok, but that would mean that a transaction that aborts a conversion
> would try to reset all unlogged relations (traversing all the FS)... 
> I don't know if that's acceptable performance-wise.

I'm not sure, either, but I don't figure such operations will be at all common.

nm


Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Fri, May 27, 2011 at 6:19 AM, Noah Misch <noah@leadboat.com> wrote:
>> So, it's ok to have a log item that is replayed only if
>>
>> WalRcvInProgress()
>>
>> is true?
>
> No, that checks for WAL streaming in particular.  A log-shipping standby needs
> the same treatment.
>
>> Is it a correct approach? I couldn't find any other way to
>> find out if we are in a standby or a master...
>
> InArchiveRecovery looks like the right thing, but it's currently static to
> xlog.c.  Perhaps exporting that is the way to go.

Why is it necessary to replay the operation only on the slave?  Can we
just use XLOG_HEAP_NEWPAGE?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Noah Misch
Date:
On Sat, May 28, 2011 at 09:33:09PM -0400, Robert Haas wrote:
> On Fri, May 27, 2011 at 6:19 AM, Noah Misch <noah@leadboat.com> wrote:
> >> So, it's ok to have a log item that is replayed only if
> >>
> >> WalRcvInProgress()
> >>
> >> is true?
> >
> > No, that checks for WAL streaming in particular. ?A log-shipping standby needs
> > the same treatment.
> >
> >> Is it a correct approach? I couldn't find any other way to
> >> find out if we are in a standby or a master...
> >
> > InArchiveRecovery looks like the right thing, but it's currently static to
> > xlog.c. ?Perhaps exporting that is the way to go.
> 
> Why is it necessary to replay the operation only on the slave?  Can we
> just use XLOG_HEAP_NEWPAGE?

I don't think it is *necessary*.  If we're replaying WAL on a master, we'll also
be resetting unlogged relations after recovery; what we write or do not write to
them in the mean time has no functional impact.  Seemed like a sensible
optimization, but maybe it's premature.

nm


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> Why is it necessary to replay the operation only on the slave?   Can we
> just use XLOG_HEAP_NEWPAGE?


Uh, I don't know why but I thought I shouldn't log a page on the master,
since all the pages are already there and fsync-ed. But if it makes no harm,
I can easily use   XLOG_HEAP_NEWPAGE (of course, only in the wal_level != minimal case).


Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Sun, May 29, 2011 at 4:29 AM, Noah Misch <noah@leadboat.com> wrote:
> I don't think it is *necessary*.  If we're replaying WAL on a master, we'll also
> be resetting unlogged relations after recovery; what we write or do not write to
> them in the mean time has no functional impact.  Seemed like a sensible
> optimization, but maybe it's premature.

Some jiggering may be necessary, because right now we remove the main
forks at the start of recovery and repopulate them at the end.  It's
not immediately obvious to me that that's going to work well with
HEAP_XLOG_NEWPAGE, but then it's not immediately obvious to me that
it's going to work well with a new WAL record type, either.  I think
we need a detailed design document for how this is all going to work.
We need to not only handle the master properly but also handle the
slave properly.  Consider, for example, the case where the slave
begins to replay the transaction, reaches a restartpoint after
replaying some of the new pages, and then crashes.  If the subsequent
restart from the restartpoint blows away the main relation fork, we're
hosed.  I fear we're plunging into implementation details without
having a good overall design in mind first.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> I think
> we need a detailed design document for how  this is all going to work.
> We need to not only handle the master properly but  also handle the
> slave properly.  Consider, for example, the case where  the slave
> begins to replay the transaction, reaches a restartpoint  after
> replaying some of the new pages, and then crashes.  If the  subsequent
> restart from the restartpoint blows away the main relation fork,  we're
> hosed.  I fear we're plunging into implementation details  without
> having a good overall design in mind first.


As I said in my first post, I'm basing the patch on the post:

http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php


So I assumed the design was ok (except for the "stray file around
on a standby" case, which has been discussed earlier on this thread).

If there are things to be discussed/analyzed (I guess the restart point
thing is one of those) we can do it... but I thought that the whole
design was somehow in place



Leonardo


Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Tue, May 31, 2011 at 3:39 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>> I think
>> we need a detailed design document for how  this is all going to work.
>> We need to not only handle the master properly but  also handle the
>> slave properly.  Consider, for example, the case where  the slave
>> begins to replay the transaction, reaches a restartpoint  after
>> replaying some of the new pages, and then crashes.  If the  subsequent
>> restart from the restartpoint blows away the main relation fork,  we're
>> hosed.  I fear we're plunging into implementation details  without
>> having a good overall design in mind first.
>
> As I said in my first post, I'm basing the patch on the post:
>
> http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php
>
>
> So I assumed the design was ok (except for the "stray file around
> on a standby" case, which has been discussed earlier on this thread).

Well, I sort of assumed the design was OK, too, but the more we talk
about this WAL-logging stuff, the less convinced I am that I really
understand the problem.  :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: switch UNLOGGED to LOGGED

From
Leonardo Francalanci
Date:
> Well, I  sort of assumed the design was OK, too, but the more we talk
> about this  WAL-logging stuff, the less convinced I am that I really
> understand the  problem.  :-(


I see. In fact, I think nobody thought about restart points...

To sum up:

1) everything seems ok when in the wal_level = minimal
case. In this case, we fsync everything and at transaction commit
we remove the init fork; in case of a crash, we don't reply
anything (as nothing has been written to the log), and we
remove the main fork as we do now.

2) in the   wal_level != minimal case things become more
complicated: if the standby reaches a restart point
and then crashes we are in trouble: it would remove the
main fork at startup, and would reply only a portion of
the table.
I guess the same applies to the master too? I mean:
if we log   HEAP_XLOG_NEWPAGEs, reach a checkpoint,
and then crash, at server restart the main fork would be
deleted, and the pages logged on the log couldn't be
replayed. But the problem on the master can be removed
using another type of log instead of   HEAP_XLOG_NEWPAGE
(to be replayed by the standbys only).


Is this analysis correct?


Leonardo


Re: switch UNLOGGED to LOGGED

From
Robert Haas
Date:
On Tue, May 31, 2011 at 12:25 PM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>> Well, I  sort of assumed the design was OK, too, but the more we talk
>> about this  WAL-logging stuff, the less convinced I am that I really
>> understand the  problem.  :-(
>
>
> I see. In fact, I think nobody thought about restart points...
>
> To sum up:
>
> 1) everything seems ok when in the wal_level = minimal
> case. In this case, we fsync everything and at transaction commit
> we remove the init fork; in case of a crash, we don't reply
> anything (as nothing has been written to the log), and we
> remove the main fork as we do now.

Yeah, that seems like it should work.

> 2) in the   wal_level != minimal case things become more
> complicated: if the standby reaches a restart point
> and then crashes we are in trouble: it would remove the
> main fork at startup, and would reply only a portion of
> the table.
> I guess the same applies to the master too? I mean:
> if we log   HEAP_XLOG_NEWPAGEs, reach a checkpoint,
> and then crash, at server restart the main fork would be
> deleted, and the pages logged on the log couldn't be
> replayed. But the problem on the master can be removed
> using another type of log instead of   HEAP_XLOG_NEWPAGE
> (to be replayed by the standbys only).

I think that's about right, except that I feel we're missing some
trick here that's needed to make all this work out nicely.  Somehow we
need to maintain some state that an unlogged->logged conversion is in
progress; that state needs to then get cleaned up at commit or abort
time (including implicit abort).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company