Thread: making an unlogged table logged

making an unlogged table logged

From
Robert Haas
Date:
Somebody asked about this on Depesz's blog today, and I think it's
come up here before too, so I thought it might be worth my writing up
a few comments on this.  I don't think I'm going to have time to work
on this any time soon, but if someone else wants to work up a patch,
I'm game to review.  I think it'd clearly be a good feature.

Generally, to do this, it would be necessary to do the following
things (plus anything I'm forgetting):

1. Take an AccessExclusiveLock on the target table.  You might think
that concurrent selects could be allowed, but I believe that's not the
case.  Read on.

2. Verify that there are no foreign keys referencing other unlogged
tables, because if that were the case then after the change we'd have
a permanent table referencing an unlogged table, which would violate
referential integrity.  (Note that unlogged referencing permanent is
OK, but permanent referencing unlogged is a no-no, so what matters
when upgrading is "outbound" foreign keys.)

3. Write out all shared buffers for the target table, and drop them.
This ensures that there are no buffers floating around for the target
relation that are marked BM_UNLOGGED, which would be a bad thing.  Or
maybe it's possible to just clear the BM_UNLOGGED flag, instead of
dropping them.  This is the step that makes me think we need an access
exclusive lock - otherwise, somebody else might read in a buffer and,
seeing that the relation is unlogged (which is true, since we haven't
committed yet), mark it BM_UNLOGGED.

4. fsync() any segments of the target relation - of any fork except
that init fork - that might have dirty pages not on disk.

5. Arrange for the appropriate file deletions at commit or abort, by
updating pendingDeletes.  On commit, we want to delete the init fork
for the table and all its indexes.  On abort, we want to delete
everything else, but only for pretend; that is, the abort record
should reflect the deletions since they'll need to happen on any
standbys, but we shouldn't actually perform them on the master since
we don't want to obliterate the contents of the table for no reason.
There's a subtle problem here I'm not quite sure how to deal with:
what happens if we *crash* without writing an abort record?  It seems
like that could leave a stray file around on a standby, because the
current code only cleans things up on the standby at the start of
recovery; to make this bullet-proof, I suppose it'd need to repeat
that every time a crash happens on the master, but I don't know how to
do that.  Note also that if wal_level is minimal, then we need only
worry about the commit case; the abort case can be a no-op.

6. If wal_level != minimal, XLOG every page of every fork except the
init fork, for both the table and the associated indexes. (Note that
this step also requires an AccessExclusiveLock rather than some weaker
lock, because of the arbitrary rule that only AccessExclusiveLocks are
sent to standbys.  If we held only ShareRowExclusiveLock on the
master, for example, a Hot Standby backend might read the table while
it's only been half-copied.)

7. Update pg_class.relpersistence from 'u' to 'p', for both the table
and the associated indexes.

Going the other direction ought to be possible too, although it seems
somewhat less useful.  For that, you'd need to flip around the check
in step #2 (i.e. check for a reference FROM a permanent table),
perform step #3, skip step #4, do step #5 backwards (create and log
init forks, arranging for them to be removed on abort - this too has
an issue with crashes that don't leave abort records behind); and
arrange for the rest of the forks to be removed on commit on any
standby without doing it on the master), skip step #6, and do step #7
backwards.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: making an unlogged table logged

From
Rob Wultsch
Date:
On Tue, Jan 4, 2011 at 7:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Somebody asked about this on Depesz's blog today, and I think it's
> come up here before too, so I thought it might be worth my writing up
> a few comments on this.  I don't think I'm going to have time to work
> on this any time soon, but if someone else wants to work up a patch,
> I'm game to review.  I think it'd clearly be a good feature.
>
> Generally, to do this, it would be necessary to do the following
> things (plus anything I'm forgetting):
>
> 1. Take an AccessExclusiveLock on the target table.  You might think
> that concurrent selects could be allowed, but I believe that's not the
> case.  Read on.
>
> 2. Verify that there are no foreign keys referencing other unlogged
> tables, because if that were the case then after the change we'd have
> a permanent table referencing an unlogged table, which would violate
> referential integrity.  (Note that unlogged referencing permanent is
> OK, but permanent referencing unlogged is a no-no, so what matters
> when upgrading is "outbound" foreign keys.)
>
> 3. Write out all shared buffers for the target table, and drop them.
> This ensures that there are no buffers floating around for the target
> relation that are marked BM_UNLOGGED, which would be a bad thing.  Or
> maybe it's possible to just clear the BM_UNLOGGED flag, instead of
> dropping them.  This is the step that makes me think we need an access
> exclusive lock - otherwise, somebody else might read in a buffer and,
> seeing that the relation is unlogged (which is true, since we haven't
> committed yet), mark it BM_UNLOGGED.
>
> 4. fsync() any segments of the target relation - of any fork except
> that init fork - that might have dirty pages not on disk.
>
> 5. Arrange for the appropriate file deletions at commit or abort, by
> updating pendingDeletes.  On commit, we want to delete the init fork
> for the table and all its indexes.  On abort, we want to delete
> everything else, but only for pretend; that is, the abort record
> should reflect the deletions since they'll need to happen on any
> standbys, but we shouldn't actually perform them on the master since
> we don't want to obliterate the contents of the table for no reason.
> There's a subtle problem here I'm not quite sure how to deal with:
> what happens if we *crash* without writing an abort record?  It seems
> like that could leave a stray file around on a standby, because the
> current code only cleans things up on the standby at the start of
> recovery; to make this bullet-proof, I suppose it'd need to repeat
> that every time a crash happens on the master, but I don't know how to
> do that.  Note also that if wal_level is minimal, then we need only
> worry about the commit case; the abort case can be a no-op.
>
> 6. If wal_level != minimal, XLOG every page of every fork except the
> init fork, for both the table and the associated indexes. (Note that
> this step also requires an AccessExclusiveLock rather than some weaker
> lock, because of the arbitrary rule that only AccessExclusiveLocks are
> sent to standbys.  If we held only ShareRowExclusiveLock on the
> master, for example, a Hot Standby backend might read the table while
> it's only been half-copied.)
>
> 7. Update pg_class.relpersistence from 'u' to 'p', for both the table
> and the associated indexes.
>
> Going the other direction ought to be possible too, although it seems
> somewhat less useful.  For that, you'd need to flip around the check
> in step #2 (i.e. check for a reference FROM a permanent table),
> perform step #3, skip step #4, do step #5 backwards (create and log
> init forks, arranging for them to be removed on abort - this too has
> an issue with crashes that don't leave abort records behind); and
> arrange for the rest of the forks to be removed on commit on any
> standby without doing it on the master), skip step #6, and do step #7
> backwards.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

A couple thoughts:
1. Could the making a table logged be a non-exclusive lock if the
ALTER is allowed to take a full checkpoint?
2. Unlogged to logged has giant use case.
3. In MySQL I have had to ALTER tables to engine BLACKHOLE because
they held data that was not vital, but the server was out of IO. Going
logged -> unlogged has a significant placed, I think.


--
Rob Wultsch
wultsch@gmail.com


Re: making an unlogged table logged

From
"Joshua D. Drake"
Date:
> > --
> > Robert Haas
> > EnterpriseDB: http://www.enterprisedb.com
> > The Enterprise PostgreSQL Company
> 
> A couple thoughts:
> 1. Could the making a table logged be a non-exclusive lock if the
> ALTER is allowed to take a full checkpoint?

If possible, that would certainly be better. If the bgwriter is doing
what it is supposed to, it would be relatively painless.

> 2. Unlogged to logged has giant use case.

Agreed.

JD

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt



Re: making an unlogged table logged

From
Robert Haas
Date:
On Tue, Jan 4, 2011 at 10:56 PM, Rob Wultsch <wultsch@gmail.com> wrote:
> 1. Could the making a table logged be a non-exclusive lock if the
> ALTER is allowed to take a full checkpoint?

No, that doesn't solve either of the two problems I described, unfortunately.

> 2. Unlogged to logged has giant use case.

Agree.

> 3. In MySQL I have had to ALTER tables to engine BLACKHOLE because
> they held data that was not vital, but the server was out of IO. Going
> logged -> unlogged has a significant placed, I think.

Interesting.  So you'd change a logged table into an unlogged table to
cut down on I/O, and take the risk of losing the data if the server
went down?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: making an unlogged table logged

From
David Fetter
Date:
On Wed, Jan 05, 2011 at 09:04:08AM -0500, Robert Haas wrote:
> On Tue, Jan 4, 2011 at 10:56 PM, Rob Wultsch <wultsch@gmail.com> wrote:
> > 1. Could the making a table logged be a non-exclusive lock if the
> > ALTER is allowed to take a full checkpoint?
> 
> No, that doesn't solve either of the two problems I described,
> unfortunately.
> 
> > 2. Unlogged to logged has giant use case.
> 
> Agree.
> 
> > 3. In MySQL I have had to ALTER tables to engine BLACKHOLE because
> > they held data that was not vital, but the server was out of IO.
> > Going logged -> unlogged has a significant placed, I think.
> 
> Interesting.  So you'd change a logged table into an unlogged table
> to cut down on I/O, and take the risk of losing the data if the
> server went down?

BLACKHOLE is a "storage engine" that's equivalent to /dev/null, so it
wasn't a risk /per se/.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: making an unlogged table logged

From
Rob Wultsch
Date:
On Wed, Jan 5, 2011 at 7:48 AM, David Fetter <david@fetter.org> wrote:
> On Wed, Jan 05, 2011 at 09:04:08AM -0500, Robert Haas wrote:
>> On Tue, Jan 4, 2011 at 10:56 PM, Rob Wultsch <wultsch@gmail.com> wrote:
>> > 1. Could the making a table logged be a non-exclusive lock if the
>> > ALTER is allowed to take a full checkpoint?
>>
>> No, that doesn't solve either of the two problems I described,
>> unfortunately.

That is too bad.

>>
>> > 2. Unlogged to logged has giant use case.
>>
>> Agree.
>>
>> > 3. In MySQL I have had to ALTER tables to engine BLACKHOLE because
>> > they held data that was not vital, but the server was out of IO.
>> > Going logged -> unlogged has a significant placed, I think.
>>
>> Interesting.  So you'd change a logged table into an unlogged table
>> to cut down on I/O, and take the risk of losing the data if the
>> server went down?
>
> BLACKHOLE is a "storage engine" that's equivalent to /dev/null, so it
> wasn't a risk /per se/.
>

Exactly. It was data I could live without and by having schema
attached to /dev/null the application did not error out and die. It is
a very bad option and being able to turn off logging for a table is a
much better one.


--
Rob Wultsch
wultsch@gmail.com


Re: making an unlogged table logged

From
Josh Berkus
Date:
On 1/4/11 6:41 PM, Robert Haas wrote:
> Going the other direction ought to be possible too, although it seems
> somewhat less useful.

Actually, it's more useful; many people who *upgrade* to 9.1 will wand
to convert one or two of their tables to unlogged.

Note that the conversion both ways can be worked around with the
CREATE/ALTER TABLE dosiedo, so I don't think either of these is critical
for 9.1.  Unless you feel like working on them, of course.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: making an unlogged table logged

From
Robert Haas
Date:
On Wed, Jan 5, 2011 at 2:02 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Note that the conversion both ways can be worked around with the
> CREATE/ALTER TABLE dosiedo, so I don't think either of these is critical
> for 9.1.  Unless you feel like working on them, of course.

As I said in my OP, I don't intend to work on them for 9.1, but will
review and possibly commit a patch written by someone else.  It'd be
nice to have, but there are other things I want more, and a ton of
other large and small patches to review.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: making an unlogged table logged

From
Simon Riggs
Date:
On Tue, 2011-01-04 at 21:41 -0500, Robert Haas wrote:

> 6. If wal_level != minimal, XLOG every page of every fork except the
> init fork, for both the table and the associated indexes. (Note that
> this step also requires an AccessExclusiveLock rather than some weaker
> lock, because of the arbitrary rule that only AccessExclusiveLocks are
> sent to standbys.  If we held only ShareRowExclusiveLock on the
> master, for example, a Hot Standby backend might read the table while
> it's only been half-copied.) 

That rule is not arbitrary, there is simply no need to send other lock
types since they would not conflict with queries. It's an optimisation.

The lock strength selected on the master doesn't need to be the same as
the lock strength on the standby. You could quite easily generate AEL
lock records to send to standby, without actually taking that lock level
on the master.

Question: what does an unlogged table look like on the standby? Are they
visible at all, does it throw an error, or do they just look empty. We
probably need some docs in the HS section to explain that. I hope the
answer isn't "look empty" since that is effectively data loss for people
spreading queries across multiple nodes.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



Re: making an unlogged table logged

From
Robert Haas
Date:
On Wed, Jan 5, 2011 at 2:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The lock strength selected on the master doesn't need to be the same as
> the lock strength on the standby. You could quite easily generate AEL
> lock records to send to standby, without actually taking that lock level
> on the master.

True.

> Question: what does an unlogged table look like on the standby? Are they
> visible at all, does it throw an error, or do they just look empty. We
> probably need some docs in the HS section to explain that. I hope the
> answer isn't "look empty" since that is effectively data loss for people
> spreading queries across multiple nodes.

Error.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: making an unlogged table logged

From
"A.M."
Date:
On Jan 5, 2011, at 2:37 PM, Robert Haas wrote:

> On Wed, Jan 5, 2011 at 2:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> The lock strength selected on the master doesn't need to be the same as
>> the lock strength on the standby. You could quite easily generate AEL
>> lock records to send to standby, without actually taking that lock level
>> on the master.
>
> True.
>
>> Question: what does an unlogged table look like on the standby? Are they
>> visible at all, does it throw an error, or do they just look empty. We
>> probably need some docs in the HS section to explain that. I hope the
>> answer isn't "look empty" since that is effectively data loss for people
>> spreading queries across multiple nodes.
>
> Error.

Hm- if the unlogged tables are being used as HTTP transient state storage, it would be handy to have that (admittedly
non-essential)data on the standby when it becomes master, even if there are no guarantees surrounding the data beyond
"itlooked like this at some point". Since the tables are not writing WAL, would it be possible to allow for writing to
unloggedtables on the standby to allow for out-of-band syncing? Otherwise, it seems the only alternative is to push
changesto a separate database on the standby machine and then suck the data in when it becomes master. 

Cheers,
M

Re: making an unlogged table logged

From
Josh Berkus
Date:
> Hm- if the unlogged tables are being used as HTTP transient state storage, it would be handy to have that (admittedly
non-essential)data on the standby when it becomes master, even if there are no guarantees surrounding the data beyond
"itlooked like this at some point". Since the tables are not writing WAL, would it be possible to allow for writing to
unloggedtables on the standby to allow for out-of-band syncing? Otherwise, it seems the only alternative is to push
changesto a separate database on the standby machine and then suck the data in when it becomes master.
 

It would be useful (this would allow per-standby buffer tables, for that
matter), but it would also be tremendously difficult.  Seems worthy of a
TODO, though.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: making an unlogged table logged

From
Dimitri Fontaine
Date:
Josh Berkus <josh@agliodbs.com> writes:
> It would be useful (this would allow per-standby buffer tables, for that
> matter), but it would also be tremendously difficult.  Seems worthy of a
> TODO, though.

Don't we have anything covering xid-less tables in the TODO already? The
read-only tables "compressions" of removing large part of headers seems
to come up often enough…

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: making an unlogged table logged

From
Robert Haas
Date:
On Wed, Jan 5, 2011 at 3:58 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Hm- if the unlogged tables are being used as HTTP transient state storage, it would be handy to have that
(admittedlynon-essential) data on the standby when it becomes master, even if there are no guarantees surrounding the
databeyond "it looked like this at some point". Since the tables are not writing WAL, would it be possible to allow for
writingto unlogged tables on the standby to allow for out-of-band syncing? Otherwise, it seems the only alternative is
topush changes to a separate database on the standby machine and then suck the data in when it becomes master. 
>
> It would be useful (this would allow per-standby buffer tables, for that
> matter), but it would also be tremendously difficult.  Seems worthy of a
> TODO, though.

I think that's probably a dead end - just to take one example, if you
don't sync often enough, the standby might have transaction ID
wraparound problems.  Autovacuum on the master will prevent that for
permanent tables, but not for an only-occasionally-updated copy of an
unlogged table.  If what you want is a possibly-slightly-stale copy of
a fast table on the master, there are ways to engineer that using
technology we already have, such as asynchronous replication + Hot
Standby + synchronous_commit=off, or by using Slony.  Admittedly, none
of these solutions are quite perfect, but it would probably be much
easier to improve those solutions or develop something completely new
rather than try to somehow go through the mammoth contortions that
would be needed to make it work with unlogged tables.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: making an unlogged table logged

From
Josh Berkus
Date:
On 1/5/11 3:14 PM, Robert Haas wrote:
> I think that's probably a dead end - just to take one example, if you
> don't sync often enough, the standby might have transaction ID
> wraparound problems.  Autovacuum on the master will prevent that for
> permanent tables, but not for an only-occasionally-updated copy of an
> unlogged table.

I think you're missing Agent M's idea: if you could write to unlogged
tables on the standby, then you could use application code to
periodically synch them.

Mind you, I personally don't find that idea that useful -- unlogged
tables are supposed to be for highly volatile data, after all.  No doubt
M was thinking that in a failover situation, it would be better to have
stale data than none at all.

However, if an unlogged table created on the master could be available
for writing and initially empty on the standbys, it would give each
standby available temporary/buffer tables they could use. That would be
*really* useful.

Also, one of the obvious uses for unlogged tables is materialized views.If unlogged tables don't get replicated, and
can'tbe created on the
 
standby, then it severely limits their utility for this purpose.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: making an unlogged table logged

From
Robert Haas
Date:
On Wed, Jan 5, 2011 at 6:25 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 1/5/11 3:14 PM, Robert Haas wrote:
>> I think that's probably a dead end - just to take one example, if you
>> don't sync often enough, the standby might have transaction ID
>> wraparound problems.  Autovacuum on the master will prevent that for
>> permanent tables, but not for an only-occasionally-updated copy of an
>> unlogged table.
>
> I think you're missing Agent M's idea: if you could write to unlogged
> tables on the standby, then you could use application code to
> periodically synch them.
>
> Mind you, I personally don't find that idea that useful -- unlogged
> tables are supposed to be for highly volatile data, after all.  No doubt
> M was thinking that in a failover situation, it would be better to have
> stale data than none at all.
>
> However, if an unlogged table created on the master could be available
> for writing and initially empty on the standbys, it would give each
> standby available temporary/buffer tables they could use. That would be
> *really* useful.

OIC, sorry.  Well, that could possibly be done, but it'd be tricky.
The obvious problem is that the backend doing the writing would need
an XID, and it'd probably have to ask the master to assign it one...
which is possibly doable, but certainly not ideal (you can't write on
the slave if the master is down, unless you promote it).  Then there's
a whole bunch of follow-on problems, like now the standby needs to run
autovacuum - but only on the unlogged tables, and without being able
to update or rely on pg_database.datfrozenxid.

I think we have to face up to the fact that WAL shipping is an
extremely limiting way to do replication.  It has its perks, certainly
- principally, that it minimizes the amount of extra work that must be
done on the master, which is an extremely valuable consideration for
many applications.  However, it's also got some pretty major
disadvantages, and one of the big ones is that it's not well-suited to
partial replication.  If it were possible to replicate individual
tables, we wouldn't be having this conversation.  You'd just replicate
some tables from the master to the standby and then create a few extra
ones on the standby (perhaps permanent, perhaps unlogged, perhaps
temporary) and call it good.

I think we ought to seriously consider having both physical and
logical replication in core.  Physical replication, which we have
today, is great for what it does, but trying to make it do things that
it's not good at is going to be an awful lot of work, and require an
awful lot of hacks, to make it cover everything that people really
want to be able to do.  Adding logical replication would be a lot of
work but we'd get a lot of collateral benefits.  Imagine that PG had
the ability to spit out INSERT/UPDATE/DELETE statements for designated
tables, as they were modified.  That would provide a framework for
partial replication, replication from PG into other databases, even
multi-master replication if you add some kind of conflict resolution.
Even though this would require essentially building a whole new
system, it's starting to seem to me that it would be simpler than
trying to remove the limitations of our existing system incrementally.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: making an unlogged table logged

From
Dimitri Fontaine
Date:
Robert Haas <robertmhaas@gmail.com> writes:
>   Adding logical replication would be a lot of
> work but we'd get a lot of collateral benefits.  Imagine that PG had

There has been extensive discussions at last pgcon about that (mainly in
the "hallway track", but also in the devroom we had) to work on getting
the common bits between the slony and londiste queuing solutions into
core, to support what you're saying.

In fact we already have the txid datatype and its functions, those came
from slony through PGQ into core.

Jan Wieck started a discussion back then to offer the basics we need in
core as far as queuing goes, here:
 http://archives.postgresql.org/pgsql-hackers/2010-05/msg01209.php

What happened to that effort?
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: making an unlogged table logged

From
Josh Berkus
Date:
> Jan Wieck started a discussion back then to offer the basics we need in
> core as far as queuing goes, here:
> 
>   http://archives.postgresql.org/pgsql-hackers/2010-05/msg01209.php
> 
> What happened to that effort?

Stalled due to lack of manpower, currently.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: making an unlogged table logged

From
Robert Haas
Date:
On Tue, Jan 4, 2011 at 9:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Generally, to do this, it would be necessary to do the following
> things (plus anything I'm forgetting):

It occurs to me that almost exactly this same procedure could be used
to make a *temporary* table into a permanent table.  You'd have to
also change the schema, and there'd be some other adjustments, but
overall it'd be pretty similar.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company