Thread: unlogged tables

unlogged tables

From
Andy Colson
Date:
I have played around a little more, and think I found a problem.

If given enough time, an unlogged table makes it to disk, and a restart wont clear the data.  If I insert a bunch of
stuff,commit, and quickly restart PG, it table is cleared.  If I let it sit for a while, it stays.
 

Based on that, I have a pgbench_accounts table (unlogged) that after a restart has data in it.


andy=# select aid, bid, abalance from pgbench_accounts where abalance = 3305;   aid   | bid | abalance
---------+-----+---------- 3790226 |  38 |     3305  274130 |   3 |     3305 2169892 |  22 |     3305  705321 |   8 |
 3305 4463145 |  45 |     3305
 

I dropped the index, and added a new one, then restart PG.  Now it seems the index is empty/unusable.

andy=# select aid, bid, abalance from pgbench_accounts where aid = 3790226; aid | bid | abalance
-----+-----+----------
(0 rows)


andy=# select pg_indexes_size('pgbench_accounts'); pg_indexes_size
-----------------           16384




Lets recreate it:

andy=# drop index bob;
DROP INDEX
Time: 13.829 ms
andy=# create index bob on pgbench_accounts(aid, bid);
CREATE INDEX
Time: 17215.859 ms
andy=# select aid, bid, abalance from pgbench_accounts where aid = 3790226;   aid   | bid | abalance
---------+-----+---------- 3790226 |  38 |     3305
(1 row)

Time: 0.712 ms

andy=# select pg_indexes_size('pgbench_accounts'); pg_indexes_size
-----------------       179716096




I also did kill -9 on all the postgres* processes, while they were busy inserting records, to try to corrupt the
database. But could not seem to.  Setting fsync off also did not give me errors, but I assume because I was using
unloggedtables, and they were all getting cleared anyway, I never saw them.
 

With fsync off and normal tables, I got bad looking things in my logs and vacuum:

LOG:  unexpected pageaddr 1/AB1D6000 in log file 1, segment 187, offset 1925120
WARNING:  relation "access" page 28184 is uninitialized --- fixing
etc...


AND last, I tried to update my git repo and see if the patches still work. They do not.

There was much discussion on the syntax:

create unlogged table vs create temp xxx table vs something else.

There was much discussion on how persistent the tables should be.  And some on backups.

At this point, though, I find myself at an end, not sure what else to do until the dust settles.

Oh, also, I wanted to add:

There is \h help:  +1
but I can find no way of determining the "tempness"/"unloggedness" of a table via \d*
The only way I found was to "pg_dump -s"


I will attempt to link this to the website, and mark it as returned to author.

-Andy


Re: unlogged tables

From
Robert Haas
Date:
On Tue, Nov 30, 2010 at 10:36 PM, Andy Colson <andy@squeakycode.net> wrote:
> Based on that, I have a pgbench_accounts table (unlogged) that after a
> restart has data in it.
>
>
> andy=# select aid, bid, abalance from pgbench_accounts where abalance =
> 3305;
>   aid   | bid | abalance
> ---------+-----+----------
>  3790226 |  38 |     3305
>  274130 |   3 |     3305
>  2169892 |  22 |     3305
>  705321 |   8 |     3305
>  4463145 |  45 |     3305
>
> I dropped the index, and added a new one, then restart PG.  Now it seems the
> index is empty/unusable.
>
> andy=# select aid, bid, abalance from pgbench_accounts where aid = 3790226;
>  aid | bid | abalance
> -----+-----+----------
> (0 rows)
>
>
> andy=# select pg_indexes_size('pgbench_accounts');
>  pg_indexes_size
> -----------------
>           16384
>
> Lets recreate it:
>
> andy=# drop index bob;
> DROP INDEX
> Time: 13.829 ms
> andy=# create index bob on pgbench_accounts(aid, bid);
> CREATE INDEX
> Time: 17215.859 ms
> andy=# select aid, bid, abalance from pgbench_accounts where aid = 3790226;
>   aid   | bid | abalance
> ---------+-----+----------
>  3790226 |  38 |     3305
> (1 row)
>
> Time: 0.712 ms
>
> andy=# select pg_indexes_size('pgbench_accounts');
>  pg_indexes_size
> -----------------
>       179716096

This appears as though you've somehow gotten a normal table connected
to an unlogged index.  That certainly sounds like a bug, but there's
not enough details here to figure out what series of steps I should
perform to recreate the problem.

> AND last, I tried to update my git repo and see if the patches still work.
> They do not.

Updated patches attached.

> Oh, also, I wanted to add:
>
> There is \h help:  +1
> but I can find no way of determining the "tempness"/"unloggedness" of a
> table via \d*

It's clearly displayed in the \d output.

 Unlogged Table "public.test"
 Column |  Type   | Modifiers
--------+---------+-----------
 a      | integer | not null
Indexes:
    "test_pkey" PRIMARY KEY, btree (a)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: unlogged tables

From
Andy Colson
Date:
On 11/30/2010 10:27 PM, Robert Haas wrote:
>
> This appears as though you've somehow gotten a normal table connected
> to an unlogged index.  That certainly sounds like a bug, but there's
> not enough details here to figure out what series of steps I should
> perform to recreate the problem.
>
>> There is \h help:  +1
>> but I can find no way of determining the "tempness"/"unloggedness" of a
>> table via \d*
>
> It's clearly displayed in the \d output.
>
>   Unlogged Table "public.test"
>   Column |  Type   | Modifiers
> --------+---------+-----------
>   a      | integer | not null
> Indexes:
>      "test_pkey" PRIMARY KEY, btree (a)
>
Jeez... Were it a snake it'd a bit me!

Ok.  I blew away my database and programs, re-gitted, re-patched (they work), re-compiled (ok), and re-ran initdb.

I have these non-standard settings:
shared_buffers = 512MB
work_mem = 5MB
checkpoint_segments = 7


1st) I can recreate some warning messages from vacuum:
WARNING:  relation "ulone" page 0 is uninitialized --- fixing
WARNING:  relation "pg_toast_16433" page 0 is uninitialized --- fixing

you create an unlogged table, fill it, restart pg (and it clears the table), then fill it again, and vacuum complains.
Hereis a log:
 

andy=# drop table ulone;
DROP TABLE
Time: 40.532 ms
andy=# create unlogged table ulone(id serial, a integer, b integer, c text);
NOTICE:  CREATE TABLE will create implicit sequence "ulone_id_seq" for serial column "ulone.id"
CREATE TABLE
Time: 151.968 ms
andy=# insert into ulone(a, b, c) select x, 1, 'bbbbbbbbbbb' from generate_series(1, 10000000) x;
INSERT 0 10000000
Time: 80401.505 ms
andy=# \q

$ vacuumdb -az
vacuumdb: vacuuming database "andy"
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"

$ sudo /etc/rc.d/postgresql stop
Stopping PostgreSQL: No directory, logging in with HOME=/

$ sudo /etc/rc.d/postgresql start
Starting PostgreSQL:

$ psql
Timing is on.
psql (9.1devel)
Type "help" for help.

andy=# select count(*) from ulone; count
-------     0
(1 row)

Time: 1.164 ms
andy=# insert into ulone(a, b, c) select x, 1, 'bbbbbbbbbbb' from generate_series(1, 10000000) x;
INSERT 0 10000000
Time: 75312.753 ms
andy=# \q

$ vacuumdb -az
vacuumdb: vacuuming database "andy"
WARNING:  relation "ulone" page 0 is uninitialized --- fixing
WARNING:  relation "pg_toast_16478" page 0 is uninitialized --- fixing
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"



2nd)  I can get the data to stick around after restart.  Though not reliably.  In general:

create and fill a table, vacuum it (not sure if its important, I do it because thats what I'd done in my pgbench
testingwhere I noticed the data stuck around), wait an hour (I usually left it for 12-24 hours, but recreated it with
aslittle as a half hour), then restart pg.   Sometimes the data is there... sometimes not.
 

I also filled my table with more data than memory would hold so it would spill to disk, again, because it recreates my
pgbenchsetup.
 

I'm still working on finding the exact steps, but I wanted to get you #1 above.

-Andy






Re: unlogged tables

From
Andy Colson
Date:
>
>
> 2nd) I can get the data to stick around after restart. Though not reliably. In general:
>
> create and fill a table, vacuum it (not sure if its important, I do it because thats what I'd done in my pgbench
testingwhere I noticed the data stuck around), wait an hour (I usually left it for 12-24 hours, but recreated it with
aslittle as a half hour), then restart pg. Sometimes the data is there... sometimes not.
 
>
> I also filled my table with more data than memory would hold so it would spill to disk, again, because it recreates
mypgbench setup.
 
>
> I'm still working on finding the exact steps, but I wanted to get you #1 above.
>
> -Andy
>
>
>
>
>

Ok, forget the time thing.  Has nothing to do with it.  (Which everyone already assumed I imagine).

Its truncate.

Create unloged table, fill it, truncate it, fill it again, restart pg, and the data will still be there.

-Andy


Re: unlogged tables

From
Alvaro Herrera
Date:
Excerpts from Andy Colson's message of vie dic 03 00:37:17 -0300 2010:

> Ok, forget the time thing.  Has nothing to do with it.  (Which everyone already assumed I imagine).
> 
> Its truncate.
> 
> Create unloged table, fill it, truncate it, fill it again, restart pg, and the data will still be there.

Hmm, presumably the table rewrite thing in truncate is not preserving
the unlogged state (perhaps it's the swap-relfilenode business).  Does
CLUSTER have a similar effect?  What about VACUUM FULL?  If so you know
where the bug is.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: unlogged tables

From
Robert Haas
Date:
On Thu, Dec 2, 2010 at 10:53 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Andy Colson's message of vie dic 03 00:37:17 -0300 2010:
>
>> Ok, forget the time thing.  Has nothing to do with it.  (Which everyone already assumed I imagine).
>>
>> Its truncate.
>>
>> Create unloged table, fill it, truncate it, fill it again, restart pg, and the data will still be there.
>
> Hmm, presumably the table rewrite thing in truncate is not preserving
> the unlogged state (perhaps it's the swap-relfilenode business).

Oh ho.  Right.  Yeah, that case is not handled.  Woopsie.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Robert Haas
Date:
On Tue, Nov 30, 2010 at 10:36 PM, Andy Colson <andy@squeakycode.net> wrote:
> [ review ]

Currently, if you create an unlogged table, restart PG, and vacuum the
table, you'll get this:

rhaas=# vacuum unlogged;
WARNING:  relation "unlogged" page 0 is uninitialized --- fixing
VACUUM

The reason this happens is because the init fork of an unlogged heap
consists of a single empty page, rather than a totally empty file.  I
needed to WAL-log the creation of the init fork, and there's currently
no way to WAL-log the creation of an empty file other than the main
relation fork.  I figured a file with one empty page would be just as
good as a totally empty file, and that way I could piggyback on
XLOG_HEAP_NEWPAGE, which will automatically create the relation fork
if it's not already there.  However, as the above warning message
demonstrates, this was a bit too clever.

One possible fix is to change the XLOG_SMGR_CREATE record to carry a
fork number.  Does that seem reasonable, or would anyone like to
recommend another approach?

I'm also going to go through and change all instances of the word
"unlogged" to "volatile", per previous discussion.  If this seems like
a bad idea to anyone, please object now rather than afterwards.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm also going to go through and change all instances of the word
> "unlogged" to "volatile", per previous discussion.  If this seems like
> a bad idea to anyone, please object now rather than afterwards.

Hm... I thought there had been discussion of a couple of different
flavors of table volatility.  Is it really a good idea to commandeer
the word "volatile" for this particular one?
        regards, tom lane


Re: unlogged tables

From
Robert Haas
Date:
On Tue, Dec 7, 2010 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I'm also going to go through and change all instances of the word
>> "unlogged" to "volatile", per previous discussion.  If this seems like
>> a bad idea to anyone, please object now rather than afterwards.
>
> Hm... I thought there had been discussion of a couple of different
> flavors of table volatility.  Is it really a good idea to commandeer
> the word "volatile" for this particular one?

So far I've come up with the following possible behaviors we could
theoretically implement:

1. Any crash or shutdown truncates the table.
2. Any crash truncates the table, but a clean shutdown does not.
3. A crash truncates the table only if it's been written since the
last checkpoint; a clean shutdown does not truncate it.

The main argument for doing #1 rather than #2 is that we'd rather not
have to include unlogged table data in checkpoints.  Andres Freund
made the argument that we could avoid that anyway, though, by just
doing an fsync() on every unlogged table file in the cluster at
shutdown time.  If that's acceptable, then ISTM there's no benefit to
implementing #1 and we should just go with #2.  If it's not
acceptable, then we have to think about whether and how to have both
of those behaviors.

#3 seems like a lot of work relative to #1 and #2 for a pretty
marginal increase in durability.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Dec 7, 2010 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hm... I thought there had been discussion of a couple of different
>> flavors of table volatility. �Is it really a good idea to commandeer
>> the word "volatile" for this particular one?

> So far I've come up with the following possible behaviors we could
> theoretically implement:

> 1. Any crash or shutdown truncates the table.
> 2. Any crash truncates the table, but a clean shutdown does not.
> 3. A crash truncates the table only if it's been written since the
> last checkpoint; a clean shutdown does not truncate it.

> The main argument for doing #1 rather than #2 is that we'd rather not
> have to include unlogged table data in checkpoints.  Andres Freund
> made the argument that we could avoid that anyway, though, by just
> doing an fsync() on every unlogged table file in the cluster at
> shutdown time.  If that's acceptable, then ISTM there's no benefit to
> implementing #1 and we should just go with #2.  If it's not
> acceptable, then we have to think about whether and how to have both
> of those behaviors.

> #3 seems like a lot of work relative to #1 and #2 for a pretty
> marginal increase in durability.

OK.  I agree that #3 adds a lot of complexity for not much of anything.
If you've got data that's static enough that #3 adds a useful amount
of safety, then you might as well be keeping it in a regular table.

I think a more relevant question is how complicated it'll be to issue
those fsyncs --- do you have a concrete implementation in mind?
        regards, tom lane


Re: unlogged tables

From
Robert Haas
Date:
On Tue, Dec 7, 2010 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Dec 7, 2010 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Hm... I thought there had been discussion of a couple of different
>>> flavors of table volatility.  Is it really a good idea to commandeer
>>> the word "volatile" for this particular one?
>
>> So far I've come up with the following possible behaviors we could
>> theoretically implement:
>
>> 1. Any crash or shutdown truncates the table.
>> 2. Any crash truncates the table, but a clean shutdown does not.
>> 3. A crash truncates the table only if it's been written since the
>> last checkpoint; a clean shutdown does not truncate it.
>
>> The main argument for doing #1 rather than #2 is that we'd rather not
>> have to include unlogged table data in checkpoints.  Andres Freund
>> made the argument that we could avoid that anyway, though, by just
>> doing an fsync() on every unlogged table file in the cluster at
>> shutdown time.  If that's acceptable, then ISTM there's no benefit to
>> implementing #1 and we should just go with #2.  If it's not
>> acceptable, then we have to think about whether and how to have both
>> of those behaviors.
>
>> #3 seems like a lot of work relative to #1 and #2 for a pretty
>> marginal increase in durability.
>
> OK.  I agree that #3 adds a lot of complexity for not much of anything.
> If you've got data that's static enough that #3 adds a useful amount
> of safety, then you might as well be keeping it in a regular table.
>
> I think a more relevant question is how complicated it'll be to issue
> those fsyncs --- do you have a concrete implementation in mind?

It can reuse most of the infrastructure we use for re-initializing
everything after a crash or unclean shutdown.  We just iterate over
every tablepace/dbspace directory and look for files with _init forks.If we find any then we open the main fork files
andfsync() each one. 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Dec 7, 2010 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think a more relevant question is how complicated it'll be to issue
>> those fsyncs --- do you have a concrete implementation in mind?

> It can reuse most of the infrastructure we use for re-initializing
> everything after a crash or unclean shutdown.  We just iterate over
> every tablepace/dbspace directory and look for files with _init forks.
>  If we find any then we open the main fork files and fsync() each one.

I assume you meant "all the other fork files", but OK.  Still, couldn't
that be rather expensive in a large DB?
        regards, tom lane


Re: unlogged tables

From
Cédric Villemain
Date:
2010/12/7 Robert Haas <robertmhaas@gmail.com>:
> On Tue, Dec 7, 2010 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Tue, Dec 7, 2010 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Hm... I thought there had been discussion of a couple of different
>>>> flavors of table volatility.  Is it really a good idea to commandeer
>>>> the word "volatile" for this particular one?
>>
>>> So far I've come up with the following possible behaviors we could
>>> theoretically implement:
>>
>>> 1. Any crash or shutdown truncates the table.
>>> 2. Any crash truncates the table, but a clean shutdown does not.
>>> 3. A crash truncates the table only if it's been written since the
>>> last checkpoint; a clean shutdown does not truncate it.
>>
>>> The main argument for doing #1 rather than #2 is that we'd rather not
>>> have to include unlogged table data in checkpoints.  Andres Freund
>>> made the argument that we could avoid that anyway, though, by just
>>> doing an fsync() on every unlogged table file in the cluster at
>>> shutdown time.  If that's acceptable, then ISTM there's no benefit to
>>> implementing #1 and we should just go with #2.  If it's not
>>> acceptable, then we have to think about whether and how to have both
>>> of those behaviors.
>>
>>> #3 seems like a lot of work relative to #1 and #2 for a pretty
>>> marginal increase in durability.
>>
>> OK.  I agree that #3 adds a lot of complexity for not much of anything.
>> If you've got data that's static enough that #3 adds a useful amount
>> of safety, then you might as well be keeping it in a regular table.
>>
>> I think a more relevant question is how complicated it'll be to issue
>> those fsyncs --- do you have a concrete implementation in mind?
>
> It can reuse most of the infrastructure we use for re-initializing
> everything after a crash or unclean shutdown.  We just iterate over
> every tablepace/dbspace directory and look for files with _init forks.
>  If we find any then we open the main fork files and fsync() each one.

It might make sense to document this behavior : a 'simple' restart
might be way longer than before. I would probably issue a sync(1)
before restarting the server in such situation. (if the
unlogged-volatile tables are large)

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


Re: unlogged tables

From
Robert Haas
Date:
On Tue, Dec 7, 2010 at 5:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Dec 7, 2010 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think a more relevant question is how complicated it'll be to issue
>>> those fsyncs --- do you have a concrete implementation in mind?
>
>> It can reuse most of the infrastructure we use for re-initializing
>> everything after a crash or unclean shutdown.  We just iterate over
>> every tablepace/dbspace directory and look for files with _init forks.
>>  If we find any then we open the main fork files and fsync() each one.
>
> I assume you meant "all the other fork files", but OK.

Oh, good point.

> Still, couldn't
> that be rather expensive in a large DB?

Well, that's why I asked whether it would be acceptable to take that
approach.  I'm guessing the overhead isn't too horrible.  If you
didn't want to take this approach but did want to survive a clean
shutdown, you would need to fsync everything written since the last
checkpoint.  The amount of additional stuff that needs to be written
here is just whatever you failed to write out during previous
checkpoints, which is probably not a ton.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Marti Raudsepp
Date:
A very useful feature for unlogged tables would be the ability to
switch them back to normal tables -- this way you could do bulk
loading into an unlogged table and then turn it into a regular table
using just fsync(), bypassing all the WAL-logging overhead. It seems
this could even be implemented in pg_restore itself.

Which brings me to:

On Tue, Dec 7, 2010 at 20:44, Robert Haas <robertmhaas@gmail.com> wrote:
> 2. Any crash truncates the table, but a clean shutdown does not.

Seems that syncing on a clean shutdown could use the same
infrastructure as the above functionality.

Have you thought about switching unlogged tables back to logged? Are
there any significant obstacles?

Regards,
Marti


Re: unlogged tables

From
Simon Riggs
Date:
On Tue, 2010-12-07 at 13:17 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I'm also going to go through and change all instances of the word
> > "unlogged" to "volatile", per previous discussion.  If this seems like
> > a bad idea to anyone, please object now rather than afterwards.
> 
> Hm... I thought there had been discussion of a couple of different
> flavors of table volatility.  Is it really a good idea to commandeer
> the word "volatile" for this particular one?

Note that DB2 uses the table modifier VOLATILE to indicate a table that
has a widely fluctuating table size, for example a queue table. It's
used as a declarative optimizer hint. So the term has many possible
meanings.

Prefer UNLOGGED or similar descriptive term.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



Re: unlogged tables

From
Robert Haas
Date:
On Wed, Dec 8, 2010 at 9:52 AM, Marti Raudsepp <marti@juffo.org> wrote:
> Have you thought about switching unlogged tables back to logged? Are
> there any significant obstacles?

I think it can be done, and I think it's useful, but I didn't want to
tackle it for version one, because it's not trivial.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Robert Haas
Date:
On Wed, Dec 8, 2010 at 10:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, 2010-12-07 at 13:17 -0500, Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>> > I'm also going to go through and change all instances of the word
>> > "unlogged" to "volatile", per previous discussion.  If this seems like
>> > a bad idea to anyone, please object now rather than afterwards.
>>
>> Hm... I thought there had been discussion of a couple of different
>> flavors of table volatility.  Is it really a good idea to commandeer
>> the word "volatile" for this particular one?
>
> Note that DB2 uses the table modifier VOLATILE to indicate a table that
> has a widely fluctuating table size, for example a queue table. It's
> used as a declarative optimizer hint. So the term has many possible
> meanings.
>
> Prefer UNLOGGED or similar descriptive term.

Hrm.  The previous consensus seemed to be in favor of trying to
describe the behavior (your contents might disappear) rather than the
implementation (we don't WAL-log those contents).  However, the fact
that DB2 uses that word to mean something entirely different is
certainly a bit awkward, so maybe we should reconsider.  Or maybe not.I'm not sure.  Anyone else want to weigh in here?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> Simon Riggs <simon@2ndquadrant.com> wrote:
>> Note that DB2 uses the table modifier VOLATILE to indicate a
>> table that has a widely fluctuating table size, for example a
>> queue table.
> the fact that DB2 uses that word to mean something entirely
> different is certainly a bit awkward
It would be especially awkward should someone port their DB2
database to PostgreSQL without noticing the semantic difference, and
then find their data missing.
> so maybe we should reconsider.
+1 for choosing terminology without known conflicts with other
significant products.
-Kevin


Re: unlogged tables

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Note that DB2 uses the table modifier VOLATILE to indicate a
>>> table that has a widely fluctuating table size, for example a
>>> queue table.
>> the fact that DB2 uses that word to mean something entirely
>> different is certainly a bit awkward
> It would be especially awkward should someone port their DB2
> database to PostgreSQL without noticing the semantic difference, and
> then find their data missing.

Not to mention that DB2 syntax tends to appear in the standard a few
years later.

>> so maybe we should reconsider.
> +1 for choosing terminology without known conflicts with other
> significant products.

Yeah.  Given this info I'm strongly inclined to stick with UNLOGGED.
        regards, tom lane


Re: unlogged tables

From
Chris Browne
Date:
tgl@sss.pgh.pa.us (Tom Lane) writes:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Robert Haas <robertmhaas@gmail.com> wrote:
>>> Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> Note that DB2 uses the table modifier VOLATILE to indicate a
>>>> table that has a widely fluctuating table size, for example a
>>>> queue table.
>  
>>> the fact that DB2 uses that word to mean something entirely
>>> different is certainly a bit awkward
>  
>> It would be especially awkward should someone port their DB2
>> database to PostgreSQL without noticing the semantic difference, and
>> then find their data missing.
>
> Not to mention that DB2 syntax tends to appear in the standard a few
> years later.

And the term "volatile" has well-understood connotations that are
analagous to those in DB2 in the C language and various descendants.
<http://en.wikipedia.org/wiki/Volatile_variable>

I'm not sure "UNLOGGED" is perfect... If "TEMPORARY" weren't already
taken, it would be pretty good.

Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.

FLASH would be an amusing choice.  "PostgreSQL 9.1, now with support for
FLASH!"
-- 
output = ("cbbrowne" "@" "acm.org")
http://linuxdatabases.info/info/internet.html
I've told you for the fifty-thousandth time, stop exaggerating. 


Re: unlogged tables

From
Robert Haas
Date:
On Wed, Dec 8, 2010 at 1:37 PM, Chris Browne <cbbrowne@acm.org> wrote:
> tgl@sss.pgh.pa.us (Tom Lane) writes:
>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>> Robert Haas <robertmhaas@gmail.com> wrote:
>>>> Simon Riggs <simon@2ndquadrant.com> wrote:
>>>>> Note that DB2 uses the table modifier VOLATILE to indicate a
>>>>> table that has a widely fluctuating table size, for example a
>>>>> queue table.
>>
>>>> the fact that DB2 uses that word to mean something entirely
>>>> different is certainly a bit awkward
>>
>>> It would be especially awkward should someone port their DB2
>>> database to PostgreSQL without noticing the semantic difference, and
>>> then find their data missing.
>>
>> Not to mention that DB2 syntax tends to appear in the standard a few
>> years later.
>
> And the term "volatile" has well-understood connotations that are
> analagous to those in DB2 in the C language and various descendants.
> <http://en.wikipedia.org/wiki/Volatile_variable>
>
> I'm not sure "UNLOGGED" is perfect... If "TEMPORARY" weren't already
> taken, it would be pretty good.
>
> Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
>
> FLASH would be an amusing choice.  "PostgreSQL 9.1, now with support for
> FLASH!"

The value of VOLATILE, I felt, was that it's sort of like a volatile
variable in C: it might suddenly change under you.  I think that
TRANSIENT and EPHEMERAL and TENUOUS all imply that the table itself is
either temporary or, in the last case, not very dense, which isn't
really what we want to convey.  I did consider EPHEMERAL myself, but
the more I think about it, the more wrong it sounds.  Even the table's
contents are not really short-lived - they may easily last for months
or years.  You just shouldn't rely on it.  I cracked up this morning
imagining calling this CREATE UNRELIABLE TABLE, but I'm starting to
think UNLOGGED is as well as we're going to do.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Kineticode Billing
Date:
On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:

> Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.

EVANESCENT.

David



Re: unlogged tables

From
Cédric Villemain
Date:
2010/12/8 Kineticode Billing <david@kineticode.com>:
> On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
>
>> Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
>
> EVANESCENT.

UNSAFE ?

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


Re: unlogged tables

From
"David E. Wheeler"
Date:
On Dec 10, 2010, at 4:34 PM, Cédric Villemain wrote:

>>> Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
>>
>> EVANESCENT.
>
> UNSAFE ?

LOLZ.

David



Re: unlogged tables

From
Robert Haas
Date:
On Wed, Dec 8, 2010 at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah.  Given this info I'm strongly inclined to stick with UNLOGGED.

OK.  Here's an updated patch set with various fixes:

- Per musings from Tom, I've revisited and revised the logic that
cross-checks relpersistence when you try to create foreign keys.  The
old logic was buggy and wrong.
- I fixed the bug Andy Colson found, whereby any operation that
rewrote an unlogged table would cause its indexes to lose their _init
forks, leading to bizarre behavior.
- I fixed another infelicity Andy Colson noted, whereby vacuuming an
unlogged heap after restart would warn about a zeroed page, by
extending XLOG_SMGR_CREATE with a fork number.
- I added support for hash indexes on unlogged tables (gin and gist
are still not yet supported).

I think the first patch (relpersistence-v4.patch) is ready to commit,
and the third patch to allow synchronous commits to become
asynchronous when it doesn't matter (relax-sync-commit-v1.patch)
doesn't seem to be changing much either, although I would appreciate
it if someone with more expertise than I have with our write-ahead
logging system would give it a quick once-over.

The main patch (unlogged-tables-v4.patch) needs more thought.  Right
now, unlogged buffers are checkpointed, which I want to get rid of.
Andres Freund suggested we could get by with this and still survive a
clean shutdown if we fsync() every unlogged relation in the cluster
before shutting down, but I'm concerned about the case where one of
the fsync() calls fails.  That's presumably already a problem with
checkpoints generally, and I haven't traced through the logic to see
exactly what happens, but I guess this would need similar treatment.
In a non-shutdown checkpoint, the checkpoint can just fail.  In a
shutdown checkpoint, we presumably can't just refuse to exit, but it
shouldn't look like a clean shutdown...

As I was working on the hash index support, it occurred to me that at
some point in the future, we might want to allow an unlogged index on
a permanent table.  With the current patch, an index is unlogged if
and only if the corresponding table is unlogged, and both the table
and the index are reset to empty on restart.  But we could have a
slightly different flavor of index that, instead of being reset to
empty, just gets marked invalid, perhaps by truncating the file to
zero-length (and adding some code to treat that as something other
than a hard error).  Perhaps you could even arrange for autovacuum to
kick off an automatic rebuild, though that might need to be
configurable since some people might not want an index rebuild kicking
off immediately after a crash/failover.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: unlogged tables

From
Jeff Janes
Date:
On Wed, Dec 8, 2010 at 6:52 AM, Marti Raudsepp <marti@juffo.org> wrote:
> A very useful feature for unlogged tables would be the ability to
> switch them back to normal tables -- this way you could do bulk
> loading into an unlogged table and then turn it into a regular table
> using just fsync(), bypassing all the WAL-logging overhead.

If archive_mode is off, then you can often find a way to bypass
WAL-logging during bulk loading anyway.

If archive_mode is on, then I don't see how this can work without
massive changes.

One possibility would be to create a mechanism to inject entire large
files into the archive log stream. (Such a facility might be useful
for other purposes too).  So the transaction that changes the mode
from unlogged to logged would have to take an exclusive lock on the
unlogged table and make sure shared buffers for it are written out,
then it would just copy the backing file(s) for that table into the
archive steam with a special header that tells the recovery process
"Set these aside, I'll explain later". Once that is done, it would
just have to ensure the WAL segment it is currently on will come after
the injected files in the archive stream, and write a WAL record
explaining where those bulk files it sent early are supposed to go.

I don't know, sound like a lot of work and lot of pitfalls.

Cheers,

Jeff


Re: unlogged tables

From
Jeff Janes
Date:
On Fri, Dec 10, 2010 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> As I was working on the hash index support, it occurred to me that at
> some point in the future, we might want to allow an unlogged index on
> a permanent table.

That is the feature I would be most excited about.

> With the current patch, an index is unlogged if
> and only if the corresponding table is unlogged, and both the table
> and the index are reset to empty on restart.  But we could have a
> slightly different flavor of index that, instead of being reset to
> empty, just gets marked invalid, perhaps by truncating the file to
> zero-length (and adding some code to treat that as something other
> than a hard error).  Perhaps you could even arrange for autovacuum to
> kick off an automatic rebuild,

Or just have rebuilding the index as part of crash recovery.  I
wouldn't use the feature anyway on indexes that would take more than a
few seconds to rebuild, And wouldn't want to advertise the database as
being available when it is essentially crippled from missing indexes.
I'd rather bite the bullet up front.

I would think of it is as declaring that, instead of making the index
recoverable via WAL logging and replay, instead make it recoverable by
rebuilding.  So in that way it is quit unlike unlogged tables, in that
we are not risking any data, just giving the database a hint about
what the most expeditious way to maintain the index might be.  Well,
more of an order than a hint, I guess.

Cheers,

Jeff


Re: unlogged tables

From
Robert Haas
Date:
On Sat, Dec 11, 2010 at 2:53 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Wed, Dec 8, 2010 at 6:52 AM, Marti Raudsepp <marti@juffo.org> wrote:
>> A very useful feature for unlogged tables would be the ability to
>> switch them back to normal tables -- this way you could do bulk
>> loading into an unlogged table and then turn it into a regular table
>> using just fsync(), bypassing all the WAL-logging overhead.
>
> If archive_mode is off, then you can often find a way to bypass
> WAL-logging during bulk loading anyway.
>
> If archive_mode is on, then I don't see how this can work without
> massive changes.

Well,  you'd need to work your way through the heap and all of its
indices and XLOG every page.  And you've got to do that in a way
that's transaction-safe, and I don't have a design in mind for that
off the top of my head.  But I think "massive changes" is probably an
overstatement.  We can already handle ALTER TABLE operations that
involve a full relation rewrite, and that already does the
full-table-XLOG thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Robert Haas
Date:
On Sat, Dec 11, 2010 at 3:18 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Fri, Dec 10, 2010 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> As I was working on the hash index support, it occurred to me that at
>> some point in the future, we might want to allow an unlogged index on
>> a permanent table.
>
> That is the feature I would be most excited about.
>
>> With the current patch, an index is unlogged if
>> and only if the corresponding table is unlogged, and both the table
>> and the index are reset to empty on restart.  But we could have a
>> slightly different flavor of index that, instead of being reset to
>> empty, just gets marked invalid, perhaps by truncating the file to
>> zero-length (and adding some code to treat that as something other
>> than a hard error).  Perhaps you could even arrange for autovacuum to
>> kick off an automatic rebuild,
>
> Or just have rebuilding the index as part of crash recovery.  I
> wouldn't use the feature anyway on indexes that would take more than a
> few seconds to rebuild, And wouldn't want to advertise the database as
> being available when it is essentially crippled from missing indexes.
> I'd rather bite the bullet up front.

I don't think you can rebuild the indexes during crash recovery; I
believe you need to be bound to the database that contains the index,
and, as we've been over before, binding to a database is irrevocable,
so the startup process can't bind to the database, rebuild the index,
and then unbind.  It would need to signal the postmaster to fire up
other backends to do this work, and at that point I think you may as
well piggyback on autovacuum rather than designing a similar mechanism
from scratch.

Also, while YOU might use such a feature only for indexes that can be
rebuilt in a few seconds, I strongly suspect that other people might
use it in other ways.  In particular, it seems that it would be
possibly sensible to use a feature like this for an index that's only
used for reporting queries.  If the database crashes, we'll still have
our primary key so we can continue operating, but we'll need to
reindex before running the nightly reports.

> I would think of it is as declaring that, instead of making the index
> recoverable via WAL logging and replay, instead make it recoverable by
> rebuilding.  So in that way it is quit unlike unlogged tables, in that
> we are not risking any data, just giving the database a hint about
> what the most expeditious way to maintain the index might be.  Well,
> more of an order than a hint, I guess.

I think it's six of one, half a dozen of the other.  An index by its
nature only contains data that is duplicated in a table, so by
definition loss of an index isn't risking any data.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Jeff Janes
Date:
On Fri, Dec 10, 2010 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> I think the first patch (relpersistence-v4.patch) is ready to commit,
> and the third patch to allow synchronous commits to become
> asynchronous when it doesn't matter (relax-sync-commit-v1.patch)
> doesn't seem to be changing much either, although I would appreciate
> it if someone with more expertise than I have with our write-ahead
> logging system would give it a quick once-over.

I don't understand what the point of the relax-sync-commit patch is.

If XactLastRecEnd.xrecoff == 0, then calling
XLogFlush(XactLastRecEnd) is pretty much a null operation anyway
because it will short-circuit at the early statement:

if (XLByteLE(record, LogwrtResult.Flush)) return

Or at least it had better return at that point, or we might have a
serious problem.  If XactLastRecEnd.xrecoff == 0 then the only way to
keep going is if XactLastRecEnd.xlogid is ahead of
LogwrtResult.Flush.xlogid.

I guess that could happen legitimately if the logs have recently
rolled over the 4GB boundary, and XactLastRecEnd is aware of this
while LogwrtResult is not yet aware of it.  I don't know if that is a
possible state of affairs.  If it is, then the result would be that on
very rare occasion your patch removes a spurious, but not harmful
other than performance,  fsync.

If somehow XactLastRecEnd gets a falsely advanced value of xlogid,
then calling XLogFlush with it would cause a PANIC "xlog write request
%X/%X is past end of log %X/%X".  So unless people have been seeing
this, that must not be able to happen.  And looking at the only places
XactLastRecEnd.xlogid get set, I don't see how it could happen.


So maybe in your patch:

if ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0)

should be

if (wrote_xlog && (XactSyncCommit || forceSyncCommit || nrels > 0) )

It seems like on general principles we should not be passing to
XLogFlush a structure which is by definition invalid.




But even if XLogFlush is going to return immediately, that doesn't
negate the harm caused by commit_delay doing its thing needlessly.
Perhaps that was the original motivation for your patch.

Cheers,

Jeff


Re: unlogged tables

From
Robert Haas
Date:
On Sat, Dec 11, 2010 at 9:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Fri, Dec 10, 2010 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> I think the first patch (relpersistence-v4.patch) is ready to commit,
>> and the third patch to allow synchronous commits to become
>> asynchronous when it doesn't matter (relax-sync-commit-v1.patch)
>> doesn't seem to be changing much either, although I would appreciate
>> it if someone with more expertise than I have with our write-ahead
>> logging system would give it a quick once-over.
>
> I don't understand what the point of the relax-sync-commit patch is.

Suppose we begin a transaction, write a bunch of data to a temporary
table, and commit.  Suppose further that synchronous_commit = off.  At
transaction commit time, we haven't written any XLOG records yet, but
we do have an XID assigned because of the writes to the temporary
tables.  So we'll issue a commit record.  Without this patch, since
synchronous_commit = off, we'll force that commit record to disk
before acknowledging the commit to the user.  However, that's not
really necessary because if we crash after acknowledging the commit to
the user, the temporary tables will disappear anyway, and our XID
doesn't exist on disk any place else - thus, whether the commit makes
it to disk before the crash or not will be immaterial on restart.

If you have a bunch of transactions that make write to temporary (or
unlogged) tables but not to any permanent tables, this makes it muuuch
faster.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Dimitri Fontaine
Date:
Cédric Villemain <cedric.villemain.debian@gmail.com> writes:
> 2010/12/8 Kineticode Billing <david@kineticode.com>:
>> On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
>>
>>> Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.

I kind of like TRANSIENT, but that's only because it's a property I've
been working with in some other systems
 http://www.erlang.org/doc/design_principles/sup_princ.html
   Restart = permanent | transient | temporary
   Restart defines when a terminated child process should be restarted.
   A permanent child process is always restarted.   A temporary child process is never restarted.   A transient child
processis restarted only if it terminates abnormally, i.e. with another exit reason than normal.     
>> EVANESCENT.
>
> UNSAFE ?

What about NOT PERSISTENT ? Then we would have two flavours of them,
that's NOT PERSISTENT ON RESTART TRUNCATE or ON RESTART FLUSH, I guess?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: unlogged tables

From
Rob Wultsch
Date:
On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:
> 2010/12/8 Kineticode Billing <david@kineticode.com>:
>> On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
>>
>>> Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
>>
>> EVANESCENT.
>
> UNSAFE ?
>
<troll>
MyISAM
</troll>



--
Rob Wultsch
wultsch@gmail.com


Re: unlogged tables

From
Robert Haas
Date:
On Sun, Dec 12, 2010 at 9:31 PM, Rob Wultsch <wultsch@gmail.com> wrote:
> On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain
> <cedric.villemain.debian@gmail.com> wrote:
>> 2010/12/8 Kineticode Billing <david@kineticode.com>:
>>> On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
>>>
>>>> Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
>>>
>>> EVANESCENT.
>>
>> UNSAFE ?
>>
> <troll>
> MyISAM
> </troll>

Heh.  But that would be corrupt-on-crash, not truncate-on-crash, no?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Rob Wultsch
Date:
On Sun, Dec 12, 2010 at 7:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Dec 12, 2010 at 9:31 PM, Rob Wultsch <wultsch@gmail.com> wrote:
>> On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain
>> <cedric.villemain.debian@gmail.com> wrote:
>>> 2010/12/8 Kineticode Billing <david@kineticode.com>:
>>>> On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
>>>>
>>>>> Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
>>>>
>>>> EVANESCENT.
>>>
>>> UNSAFE ?
>>>
>> <troll>
>> MyISAM
>> </troll>
>
> Heh.  But that would be corrupt-on-crash, not truncate-on-crash, no?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
<troll>
Yep. Truncate-on-shutdown MySQL options are the MEMORY and PBXT (using
the memory resident option).
</troll>

I like TRANSIENT but wonder if MEMORY might be more easily understood by users.
--
Rob Wultsch
wultsch@gmail.com


Re: unlogged tables

From
Robert Haas
Date:
On Fri, Dec 10, 2010 at 11:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think the first patch (relpersistence-v4.patch) is ready to commit,

So I've now committed it.

> and the third patch to allow synchronous commits to become
> asynchronous when it doesn't matter (relax-sync-commit-v1.patch)

Jeff Janes reviewed this, which was good, but he missed a key bit on
which I've now set him straight.  So an updated review of this would
be much appreciated.

> doesn't seem to be changing much either, although I would appreciate
> it if someone with more expertise than I have with our write-ahead
> logging system would give it a quick once-over.
>
> The main patch (unlogged-tables-v4.patch) needs more thought.  Right
> now, unlogged buffers are checkpointed, which I want to get rid of.
> Andres Freund suggested we could get by with this and still survive a
> clean shutdown if we fsync() every unlogged relation in the cluster
> before shutting down, but I'm concerned about the case where one of
> the fsync() calls fails.  That's presumably already a problem with
> checkpoints generally, and I haven't traced through the logic to see
> exactly what happens, but I guess this would need similar treatment.
> In a non-shutdown checkpoint, the checkpoint can just fail.  In a
> shutdown checkpoint, we presumably can't just refuse to exit, but it
> shouldn't look like a clean shutdown...

Any input on this point?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Robert Haas
Date:
Here's an attempt to summarize the remaining issues with this patch
that I know about.  I may have forgotten something, so please mention
it if you notice something missing.

1. pg_dump needs an option to control whether unlogged tables are
dumped.  --no-unlogged-tables seems like the obvious choice, assuming
we want the default to be to dump them, which seems like the safest
option.

2. storage.sgml likely needs to be updated.  We have a section on the
free space map and one on the visibility map, so I suppose the logical
thing to do is add a similar section on the initialization fork.

3. It's unnecessary to include unlogged relation buffers in
non-shutdown checkpoints.  I've recently realized that this is true
independently of whether or not we want unlogged tables to survive a
clean shutdown.  Whether or not we can survive a clean shutdown is a
function of whether we register dirty segments when buffers are
written, which is independent of whether we choose to write such
buffers as part of a checkpoint.  And indeed, unless we're about to
shut down, there's no reason to do so, because the whole point of
checkpointing is to advance the redo pointer, and that's irrelevant
for unlogged tables.

4. It's arguably unnecessary to register dirty segments for unlogged
relations.  Given #3, this now seems a little less important.  If the
unlogged relation is hot and fits in shared_buffers, then omitting it
from the checkpoint process means we'll never write out those dirty
buffers, so the fact that they'd cause fsyncs if we did write them
doesn't matter.  However, it's still not totally irrelevant, because a
relation that fits in the OS buffer cache but not in shared buffers
will probably generate fsyncs at every checkpoint.  (And on the third
hand, the OS may decide to write the dirty data anyway, especially if
it's a largish percentage of RAM.)  There are a couple of possible
ways of dealing with this:

4A. The solution Andres proposed - Iterate through all unlogged
relations at shutdown time and fsyncing them all.  Possibly
complicated to handle fsync failures.
4B. Another idea I just thought of - register dirty segments as
normal, but teach the background writer to accumulate them in a
separate queue that is only flushed at shutdown, or when it reaches
some maximum size, rather than at every checkpoint.
4C. Decree that this is an area for future enhancement and forget
about it for now.  I am leaning toward this option.

5. Make it work with GIST indexes.  Per discussion on the other
thread, the current proposal seems to be: (a) add a BM_FLUSH_XLOG bit;
when clear, don't flush XLOG; this then allows pages to have fake
LSNs; (b) add an XLogRecPtr structure in shared memory, protected by a
spinlock; (c) use the structure described in (b) to generate fake LSNs
every time an operation is performed on an unlogged GIST index.  I am
not clear on how we make this work across shutdowns - it seems you'd
need to save this structure somewhere during a clean shutdown (where?)
and restore it on startup, unless we go back to truncating even on a
clean shutdown.

6. Make it work with GIN indexes.  I haven't looked at what's involved here yet.

Advice, comments, feedback appreciated...  I'd like to put this one to bed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
"Kevin Grittner"
Date:
Robert Haas  wrote:
> If there's any third-party code out there that is checking
> rd_istemp, it likely also needs to be revised to check whether
> WAL-logging is needed, not whether the relation is temp. The way
> I've coded it, such code will fail to compile, and can be very
> easily fixed by substituting a call to RelationNeedsWAL() or
> RelationUsesLocalBuffers() or RelationUsesTempNamespace(),
> depending on which property the caller actually cares about.
Hmm...  This broke the SSI patch, which was using rd_istemp to omit
conflict checking where it was set to true.  The property I care
about is whether tuples in one backend can be read by an transaction
in a different backend, which I assumed would not be true for
temporary tables.  Which of the above would be appropriate for that
use?
-Kevin


Re: unlogged tables

From
Robert Haas
Date:
On Sat, Dec 18, 2010 at 12:27 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas  wrote:
>
>> If there's any third-party code out there that is checking
>> rd_istemp, it likely also needs to be revised to check whether
>> WAL-logging is needed, not whether the relation is temp. The way
>> I've coded it, such code will fail to compile, and can be very
>> easily fixed by substituting a call to RelationNeedsWAL() or
>> RelationUsesLocalBuffers() or RelationUsesTempNamespace(),
>> depending on which property the caller actually cares about.
>
> Hmm...  This broke the SSI patch, which was using rd_istemp to omit
> conflict checking where it was set to true.  The property I care
> about is whether tuples in one backend can be read by an transaction
> in a different backend, which I assumed would not be true for
> temporary tables.  Which of the above would be appropriate for that
> use?

RelationUsesLocalBuffers().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Alvaro Herrera
Date:
Excerpts from Robert Haas's message of sáb dic 18 02:21:41 -0300 2010:
> Here's an attempt to summarize the remaining issues with this patch
> that I know about.  I may have forgotten something, so please mention
> it if you notice something missing.
> 
> 1. pg_dump needs an option to control whether unlogged tables are
> dumped.  --no-unlogged-tables seems like the obvious choice, assuming
> we want the default to be to dump them, which seems like the safest
> option.

If there are valid use cases for some unlogged tables being dumped and
some others not, would it make sense to be able to specify a pattern of
tables to be dumped or skipped?

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: unlogged tables

From
Robert Haas
Date:
On Mon, Dec 20, 2010 at 9:05 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Robert Haas's message of sáb dic 18 02:21:41 -0300 2010:
>> Here's an attempt to summarize the remaining issues with this patch
>> that I know about.  I may have forgotten something, so please mention
>> it if you notice something missing.
>>
>> 1. pg_dump needs an option to control whether unlogged tables are
>> dumped.  --no-unlogged-tables seems like the obvious choice, assuming
>> we want the default to be to dump them, which seems like the safest
>> option.
>
> If there are valid use cases for some unlogged tables being dumped and
> some others not, would it make sense to be able to specify a pattern of
> tables to be dumped or skipped?

Well, if you want to dump a subset of the tables in your database, you
can already do that.  I think that adding a pattern to
--no-unlogged-tables (or whatever we end up calling it) would be an
unnecessary frammish.  There's no particular reason to think that
unlogged tables are going to be so widely used or that concerns about
which ones are going to be so widespread that we should do something
here when we don't even have much simpler things like --function,
which IMHO would extremely useful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unlogged tables

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Robert Haas's message of sáb dic 18 02:21:41 -0300 2010:
>> 1. pg_dump needs an option to control whether unlogged tables are
>> dumped.  --no-unlogged-tables seems like the obvious choice, assuming
>> we want the default to be to dump them, which seems like the safest
>> option.

> If there are valid use cases for some unlogged tables being dumped and
> some others not, would it make sense to be able to specify a pattern of
> tables to be dumped or skipped?

Presumably you could still do that with the regular --tables name
pattern switch.  I don't see a reason for unlogged tables to respond to
a different name pattern.
        regards, tom lane