Re: Survey on backing up unlogged tables: help us with PostgreSQL development! - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
Date
Msg-id BBB676BF-6CEA-44E5-B635-32D4EA41426B@solfertje.student.utwente.nl
Whole thread Raw
In response to Survey on backing up unlogged tables: help us with PostgreSQL development!  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Survey on backing up unlogged tables: help us with PostgreSQL development!  (Karsten Hilbert <Karsten.Hilbert@gmx.net>)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!  (Jayadevan M <Jayadevan.Maymala@ibsplc.com>)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!  (Josh Berkus <josh@agliodbs.com>)
List pgsql-general
On 16 Nov 2010, at 23:46, Josh Berkus wrote:

> Folks,
>
> Please help us resolve a discussion on -hackers.
>
> PostgreSQL 9.1 is likely to have, as a feature, the ability to create
> tables which are "unlogged", meaning that they are not added to the
> transaction log, and will be truncated (emptied) on database restart.
> Such tables are intended for highly volatile, but not very valuable,
> data, such as session statues, application logs, etc.
>
> The question is, how would you, as a DBA, expect pg_dump backups to
> treat unlogged tables? Backing them up by default has the potential to
> both cause performance drag on the unlogged table and make your backups
> take longer unless you remember to omit them. Not backing them up by
> default has the drawback that if you forget --include-unlogged switch,
> and shut the database down, any unlogged data is gone. How would you
> *expect* unlogged tables to behave?


From the discussion so far it appears to me that "unlogged" should probably be split into various gradations of
unlogged.There appear to be a number of popular use-cases for such tables, with different requirements, namely: 

1. Session tables
These tables contain data about a user session in some application. It is temporary data at best, it's no problem to
loseit at all. Dumping it makes no sense. 

2. Staging tables
These tables contain data that's being processed and prepared to be entered into other tables in the database. If the
processfails it can usually be restarted, so losing the data is no problem. Here as well, dumping makes little sense. 

3. Logging tables
This is data from application logs. It's not usually mission critical, so losing it isn't a very big deal, but it is
usefuldata of itself. It should in most cases survive a normal backend restart, but if it doesn't survive a backend
crashthat's acceptable. This data should in most cases be included in dumps (or dumped separately?). 

4. Materialized views
Stored results of a query that's likely to have a big footprint on system resources. Losing the data after a backend
crashis acceptable, but it should survive a normal system restart. Since the data can be generated from the contents of
thedatabase, it's not necessary to include it in dumps (but maybe it is convenient in some cases?) 

I think this is the gist of it.
Which leads me to think that people want three knobs to play with: should the table be logged or not? Can it be
truncatedat normal server restart or not? Should it be included in dumps or not? And possibly, should it be fsynced or
not?

Of course, without WAL logs, PITR and WAL-based replication are out of the question for these tables. Also, since the
datacan be lost, they can't be referenced by foreign keys. 

Does that sum it up adequately?


There's one thing that I didn't see mentioned and that I'm not sure fits into the picture here, namely read-only tables
(materializedviews would qualify for those in most cases). 
These tables are written every once in a while by a system user, but it doesn't change in between at all. There's not
muchpoint in giving every user their own session, and it should be possible to assume all index entries point to a live
record(which has consequences for COUNT(), for example). 

Changing that has quite some implications though, I wager...

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4ce4daf610425035851824!



pgsql-general by date:

Previous
From: Allan Kamau
Date:
Subject: Re: ERROR row is too big size 9336, exceeds size 8160 when populating record with tsquery and tsvector fields
Next
From: Karsten Hilbert
Date:
Subject: Re: Survey on backing up unlogged tables: help us with PostgreSQL development!