Re: Unlogged tables, persistent kind - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Unlogged tables, persistent kind
Date
Msg-id BANLkTi=58zymki97XJvNXFG9=UgDkDNo1g@mail.gmail.com
Whole thread Raw
In response to Re: Unlogged tables, persistent kind  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Tue, May 3, 2011 at 8:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The current assessment is that UNLOGGED tables are useful only for
> running a data cache. If the database crashes, then the table is
> truncated and you must refill the cache. If that is the case, then it
> must surely be better to have a cache that is already 99% full, than
> one which starts at empty. There is no damage or loss because parts of
> the cache were missing.

That's not necessarily the case of course. I've written caches before
where a set of records would be present for each object being cached.
I could deal with a whole set not being present but if just some of
the records were missing then I would serve incorrect results. Since
they were created in a single transacion I should be able to rely on
the whole set being present or missing consistently.

That doesn't mean there aren't cases where that's true of course. In
an ideal world the database would guarantee that you couldn't use the
table in this way and fail to get the consistency you expect.
Something like getting an error if you try to modify two rows in an
unlogged table during a single transaction. I'm not sure how to do
that reasonably though.


> For that use case, total loss is catastrophic, not just mildly
> irritating. If you are a Telco, losing a few minutes billing data
> costs much less than having every server have better hardware so it
> can cope with high WAL traffic as well. They don't want to lose the
> data, but its a cost based trade off.

This analysis is dead on. That's precisely how businesses evaluate
this question.


> Consistency is not an issue, you
> are just missing some data. That is normal anyway, since sensor data
> generators (mobile devices etc) frequently fail, are offline, turned
> off etc, so there isn't even a definition of what complete data is
> supposed to look like. The missing data looks exactly like lots of
> people turned their phones off for a few minutes.

I don't think that's true however. Consider if I have a rollup table
that contains aggregated sums of that data. If you lose some of the
records then my aggregates don't add up to the "correct" values any
more.

Or consider if you are counting sensor data like total data
transferred and session count -- and then reporting the average data
transferred per session. And you accidentally lose a bunch of
sessions. Now your data/session report will be reporting false
information.

This is true even if you only lose recently committed rows. But losing
whole blocks means you risk losing random old data which makes it hard
to work around by, say, purging recently aggregated data.

> So my suggestion makes UNLOGGED tables more useful for the use case
> they were designed to address - cached data (AIUI), plus they allow
> another use case that doesn't seem to be well understood, low value
> data in massive data volumes.

There other approaches as well. Foreign data wrappers mean you could
do things like store the low value data in raw text files or other
systems like memcached or Hadoop or whatever. I'm not saying there's
no reason to do something in Postgres but if you're being bitten by
Postgres's block-oriented random access storage it may be a problem
too fundamental to solve without addressing the underlying storage
strategy?


-- 
greg


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: A small step towards more organized beta testing
Next
From: Josh Berkus
Date:
Subject: Re: A small step towards more organized beta testing