Thread: Unlogged tables, persistent kind

Unlogged tables, persistent kind

From
Simon Riggs
Date:
Unlogged tables are a good new feature.

I noticed Bruce had mentioned they were the equivalent of NoSQL, which
I don't really accept. I guess it depends upon whether you mean NoSQL
for caches (e.g. memcached) or NoSQL for useful datastores (e.g.
Mongo). It seems worth discussin now before we get too far into the
marketing hype around Beta.

If you don't log changes to tables you have two choices if we crash
1) truncate the table and any indexes
2) rebuild any indexes damaged by the crash

Currently, we do (1). That certainly has its place but most data
stores don't do this if they crash, since it would lead to data loss.
Not just a couple of rows either - serious, major data loss if you put
the wrong kind of data in it. We even delete data that has been safely
on disk for weeks, months, which IMHO some people could easily get
confused about.

In the future, I would like to work on (2), which preserves as much
data as possible, while recognising indexes may be damaged. I don't
really have any name for this, since the current naming seems to
assume there is only one kind of unlogged table.

My implementation path for that would be to add a crash_number onto
pg_control and pg_index. Any index marked as "unlogged, persistent"
would only be usable if it's crash number is the same as current
system crash number.

REINDEX would update the index crash number to current value. That
also allows us to imagine a "repair index" command in the future as
well.

Heap blocks would be zeroed if they were found to be damaged, following a crash.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Unlogged tables, persistent kind

From
Greg Stark
Date:
On Sun, Apr 24, 2011 at 6:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> My implementation path for that would be to add a crash_number onto
> pg_control and pg_index. Any index marked as "unlogged, persistent"
> would only be usable if it's crash number is the same as current
> system crash number.
>
> REINDEX would update the index crash number to current value. That
> also allows us to imagine a "repair index" command in the future as
> well.

This seems useful for non-crash-safe indexes in general.

> Heap blocks would be zeroed if they were found to be damaged, following a crash.
>

How do you propose to detect that? Until we solve the whole checksum
story I don't think we have a reliable way to detect bad pages. And in
some cases where do detect them we would detect them by crashing.

-- 
greg


Re: Unlogged tables, persistent kind

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> If you don't log changes to tables you have two choices if we crash
> 1) truncate the table and any indexes
> 2) rebuild any indexes damaged by the crash

No, you have only one choice, and that's (1), because there's no
guarantee that what's in the table file is meaningful.

> Heap blocks would be zeroed if they were found to be damaged, following a crash.

This is sheerest fantasy.  And even if you could implement it, what sort
of feature would you be offering?  "Your data is preserved except when
it isn't"?  People who want that can go use mysql.
        regards, tom lane


Re: Unlogged tables, persistent kind

From
Simon Riggs
Date:
On Sun, Apr 24, 2011 at 10:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> If you don't log changes to tables you have two choices if we crash
>> 1) truncate the table and any indexes
>> 2) rebuild any indexes damaged by the crash
>
> No, you have only one choice, and that's (1), because there's no
> guarantee that what's in the table file is meaningful.

>> Heap blocks would be zeroed if they were found to be damaged, following a crash.
>
> This is sheerest fantasy.  And even if you could implement it, what sort
> of feature would you be offering?  "Your data is preserved except when
> it isn't"?  People who want that can go use mysql.

AFAIUI, a great many people do.

I am proposing a non-default mode, requiring explicit activation by
user which preserves as much data as possible. I am fully aware that
what is proposed is not an optimisation, but a downgrading of normal
resilience in exchange for some data loss in the event of a crash.

Yes, many other systems support this and people are becoming persuaded
that such risk/reward choices make sense for them.
I see no reason not to provide an option to do this, so people can
make informed choices.

For large sets of low value data, it makes sense. Deleting all data,
just simply because some of it might be damaged, is not the only
option. IMHO deleting all the data is a surprising option that will
cause many people to curse us. I don't see preserving some of the data
as being worse.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Unlogged tables, persistent kind

From
Christopher Browne
Date:
On Sun, Apr 24, 2011 at 6:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> For large sets of low value data, it makes sense. Deleting all data,
> just simply because some of it might be damaged, is not the only
> option. IMHO deleting all the data is a surprising option that will
> cause many people to curse us. I don't see preserving some of the data
> as being worse.

For the "cache table" case, it is *certainly* reasonable to delete
everything upon discovering the risk that some might be damaged.

I have seen cases "in production" where the fix to 'oh, looks like
something's corrupted' is indeed "truncate the cache," where that has
become a Standard Operating Procedure.

Sure, to have Postgres do this is a bit heavy-handed, but it's not as
if this isn't going to be heavily documented with warnings like:
"Contains live plague bacteria.
Beware the Rabid Hippopotami.  May cause bleeding at the eyes.  If
your database crashes, this table WILL get truncated at startup time."If the docs are short on warnings, that should
probablyget
 
rectified.  (I'm allowed to be volunteered to do so :-).)

I'd actually find it unsurprising for such tables to get truncated
even on a clean restart; I'd favor that being an option, as along as
make it thus were generally unproblematic.  If the application using
such a table can't cope with that logic, better to induce an
understanding of that sooner rather than later ;-).

It seems like a losing battle to try terribly hard to keep the data
around when, by marking it unlogged, the data definition specifically
warned that this was risky.

I'd not go so far as to suggest having autovac TRUNCATE such tables at
random intervals, but that's a pathology that's not completely
incompatible with the DDL declaration :-)
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: Unlogged tables, persistent kind

From
Simon Riggs
Date:
On Sun, Apr 24, 2011 at 7:41 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Sun, Apr 24, 2011 at 6:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> My implementation path for that would be to add a crash_number onto
>> pg_control and pg_index. Any index marked as "unlogged, persistent"
>> would only be usable if it's crash number is the same as current
>> system crash number.
>>
>> REINDEX would update the index crash number to current value. That
>> also allows us to imagine a "repair index" command in the future as
>> well.
>
> This seems useful for non-crash-safe indexes in general.
>
>> Heap blocks would be zeroed if they were found to be damaged, following a crash.
>>
>
> How do you propose to detect that? Until we solve the whole checksum
> story I don't think we have a reliable way to detect bad pages. And in
> some cases where do detect them we would detect them by crashing.


That should be changed.


--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Unlogged tables, persistent kind

From
Greg Stark
Date:
On Sun, Apr 24, 2011 at 11:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> IMHO deleting all the data is a surprising option that will
> cause many people to curse us. I don't see preserving some of the data
> as being worse.

What possible damage to you want to recover from?

Without WAL logging after a software crash it's possible for update
chains to be broken, for multiple copies of the same tuple to be
visible, for some tuples to disappear but not others, etc.

And without checksums after a hardware crash it'll be possible for
pages to be torn resulting in tuple pointers that land in the middle
of nowhere or tuples that start off fine but are half overwritten with
unrelated garbage.

-- 
greg


Re: Unlogged tables, persistent kind

From
Robert Haas
Date:
On Apr 24, 2011, at 1:22 PM, Simon Riggs <simon@2ndQuadrant.com> wrote:
> Unlogged tables are a good new feature.

Thanks.

> I noticed Bruce had mentioned they were the equivalent of NoSQL, which
> I don't really accept.

Me neither. I thought that was poorly said.

> Heap blocks would be zeroed if they were found to be damaged, following a crash.

The problem is not so much the blocks that are damaged (e.g. half-written, torn page) but the ones that were never
writtenat all. For example, read page A, read page B, update tuple on page A putting new version on page B, write one
butnot both of A and B out to the O/S, crash.  Everything on disk is a valid page, but they are not coherent taken as a
whole. It's normally XLOG replay that fixes this type of situation... 

I thought about this problem a bit and I think you could perhaps deal with it by having some sort of partially logged
table,where we would XLOG just enough to know which blocks or relations had been modified and only nuke enough data to
becertain of being safe. But it isn't clear that there is much use case for this, especially because I think it would
giveup nearly all the performance benefit. 

I do think it might be useful to have an unlogged index on a logged table, somehow frobnicated so that on a crash the
indexis known invalid and not used until a REINDEX is performed. 

...Robert

Re: Unlogged tables, persistent kind

From
Leonardo Francalanci
Date:
The only data we can't rebuild it's the heap. So what about an option for UNlogged indexes on a LOGged table? It would
alwayspreserve data, and it would 'only' cost a rebuilding of the indexes in case of an unclean shutdown. I think it
wouldgive a boost in performance for all those cases where the IO (especially random IO) is caused by the indexes, and
itdoesn't look too complicated (but maybe I'm missing something). 

I proposed the unlogged to logged patch (BTW has anyone given a look at it?) because we partition data based on a
timestamp,and we can risk loosing the last N minutes of data, but after N minutes we want to know data will always be
there,so we would like to set a partition table to 'logged'.  

Leonardo


Re: Unlogged tables, persistent kind

From
Simon Riggs
Date:
On Mon, Apr 25, 2011 at 8:36 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
> The only data we can't rebuild it's the heap. So what about an option for UNlogged indexes on a LOGged table? It
wouldalways preserve data, and it would 'only' cost a rebuilding of the indexes in case of an unclean shutdown. I think
itwould give a boost in performance for all those cases where the IO (especially random IO) is caused by the indexes,
andit doesn't look too complicated (but maybe I'm missing something). 
>
> I proposed the unlogged to logged patch (BTW has anyone given a look at it?) because we partition data based on a
timestamp,and we can risk loosing the last N minutes of data, but after N minutes we want to know data will always be
there,so we would like to set a partition table to 'logged'. 

I agree that unlogged indexes on a logged heap are better for
resilience and are likely to be the best first step.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Unlogged tables, persistent kind

From
Simon Riggs
Date:
On Mon, Apr 25, 2011 at 8:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Apr 24, 2011, at 1:22 PM, Simon Riggs <simon@2ndQuadrant.com> wrote:
>> Unlogged tables are a good new feature.
>
> Thanks.
>
>> I noticed Bruce had mentioned they were the equivalent of NoSQL, which
>> I don't really accept.
>
> Me neither. I thought that was poorly said.
>
>> Heap blocks would be zeroed if they were found to be damaged, following a crash.
>
> The problem is not so much the blocks that are damaged (e.g. half-written, torn page) but the ones that were never
writtenat all. For example, read page A, read page B, update tuple on page A putting new version on page B, write one
butnot both of A and B out to the O/S, crash.  Everything on disk is a valid page, but they are not coherent taken as a
whole. It's normally XLOG replay that fixes this type of situation... 

Not really sure it matters what the cause of data loss is, does it?
The zeroing of the blocks definitely causes data loss but the
intention is to bring the table back to a consistent physical state,
not to in any way repair the data loss.

Repeating my words above, this proposed option trades potential minor
data loss for performance.

The amount of data loss on a big table will be <1% of the data loss
caused by truncating the whole table.

This is important on big tables where reloading from a backup might
take a long time.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Unlogged tables, persistent kind

From
Robert Haas
Date:
On Mon, Apr 25, 2011 at 5:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, Apr 25, 2011 at 8:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Apr 24, 2011, at 1:22 PM, Simon Riggs <simon@2ndQuadrant.com> wrote:
>>> Unlogged tables are a good new feature.
>>
>> Thanks.
>>
>>> I noticed Bruce had mentioned they were the equivalent of NoSQL, which
>>> I don't really accept.
>>
>> Me neither. I thought that was poorly said.
>>
>>> Heap blocks would be zeroed if they were found to be damaged, following a crash.
>>
>> The problem is not so much the blocks that are damaged (e.g. half-written, torn page) but the ones that were never
writtenat all. For example, read page A, read page B, update tuple on page A putting new version on page B, write one
butnot both of A and B out to the O/S, crash.  Everything on disk is a valid page, but they are not coherent taken as a
whole. It's normally XLOG replay that fixes this type of situation... 
>
> Not really sure it matters what the cause of data loss is, does it?
> The zeroing of the blocks definitely causes data loss but the
> intention is to bring the table back to a consistent physical state,
> not to in any way repair the data loss.

Right, but the trick is how you identify which blocks you need to
zero.  You used the word "damaged", which to me implied that the block
had been modified in some way but ended up with other than the
expected contents, so that something like a CRC check might detect the
problem.  My point (as perhaps you already understand) is that you
could easily have a situation where every block in the table passes a
hypothetical block-level CRC check, but the table as a whole is still
damaged because update chains aren't coherent.  So you need some kind
of mechanism for identifying which portions of the table you need to
zero to get back to a guaranteed-coherent state.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Unlogged tables, persistent kind

From
Robert Haas
Date:
On Mon, Apr 25, 2011 at 3:36 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
> The only data we can't rebuild it's the heap. So what about an option for UNlogged indexes on a LOGged table? It
wouldalways preserve data, and it would 'only' cost a rebuilding of the indexes in case of an unclean shutdown. I think
itwould give a boost in performance for all those cases where the IO (especially random IO) is caused by the indexes,
andit doesn't look too complicated (but maybe I'm missing something). 

+1.

> I proposed the unlogged to logged patch (BTW has anyone given a look at it?) because we partition data based on a
timestamp,and we can risk loosing the last N minutes of data, but after N minutes we want to know data will always be
there,so we would like to set a partition table to 'logged'. 

That approach is something I had also given some thought to, and I'm
glad to hear that people are thinking about doing it in the real
world.  I'm planning to look at your patch, but I haven't gotten to it
yet, because I'm giving priority to anything that must be done to get
9.1beta1 out the door.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Unlogged tables, persistent kind

From
Leonardo Francalanci
Date:
> The amount of data loss on a big
> table will be <1% of the data loss
>caused by truncating the whole table.

If that 1% is random (not time/transaction related), usually you'd rather have an empty table. In other words: is a
tablethat is not consistant with anything else in the db useful? 


Re: Unlogged tables, persistent kind

From
Jesper Krogh
Date:
On 2011-04-25 20:00, Leonardo Francalanci wrote:<br /><span style="white-space: pre;">>> The amount of data loss
ona big table will be <1% of the data<br /> >> loss caused by truncating the whole table.<br /> > <br />
>If that 1% is random (not time/transaction related), usually you'd<br /> > rather have an empty table. In other
words:is a table that is not<br /> > consistant with anything else in the db useful?<br /> > </span><br />
Dependson the application, if it serves for pure caching then it is fully acceptable and way<br /> better than dropping
everything.<br/><br /> -- <br /> Jesper<br /><br /> 

Re: Unlogged tables, persistent kind

From
"Kevin Grittner"
Date:
Jesper Krogh <jesper@krogh.cc> wrote:
> On 2011-04-25 20:00, Leonardo Francalanci wrote:
>>> The amount of data loss on a big table will be <1% of the data
>>> loss caused by truncating the whole table.
>>
>>  If that 1% is random (not time/transaction related), usually
>>  you'd rather have an empty table. In other words: is a table
>>  that is not consistant with anything else in the db useful?
>>
> Depends on the application, if it serves for pure caching then it
> is fully acceptable and way better than dropping everything.
I buy this *if* we can be sure we're not keeping information which
is duplicated or mangled, and if we can avoid crashing the server to
a panic because of broken pointers or other infelicities.  I'm not
sure that can't be done, but I don't think I've heard an explanation
of how that could be accomplished, particularly without overhead
which would wipe out the performance benefit of unlogged tables. 
(And without a performance benefit, what's the point?)
-Kevin


Re: Unlogged tables, persistent kind

From
Christopher Browne
Date:
On Mon, Apr 25, 2011 at 2:03 PM, Jesper Krogh <jesper@krogh.cc> wrote:
> On 2011-04-25 20:00, Leonardo Francalanci wrote:
>>> The amount of data loss on a big table will be <1% of the data
>>> loss caused by truncating the whole table.
>>
>> If that 1% is random (not time/transaction related), usually you'd
>> rather have an empty table. In other words: is a table that is not
>> consistant with anything else in the db useful?
>>
> Depends on the application, if it serves for pure caching then it is fully
> acceptable and way
> better than dropping everything.

Whoah...

When cacheing, the application already needs to be able to cope with
the case where there's nothing in the cache.

This means that if the cache gets truncated, it's reasonable to expect
that the application won't get deranged - it already needs to cope
with the case where data's not there and needs to get constructed.

In contrast, if *wrong* data is in the cache, that could very well
lead to wrong behavior on the part of the application.

And there may not be any mechanism aside from cache truncation that
will rectify that.

It seems to me that it's a lot riskier to try to preserve contents of
such tables than it is to truncate them.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: Unlogged tables, persistent kind

From
Simon Riggs
Date:
On Mon, Apr 25, 2011 at 7:00 PM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>> The amount of data loss on a big
>> table will be <1% of the data loss
>>caused by truncating the whole table.
>
> If that 1% is random (not time/transaction related), usually you'd rather have an empty table.

Why do you think it would be random?


> In other words: is a table that is not consistant with anything else in the db useful?

That's too big a leap. Why would it suddenly be inconsistent with the
rest of the database?


Not good arguments.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Unlogged tables, persistent kind

From
Simon Riggs
Date:
On Mon, Apr 25, 2011 at 1:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 25, 2011 at 5:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Mon, Apr 25, 2011 at 8:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Apr 24, 2011, at 1:22 PM, Simon Riggs <simon@2ndQuadrant.com> wrote:
>>>> Unlogged tables are a good new feature.
>>>
>>> Thanks.
>>>
>>>> I noticed Bruce had mentioned they were the equivalent of NoSQL, which
>>>> I don't really accept.
>>>
>>> Me neither. I thought that was poorly said.
>>>
>>>> Heap blocks would be zeroed if they were found to be damaged, following a crash.
>>>
>>> The problem is not so much the blocks that are damaged (e.g. half-written, torn page) but the ones that were never
writtenat all. For example, read page A, read page B, update tuple on page A putting new version on page B, write one
butnot both of A and B out to the O/S, crash.  Everything on disk is a valid page, but they are not coherent taken as a
whole. It's normally XLOG replay that fixes this type of situation... 
>>
>> Not really sure it matters what the cause of data loss is, does it?
>> The zeroing of the blocks definitely causes data loss but the
>> intention is to bring the table back to a consistent physical state,
>> not to in any way repair the data loss.
>
> Right, but the trick is how you identify which blocks you need to
> zero.  You used the word "damaged", which to me implied that the block
> had been modified in some way but ended up with other than the
> expected contents, so that something like a CRC check might detect the
> problem.  My point (as perhaps you already understand) is that you
> could easily have a situation where every block in the table passes a
> hypothetical block-level CRC check, but the table as a whole is still
> damaged because update chains aren't coherent.  So you need some kind
> of mechanism for identifying which portions of the table you need to
> zero to get back to a guaranteed-coherent state.

That sounds like progress.

The current mechanism is "truncate complete table". There are clearly
other mechanisms that would not remove all data.

Probably the common case would be for insert-only data.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Unlogged tables, persistent kind

From
Robert Haas
Date:
On Mon, Apr 25, 2011 at 2:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Right, but the trick is how you identify which blocks you need to
>> zero.  You used the word "damaged", which to me implied that the block
>> had been modified in some way but ended up with other than the
>> expected contents, so that something like a CRC check might detect the
>> problem.  My point (as perhaps you already understand) is that you
>> could easily have a situation where every block in the table passes a
>> hypothetical block-level CRC check, but the table as a whole is still
>> damaged because update chains aren't coherent.  So you need some kind
>> of mechanism for identifying which portions of the table you need to
>> zero to get back to a guaranteed-coherent state.
>
> That sounds like progress.
>
> The current mechanism is "truncate complete table". There are clearly
> other mechanisms that would not remove all data.

No doubt.  Consider a block B.  If the system crashes when block B is
dirty either in the OS cache or shared_buffers, then you must zero B,
or truncate it away.  If it was clean in both places, however, it's
good data and you can keep it.

So you can imagine for example a scheme where imagine that the
relation is divided into 8MB chunks, and we WAL-log the first
operation after each checkpoint that touches a chunk.  Replay zeroes
the chunk, and we also invalidate all the indexes (the user must
REINDEX to get them working again).  I think that would be safe, and
certainly the WAL-logging overhead would be far less than WAL-logging
every change, since we'd need to emit only ~16 bytes of WAL for every
8MB written, rather than ~8MB of WAL for every 8MB written.  It
wouldn't allow some of the optimizations that the current unlogged
tables can get away with only because they WAL-log exactly nothing -
and selectively zeroing chunks of a large table might slow down
startup quite a bit - but it might still be useful to someone.

However, I think that the "logged table, unlogged index" idea is
probably the most promising thing to think about doing first.  It's
easy to imagine all sorts of uses for that sort of thing even in cases
where people can't afford to have any data get zeroed, and it would
provide a convenient building block for something like the above if we
eventually wanted to go that way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Unlogged tables, persistent kind

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> However, I think that the "logged table, unlogged index" idea is
> probably the most promising thing to think about doing first.

+1 for that --- it's clean, has a clear use-case, and would allow us
to manage the current mess around hash indexes more cleanly.
That is, hash indexes would always be treated as unlogged.

(Or of course we could fix the lack of WAL logging for hash indexes,
but I notice a lack of people stepping up to do that.)
        regards, tom lane


Re: Unlogged tables, persistent kind

From
Leonardo Francalanci
Date:
> > If that 1%  is random (not time/transaction related), usually you'd rather
>have an empty  table.
>
> Why do you think it would be random?

"Heap blocks would be zeroed if they were found to be damaged, following a
crash."

If you erase full blocks, you have no idea what data you erased; it could be
something changed 1 hour ago, 1 month ago, 1 year ago. This is very different
from,
say, synchronous_commit=off: in that case, "the most recent transactions may be
lost if the database should crash". In your case, "some (who knows which???)
data
is lost". So, to me that sounds like random loss. I don't think that that is
different
from a corrupted table. You're not deleting rows recently changed; you're
deleting
everything that is "physically close" to it.

> > In other  words: is a table that is not consistant with anything else in the
>db  useful?
>
> That's too big a leap. Why would it suddenly be inconsistent with  the
> rest of the database?


If you delete some data, and you have no idea what data you lost, I don't think
you have a
consistent db. Unless, of course, your table has no relation with any other
table in the db.

Of course, all these thoughts are based on the assumption that I know what
happens when a
block is erased; but my knowledge of postgresql internals is not so good, so I
might be
*very* wrong


Re: Unlogged tables, persistent kind

From
Cédric Villemain
Date:
2011/4/25 Christopher Browne <cbbrowne@gmail.com>:
> On Mon, Apr 25, 2011 at 2:03 PM, Jesper Krogh <jesper@krogh.cc> wrote:
>> On 2011-04-25 20:00, Leonardo Francalanci wrote:
>>>> The amount of data loss on a big table will be <1% of the data
>>>> loss caused by truncating the whole table.
>>>
>>> If that 1% is random (not time/transaction related), usually you'd
>>> rather have an empty table. In other words: is a table that is not
>>> consistant with anything else in the db useful?
>>>
>> Depends on the application, if it serves for pure caching then it is fully
>> acceptable and way
>> better than dropping everything.
>
> Whoah...
>
> When cacheing, the application already needs to be able to cope with
> the case where there's nothing in the cache.
>
> This means that if the cache gets truncated, it's reasonable to expect
> that the application won't get deranged - it already needs to cope
> with the case where data's not there and needs to get constructed.

That is true but the application performance has already to cope with
a server crash/restart. Many things you can add to make the restart
(for the application)  more 'smooth' is good.

>
> In contrast, if *wrong* data is in the cache, that could very well
> lead to wrong behavior on the part of the application.
>
> And there may not be any mechanism aside from cache truncation that
> will rectify that.
>
> It seems to me that it's a lot riskier to try to preserve contents of
> such tables than it is to truncate them.
> --
> When confronted by a difficult problem, solve it by reducing it to the
> question, "How would the Lone Ranger handle this?"
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


Re: Unlogged tables, persistent kind

From
Simon Riggs
Date:
On Tue, Apr 26, 2011 at 8:49 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>> > If that 1%  is random (not time/transaction related), usually you'd rather
>>have an empty  table.
>>
>> Why do you think it would be random?
>
> "Heap blocks would be zeroed if they were found to be damaged, following a
> crash."
>
> If you erase full blocks, you have no idea what data you erased; it could be
> something changed 1 hour ago, 1 month ago, 1 year ago. This is very different
> from,
> say, synchronous_commit=off: in that case, "the most recent transactions may be
> lost if the database should crash". In your case, "some (who knows which???)
> data
> is lost". So, to me that sounds like random loss. I don't think that that is
> different
> from a corrupted table. You're not deleting rows recently changed; you're
> deleting
> everything that is "physically close" to it.
>
>> > In other  words: is a table that is not consistant with anything else in the
>>db  useful?
>>
>> That's too big a leap. Why would it suddenly be inconsistent with  the
>> rest of the database?
>
>
> If you delete some data, and you have no idea what data you lost, I don't think
> you have a
> consistent db. Unless, of course, your table has no relation with any other
> table in the db.
>
> Of course, all these thoughts are based on the assumption that I know what
> happens when a
> block is erased; but my knowledge of postgresql internals is not so good, so I
> might be
> *very* wrong

You're assuming that there are referential links *from* other tables
to the table with damage. In which case you would be correct. But of
course, if you needed that data for integrity you would never do that,
so the problem is a nonexistent use case. The suggested mode is for
Fact data, not reference tables.

The current assessment is that UNLOGGED tables are useful only for
running a data cache. If the database crashes, then the table is
truncated and you must refill the cache. If that is the case, then it
must surely be better to have a cache that is already 99% full, than
one which starts at empty. There is no damage or loss because parts of
the cache were missing.

Unlogged Tables are currently so volatile they are unusable for any
other purpose. I want to see a table that is useful for low value
data, such as sensor data.
If you had 10 TB of sensor data and the database crashes, then you
want to lose a few blocks, not the whole lot. Low value => rare, minor
loss is acceptable, but it doesn;t mean total data loss is acceptable.
For that use case, total loss is catastrophic, not just mildly
irritating. If you are a Telco, losing a few minutes billing data
costs much less than having every server have better hardware so it
can cope with high WAL traffic as well. They don't want to lose the
data, but its a cost based trade off. Consistency is not an issue, you
are just missing some data. That is normal anyway, since sensor data
generators (mobile devices etc) frequently fail, are offline, turned
off etc, so there isn't even a definition of what complete data is
supposed to look like. The missing data looks exactly like lots of
people turned their phones off for a few minutes.

So my suggestion makes UNLOGGED tables more useful for the use case
they were designed to address - cached data (AIUI), plus they allow
another use case that doesn't seem to be well understood, low value
data in massive data volumes.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Unlogged tables, persistent kind

From
Greg Stark
Date:
On Tue, May 3, 2011 at 8:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The current assessment is that UNLOGGED tables are useful only for
> running a data cache. If the database crashes, then the table is
> truncated and you must refill the cache. If that is the case, then it
> must surely be better to have a cache that is already 99% full, than
> one which starts at empty. There is no damage or loss because parts of
> the cache were missing.

That's not necessarily the case of course. I've written caches before
where a set of records would be present for each object being cached.
I could deal with a whole set not being present but if just some of
the records were missing then I would serve incorrect results. Since
they were created in a single transacion I should be able to rely on
the whole set being present or missing consistently.

That doesn't mean there aren't cases where that's true of course. In
an ideal world the database would guarantee that you couldn't use the
table in this way and fail to get the consistency you expect.
Something like getting an error if you try to modify two rows in an
unlogged table during a single transaction. I'm not sure how to do
that reasonably though.


> For that use case, total loss is catastrophic, not just mildly
> irritating. If you are a Telco, losing a few minutes billing data
> costs much less than having every server have better hardware so it
> can cope with high WAL traffic as well. They don't want to lose the
> data, but its a cost based trade off.

This analysis is dead on. That's precisely how businesses evaluate
this question.


> Consistency is not an issue, you
> are just missing some data. That is normal anyway, since sensor data
> generators (mobile devices etc) frequently fail, are offline, turned
> off etc, so there isn't even a definition of what complete data is
> supposed to look like. The missing data looks exactly like lots of
> people turned their phones off for a few minutes.

I don't think that's true however. Consider if I have a rollup table
that contains aggregated sums of that data. If you lose some of the
records then my aggregates don't add up to the "correct" values any
more.

Or consider if you are counting sensor data like total data
transferred and session count -- and then reporting the average data
transferred per session. And you accidentally lose a bunch of
sessions. Now your data/session report will be reporting false
information.

This is true even if you only lose recently committed rows. But losing
whole blocks means you risk losing random old data which makes it hard
to work around by, say, purging recently aggregated data.

> So my suggestion makes UNLOGGED tables more useful for the use case
> they were designed to address - cached data (AIUI), plus they allow
> another use case that doesn't seem to be well understood, low value
> data in massive data volumes.

There other approaches as well. Foreign data wrappers mean you could
do things like store the low value data in raw text files or other
systems like memcached or Hadoop or whatever. I'm not saying there's
no reason to do something in Postgres but if you're being bitten by
Postgres's block-oriented random access storage it may be a problem
too fundamental to solve without addressing the underlying storage
strategy?


-- 
greg


Re: Unlogged tables, persistent kind

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> You're assuming that there are referential links *from* other tables
> to the table with damage. In which case you would be correct. But of
> course, if you needed that data for integrity you would never do that,
> so the problem is a nonexistent use case. The suggested mode is for
> Fact data, not reference tables.

So I suppose your notion of "fact data" includes no fields that are wide
enough to need toasting?  Because as soon as you have any out-of-line
values, there's an equivalent of foreign keys behind the scenes, where
the user can't see it (until he gets "missing chunk number" or some such
error).

> The current assessment is that UNLOGGED tables are useful only for
> running a data cache. If the database crashes, then the table is
> truncated and you must refill the cache. If that is the case, then it
> must surely be better to have a cache that is already 99% full, than
> one which starts at empty. There is no damage or loss because parts of
> the cache were missing.

A cache that starts at 99% full of untrustworthy data is NOT better.

> Unlogged Tables are currently so volatile they are unusable for any
> other purpose. I want to see a table that is useful for low value
> data, such as sensor data.

Basically, you're being hopelessly optimistic both about the extent to
which a crash is likely to render data inconsistent, and our ability to
detect that inconsistency.  It doesn't matter whether the data is "low
value", the difficulty of cleaning up remains the same.  I don't want to
deal with trying to detect that, and I definitely don't want to dump the
problems onto users.
        regards, tom lane


Re: Unlogged tables, persistent kind

From
Christopher Browne
Date:
On Tue, May 3, 2011 at 4:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> You're assuming that there are referential links *from* other tables
>> to the table with damage. In which case you would be correct. But of
>> course, if you needed that data for integrity you would never do that,
>> so the problem is a nonexistent use case. The suggested mode is for
>> Fact data, not reference tables.
>
> So I suppose your notion of "fact data" includes no fields that are wide
> enough to need toasting?  Because as soon as you have any out-of-line
> values, there's an equivalent of foreign keys behind the scenes, where
> the user can't see it (until he gets "missing chunk number" or some such
> error).
>
>> The current assessment is that UNLOGGED tables are useful only for
>> running a data cache. If the database crashes, then the table is
>> truncated and you must refill the cache. If that is the case, then it
>> must surely be better to have a cache that is already 99% full, than
>> one which starts at empty. There is no damage or loss because parts of
>> the cache were missing.
>
> A cache that starts at 99% full of untrustworthy data is NOT better.

That's a bit pessimistic.

The case that bugs me is that a cache that's 99% trustworthy, but
where I have no idea that:
a) 1% of it is crud, and
b) Which 1% of it is crud
is still a pretty unacceptable scenario.

I head back to our policy for handling caches: If in doubt, TRUNCATE.

That policy would be nicely consistent with the way 9.1 deals with
unlogged tables.

>> Unlogged Tables are currently so volatile they are unusable for any
>> other purpose. I want to see a table that is useful for low value
>> data, such as sensor data.
>
> Basically, you're being hopelessly optimistic both about the extent to
> which a crash is likely to render data inconsistent, and our ability to
> detect that inconsistency.  It doesn't matter whether the data is "low
> value", the difficulty of cleaning up remains the same.  I don't want to
> deal with trying to detect that, and I definitely don't want to dump the
> problems onto users.

+1, on both grounds.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"