Re: Survey on backing up unlogged tables: help us with PostgreSQL development! - Mailing list pgsql-general

From Ivan Voras
Subject Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
Date
Msg-id ic1n6f$hc8$1@dough.gmane.org
Whole thread Raw
In response to Re: Re: Survey on backing up unlogged tables: help us with PostgreSQL development!  ("A.M." <agentm@themactionfaction.com>)
List pgsql-general
On 11/17/10 17:43, A.M. wrote:
>
> On Nov 17, 2010, at 11:32 AM, Ivan Voras wrote:
>
>> On 11/17/10 02:55, Josh Berkus wrote:
>>>
>>>> If you do wish to have the data tossed out for no good reason every so
>>>> often, then there ought to be a separate attribute to control that.  I'm
>>>> really having trouble seeing how such behavior would be desirable enough
>>>> to ever have the server do it for you, on its terms rather than yours.
>>>
>>> I don't quite follow you.  The purpose of unlogged tables is for data
>>> which is disposable in the event of downtime; the classic example is the
>>> a user_session_status table.  In the event of a restart, all user
>>> sessions are going to be invalid anyway.
>>
>> Depends on what you mean by "session".
>>
>> Typical web application session data, e.g. for PHP applications which are deployed in *huge* numbers resides
directlyon file systems, and are not guarded by anything (not even fsyncs). On operating system crash (and I do mean
whenthe whole machine and the OS go down), the most that can happen is that some of those session files get garbled or
missing- all the others work perfectly fine when the server is brought back again and the users can continue to work
withintheir sessions. -- *That* is useful session behaviour and it is also useful for logs. 
>>
>> The definition of unlogged tables which are deliberately being emptied for no good reason does not seem very useful
tome. I'd rather support a (optional) mode (if it can be implemented) in which PostgreSQL scans through these unlogged
tableson startup and discards any pages whose checkums don't match, but accepts all others as "good enough". Even
better:maybe not all pages need to be scanned, only the last few, if there is a chance for any kind of mechanism which
canact as checkpoints for data validity. 
>
> This is not really a fair feature comparison. With the file-based sessions, the webserver will continue to deal with
potentiallycorrupted sessions, which is worse than dealing with no sessions. 

I guess it depends on specific use case, but in the common case (i.e.
non-mission critical massive deployments) I'd say it's definitely *not*
worse than no sessions. "Dealing with potential corruption" in this case
usually means the web application will attempt to deserialize the
session data and fail if it's corrupted, leading to a new session being
created.

> I also plan on using this feature for materialized views to replace memcached.

Just how large a performance gain is expected from this thing? :) I
don't see a mention that fsync will be disabled on unlogged tables
(though it makes sense so it probably will be).

Having materialized views this way will mean that something - either an
application or an external script triggered by database startup - will
have to calculate and create this materialized view, which will probably
involve massive table scanning all around - I suspect that performance
gains from unlogged tables could be hidden by this scanning.

Anyway, I'm not arguing against them, I'm arguing for making them more
powerful.

pgsql-general by date:

Previous
From: "Marc Mamin"
Date:
Subject: Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
Next
From: Merlin Moncure
Date:
Subject: Re: Survey on backing up unlogged tables: help us with PostgreSQL development!