Re: unlogged tables - Mailing list pgsql-hackers

From Robert Haas
Subject Re: unlogged tables
Date
Msg-id AANLkTi=tdk9AN0GmYBp7wURB-nNMzuQCq_C8p0_Q6TpZ@mail.gmail.com
Whole thread Raw
In response to Re: unlogged tables  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: unlogged tables
List pgsql-hackers
On Wed, Nov 17, 2010 at 2:31 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> The customer is always right, and I think we are hearing loud and clear
>>> what the customers want.  Please let's not go out of our way to create
>>> a feature that isn't what they want.
>>
>> I would be fine with only having a safe shutdown with unlogged tables
>> and skip the checkpoint I/O all other times.
>
> Yeah, I was just thinking something like that would be good, and should
> overcome Robert's objection to the whole idea.

Could we slow down here a bit and talk through the ideas here in a
logical fashion?

The customer is always right, but the informed customer makes better
decisions than the uninformed customer.  This idea, as proposed, does
not work.  If you only include dirty buffers at the final checkpoint
before shutting down, you have no guarantee that any buffers that you
either didn't write or didn't fsync previously are actually on disk.
Therefore, you have no guarantee that the table data is not corrupted.So you really have to decide between including
theunlogged-table 
buffers in EVERY checkpoint and not ever including them at all.  Which
one is right depends on your use case.

For example, consider the poster who said that, when this feature is
available, they plan to try ripping out their memcached instance and
replacing it with PostgreSQL running unlogged tables.  Suppose this
poster (or someone else in a similar situation) has a 64 GB and is
currently running a 60 GB memcached instance on it, which is not an
unrealistic scenario for memcached.  Suppose further that he dirties
25% of that data each hour.  memcached is currently doing no writes to
disk.  When he switches to PostgreSQL and sets checkpoints_segments to
a gazillion and checkpoint_timeout to the maximum, he's going to start
writing 15 GB of data to disk every hour - data which he clearly
doesn't care about losing, or preserving across restarts, because he's
currently storing it in memcached.  In fact, with memcached, he'll not
only lose data at shutdown - he'll lose data on a regular basis when
everything is running normally.  We can try to convince ourselves that
someone in this situation will not care about needing to get 15GB of
disposable data per hour from memory to disk in order to have a
feature that he doesn't need, but I think it's going to be pretty hard
to make that credible.

Now, second use case.  Consider someone who is currently running
PostgreSQL in a non-durable configuration, with fsync=off,
full_page_writes=off, and synchronous_commit=off.  This person - who
is based on someone I spoke with at PG West - is doing a large amount
of data processing using PostGIS.  Their typical workflow is to load a
bunch of data, run a simulation, and then throw away the entire
database.  They don't want to pay the cost of durability because if
they crash in mid-simulation they will simply rerun it.  Being fast is
more important.  Whether or not this person will be happy with the
proposed behavior is a bit harder to say.  If it kills performance,
they will definitely hate it.  But if the performance penalty is only
modest, they may enjoy the convenience of being able to shut down the
database and start it up again later without losing data.

Third use case.  Someone on pgsql-general mentioned that they want to
write logs to PG, and can abide losing them if a crash happens, but
not on a clean shutdown and restart.  This person clearly shuts down
their production database a lot more often than I do, but that is OK.
By explicit stipulation, they want the survive-a-clean-shutdown
behavior.  I have no problem supporting that use case, providing they
are willing to take the associated performance penalty at checkpoint
time, which we don't know because we haven't asked, but I'm fine with
assuming it's useful even though I probably wouldn't use it much
myself.

> I also agree with Tom's sentiment above.
>
> To answer another point I see Tom made on the -general list: while
> individual backends may crash from time to time, crashes of the whole
> Postgres server are very rare in my experience in production environments.
> It's really pretty robust, unless you're doing crazy stuff. So that makes it
> all the more important that we can restart a server cleanly (say, to change
> a config setting) without losing the unlogged tables. If we don't allow that
> we'll make a laughing stock of ourselves. Honestly.

Let's please not assume that there is only one reasonable option here,
or that I have not thought about some of these issues.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: unlogged tables
Next
From: Tom Lane
Date:
Subject: Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running