Re: unlogged tables - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: unlogged tables |
Date | |
Msg-id | AANLkTi=tdk9AN0GmYBp7wURB-nNMzuQCq_C8p0_Q6TpZ@mail.gmail.com Whole thread Raw |
In response to | Re: unlogged tables (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: unlogged tables
|
List | pgsql-hackers |
On Wed, Nov 17, 2010 at 2:31 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >>> The customer is always right, and I think we are hearing loud and clear >>> what the customers want. Please let's not go out of our way to create >>> a feature that isn't what they want. >> >> I would be fine with only having a safe shutdown with unlogged tables >> and skip the checkpoint I/O all other times. > > Yeah, I was just thinking something like that would be good, and should > overcome Robert's objection to the whole idea. Could we slow down here a bit and talk through the ideas here in a logical fashion? The customer is always right, but the informed customer makes better decisions than the uninformed customer. This idea, as proposed, does not work. If you only include dirty buffers at the final checkpoint before shutting down, you have no guarantee that any buffers that you either didn't write or didn't fsync previously are actually on disk. Therefore, you have no guarantee that the table data is not corrupted.So you really have to decide between including theunlogged-table buffers in EVERY checkpoint and not ever including them at all. Which one is right depends on your use case. For example, consider the poster who said that, when this feature is available, they plan to try ripping out their memcached instance and replacing it with PostgreSQL running unlogged tables. Suppose this poster (or someone else in a similar situation) has a 64 GB and is currently running a 60 GB memcached instance on it, which is not an unrealistic scenario for memcached. Suppose further that he dirties 25% of that data each hour. memcached is currently doing no writes to disk. When he switches to PostgreSQL and sets checkpoints_segments to a gazillion and checkpoint_timeout to the maximum, he's going to start writing 15 GB of data to disk every hour - data which he clearly doesn't care about losing, or preserving across restarts, because he's currently storing it in memcached. In fact, with memcached, he'll not only lose data at shutdown - he'll lose data on a regular basis when everything is running normally. We can try to convince ourselves that someone in this situation will not care about needing to get 15GB of disposable data per hour from memory to disk in order to have a feature that he doesn't need, but I think it's going to be pretty hard to make that credible. Now, second use case. Consider someone who is currently running PostgreSQL in a non-durable configuration, with fsync=off, full_page_writes=off, and synchronous_commit=off. This person - who is based on someone I spoke with at PG West - is doing a large amount of data processing using PostGIS. Their typical workflow is to load a bunch of data, run a simulation, and then throw away the entire database. They don't want to pay the cost of durability because if they crash in mid-simulation they will simply rerun it. Being fast is more important. Whether or not this person will be happy with the proposed behavior is a bit harder to say. If it kills performance, they will definitely hate it. But if the performance penalty is only modest, they may enjoy the convenience of being able to shut down the database and start it up again later without losing data. Third use case. Someone on pgsql-general mentioned that they want to write logs to PG, and can abide losing them if a crash happens, but not on a clean shutdown and restart. This person clearly shuts down their production database a lot more often than I do, but that is OK. By explicit stipulation, they want the survive-a-clean-shutdown behavior. I have no problem supporting that use case, providing they are willing to take the associated performance penalty at checkpoint time, which we don't know because we haven't asked, but I'm fine with assuming it's useful even though I probably wouldn't use it much myself. > I also agree with Tom's sentiment above. > > To answer another point I see Tom made on the -general list: while > individual backends may crash from time to time, crashes of the whole > Postgres server are very rare in my experience in production environments. > It's really pretty robust, unless you're doing crazy stuff. So that makes it > all the more important that we can restart a server cleanly (say, to change > a config setting) without losing the unlogged tables. If we don't allow that > we'll make a laughing stock of ourselves. Honestly. Let's please not assume that there is only one reasonable option here, or that I have not thought about some of these issues. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: