Re: PITR Functional Design v2 for 7.5 - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: PITR Functional Design v2 for 7.5 |
Date | |
Msg-id | 200403090946.33699.josh@agliodbs.com Whole thread Raw |
In response to | PITR Functional Design v2 for 7.5 ("Simon Riggs" <simon@2ndquadrant.com>) |
Responses |
Re: PITR Functional Design v2 for 7.5
Re: PITR Functional Design v2 for 7.5 Re: PITR Functional Design v2 for 7.5 Re: PITR Functional Design v2 for 7.5 Re: PITR Functional Design v2 for 7.5 |
List | pgsql-hackers |
Simon, First off, let me compliment you on such a thourough proposal. I'm feeling very enthusiastic about 7.5 PITR based on the amount of thought you've given the project. Based on throuroughness, I wanted to make a few comments and suggestions. All of the below are in the category of "extras it would be nice to have but are not essential to implementing PITR." Possibly they are all features to consider for 7.6/8.0/whatever; the next version. These comments are based on my personal experience as a professional contract DBA for PostgreSQL, MS SQL Server and SQL Anywhere. > PITR features are designed to extend the existing Crash Recovery > features so > that a recovery can take place in situations where a crash recovery > would > not have been possible. These situations are: In my personal experience, the *primary* use of PITR is recovery from User Error. For example, with one SQL Server 7.0 installation for a law firm, I've made use of PITR 4 times over the last 4 years: once was because and HDD failed, the other three were all becuase of IT dept. staff running unconstrained UPDATE queries against the back end. For recovery with minimal loss of data, there are existing solutions, such as replication servers, in addition to PITR; for recovery from User Error, only PITR will suffice. > There are a wide range of Backup and Recovery (BAR) products on the > market, both open source and commercially licensed programs that provide > facilities to perform full physical backups and individual file > archives. The best way to foster wide adoption of PostgreSQL is to allow > it to work in conjunction with any of these products. To this end, a Very perceptive of you. Good idea! > wal_archive_policy and enable/disable archiving accordingly. This > parameter can only be changed at server start. (This is required because > the initial step of archiving each xlog is performed by the backend; if > this were changeable after boot, then it might be possible for an > individual backend to override the wal_archive_policy and choose not to > archive - which would then effect the whole system and all users, not > just the user making that choice). It is considered less desirable to Let me voice a real-world exception to this policy. Imagine that you are running an OLAP or decision-support database that analyzes data coming from an external source. Once a day you load 250MB of data via COPY and then does transformations on that data. While doing the load, you do *not* want the archiver running, as it would quickly fill up the WAL partition and backlog the archive tape. Under the proposed PITR spec, the only way to handle this would be to: 1) Full back up 2) Shut down PG 3) Restart PG without archiving 4) Load the data 5) Shut down PG again 6) Restart PG with archiving 7) Full back-up again. DBAs would like it much more if starting/stopping the archiver was possible via a superuser (not regular user) GUC. This would allow a much faster cycle: 1) Full back up 2) Stop archiving 3) Load the data 4) Restart archiving 5) Full back-up Related to the above, what I don't see in your paper or the proposed API is a way to coordinate full back-ups and WAL archiving. Obviously, the PITR Archive is only useful in reference to an existing full backup, so it is important to be able to associate a set of PITR archives with a particular full backup, or with some kind of "backup checkpoint". I'm sure that you have a solution for this, I just didn't see it explained in your proposal, or didn't understand it. FWIW, I find the MSSQL PITR system awkward in the extreme and harrowing in its unreliability. So it's not a good model to copy .... > There is no requirement for the archiver to halt when PostgreSQL shuts > down, though may choose to do so or not, e.g. it may be desirable to > have one archiver operate for multiple postmasters simultaneously. The I see that you've chosen the "One archiver, many databases/clusters" architecture. I can also see how this strategy will be easier than the "many archivers" strategy. Be prepared that, based on the needs of DBAs, you will get the following requests: A) Will it be possible to have the archiver process run on a seperate machine from PostgreSQL and access it over the network, via NFS or some other means? B) Will it be possible to define multiple output streams, so that database X and be archived to device Y and database N to device M? > The out of space condition could therefore occur in two ways: > 1. there is a single delay during which xlog filesystem fills > 2. there could be a systematic delay which builds slowly until the xlog > filesystem fills Given how PITR, and Tablespaces, both substantially increase the risk of running out of space on the xlog partition(s), it would be very nice to be able to arrange a WARNING whenever any PostgreSQL disk resource drops below a pre-defined percentage of availability. This could be done through a simple asynchronous process; heck, I think even I could write it in Perl. Maybe I'll try. The idea would be that all of the dirs defined in Tablespaces and PGData would be checked every X seconds for available space, and when it drops below Y% as reported by the filesystem, a WARNING is issued (X and Y% would be configurable at start time). Given the fluctuation of the xlog size and the size of the data files, this is not 100% reliabile but as a DBA it would be useful to be warned that xlog is more than, say, 80% full at least some of the time. It would let me know that I need to look at re-partitioning sooner rather than later. This asynchronous daemon would be optional, like the statistics daemon, allowing DBAs with large disks and small DBs not to run it. > It is possible that an administrator may wish to choose to keep > PostgreSQL up and to begin dropping log files rather than eventually > crash. If that choice was made AND a full physical backup was not yet > available, then there is a window of risk during which if a catastrophic Frankly, the only reason I can see for keeping the DB up after out-of-space is to allow a full backup to be made. In fact, I would favor a solution that immediately ran a full backup and then shut down the db whenever out-of-space happened. If continued operation with dropping logs proves hard to implement, I'd say skip it. Also, if you allow halting the archiver process on running databases, it would allow the DBA to gracefully extricate themselves from circumstances where rlog or xlog is out of space but the rest of the DB system is not. -- -Josh BerkusAglio Database SolutionsSan Francisco
pgsql-hackers by date: