Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and |
Date | |
Msg-id | 1135333123.2964.589.camel@localhost.localdomain Whole thread Raw |
In response to | Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and (Stephen Frost <sfrost@snowman.net>) |
List | pgsql-hackers |
On Thu, 2005-12-22 at 17:36 -0500, Stephen Frost wrote: > * Simon Riggs (simon@2ndquadrant.com) wrote: > > On Thu, 2005-12-22 at 21:18 +0100, Martijn van Oosterhout wrote: > > > Considering "WAL bypass" is code for "breaks PITR" > > > > No it isn't. All of the WAL bypass logic does *not* operate when PITR is > > active. The WAL bypass logic is aimed at Data Warehouses, which > > typically never operate in PITR mode for performance reasons, however > > the choice is yours. OK, thanks for saying all of that; you probably speak for many in raising these concerns. I'll answer each bit as we come to it. Suffice to say, your concerns are good and so are the answers: > Eh? PITR mode is bad for performance? Maybe I missed something but I > wouldn't have thought PITR would degrade regular performance all that > badly. PITR mode is *not* bad for performance. On a very heavily loaded write-intensive test system, the general PITR overhead on regular performance was around 1% - so almost negligible. We have been discussing a number of optimizations to specific commands that would allow them to avoid writing WAL and thus speed up their performance. If archive_command is set then WAL will always be written; if it is not set then these commands will (or could) go faster: - CREATE TABLE AS SELECT (in 8.1) - COPY LOCK (patch submitted) - COPY in same transaction as CREATE TABLE (patch submitted) - INSERT SELECT in same transaction as CREATE TABLE (this discussion) (There are a number of other conditions also, such as there must be no indexes on a table. All of which now documented with the patch) > So long as it doesn't take 15 minutes or some such to move the > WAL to somewhere else (and I'm not sure that'd even slow things down..). > For a Data Warehouse, have you got a better way of doing backups such > that you don't lose at minimum most of a day's work? Yes. Don't just use the backup facilities on their own. Think about how the architecture of your systems will work and see if there is a better way when you look at very large systems. > I'm not exactly a > big fan do doing a pg_dump every night either given that the database is > 360GB. Much nicer to take a weekly dump of the database and then do > PITR for a week or two before taking another dump of the db. e.g. Keep your reference data (low volume) in an Operational Data Store (ODS) database, protected by archiving. Keep your main fact data (high volume) in the Data Warehouse, but save the data in slices as you load it, so that a recovery is simply a reload of the database: no PITR or pg_dump required, so high performance data transformation and load work is possible. This is a commonly used architectural design pattern. > I like the idea of making COPY go faster, but please don't break my > backup system while you're at it. On a personal note, I would only add that I spent a long time working on PITR and I would never design anything that would intentionally break it (nor would patches be accepted that did that). That probably gives me the confidence to approach designs that might look like I'm doing that, but without actually straying over the edge. > I'm honestly kind of nervous about > what you mean by checking it PITR is active- how is that done, exactly? > Check if you have a script set to rotate the logs elsewhere? Or is it > checking if you're in the taking-a-full-database-backup stage? Or what? Internally, we use XLogArchivingActive(). Externally this will be set when the admin sets archive_command to a particular value. My original preference was for a parameter called archive_mode= ON | OFF which would allow us to more easily discuss this, but this does not currently exist. > What's the performance decrease when using PITR, and what's it from? Is > it just that COPY isn't as fast? Honestly, I could live with COPY being > not as fast as it could be if my backups work. :) These commands will not be optimized for speed when archive_command is set: - CREATE TABLE AS SELECT (in 8.1) - COPY LOCK (patch submitted) > Sorry for sounding concerned but, well, backups are very important and > so is performance and I'm afraid either I've not read all the > documentation about the issues being discussed here or there isn't > enough out there to make sense of it all yet. :) If you choose PITR, then you are safe. If you do not, the crash recovery of the database is not endangered by these optimizations. Hope that covers all of your concerns? I'm just writing a course that explains many of these techniques, available in the New Year. Best Regards, Simon Riggs
pgsql-hackers by date: