Thread: RE: [GENERAL] Transaction logging
What would it take to have transaction logging added to Postgres. I am a c/c++ programmer and will consider contributing to the Postgres development effort. I really like everything I see and read about Postgres. As a result, I am strongly considering Postgres as the database engine for my Membership database application. My customer is willing to invest in a commercial database, but most of the commercial databases I have briefly looked at fall a little short in one way or another. I have several concerns/needs that I am willing to implement and/or support: - Outer join support in views - Transaction logging - Some form of mirroring, shadowing, or replication - The current locking mechanism is of some concern. I need to make sure that one user can read a record and then a second can read and update that same record. - If the first user attempts to update that record, what happens? I know some of these requests are currently being worked, it would be helpful to get some idea of when these items are expected to be released. Thanks, Michael -----Original Message----- From: sdupille@i-france.com [SMTP:sdupille@i-france.com] Sent: Tuesday, February 23, 1999 6:08 AM To: pgsql-general@postgreSQL.org Subject: Re: [GENERAL] Transaction logging Hi ! Peter T Mount <peter@retep.org.uk> writes: > > Has anyone implemented transaction logging in Postgres? Any suggestions on > > how to easily implement transaction logging? Storing the log file in a text > > file seems best but I am not sure out to open and write to a text file from > > a trigger. I would also be nice to post this transaction log against a back > > up server. > Just a quick thought, but how about using syslog? That can be used to post > queries to a remote server, and it can be told to store the "postgres" > stuff to a seperate file on that server. > Just an idea... Why not, but I think it's a bad idea. Syslog is used to log events coming from the system. It stores every connection to the system, and any event which can affect the system (such as power shutdown). The transaction logging is a different taste of log : it must store every transaction made to the database, and in case of deletion of records, or change to data, it must save the old values. So it generates a lot of traffic, and is closely dependant of the database system. Syslog is not strong enough to deal with so much data, and the use of an external process to get the transaction logging would generate too much traffic (the cost in time would be too high). The logging facility would, I think, be made by the database itself. Anything else : the logging facility is used to recover the database after a crash (mainly). This kind of log _must_ be easy to use in case of crash. Syslog is very well when you won't to know what append, but not to deal with the problem. Don't forget that Syslog add some data to the events we send to him (such as the sender and the date of the message). These data, in case of recovery by transaction logging mechanism, are noise, which get the recovery (a little bit) harder. I don't think that we could get a logging facility with the use of triggers. I think it would be better to hack the postgres backend, and supersedes the access to SQL primitives (like insert or update). It would be a little harder to implement, but faster and totally transparent to the user. regards. -- ___ {~._.~} Stephane - DUST - Dupille ( Y ) You were dust and you shall turn into dust ()~*~() email : sdupille@i-france.com (_)-(_)
> What would it take to have transaction logging added to Postgres. I am a > c/c++ programmer and will consider contributing to the Postgres development > effort. I really like everything I see and read about Postgres. As a > result, I am strongly considering Postgres as the database engine for my > Membership database application. My customer is willing to invest in a > commercial database, but most of the commercial databases I have briefly > looked at fall a little short in one way or another. I have several > concerns/needs that I am willing to implement and/or support: > > - Outer join support in views In the works. Perhaps for 6.5, probably not. > > - Transaction logging > > - Some form of mirroring, shadowing, or replication > > - The current locking mechanism is of some concern. I need to make > sure that one user can read a record and then a second can read and update > that same record. MVCC locking in 6.5. Will do what you need. > > - If the first user attempts to update that record, what happens? Hard to explain. Will wait or update a copy while read's use an older copy fo the row. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Michael Davis wrote: > > What would it take to have transaction logging added to Postgres. I am a > c/c++ programmer and will consider contributing to the Postgres development > effort. I really like everything I see and read about Postgres. As a I spent some time on transaction logging since it's a feature I'm missing too. There are mainly two different transaction log mechanisms out. 1. Log queries sent to the backend. 2. Log images of inserted/updated rows and row ID's of deleted ones. The query level logging will write less information if queries usually affect a large number of rows. Unfortunately the extensibility of Postgres work's against this approach. There could be any number of user written functions who's results aren't reproduceable during recovery. And core features of Postgres itself would introduce the same problem. Have a sequence which is used to create default values for multiple tables, so that one ID is unique across them. Now two backends insert (with INSERT ... SELECT) concurrently into different tables using the same sequence. It's a classic race condition and it depends on context switching and page faults which backend will get which sequence numbers. You cannot foresee and you cannot reproduce, except you hook into the sequence generator and log this too. Later when recovering, another hook into the sequence generator must reproduce the logged results on the per backend/transaction/command base, and the same must be done for each function that usually returns unreproduceable results (anything dealing with time, pid's, etc.). As said, this must also cover user functions. So at least there must be a general log API that provides such a functionality for user written functions. The image logging approach also has problems. First, the only thing given to the heap access methods to outdate a tuple on update/delete is the current tuple ID (information that tells which tuple in which block is meant). So you need to save the database files in binary format, because during the actually existing dump/restore this could change and the logged CTID's would hit the wrong tuples. Second, you must remember in the log which transaction ID these informations came from and later if the transaction committed or not, so the recovery can set this commit/abort information in pg_log too. pg_log is a shared system file and the transaction ID's are unique only for one server. Using this information for online replication of a single database to another Postgres installation will not work. Third, there are still some shared system catalogs across all databases (pg_database, pg_group, pg_log!!!, pg_shadow and pg_variable). Due to that it would be impossible (or at least very, very tricky) to restore/recover (maybe point in time) one single database. If you destroy one database and restore it from the binary backup, these shared catalogs cannot be restored too, so they're out of sync with the backup time. How should the recovery now hit the right things (which probably must not be there at all)?. All this is really a mess. I think the architecture of Postgres will only allow something on query level with some general API for things that must reproduce the same result during recovery. For example time(). Inside the backend, time() should never be called directly. Instead another function is to be called that log's during normal operation which time get's returned by this particular function call and if the backend is in recovery mode, returns the value from the log. And again, this all means trouble. Usually, most queries sent to the database don't change any data because they are SELECT's. It would dramatically blow up the log amount if you log ALL queries instead of only those that modify things. But when the query begins, you don't know this, because a SELECT might call a function that uses SPI to UPDATE something else. So the decision if the query must be logged or not can only be made when the query is done (by having some global variable where the heap access methods set a flag that something got written). Now you have to log function call's like time() even if the query will not modify any single row in the database because the query is a SELECT 'now'::datetime - updtime FROM ... Doing this on a table with thousands of rows will definitely waste much logging space and slowdown the whole thing by unnecessary logging. Maybe it's a compromise if at each query start the actual time and other such information is remembered by the backend, all time() calls return this remembered value instead of the real one (wouldn't be bad anyway IMHO), and this information is logged only if the query is to be logged. Finally I think I must have missed some more problems, but aren't these enough already to frustrate you :-? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
Added to TODO: * Transaction log, so re-do log can be on a separate disk by logging SQL queries, or before/after row images > Michael Davis wrote: > > > > > What would it take to have transaction logging added to Postgres. I am a > > c/c++ programmer and will consider contributing to the Postgres development > > effort. I really like everything I see and read about Postgres. As a > > I spent some time on transaction logging since it's a feature > I'm missing too. There are mainly two different transaction > log mechanisms out. > > 1. Log queries sent to the backend. > > 2. Log images of inserted/updated rows and row ID's of > deleted ones. > > The query level logging will write less information if > queries usually affect a large number of rows. Unfortunately > the extensibility of Postgres work's against this approach. > There could be any number of user written functions who's > results aren't reproduceable during recovery. And core > features of Postgres itself would introduce the same problem. > Have a sequence which is used to create default values for > multiple tables, so that one ID is unique across them. Now > two backends insert (with INSERT ... SELECT) concurrently > into different tables using the same sequence. It's a > classic race condition and it depends on context switching > and page faults which backend will get which sequence > numbers. You cannot foresee and you cannot reproduce, except > you hook into the sequence generator and log this too. Later > when recovering, another hook into the sequence generator > must reproduce the logged results on the per > backend/transaction/command base, and the same must be done > for each function that usually returns unreproduceable > results (anything dealing with time, pid's, etc.). > > As said, this must also cover user functions. So at least > there must be a general log API that provides such a > functionality for user written functions. > > The image logging approach also has problems. First, the only > thing given to the heap access methods to outdate a tuple on > update/delete is the current tuple ID (information that tells > which tuple in which block is meant). So you need to save > the database files in binary format, because during the > actually existing dump/restore this could change and the > logged CTID's would hit the wrong tuples. > > Second, you must remember in the log which transaction ID > these informations came from and later if the transaction > committed or not, so the recovery can set this commit/abort > information in pg_log too. pg_log is a shared system file and > the transaction ID's are unique only for one server. Using > this information for online replication of a single database > to another Postgres installation will not work. > > Third, there are still some shared system catalogs across all > databases (pg_database, pg_group, pg_log!!!, pg_shadow and > pg_variable). Due to that it would be impossible (or at least > very, very tricky) to restore/recover (maybe point in time) > one single database. If you destroy one database and restore > it from the binary backup, these shared catalogs cannot be > restored too, so they're out of sync with the backup time. > How should the recovery now hit the right things (which > probably must not be there at all)?. > > All this is really a mess. I think the architecture of > Postgres will only allow something on query level with some > general API for things that must reproduce the same result > during recovery. For example time(). Inside the backend, > time() should never be called directly. Instead another > function is to be called that log's during normal operation > which time get's returned by this particular function call > and if the backend is in recovery mode, returns the value > from the log. > > And again, this all means trouble. Usually, most queries sent > to the database don't change any data because they are > SELECT's. It would dramatically blow up the log amount if you > log ALL queries instead of only those that modify things. But > when the query begins, you don't know this, because a SELECT > might call a function that uses SPI to UPDATE something else. > So the decision if the query must be logged or not can only > be made when the query is done (by having some global > variable where the heap access methods set a flag that > something got written). Now you have to log function call's > like time() even if the query will not modify any single row > in the database because the query is a > > SELECT 'now'::datetime - updtime FROM ... > > Doing this on a table with thousands of rows will definitely > waste much logging space and slowdown the whole thing by > unnecessary logging. > > Maybe it's a compromise if at each query start the actual > time and other such information is remembered by the backend, > all time() calls return this remembered value instead of the > real one (wouldn't be bad anyway IMHO), and this information > is logged only if the query is to be logged. > > Finally I think I must have missed some more problems, but > aren't these enough already to frustrate you :-? > > > Jan > > -- > > #======================================================================# > # It's easier to get forgiveness for being wrong than for being right. # > # Let's break this rule - forgive me. # > #======================================== jwieck@debis.com (Jan Wieck) # > > > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026