Re: [HACKERS] redolog - for discussion - Mailing list pgsql-hackers
From | jwieck@debis.com (Jan Wieck) |
---|---|
Subject | Re: [HACKERS] redolog - for discussion |
Date | |
Msg-id | m0zqNDo-000EBTC@orion.SAPserv.Hamburg.dsh.de Whole thread Raw |
In response to | Re: [HACKERS] redolog - for discussion (Vadim Mikheev <vadim@krs.ru>) |
List | pgsql-hackers |
Vadim wrote: > > Jan Wieck wrote: > > > > RECOVER DATABASE {ALL | UNTIL 'datetime' | RESET}; > > > ... > > > > For the others, the backend starts the recovery program > > which reads the redolog files, establishes database > > connections as required and reruns all the commands in > ^^^^^^^^^^^^^^^^^^^^^^^^^^ > > them. If a required logfile isn't found, it tells the > ^^^^^ > > I foresee problems with using _commands_ logging for > recovery/replication -:(( > > Let's consider two concurrent updates in READ COMMITTED mode: > > update test set x = 2 where y = 1; > > and > > update test set x = 3 where y = 1; > > The result of both committed transaction will be x = 2 > if the 1st transaction updated row _after_ 2nd transaction > and x = 3 if the 2nd transaction gets row after 1st one. > Order of updates is not defined by order in which commands > begun and so order in which commands should be rerun > will be unknown... Yepp, the order in which commands begun is absolutely not of interest. Locking could already delay the execution of one command until another one started later has finished and released the lock. It's a classic race condition. Thus, my plan was to log the queries just before the call to CommitTransactionCommand() in tcop. This has the advantage, that queries which bail out with errors don't get into the log at all and must not get rerun. And I can set a static flag to false before starting the command, which is set to true in the buffer manager when a buffer is written (marked dirty), so filtering out queries that do no updates at all is easy. Unfortunately query level logging get's hit by the current implementation of sequence numbers. If a query that get's aborted somewhere in the middle (maybe by a trigger) called nextval() for rows processed earlier, the sequence number isn't advanced at recovery time, because the query is suppressed at all. And sequences aren't locked, so for concurrently running queries getting numbers from the same sequence, the results aren't reproduceable. If some application selects a value resulting from a sequence and uses that later in another query, how could the redolog know that this has changed? It's a Const in the query logged, and all that corrupts the whole thing. All that is painful and I don't see another solution yet than to hook into nextval(), log out the numbers generated in normal operation and getting back the same numbers in redo mode. The whole thing gets more and more complicated :-( Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
pgsql-hackers by date: