Thread: Need help understanding WAL and checkpoints
Hi there=0A=0AI'm trying to get my head around WAL and checkpoints and need= to ask a couple of questions before I get a headache.=0A=0AFirstly, I see = the terms "WAL log", "WAL file" and "transaction log" all over the place - = are these the same thing (i.e. files in the pg_xlog directory)?=0A=0AI'm a = bit confused by this paragraph in the docs:=0A=0A"Checkpoints are points in= the sequence of transactions at which it is guaranteed that the heap and i= ndex data files have been updated with all information written before the c= heckpoint. At checkpoint time, all dirty data pages are flushed to disk and= a special checkpoint record is written to the log file. (The changes were = previously flushed to the WAL files.)"=0A=0A( see: http://www.postgresql.or= g/docs/current/static/wal-configuration.html )=0A=0A"a special checkpoint r= ecord is written to the log file."=0A=A0=A0=A0 -> which log file is meant h= ere?=0A=0A"The changes were previously flushed to the WAL files."=0A=A0=A0 = -> does "previously" here mean "at a previous point in time" or "in previou= s PostgreSQL versions"?=0A=0A=A0=A0 -> at what point are changes flushed to= WAL files?=0A=0ASo say I perform an operation like :=0A=0A=A0 UPDATE foo S= ET bar=3D'baz'=0A=0Aare the following assumptions correct?=0A=0A- The first= time this changed data hits the disk, it is as an entry in the WAL log=0A= =0A- At some point a checkpoint occurs, and the changed data is written to = the actual data file from system memory (the "dirty data pages"?)=0A=0A- th= e only time the actual data files will be updated from the WAL log (i.e. no= t from system memory) will be after a crash, when the logs are replayed fro= m the last checkpoint?=0A=0Athanks for any help on this!=0A=0A- drew
drew_hunt wrote: > I'm trying to get my head around WAL and checkpoints and need to ask a co= uple of questions before I > get a headache. >=20 > Firstly, I see the terms "WAL log", "WAL file" and "transaction log" all = over the place - are these > the same thing (i.e. files in the pg_xlog directory)? Usually they mean the same thing. For exact definitions, read http://www.postgresql.org/docs/current/static/continuous-archiving.html#BAC= KUP-ARCHIVING-WAL "WAL" consists of a stream of "WAL records" and is physically represented as "WAL segment files" (in pg_xlog). > I'm a bit confused by this paragraph in the docs: >=20 > "Checkpoints are points in the sequence of transactions at which it is gu= aranteed that the heap and > index data files have been updated with all information written before th= e checkpoint. At checkpoint > time, all dirty data pages are flushed to disk and a special checkpoint r= ecord is written to the log > file. (The changes were previously flushed to the WAL files.)" >=20 > ( see: http://www.postgresql.org/docs/current/static/wal-configuration.ht= ml ) >=20 > "a special checkpoint record is written to the log file." > =A0=A0=A0 -> which log file is meant here? The WAL. > "The changes were previously flushed to the WAL files." > =A0=A0 -> does "previously" here mean "at a previous point in time" or "i= n previous PostgreSQL versions"? The former. > =A0=A0 -> at what point are changes flushed to WAL files? The change must be on disk in a WAL segment before the transaction can commit. > So say I perform an operation like : >=20 > =A0 UPDATE foo SET bar=3D'baz' >=20 > are the following assumptions correct? >=20 > - The first time this changed data hits the disk, it is as an entry in th= e WAL log >=20 > - At some point a checkpoint occurs, and the changed data is written to t= he actual data file from > system memory (the "dirty data pages"?) >=20 > - the only time the actual data files will be updated from the WAL log (i= .e. not from system memory) > will be after a crash, when the logs are replayed from the last checkpoin= t? All three are correct. Yours, Laurenz Albe
Hi Albe=0A=0A--- On Wed, 2013/2/6, Albe Laurenz <laurenz.albe@wien.gv.at> w= rote:=0A=0A> drew_hunt wrote:=0A> > I'm trying to get my head around WAL an= d checkpoints and need to ask a couple of questions before I=0A> > get a he= adache.=0A> > =0A> > Firstly, I see the terms "WAL log", "WAL file" and "tr= ansaction log" all over the place - are these=0A> > the same thing (i.e. fi= les in the pg_xlog directory)?=0A> =0A> Usually they mean the same thing.= =0A> For exact definitions, read=0A> http://www.postgresql.org/docs/current= /static/continuous-archiving.html#BACKUP-ARCHIVING-WAL=0A> "WAL" consists o= f a stream of "WAL records" and is physically=0A> represented as "WAL segme= nt files" (in pg_xlog).=0A> =0A> > I'm a bit confused by this paragraph in = the docs:=0A> > =0A> > "Checkpoints are points in the sequence of transacti= ons at which it is guaranteed that the heap and=0A> > index data files have= been updated with all information written before the checkpoint. At checkp= oint=0A> > time, all dirty data pages are flushed to disk and a special che= ckpoint record is written to the log=0A> > file. (The changes were previous= ly flushed to the WAL files.)"=0A> > =0A> > ( see: http://www.postgresql.or= g/docs/current/static/wal-configuration.html )=0A> > =0A> > "a special chec= kpoint record is written to the log file."=0A> > =A0=A0=A0 -> which log fil= e is meant here?=0A> =0A> The WAL.=0A> =0A> > "The changes were previously = flushed to the WAL files."=0A> > =A0=A0 -> does "previously" here mean "at = a previous point in time" or "in previous PostgreSQL versions"?=0A> =0A> Th= e former.=0A> =0A> > =A0=A0 -> at what point are changes flushed to WAL fil= es?=0A> =0A> The change must be on disk in a WAL segment before the=0A> tra= nsaction can commit.=0A> =0A> > So say I perform an operation like :=0A> > = =0A> > =A0 UPDATE foo SET bar=3D'baz'=0A> > =0A> > are the following assump= tions correct?=0A> > =0A> > - The first time this changed data hits the dis= k, it is as an entry in the WAL log=0A> > =0A> > - At some point a checkpoi= nt occurs, and the changed data is written to the actual data file from=0A>= > system memory (the "dirty data pages"?)=0A> > =0A> > - the only time the= actual data files will be updated from the WAL log (i.e. not from system m= emory)=0A> > will be after a crash, when the logs are replayed from the las= t checkpoint?=0A> =0A> All three are correct.=0A=0AMany thanks for the answ= ers, its cleared things up for me :)=0A=0A- drew