DB logging (was: Problem with the numbers I reported yesterday) - Mailing list pgsql-hackers

From jwieck@debis.com (Jan Wieck)
Subject DB logging (was: Problem with the numbers I reported yesterday)
Date
Msg-id m0y3i6Y-000BFRC@orion.SAPserv.Hamburg.dsh.de
Whole thread Raw
In response to Re: [HACKERS] Problem with the numbers I reported yesterday  ("Kent S. Gordon" <kgor@inetspace.com>)
Responses Re: DB logging (was: Problem with the numbers I reported yesterday)  ("Kent S. Gordon" <kgor@inetspace.com>)
List pgsql-hackers
Kent wrote:
>
> I do not think that pg_log is used like a normal 'log' device in other
> databases.  My quick look at the code looks like pg_log only has a
> list of transactions and not the actual data blocks.  Notice that
> TransRecover is commented out in backent/access/transam/transam.c.
>
> Most database log has the before images and after images of any page
> that has been modified in a transaction followed by commit/abort
> record.  This allows for only this file to have to be synced.  The
> rest of the database can float (generally checkpoints are done every
> so often to reduce recover time).  The method of recovering from a
> crash is to replay the log from the last checkpoint until the end of
> the log by applying the before/after images (as needed based on
> weather the transaction commited) to the actual database relations.
>
> I would appreciate anyone correcting any mistakes in my understanding
> of how postgres works.
>
>     > Ocie Mitchell
>
> Kent S. Gordon
> Architect
> iNetSpace Co.
> voice: (972)851-3494 fax:(972)702-0384 e-mail:kgor@inetspace.com
>
>

    Totally  right,  PostgreSQL doesn't have a log mechanism that
    collects all the information to recover a corrupted  database
    from a backup.

    I hacked around on that a little bit.

    When doing a complete after image logging, that is taking all
    the tuples that are stored on insert/update, the  tuple  id's
    of  deletes  plus the information about transaction id's that
    commit, the regression tests produce log data  that  is  more
    than   the   size  of  the  final  regression  database.  The
    performance  increase  when  only  syncing   the   log-   and
    controlfiles  (2  control  files on different devices and the
    logfile on a different device from the  database  files)  and
    running  the  backends  with  -F  is  about  15-20%  for  the
    regression test.

    I thought this is far too much logging data and so  I  didn't
    spent much time trying to implement a recovery. But as far as
    I got it I can tell that the updates to system  catalogs  and
    keeping the indices up to date will be really tricky.

    Another  possible log mechanism I'll try sometimes after v6.3
    release is to log the queries and  data  from  copy  commands
    along with informations about Oid and Tid allocations.


Until later, Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

pgsql-hackers by date:

Previous
From: Tom I Helbekkmo
Date:
Subject: Re: [HACKERS] Re: [PORTS] v6.3 release ToDo list and supported ports
Next
From: t-ishii@sra.co.jp (Tatsuo Ishii)
Date:
Subject: Re: [HACKERS] v6.3 release ToDo list and supported ports