Thread: RE: [HACKERS] RE: [GENERAL] Transaction logging

RE: [HACKERS] RE: [GENERAL] Transaction logging

From
Michael Davis
Date:
I like it.

    -----Original Message-----
    From:    Jackson, DeJuan [SMTP:djackson@cpsgroup.com]
    Sent:    Friday, March 05, 1999 12:02 PM
    To:    jwieck@debis.com; Michael Davis
    Cc:    pgsql-general@postgreSQL.org; pgsql-hackers@postgreSQL.org
    Subject:    RE: [HACKERS] RE: [GENERAL] Transaction logging

    Wouldn't it be possible to have the commit take care of the logging.
I
    don't know enough about the backend, but all those functions have to
get
    evaluated at some point before the commit in order to be written to
the
    db.  You could actually have the commit log the needed sql-commands
    without any function interference.  Also doesn't commit know if any
rows
    have been updated by any of the transaction's statements.  This
    could/would require filtering rules and triggers from the output, or
    turning them off on the restore.
    As far as the shared-tables/catalogues are concerned.  We could have
    each full-database dump take care of it's entries into the shared
    tables.  Having the transaction logs be incremental from the point
of
    the dump allows the catalogues to be self updating (assuming we are
    logging vacuums).

    All this depends on getting a database commit to output it's
information
    easily.  It also ignores MVCC's effect on transactions, but I think
it's
    a workable model.

    You can tell me I'm blowing smoke now.
        -DEJ

    > -----Original Message-----
    > From: jwieck@debis.com [mailto:jwieck@debis.com]
    > Sent: Friday, March 05, 1999 12:24 PM
    > To: michael.davis@prevuenet.com
    > Cc: pgsql-general@postgreSQL.org; pgsql-hackers@postgreSQL.org
    > Subject: Re: [HACKERS] RE: [GENERAL] Transaction logging
    >
    >
    > Michael Davis wrote:
    >
    > >
    > > What would it take to have transaction logging added to
    > Postgres.  I am a
    > > c/c++ programmer and will consider contributing to the
    > Postgres development
    > > effort.  I really like everything I see and read about
    > Postgres.  As a
    >
    >     I spent some time on transaction logging since it's a feature
    >     I'm missing too. There are mainly two  different  transaction
    >     log mechanisms out.
    >
    >     1.  Log queries sent to the backend.
    >
    >     2.  Log  images  of  inserted/updated  rows  and  row ID's of
    >         deleted ones.
    >
    >     The query  level  logging  will  write  less  information  if
    >     queries usually affect a large number of rows.  Unfortunately
    >     the extensibility of Postgres work's against  this  approach.
    >     There  could  be  any  number of user written functions who's
    >     results  aren't  reproduceable  during  recovery.  And   core
    >     features of Postgres itself would introduce the same problem.
    >     Have a sequence which is used to create  default  values  for
    >     multiple  tables,  so  that one ID is unique across them. Now
    >     two backends insert (with  INSERT  ...  SELECT)  concurrently
    >     into  different  tables  using  the  same  sequence.   It's a
    >     classic race condition and it depends  on  context  switching
    >     and  page  faults  which  backend  will  get  which  sequence
    >     numbers. You cannot foresee and you cannot reproduce,  except
    >     you  hook into the sequence generator and log this too. Later
    >     when recovering, another hook  into  the  sequence  generator
    >     must    reproduce    the    logged   results   on   the   per
    >     backend/transaction/command base, and the same must  be  done
    >     for   each  function  that  usually  returns  unreproduceable
    >     results (anything dealing with time, pid's, etc.).
    >
    >     As said, this must also cover user  functions.  So  at  least
    >     there  must  be  a  general  log  API  that  provides  such a
    >     functionality for user written functions.
    >
    >     The image logging approach also has problems. First, the only
    >     thing  given to the heap access methods to outdate a tuple on
    >     update/delete is the current tuple ID (information that tells
    >     which  tuple  in  which block is meant).  So you need to save
    >     the database files  in  binary  format,  because  during  the
    >     actually  existing  dump/restore  this  could  change and the
    >     logged CTID's would hit the wrong tuples.
    >
    >     Second, you must remember in the  log  which  transaction  ID
    >     these  informations  came  from  and later if the transaction
    >     committed or not, so the recovery can set  this  commit/abort
    >     information in pg_log too. pg_log is a shared system file and
    >     the transaction ID's are unique only for one  server.   Using
    >     this  information for online replication of a single database
    >     to another Postgres installation will not work.
    >
    >     Third, there are still some shared system catalogs across all
    >     databases  (pg_database,  pg_group,  pg_log!!!, pg_shadow and
    >     pg_variable). Due to that it would be impossible (or at least
    >     very,  very  tricky) to restore/recover (maybe point in time)
    >     one single database. If you destroy one database and  restore
    >     it  from  the  binary backup, these shared catalogs cannot be
    >     restored too, so they're out of sync with  the  backup  time.
    >     How  should  the  recovery  now  hit  the right things (which
    >     probably must not be there at all)?.
    >
    >     All this is really  a  mess.  I  think  the  architecture  of
    >     Postgres  will  only allow something on query level with some
    >     general API for things that must reproduce  the  same  result
    >     during  recovery.  For  example  time().  Inside the backend,
    >     time() should  never  be  called  directly.  Instead  another
    >     function  is  to be called that log's during normal operation
    >     which time get's returned by this  particular  function  call
    >     and  if  the  backend  is in recovery mode, returns the value
    >     from the log.
    >
    >     And again, this all means trouble. Usually, most queries sent
    >     to  the  database  don't  change  any  data  because they are
    >     SELECT's. It would dramatically blow up the log amount if you
    >     log ALL queries instead of only those that modify things. But
    >     when the query begins, you don't know this, because a  SELECT
    >     might call a function that uses SPI to UPDATE something else.
    >     So the decision if the query must be logged or not  can  only
    >     be  made  when  the  query  is  done  (by  having some global
    >     variable where the  heap  access  methods  set  a  flag  that
    >     something  got written).  Now you have to log function call's
    >     like time() even if the query will not modify any single  row
    >     in the database because the query is a
    >
    >     SELECT 'now'::datetime - updtime FROM ...
    >
    >     Doing  this on a table with thousands of rows will definitely
    >     waste much logging space and  slowdown  the  whole  thing  by
    >     unnecessary logging.
    >
    >     Maybe  it's  a  compromise  if at each query start the actual
    >     time and other such information is remembered by the backend,
    >     all  time() calls return this remembered value instead of the
    >     real one (wouldn't be bad anyway IMHO), and this  information
    >     is logged only if the query is to be logged.
    >
    >     Finally  I  think  I must have missed some more problems, but
    >     aren't these enough already to frustrate you :-?
    >
    >
    > Jan
    >
    > --
    >
    > #=============================================================
    > =========#
    > # It's easier to get forgiveness for being wrong than for
    > being right. #
    > # Let's break this rule - forgive me.
    >          #
    > #======================================== jwieck@debis.com
    > (Jan Wieck) #
    >
    >
    >