Thread: RE: [GENERAL] Transaction logging

RE: [GENERAL] Transaction logging

From
Michael Davis
Date:
What would it take to have transaction logging added to Postgres.  I am a
c/c++ programmer and will consider contributing to the Postgres development
effort.  I really like everything I see and read about Postgres.  As a
result, I am strongly considering Postgres as the database engine for my
Membership database application.  My customer is willing to invest in a
commercial database, but most of the commercial databases I have briefly
looked at fall a little short in one way or another.  I have several
concerns/needs that I am willing to implement and/or support:

    - Outer join support in views

    - Transaction logging

    - Some form of mirroring, shadowing, or replication

    - The current locking mechanism is of some concern.  I need to make
sure that one user can read a record and then a second can read and update
that same record.

    - If the first user attempts to update that record, what happens?

I know some of these requests are currently being worked, it would be
helpful to get some idea of when these items are expected to be released.

Thanks, Michael

    -----Original Message-----
    From:    sdupille@i-france.com [SMTP:sdupille@i-france.com]
    Sent:    Tuesday, February 23, 1999 6:08 AM
    To:    pgsql-general@postgreSQL.org
    Subject:    Re: [GENERAL] Transaction logging


        Hi !

    Peter T Mount <peter@retep.org.uk> writes:
    > > Has anyone implemented transaction logging in Postgres?  Any
suggestions on
    > > how to easily implement transaction logging?  Storing the log
file in a text
    > > file seems best but I am not sure out to open and write to a
text file from
    > > a trigger.  I would also be nice to post this transaction log
against a back
    > > up server.

    > Just a quick thought, but how about using syslog? That can be used
to post
    > queries to a remote server, and it can be told to store the
"postgres"
    > stuff to a seperate file on that server.

    > Just an idea...

        Why not, but I think it's a bad idea. Syslog is used to log
    events coming from the system. It stores every connection to the
    system, and any event which can affect the system (such as power
    shutdown).

        The transaction logging is a different taste of log : it
must
    store every transaction made to the database, and in case of
deletion
    of records, or change to data, it must save the old values. So it
    generates a lot of traffic, and is closely dependant of the database
    system.

        Syslog is not strong enough to deal with so much data, and
the
    use of an external process to get the transaction logging would
    generate too much traffic (the cost in time would be too high). The
    logging facility would, I think, be made by the database itself.

        Anything else : the logging facility is used to recover the
    database after a crash (mainly). This kind of log _must_ be easy to
    use in case of crash. Syslog is very well when you won't to know
what
    append, but not to deal with the problem. Don't forget that Syslog
add
    some data to the events we send to him (such as the sender and the
    date of the message). These data, in case of recovery by transaction
    logging mechanism, are noise, which get the recovery (a little bit)
    harder.

        I don't think that we could get a logging facility with the
    use of triggers. I think it would be better to hack the postgres
    backend, and supersedes the access to SQL primitives (like insert or
    update). It would be a little harder to implement, but faster and
    totally transparent to the user.

        regards.

    --
      ___
    {~._.~}                    Stephane - DUST - Dupille
     ( Y )            You were dust and you shall turn into dust
    ()~*~()                  email : sdupille@i-france.com
    (_)-(_)

Re: [GENERAL] Transaction logging

From
Bruce Momjian
Date:
> What would it take to have transaction logging added to Postgres.  I am a
> c/c++ programmer and will consider contributing to the Postgres development
> effort.  I really like everything I see and read about Postgres.  As a
> result, I am strongly considering Postgres as the database engine for my
> Membership database application.  My customer is willing to invest in a
> commercial database, but most of the commercial databases I have briefly
> looked at fall a little short in one way or another.  I have several
> concerns/needs that I am willing to implement and/or support:
>
>     - Outer join support in views

In the works.  Perhaps for 6.5, probably not.

>
>     - Transaction logging
>
>     - Some form of mirroring, shadowing, or replication
>
>     - The current locking mechanism is of some concern.  I need to make
> sure that one user can read a record and then a second can read and update
> that same record.

MVCC locking in 6.5.  Will do what you need.
>
>     - If the first user attempts to update that record, what happens?

Hard to explain.  Will wait or update a copy while read's use an older
copy fo the row.


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [HACKERS] RE: [GENERAL] Transaction logging

From
jwieck@debis.com (Jan Wieck)
Date:
Michael Davis wrote:

>
> What would it take to have transaction logging added to Postgres.  I am a
> c/c++ programmer and will consider contributing to the Postgres development
> effort.  I really like everything I see and read about Postgres.  As a

    I spent some time on transaction logging since it's a feature
    I'm missing too. There are mainly two  different  transaction
    log mechanisms out.

    1.  Log queries sent to the backend.

    2.  Log  images  of  inserted/updated  rows  and  row ID's of
        deleted ones.

    The query  level  logging  will  write  less  information  if
    queries usually affect a large number of rows.  Unfortunately
    the extensibility of Postgres work's against  this  approach.
    There  could  be  any  number of user written functions who's
    results  aren't  reproduceable  during  recovery.  And   core
    features of Postgres itself would introduce the same problem.
    Have a sequence which is used to create  default  values  for
    multiple  tables,  so  that one ID is unique across them. Now
    two backends insert (with  INSERT  ...  SELECT)  concurrently
    into  different  tables  using  the  same  sequence.   It's a
    classic race condition and it depends  on  context  switching
    and  page  faults  which  backend  will  get  which  sequence
    numbers. You cannot foresee and you cannot reproduce,  except
    you  hook into the sequence generator and log this too. Later
    when recovering, another hook  into  the  sequence  generator
    must    reproduce    the    logged   results   on   the   per
    backend/transaction/command base, and the same must  be  done
    for   each  function  that  usually  returns  unreproduceable
    results (anything dealing with time, pid's, etc.).

    As said, this must also cover user  functions.  So  at  least
    there  must  be  a  general  log  API  that  provides  such a
    functionality for user written functions.

    The image logging approach also has problems. First, the only
    thing  given to the heap access methods to outdate a tuple on
    update/delete is the current tuple ID (information that tells
    which  tuple  in  which block is meant).  So you need to save
    the database files  in  binary  format,  because  during  the
    actually  existing  dump/restore  this  could  change and the
    logged CTID's would hit the wrong tuples.

    Second, you must remember in the  log  which  transaction  ID
    these  informations  came  from  and later if the transaction
    committed or not, so the recovery can set  this  commit/abort
    information in pg_log too. pg_log is a shared system file and
    the transaction ID's are unique only for one  server.   Using
    this  information for online replication of a single database
    to another Postgres installation will not work.

    Third, there are still some shared system catalogs across all
    databases  (pg_database,  pg_group,  pg_log!!!, pg_shadow and
    pg_variable). Due to that it would be impossible (or at least
    very,  very  tricky) to restore/recover (maybe point in time)
    one single database. If you destroy one database and  restore
    it  from  the  binary backup, these shared catalogs cannot be
    restored too, so they're out of sync with  the  backup  time.
    How  should  the  recovery  now  hit  the right things (which
    probably must not be there at all)?.

    All this is really  a  mess.  I  think  the  architecture  of
    Postgres  will  only allow something on query level with some
    general API for things that must reproduce  the  same  result
    during  recovery.  For  example  time().  Inside the backend,
    time() should  never  be  called  directly.  Instead  another
    function  is  to be called that log's during normal operation
    which time get's returned by this  particular  function  call
    and  if  the  backend  is in recovery mode, returns the value
    from the log.

    And again, this all means trouble. Usually, most queries sent
    to  the  database  don't  change  any  data  because they are
    SELECT's. It would dramatically blow up the log amount if you
    log ALL queries instead of only those that modify things. But
    when the query begins, you don't know this, because a  SELECT
    might call a function that uses SPI to UPDATE something else.
    So the decision if the query must be logged or not  can  only
    be  made  when  the  query  is  done  (by  having some global
    variable where the  heap  access  methods  set  a  flag  that
    something  got written).  Now you have to log function call's
    like time() even if the query will not modify any single  row
    in the database because the query is a

    SELECT 'now'::datetime - updtime FROM ...

    Doing  this on a table with thousands of rows will definitely
    waste much logging space and  slowdown  the  whole  thing  by
    unnecessary logging.

    Maybe  it's  a  compromise  if at each query start the actual
    time and other such information is remembered by the backend,
    all  time() calls return this remembered value instead of the
    real one (wouldn't be bad anyway IMHO), and this  information
    is logged only if the query is to be logged.

    Finally  I  think  I must have missed some more problems, but
    aren't these enough already to frustrate you :-?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] RE: [GENERAL] Transaction logging

From
Bruce Momjian
Date:
Added to TODO:

    * Transaction log, so re-do log can be on a separate disk by
      logging SQL queries, or before/after row images



> Michael Davis wrote:
>
> >
> > What would it take to have transaction logging added to Postgres.  I am a
> > c/c++ programmer and will consider contributing to the Postgres development
> > effort.  I really like everything I see and read about Postgres.  As a
>
>     I spent some time on transaction logging since it's a feature
>     I'm missing too. There are mainly two  different  transaction
>     log mechanisms out.
>
>     1.  Log queries sent to the backend.
>
>     2.  Log  images  of  inserted/updated  rows  and  row ID's of
>         deleted ones.
>
>     The query  level  logging  will  write  less  information  if
>     queries usually affect a large number of rows.  Unfortunately
>     the extensibility of Postgres work's against  this  approach.
>     There  could  be  any  number of user written functions who's
>     results  aren't  reproduceable  during  recovery.  And   core
>     features of Postgres itself would introduce the same problem.
>     Have a sequence which is used to create  default  values  for
>     multiple  tables,  so  that one ID is unique across them. Now
>     two backends insert (with  INSERT  ...  SELECT)  concurrently
>     into  different  tables  using  the  same  sequence.   It's a
>     classic race condition and it depends  on  context  switching
>     and  page  faults  which  backend  will  get  which  sequence
>     numbers. You cannot foresee and you cannot reproduce,  except
>     you  hook into the sequence generator and log this too. Later
>     when recovering, another hook  into  the  sequence  generator
>     must    reproduce    the    logged   results   on   the   per
>     backend/transaction/command base, and the same must  be  done
>     for   each  function  that  usually  returns  unreproduceable
>     results (anything dealing with time, pid's, etc.).
>
>     As said, this must also cover user  functions.  So  at  least
>     there  must  be  a  general  log  API  that  provides  such a
>     functionality for user written functions.
>
>     The image logging approach also has problems. First, the only
>     thing  given to the heap access methods to outdate a tuple on
>     update/delete is the current tuple ID (information that tells
>     which  tuple  in  which block is meant).  So you need to save
>     the database files  in  binary  format,  because  during  the
>     actually  existing  dump/restore  this  could  change and the
>     logged CTID's would hit the wrong tuples.
>
>     Second, you must remember in the  log  which  transaction  ID
>     these  informations  came  from  and later if the transaction
>     committed or not, so the recovery can set  this  commit/abort
>     information in pg_log too. pg_log is a shared system file and
>     the transaction ID's are unique only for one  server.   Using
>     this  information for online replication of a single database
>     to another Postgres installation will not work.
>
>     Third, there are still some shared system catalogs across all
>     databases  (pg_database,  pg_group,  pg_log!!!, pg_shadow and
>     pg_variable). Due to that it would be impossible (or at least
>     very,  very  tricky) to restore/recover (maybe point in time)
>     one single database. If you destroy one database and  restore
>     it  from  the  binary backup, these shared catalogs cannot be
>     restored too, so they're out of sync with  the  backup  time.
>     How  should  the  recovery  now  hit  the right things (which
>     probably must not be there at all)?.
>
>     All this is really  a  mess.  I  think  the  architecture  of
>     Postgres  will  only allow something on query level with some
>     general API for things that must reproduce  the  same  result
>     during  recovery.  For  example  time().  Inside the backend,
>     time() should  never  be  called  directly.  Instead  another
>     function  is  to be called that log's during normal operation
>     which time get's returned by this  particular  function  call
>     and  if  the  backend  is in recovery mode, returns the value
>     from the log.
>
>     And again, this all means trouble. Usually, most queries sent
>     to  the  database  don't  change  any  data  because they are
>     SELECT's. It would dramatically blow up the log amount if you
>     log ALL queries instead of only those that modify things. But
>     when the query begins, you don't know this, because a  SELECT
>     might call a function that uses SPI to UPDATE something else.
>     So the decision if the query must be logged or not  can  only
>     be  made  when  the  query  is  done  (by  having some global
>     variable where the  heap  access  methods  set  a  flag  that
>     something  got written).  Now you have to log function call's
>     like time() even if the query will not modify any single  row
>     in the database because the query is a
>
>     SELECT 'now'::datetime - updtime FROM ...
>
>     Doing  this on a table with thousands of rows will definitely
>     waste much logging space and  slowdown  the  whole  thing  by
>     unnecessary logging.
>
>     Maybe  it's  a  compromise  if at each query start the actual
>     time and other such information is remembered by the backend,
>     all  time() calls return this remembered value instead of the
>     real one (wouldn't be bad anyway IMHO), and this  information
>     is logged only if the query is to be logged.
>
>     Finally  I  think  I must have missed some more problems, but
>     aren't these enough already to frustrate you :-?
>
>
> Jan
>
> --
>
> #======================================================================#
> # It's easier to get forgiveness for being wrong than for being right. #
> # Let's break this rule - forgive me.                                  #
> #======================================== jwieck@debis.com (Jan Wieck) #
>
>
>
>


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026