Thread: View causes error in pg_tables

View causes error in pg_tables

From
Michael Davis
Date:
05 March 1999, 23:56:59
The following select statement produces the provided error below:

select * from pg_tables;
ERROR:  cache lookup of attribute 0 in relation 6187373 failed

This happens after creating view Membership_Addresses (provided below).  I
also included the function used in Membership_Addresss.  Selecting from the
view Membership_Addresses works great (Select * from Membership_Addresses;).
What is causing the error in the pg_tables view?  If I drop view
Membership_Addresses the select * from pg_tables works great.

CREATE FUNCTION FormatAddress(ADDRESSES, int2) RETURNS varchar AS '
    DECLARE
        adr ALIAS FOR $1;
        includecountry ALIAS FOR $2;
BEGIN
        if (adr.address is null) then
            str1 := '''';
        else
            str1 := adr.address || ''\n'';
        end if;
        ... lots of other stuff ...
        return str1;
    END; ' LANGUAGE 'plpgsql';

CREATE VIEW Membership_Addresses as
    SELECT addressid, memberid, address, city, state, zipcode, country,
phone, altphone, fax,
        FormatAddress(addresses, 0) AS MemberAddress
    FROM addresses
    WHERE active = -1;



    -----Original Message-----
    From:    jwieck@debis.com [SMTP:jwieck@debis.com]
    Sent:    Friday, March 05, 1999 11:24 AM
    To:    Michael Davis
    Cc:    pgsql-general@postgreSQL.org; pgsql-hackers@postgreSQL.org
    Subject:    Re: [HACKERS] RE: [GENERAL] Transaction logging

    Michael Davis wrote:

    >
    > What would it take to have transaction logging added to Postgres.
I am a
    > c/c++ programmer and will consider contributing to the Postgres
development
    > effort.  I really like everything I see and read about Postgres.
As a

        I spent some time on transaction logging since it's a feature
        I'm missing too. There are mainly two  different  transaction
        log mechanisms out.

        1.  Log queries sent to the backend.

        2.  Log  images  of  inserted/updated  rows  and  row ID's of
            deleted ones.

        The query  level  logging  will  write  less  information  if
        queries usually affect a large number of rows.  Unfortunately
        the extensibility of Postgres work's against  this  approach.
        There  could  be  any  number of user written functions who's
        results  aren't  reproduceable  during  recovery.  And   core
        features of Postgres itself would introduce the same problem.
        Have a sequence which is used to create  default  values  for
        multiple  tables,  so  that one ID is unique across them. Now
        two backends insert (with  INSERT  ...  SELECT)  concurrently
        into  different  tables  using  the  same  sequence.   It's a
        classic race condition and it depends  on  context  switching
        and  page  faults  which  backend  will  get  which  sequence
        numbers. You cannot foresee and you cannot reproduce,  except
        you  hook into the sequence generator and log this too. Later
        when recovering, another hook  into  the  sequence  generator
        must    reproduce    the    logged   results   on   the   per
        backend/transaction/command base, and the same must  be  done
        for   each  function  that  usually  returns  unreproduceable
        results (anything dealing with time, pid's, etc.).

        As said, this must also cover user  functions.  So  at  least
        there  must  be  a  general  log  API  that  provides  such a
        functionality for user written functions.

        The image logging approach also has problems. First, the only
        thing  given to the heap access methods to outdate a tuple on
        update/delete is the current tuple ID (information that tells
        which  tuple  in  which block is meant).  So you need to save
        the database files  in  binary  format,  because  during  the
        actually  existing  dump/restore  this  could  change and the
        logged CTID's would hit the wrong tuples.

        Second, you must remember in the  log  which  transaction  ID
        these  informations  came  from  and later if the transaction
        committed or not, so the recovery can set  this  commit/abort
        information in pg_log too. pg_log is a shared system file and
        the transaction ID's are unique only for one  server.   Using
        this  information for online replication of a single database
        to another Postgres installation will not work.

        Third, there are still some shared system catalogs across all
        databases  (pg_database,  pg_group,  pg_log!!!, pg_shadow and
        pg_variable). Due to that it would be impossible (or at least
        very,  very  tricky) to restore/recover (maybe point in time)
        one single database. If you destroy one database and  restore
        it  from  the  binary backup, these shared catalogs cannot be
        restored too, so they're out of sync with  the  backup  time.
        How  should  the  recovery  now  hit  the right things (which
        probably must not be there at all)?.

        All this is really  a  mess.  I  think  the  architecture  of
        Postgres  will  only allow something on query level with some
        general API for things that must reproduce  the  same  result
        during  recovery.  For  example  time().  Inside the backend,
        time() should  never  be  called  directly.  Instead  another
        function  is  to be called that log's during normal operation
        which time get's returned by this  particular  function  call
        and  if  the  backend  is in recovery mode, returns the value
        from the log.

        And again, this all means trouble. Usually, most queries sent
        to  the  database  don't  change  any  data  because they are
        SELECT's. It would dramatically blow up the log amount if you
        log ALL queries instead of only those that modify things. But
        when the query begins, you don't know this, because a  SELECT
        might call a function that uses SPI to UPDATE something else.
        So the decision if the query must be logged or not  can  only
        be  made  when  the  query  is  done  (by  having some global
        variable where the  heap  access  methods  set  a  flag  that
        something  got written).  Now you have to log function call's
        like time() even if the query will not modify any single  row
        in the database because the query is a

        SELECT 'now'::datetime - updtime FROM ...

        Doing  this on a table with thousands of rows will definitely
        waste much logging space and  slowdown  the  whole  thing  by
        unnecessary logging.

        Maybe  it's  a  compromise  if at each query start the actual
        time and other such information is remembered by the backend,
        all  time() calls return this remembered value instead of the
        real one (wouldn't be bad anyway IMHO), and this  information
        is logged only if the query is to be logged.

        Finally  I  think  I must have missed some more problems, but
        aren't these enough already to frustrate you :-?


    Jan

    --


#======================================================================#
    # It's easier to get forgiveness for being wrong than for being
right. #
    # Let's break this rule - forgive me.
#
    #======================================== jwieck@debis.com (Jan
Wieck) #