Thread: Backend crashes - what's going on here???

Backend crashes - what's going on here???

From
jwieck@debis.com (Jan Wieck)
Date:
Hey,

    the current snapshot dumps core on the 4th time doing

    REVOKE ALL ON pg_user FROM public;

    It  does  too in other situations but this is the simplest to
    reproduce. The segmentation fault happens in nocachegetattr()
    due  to  a  destroyed  tuple descriptor (natts = 0!!! and the
    others don't look good either) for the syscache 21 (USENAME).
    But the destruction must happen somewhere else.

    With  the  02/13  snapshot  I haven't got any problems on it.
    But cannot find the error with diff.

    BTW: Doing last checks on view permissions - sending a  patch
    soon.


Until later, Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] Backend crashes - what's going on here???

From
Bruce Momjian
Date:
>
> Hey,
>
>     the current snapshot dumps core on the 4th time doing
>
>     REVOKE ALL ON pg_user FROM public;
>
>     It  does  too in other situations but this is the simplest to
>     reproduce. The segmentation fault happens in nocachegetattr()
>     due  to  a  destroyed  tuple descriptor (natts = 0!!! and the
>     others don't look good either) for the syscache 21 (USENAME).
>     But the destruction must happen somewhere else.
>
>     With  the  02/13  snapshot  I haven't got any problems on it.
>     But cannot find the error with diff.
>
>     BTW: Doing last checks on view permissions - sending a  patch
>     soon.

Yep, I saw this too when testing my password acl null patch.  Couldn't
reproduce it, so I thought it was a fluke.

--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] Backend crashes - what's going on here???

From
jwieck@debis.com (Jan Wieck)
Date:
Whow - gdb is a nice tool

>
> >
> > Hey,
> >
> >     the current snapshot dumps core on the 4th time doing
> >
> >     REVOKE ALL ON pg_user FROM public;
> >
> >     It  does  too in other situations but this is the simplest to
> >     reproduce. The segmentation fault happens in nocachegetattr()
> >     due  to  a  destroyed  tuple descriptor (natts = 0!!! and the
> >     others don't look good either) for the syscache 21 (USENAME).
> >     But the destruction must happen somewhere else.
> >
> >     With  the  02/13  snapshot  I haven't got any problems on it.
> >     But cannot find the error with diff.
> >
> >     BTW: Doing last checks on view permissions - sending a  patch
> >     soon.
>
> Yep, I saw this too when testing my password acl null patch.  Couldn't
> reproduce it, so I thought it was a fluke.
>
> --
> Bruce Momjian
> maillist@candle.pha.pa.us
>

    Have  a  clue  now  what  causes  the  crash. It happens when
    pg_user is looked up in the syscache. It must have to do with
    the   fact   that  during  initialization  in  miscinit.c  on
    SetUserId()    the    user    tuple    is    fetched    using
    SearchSysCacheTuple().   Due  to  this  the SysCache entry 21
    gets initialized but later on start transaction  through  the
    cache  reset  the  memory  for the cc_tupdesc in the cache is
    freed. So I assume when SetUserId() is called,  the  syscache
    is not ready for use yet.

    I  don't  have a solution right now. Is someone more familiar
    with  the  handling  of  the  syscache  during  startup?   Is
    SetUserId() just called a little too early or is the syscache
    unusable during InitPostgres at all?

    But the fact  that  CatalogCacheInitializeCache()  is  called
    only  for  pg_user during startup makes me feel sure that the
    lookup of the user using SearchSysCacheTuple()  is  wrong  at
    this  time.  I  think  it  sould  be  done  without using the
    syscache.

    Back on monday - maybe with a solution.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] Backend crashes - what's going on here???

From
jwieck@debis.com (Jan Wieck)
Date:
Uhhh - much more ugly than I thought first :-(

I wrote:
>
>
> Whow - gdb is a nice tool
>
> >
> > >
> > > Hey,
> > >
> > >     the current snapshot dumps core on the 4th time doing
> > >
> > >     REVOKE ALL ON pg_user FROM public;
> > >
> > >     It  does  too in other situations but this is the simplest to
> > >     reproduce. The segmentation fault happens in nocachegetattr()
> > >     due  to  a  destroyed  tuple descriptor (natts = 0!!! and the
> > >     others don't look good either) for the syscache 21 (USENAME).
> > >     But the destruction must happen somewhere else.
> > >
> > >     With  the  02/13  snapshot  I haven't got any problems on it.
> > >     But cannot find the error with diff.
> > >
> > >     BTW: Doing last checks on view permissions - sending a  patch
> > >     soon.
> >
> > Yep, I saw this too when testing my password acl null patch.  Couldn't
> > reproduce it, so I thought it was a fluke.
> >
> > --
> > Bruce Momjian
> > maillist@candle.pha.pa.us
> >
>
>     Have  a  clue  now  what  causes  the  crash. It happens when
>     pg_user is looked up in the syscache. It must have to do with
>     the   fact   that  during  initialization  in  miscinit.c  on
>     SetUserId()    the    user    tuple    is    fetched    using
>     SearchSysCacheTuple().   Due  to  this  the SysCache entry 21
>     gets initialized but later on start transaction  through  the
>     cache  reset  the  memory  for the cc_tupdesc in the cache is
>     freed. So I assume when SetUserId() is called,  the  syscache
>     is not ready for use yet.
>
>     I  don't  have a solution right now. Is someone more familiar
>     with  the  handling  of  the  syscache  during  startup?   Is
>     SetUserId() just called a little too early or is the syscache
>     unusable during InitPostgres at all?
>
>     But the fact  that  CatalogCacheInitializeCache()  is  called
>     only  for  pg_user during startup makes me feel sure that the
>     lookup of the user using SearchSysCacheTuple()  is  wrong  at
>     this  time.  I  think  it  sould  be  done  without using the
>     syscache.
>
>     Back on monday - maybe with a solution.

    The  crash  is  due  to the cache invalidations on updates to
    pg_class (and can happen too on updates to  pg_attribute  and
    others).

    When a tuple in pg_class or the others is modified, its cache
    invalidation  causes  a   RelationFlushRelation()   for   the
    affected  relation.   revoking  from  pg_user e.g. means that
    RelationFlushRelation() is called for pg_user but this  frees
    the  tuple  desctiptor.  The tuple descriptor is also used in
    the SysCache, and this isn't flushed/freed!

    There are more possible errors on this. A simple

    UPDATE pg_class SET relname = relname;

    let's the backend crash on the very next command. And

    REVOKE ALL ON pg_class FROM public;

    crashes immediately because the cache invalidation needs  the
    just  invalidated heap tuple for pg_class in pg_class. Sounds
    a bit hairy.

    I think this is also the reason for  backend  crashes  I  had
    when  defining  rewrite rules on relations that already exist
    (where I expect others that already noticed them).

    I still don't have the solution.  But  this  must  get  fixed
    before  releasing 6.3. I think a walk through the SysCache on
    RelationFlushRelation() looking if this relation  is  in  the
    SysCache  and  if found resetting this cache can help (except
    for the revoke on pg_class).

    Append this to TODO!


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #