Thread: pg_group_name_index corrupt?

pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
This worry anyone? :)

v7.0, compiled today based off of CVS ...



Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 

---------- Forwarded message ----------

pg_banners=> vacuum;
NOTICE:  Index pg_group_sysid_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME AS HEAP' (1).       Recreate the
index.
NOTICE:  Index pg_group_name_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME AS HEAP' (1).       Recreate the
index.
VACUUM
pg_banners=> \d pg_group
Table    = pg_group
+----------------------------------+----------------------------------+-------+
|              Field               |              Type                | Length|
+----------------------------------+----------------------------------+-------+
| groname                          | name                             |    32 |
| grosysid                         | int4                             |     4 |
| grolist                          | int4[]                           |   var |
+----------------------------------+----------------------------------+-------+
Indices:  pg_group_name_index         pg_group_sysid_index

pg_banners=> \d pg_group_name_index
Table    = pg_group_name_index
+----------------------------------+----------------------------------+-------+
|              Field               |              Type                | Length|
+----------------------------------+----------------------------------+-------+
| groname                          | name                             |    32 |
+----------------------------------+----------------------------------+-------+
pg_banners=> drop index pg_group_name_index;
ERROR:  class "pg_group_name_index" is a system catalog index
pg_banners=>




Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
> This worry anyone? :)
> NOTICE:  Index pg_group_sysid_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME AS HEAP' (1).
>         Recreate the index.
> NOTICE:  Index pg_group_name_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME AS HEAP' (1).
>         Recreate the index.

Not if you had other transactions running in parallel with the
vacuum --- if the vacuum was the only thing running then I'd want
to know what you were doing before that...
        regards, tom lane


Re: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
On Thu, 4 May 2000, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> > This worry anyone? :)
> > NOTICE:  Index pg_group_sysid_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME AS HEAP' (1).
> >         Recreate the index.
> > NOTICE:  Index pg_group_name_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME AS HEAP' (1).
> >         Recreate the index.
> 
> Not if you had other transactions running in parallel with the
> vacuum --- if the vacuum was the only thing running then I'd want
> to know what you were doing before that...

I can't guarantee whether i was or not :(  right now, I'm assuming that
'other transactions' would include any database on that server, not just
the database that I was vacuuming at the time, as even if I go in and do a
vacuum on 'template1', that error pops up ...

It says to 'recreate the index', but if I try to, it tells me its a system
table (of course) ... is there a way of fixing this without having to do
a dump/reload?


Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
>>>> This worry anyone? :)
>>>> NOTICE:  Index pg_group_sysid_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME AS HEAP' (1).
>>>> Recreate the index.
>>>> NOTICE:  Index pg_group_name_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME AS HEAP' (1).
>>>> Recreate the index.

It occurred to me that this would be the likely symptom if a CREATE
GROUP or ALTER GROUP command had neglected to update the indexes on
pg_group.  However, I can't spot any such problem in the code nor
cause it to happen by hand.  Anyone else have an idea?

BTW Marc, do you happen to know what sequence of user and/or group
create/alter/drops might have been executed on your setup?
        regards, tom lane


Re: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
On Thu, 4 May 2000, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> >>>> This worry anyone? :)
> >>>> NOTICE:  Index pg_group_sysid_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME AS HEAP' (1).
> >>>> Recreate the index.
> >>>> NOTICE:  Index pg_group_name_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME AS HEAP' (1).
> >>>> Recreate the index.
> 
> It occurred to me that this would be the likely symptom if a CREATE
> GROUP or ALTER GROUP command had neglected to update the indexes on
> pg_group.  However, I can't spot any such problem in the code nor
> cause it to happen by hand.  Anyone else have an idea?
> 
> BTW Marc, do you happen to know what sequence of user and/or group
> create/alter/drops might have been executed on your setup?

the only group that is in pg_group is:

template1=> select * from pg_group;
groname      |grosysid|grolist       
-------------+--------+--------------
trends_acctng|       0|{70,1007,1008}
(1 row)

and it was added ~2 weeks ago ... 

i tried deleting and re-adding it this morning, now it says (1 of 2) vs (0
of 1) ...

I had a server lock up the other day running an RC2 code base, which a
'truss -p' on postmaster produced nothing ... I did a kill on the server,
upgraded to the latest code and restarted it, after which this problem
occur'd ...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
> the only group that is in pg_group is:

> template1=> select * from pg_group;
> groname      |grosysid|grolist       
> -------------+--------+--------------
> trends_acctng|       0|{70,1007,1008}
> (1 row)

> and it was added ~2 weeks ago ... 

> i tried deleting and re-adding it this morning, now it says (1 of 2) vs (0
> of 1) ...

Hmm, that's what I was going to suggest doing.  I think there must be a
dead tuple in pg_group that VACUUM doesn't want to clean out for some
reason --- maybe it thinks there is still an open transaction somewhere
that could see the tuple as still live?  Odd.

Did you try doing a vacuum of pg_group in between deleting and re-adding
the one valid group?

> I had a server lock up the other day running an RC2 code base, which a
> 'truss -p' on postmaster produced nothing ... I did a kill on the server,
> upgraded to the latest code and restarted it, after which this problem
> occur'd ...

Wish you'd taken more careful notes when that happened.  What do you
mean by lock up, exactly?  Do you know what it was doing just
beforehand?
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
I wrote:
> It occurred to me that this would be the likely symptom if a CREATE
> GROUP or ALTER GROUP command had neglected to update the indexes on
> pg_group.  However, I can't spot any such problem in the code nor
> cause it to happen by hand.  Anyone else have an idea?

After further looking, I notice that users.c is one of the few places
that will drop AccessExclusiveLock at heap_close time rather than
holding it till xact commit.  I wonder whether this is a bug...
it could allow another backend to get in and start vacuuming the file
before our updates have committed.  I am not sure that vacuum would do
the wrong thing in that case, but maybe so.  Comments anyone (Vadim?)

I also notice that there definitely is a glaring bug there:
write_password_file() leaks one kernel file descriptor each time it runs
(note the creat() call).  Alter enough pg_shadow entries in one session
and your backend stops working.  I think this is a "must fix" problem
--- any objections?
        regards, tom lane


Re: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
On Thu, 4 May 2000, Tom Lane wrote:

> I wrote:
> > It occurred to me that this would be the likely symptom if a CREATE
> > GROUP or ALTER GROUP command had neglected to update the indexes on
> > pg_group.  However, I can't spot any such problem in the code nor
> > cause it to happen by hand.  Anyone else have an idea?
> 
> After further looking, I notice that users.c is one of the few places
> that will drop AccessExclusiveLock at heap_close time rather than
> holding it till xact commit.  I wonder whether this is a bug...
> it could allow another backend to get in and start vacuuming the file
> before our updates have committed.  I am not sure that vacuum would do
> the wrong thing in that case, but maybe so.  Comments anyone (Vadim?)
> 
> I also notice that there definitely is a glaring bug there:
> write_password_file() leaks one kernel file descriptor each time it runs
> (note the creat() call).  Alter enough pg_shadow entries in one session
> and your backend stops working.  I think this is a "must fix" problem
> --- any objections?

None from me ...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
On Thu, 4 May 2000, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> > the only group that is in pg_group is:
> 
> > template1=> select * from pg_group;
> > groname      |grosysid|grolist       
> > -------------+--------+--------------
> > trends_acctng|       0|{70,1007,1008}
> > (1 row)
> 
> > and it was added ~2 weeks ago ... 
> 
> > i tried deleting and re-adding it this morning, now it says (1 of 2) vs (0
> > of 1) ...
> 
> Hmm, that's what I was going to suggest doing.  I think there must be a
> dead tuple in pg_group that VACUUM doesn't want to clean out for some
> reason --- maybe it thinks there is still an open transaction somewhere
> that could see the tuple as still live?  Odd.
> 
> Did you try doing a vacuum of pg_group in between deleting and re-adding
> the one valid group?

Hadn't thought of that ... but ... and you aren't going to like this
... if I delete/vacuum/insert/vacuum ... INDEX TUPLES increases by 1, HEAP
increases by one ... I'm up to 3->4 now, and would go 4->5 if I were to do
it again ...

> > I had a server lock up the other day running an RC2 code base, which a
> > 'truss -p' on postmaster produced nothing ... I did a kill on the server,
> > upgraded to the latest code and restarted it, after which this problem
> > occur'd ...
> 
> Wish you'd taken more careful notes when that happened.  What do you
> mean by lock up, exactly?  Do you know what it was doing just
> beforehand?

Don't know ... one of hte problems I'm having with my FreeBSD machine
right now is that, for some reason, setproctitle() isn't working, so all
my backends look the same 'postmaster' and its start up options :(  

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
> Hadn't thought of that ... but ... and you aren't going to like this
> ... if I delete/vacuum/insert/vacuum ... INDEX TUPLES increases by 1, HEAP
> increases by one ... I'm up to 3->4 now, and would go 4->5 if I were to do
> it again ...

That definitely sounds like VACUUM thinks there's an old open transaction
somewhere and so it doesn't want to clean out the dead tuples.

I believe we have a mechanism for deciding that an old transaction must
have aborted (it involves looking to see if any active backend claims to
be running that transaction).  But I wonder whether that mechanism is
being used when VACUUM decides whether it can clean out a dead tuple or
not.  Vadim?

> Don't know ... one of hte problems I'm having with my FreeBSD machine
> right now is that, for some reason, setproctitle() isn't working, so all
> my backends look the same 'postmaster' and its start up options :(  

Now you know how the other half lives ;-).
        regards, tom lane


RE: pg_group_name_index corrupt?

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@hub.org
> [mailto:pgsql-hackers-owner@hub.org]On Behalf Of The Hermit Hacker
>
> On Thu, 4 May 2000, Tom Lane wrote:
>
> > The Hermit Hacker <scrappy@hub.org> writes:
> > > This worry anyone? :)
> > > NOTICE:  Index pg_group_sysid_index: NUMBER OF INDEX' TUPLES
> (0) IS NOT THE SAME AS HEAP' (1).
> > >         Recreate the index.
> > > NOTICE:  Index pg_group_name_index: NUMBER OF INDEX' TUPLES
> (0) IS NOT THE SAME AS HEAP' (1).
> > >         Recreate the index.
> >
> > Not if you had other transactions running in parallel with the
> > vacuum --- if the vacuum was the only thing running then I'd want
> > to know what you were doing before that...
>
> I can't guarantee whether i was or not :(  right now, I'm assuming that
> 'other transactions' would include any database on that server, not just
> the database that I was vacuuming at the time, as even if I go in and do a
> vacuum on 'template1', that error pops up ...
>

Why does pg_group exist under $PGDATA though the indexes exist
under each $PGDATA/base/db_name ?
Could it be consistent on all databases ?

> It says to 'recreate the index', but if I try to, it tells me its a system
> table (of course) ... is there a way of fixing this without having to do
> a dump/reload?
>

Run "reindex table pg_group force;" under standalone postmaster
with options -O and -P. You must shutdown postmaster first.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> Why does pg_group exist under $PGDATA though the indexes exist
> under each $PGDATA/base/db_name ?
> Could it be consistent on all databases ?

Oh my, I think you've got it!  The indexes must be SharedSystemRelations!!

Backend 1 in database regression:

regression=# vacuum analyze pg_group;
VACUUM

Backend 2 in database other is also happy:

other=# vacuum analyze pg_group;
VACUUM

Now create a group with backend 1:

regression=# create group g;
CREATE GROUP

Backend 1 sees the index entries:

regression=# vacuum analyze pg_group;
VACUUM

But backend 2 doesn't:

other=# vacuum analyze pg_group;
NOTICE:  Index pg_group_sysid_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME AS HEAP' (1).       Recreate the
index.
NOTICE:  Index pg_group_name_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME
AS HEAP' (1).       Recreate the index.
VACUUM

pg_shadow would have the same problem if it had indices, which I thought
it did but they seem to have been disabled.

Can you say "one more initdb"?  I knew you could...
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> Why does pg_group exist under $PGDATA though the indexes exist
>> under each $PGDATA/base/db_name ?
>> Could it be consistent on all databases ?

> Oh my, I think you've got it!  The indexes must be SharedSystemRelations!!

Yup, Hiroshi has spotted the problem.  Turning the indexes on pg_group
into shared relations fixes the cross-database misbehavior shown in my
prior message, and I'll bet this bug explains Marc's report too.

We never had any indexes on pg_group (or any shared relation) before,
which is why we hadn't seen this kind of failure before.  (Another
limitation of the regression tests exposed --- they don't test
cross-database behaviors.)

So, now what?  This is a simple fix, but it will require initdb (or at
least pg_upgrade), which I'd really rather not do at this point in the
release cycle.  But I'm not sure we have any choice.  As it stands,
pg_group is broken.

If we are going to have to force a new initdb here, we probably ought
to reconsider a couple of recent past decisions that were put off on
grounds of "we don't want another initdb before 7.0".  I'm thinking of
the remaining ODBC support functions and the new LIKE estimator in
particular.  Do we want to revisit those decisions, or leave well enough
alone?
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
> The Hermit Hacker <scrappy@hub.org> writes:
>> Hadn't thought of that ... but ... and you aren't going to like this
>> ... if I delete/vacuum/insert/vacuum ... INDEX TUPLES increases by 1, HEAP
>> increases by one ... I'm up to 3->4 now, and would go 4->5 if I were to do
>> it again ...

> That definitely sounds like VACUUM thinks there's an old open transaction
> somewhere and so it doesn't want to clean out the dead tuples.

After looking some more at this, I'm not convinced that the problem
Hiroshi found explains everything Marc is seeing.  Having multiple
copies of the pg_group indexes would certainly explain missing index
tuples in any one copy.  But what about the behavior above?  It still
seems that VACUUM is not cleaning out dead tuples from the table itself.

Marc, is it possible that you have any backends that are in the middle
of a transaction and have just been sitting that way for hours or days?
If so, that'd explain VACUUM's refusal to remove tuples.  Another
symptom you should look for is notices likeNOTICE:  RegisterSharedInvalid: SI buffer overflowNOTICE:
InvalidateSharedInvalid:cache state reset
 
in the postmaster log --- there'll probably be quite a few if there's
a hung backend somewhere.
        regards, tom lane


Re: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
On Thu, 4 May 2000, Tom Lane wrote:

> But backend 2 doesn't:
> 
> other=# vacuum analyze pg_group;
> NOTICE:  Index pg_group_sysid_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME AS HEAP' (1).
>         Recreate the index.
> NOTICE:  Index pg_group_name_index: NUMBER OF INDEX' TUPLES (0) IS NOT THE SAME
> AS HEAP' (1).
>         Recreate the index.
> VACUUM
> 
> pg_shadow would have the same problem if it had indices, which I thought
> it did but they seem to have been disabled.
> 
> Can you say "one more initdb"?  I knew you could...

Would a pg_upgrade fix this, or do I have to do a whole dump/reload?  

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
On Thu, 4 May 2000, Tom Lane wrote:

> > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >> Why does pg_group exist under $PGDATA though the indexes exist
> >> under each $PGDATA/base/db_name ?
> >> Could it be consistent on all databases ?
> 
> > Oh my, I think you've got it!  The indexes must be SharedSystemRelations!!
> 
> Yup, Hiroshi has spotted the problem.  Turning the indexes on pg_group
> into shared relations fixes the cross-database misbehavior shown in my
> prior message, and I'll bet this bug explains Marc's report too.
> 
> We never had any indexes on pg_group (or any shared relation) before,
> which is why we hadn't seen this kind of failure before.  (Another
> limitation of the regression tests exposed --- they don't test
> cross-database behaviors.)
> 
> So, now what?  This is a simple fix, but it will require initdb (or at
> least pg_upgrade), which I'd really rather not do at this point in the
> release cycle.  But I'm not sure we have any choice.  As it stands,
> pg_group is broken.
> 
> If we are going to have to force a new initdb here, we probably ought
> to reconsider a couple of recent past decisions that were put off on
> grounds of "we don't want another initdb before 7.0".  I'm thinking of
> the remaining ODBC support functions and the new LIKE estimator in
> particular.  Do we want to revisit those decisions, or leave well enough
> alone?

Leave well enough alone ... this fixed, IMHO, a *very* important potential
bug, whereas the other two can be worked around.  AT this *really* late
stage in the cycle, fixing one bug at least only opens us up to the
possibility of one bug ... doing the ODBC/LIKE stuff aren't mission
critical, and really only affect a relatively small group of ppl in
comparison ...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



RE: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
On Fri, 5 May 2000, Hiroshi Inoue wrote:

> > It says to 'recreate the index', but if I try to, it tells me its a system
> > table (of course) ... is there a way of fixing this without having to do
> > a dump/reload?
> >
> 
> Run "reindex table pg_group force;" under standalone postmaster
> with options -O and -P. You must shutdown postmaster first.

Okay, just looked at the postgres man page, and found the -O option, but
no -P ... this is in the users-lounge/7.0/docs directory ...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



RE: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
Coolness, that worked like a charm ... so I take it I have to do this for
each and every database on the system? :)


On Fri, 5 May 2000, Hiroshi Inoue wrote:

> > -----Original Message-----
> > From: pgsql-hackers-owner@hub.org
> > [mailto:pgsql-hackers-owner@hub.org]On Behalf Of The Hermit Hacker
> >
> > On Thu, 4 May 2000, Tom Lane wrote:
> >
> > > The Hermit Hacker <scrappy@hub.org> writes:
> > > > This worry anyone? :)
> > > > NOTICE:  Index pg_group_sysid_index: NUMBER OF INDEX' TUPLES
> > (0) IS NOT THE SAME AS HEAP' (1).
> > > >         Recreate the index.
> > > > NOTICE:  Index pg_group_name_index: NUMBER OF INDEX' TUPLES
> > (0) IS NOT THE SAME AS HEAP' (1).
> > > >         Recreate the index.
> > >
> > > Not if you had other transactions running in parallel with the
> > > vacuum --- if the vacuum was the only thing running then I'd want
> > > to know what you were doing before that...
> >
> > I can't guarantee whether i was or not :(  right now, I'm assuming that
> > 'other transactions' would include any database on that server, not just
> > the database that I was vacuuming at the time, as even if I go in and do a
> > vacuum on 'template1', that error pops up ...
> >
> 
> Why does pg_group exist under $PGDATA though the indexes exist
> under each $PGDATA/base/db_name ?
> Could it be consistent on all databases ?
> 
> > It says to 'recreate the index', but if I try to, it tells me its a system
> > table (of course) ... is there a way of fixing this without having to do
> > a dump/reload?
> >
> 
> Run "reindex table pg_group force;" under standalone postmaster
> with options -O and -P. You must shutdown postmaster first.
> 
> Regards.
> 
> Hiroshi Inoue
> Inoue@tpf.co.jp
> 

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
>> Can you say "one more initdb"?  I knew you could...

> Would a pg_upgrade fix this, or do I have to do a whole dump/reload?  

pg_upgrade *should* work, but if I were you I'd make a backup dump
first, just in case.  I don't trust pg_upgrade very much...
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
>> If we are going to have to force a new initdb here, we probably ought
>> to reconsider a couple of recent past decisions that were put off on
>> grounds of "we don't want another initdb before 7.0".  I'm thinking of
>> the remaining ODBC support functions and the new LIKE estimator in
>> particular.  Do we want to revisit those decisions, or leave well enough
>> alone?

> Leave well enough alone ... this fixed, IMHO, a *very* important potential
> bug, whereas the other two can be worked around.  AT this *really* late
> stage in the cycle, fixing one bug at least only opens us up to the
> possibility of one bug ... doing the ODBC/LIKE stuff aren't mission
> critical, and really only affect a relatively small group of ppl in
> comparison ...

That's a fair objection for the LIKE estimator, which after all hasn't
gotten much testing.  I'll leave well enough alone there.  But those
missing ODBC functions are just another dozen SQL-function entries for
pg_proc; hard to see how they can break anything else, even if (worst
case) they're wrong themselves ...
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
> Coolness, that worked like a charm ... so I take it I have to do this for
> each and every database on the system? :)

Yup :-(.  But actually there's not much point since you're looking at
a pg_upgrade or initdb for the real fix, anyway.
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
>> Would a pg_upgrade fix this, or do I have to do a whole dump/reload?  

> pg_upgrade *should* work, but if I were you I'd make a backup dump
> first, just in case.  I don't trust pg_upgrade very much...

Oh, btw: pg_upgrade will *not* work to save and reload your group
definitions, because neither it nor pg_dumpall do anything at all with
pg_group!  For that matter, a full dumpall/reload won't preserve
groups either!

I griped about that a week or so ago, but no one seems to have picked up
on it.  Do you want to consider that a "must fix" problem as well?
I think it's a pretty minor fix, but considering how late we are in the
cycle...
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
>> Run "reindex table pg_group force;" under standalone postmaster
>> with options -O and -P. You must shutdown postmaster first.

> Okay, just looked at the postgres man page, and found the -O option, but
> no -P ... this is in the users-lounge/7.0/docs directory ...

Hiroshi did commit a change to doc/src/sgml/ref/postgres-ref.sgml for
that just a few days ago, but I guess it didn't make it into the derived
files that Thomas was preparing off-line.  I believe there were several
other small doco fixes while Thomas was on vacation --- I dunno if he
plans to try to get those into the 7.0 derived docs or not.
        regards, tom lane


Re: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
On Thu, 4 May 2000, Tom Lane wrote:

> >> Would a pg_upgrade fix this, or do I have to do a whole dump/reload?  
> 
> > pg_upgrade *should* work, but if I were you I'd make a backup dump
> > first, just in case.  I don't trust pg_upgrade very much...
> 
> Oh, btw: pg_upgrade will *not* work to save and reload your group
> definitions, because neither it nor pg_dumpall do anything at all with
> pg_group!  For that matter, a full dumpall/reload won't preserve
> groups either!
> 
> I griped about that a week or so ago, but no one seems to have picked up
> on it.  Do you want to consider that a "must fix" problem as well?
> I think it's a pretty minor fix, but considering how late we are in the
> cycle...

considering where the problem is, I think that if it can be safely done,
please do it ...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
>> I griped about that a week or so ago, but no one seems to have picked up
>> on it.  Do you want to consider that a "must fix" problem as well?
>> I think it's a pretty minor fix, but considering how late we are in the
>> cycle...

> considering where the problem is, I think that if it can be safely done,
> please do it ...

Done and done.  I also realized that pg_upgrade had another nasty bug
in it: the VACUUMs were not necessarily executed as superuser, but as
whichever user happened to own the item dumped last by pg_dump in each
database.  That would result in VACUUM skipping over tables it thought
it didn't have permission to vacuum --- like, say, all the system
tables.  Perhaps this explains the failures that some people have
reported.

Another day, another bug swatted ...
        regards, tom lane

PS: when you announce RC5, don't forget to mention the required initdb
;-)


Re: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
On Thu, 4 May 2000, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> >> I griped about that a week or so ago, but no one seems to have picked up
> >> on it.  Do you want to consider that a "must fix" problem as well?
> >> I think it's a pretty minor fix, but considering how late we are in the
> >> cycle...
> 
> > considering where the problem is, I think that if it can be safely done,
> > please do it ...
> 
> Done and done.  I also realized that pg_upgrade had another nasty bug
> in it: the VACUUMs were not necessarily executed as superuser, but as
> whichever user happened to own the item dumped last by pg_dump in each
> database.  That would result in VACUUM skipping over tables it thought
> it didn't have permission to vacuum --- like, say, all the system
> tables.  Perhaps this explains the failures that some people have
> reported.
> 
> Another day, another bug swatted ...

Okay, before I package this up and annouce it ... I just posted something
that I *think* is probably related, but just want to make sure ... if I do
a pg_dump of a database, i'm getting an ERROR about 'Inhrel' Attribute not
found ... not sure why, or if it matters ...




Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
> a pg_dump of a database, i'm getting an ERROR about 'Inhrel' Attribute not
> found ... not sure why, or if it matters ...

That's the sign of a 6.5 pg_dump applied to a 7.0 database, or was it
the other way round?  Anyway, check your PATH ...
        regards, tom lane


Re: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
On Thu, 4 May 2000, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> > a pg_dump of a database, i'm getting an ERROR about 'Inhrel' Attribute not
> > found ... not sure why, or if it matters ...
> 
> That's the sign of a 6.5 pg_dump applied to a 7.0 database, or was it
> the other way round?  Anyway, check your PATH ...

Okay, it would be v6.5. pg_dump applied to v7.0 database ... I thought it
was me, but I suspect Jeff was on the old v6.5 server running pg_dump at
the same time without me thinking to ask ...

Am upgrading my server now, and will do RC5 tonight also ...

Great work, once more :)

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
okay, something that I think needs to be clarified ... RC5 requires an
initdb, so you have to do a pg_dumpall first, then initdb, then reload ...

your recent fixes ... should we be running pg_dumpall from RC5 on our RC4
databases, or does it not matter?  I'm using the RC5 one right now, and
all appears correct, but I figured I'd ask ...

On Thu, 4 May 2000, Tom Lane wrote:

> >> Would a pg_upgrade fix this, or do I have to do a whole dump/reload?  
> 
> > pg_upgrade *should* work, but if I were you I'd make a backup dump
> > first, just in case.  I don't trust pg_upgrade very much...
> 
> Oh, btw: pg_upgrade will *not* work to save and reload your group
> definitions, because neither it nor pg_dumpall do anything at all with
> pg_group!  For that matter, a full dumpall/reload won't preserve
> groups either!
> 
> I griped about that a week or so ago, but no one seems to have picked up
> on it.  Do you want to consider that a "must fix" problem as well?
> I think it's a pretty minor fix, but considering how late we are in the
> cycle...
> 
>             regards, tom lane
> 

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
> okay, something that I think needs to be clarified ... RC5 requires an
> initdb, so you have to do a pg_dumpall first, then initdb, then reload ...

> your recent fixes ... should we be running pg_dumpall from RC5 on our RC4
> databases, or does it not matter?  I'm using the RC5 one right now, and
> all appears correct, but I figured I'd ask ...

pg_upgrade should work, or at least it's worth trying --- see the
message I just posted.  If you have anything in pg_group then the
best procedure is to use the RC5 pg_dumpall, since RC4 and before's
pg_dumpall neglects to dump pg_group.  In any case, RC4 and before's
pg_upgrade is now known to be broken, so be sure you use RC5's script
at that point.

Or just use dump/initdb/reload, but it'd be nice to get some pounding
on pg_upgrade and find out if it's trustworthy now.

I'd definitely recommend a full pg_dumpall before experimenting with
pg_upgrade, just in case things go worng ;-)
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Bruce Momjian
Date:
> I wrote:
> > It occurred to me that this would be the likely symptom if a CREATE
> > GROUP or ALTER GROUP command had neglected to update the indexes on
> > pg_group.  However, I can't spot any such problem in the code nor
> > cause it to happen by hand.  Anyone else have an idea?
> 
> After further looking, I notice that users.c is one of the few places
> that will drop AccessExclusiveLock at heap_close time rather than
> holding it till xact commit.  I wonder whether this is a bug...
> it could allow another backend to get in and start vacuuming the file
> before our updates have committed.  I am not sure that vacuum would do
> the wrong thing in that case, but maybe so.  Comments anyone (Vadim?)
> 
> I also notice that there definitely is a glaring bug there:
> write_password_file() leaks one kernel file descriptor each time it runs
> (note the creat() call).  Alter enough pg_shadow entries in one session
> and your backend stops working.  I think this is a "must fix" problem
> --- any objections?

Please fix both, I think.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
okay, I did the RC5 pg_dumpall and am doing a reload ... call me chicken
:)


On Fri, 5 May 2000, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> > okay, something that I think needs to be clarified ... RC5 requires an
> > initdb, so you have to do a pg_dumpall first, then initdb, then reload ...
> 
> > your recent fixes ... should we be running pg_dumpall from RC5 on our RC4
> > databases, or does it not matter?  I'm using the RC5 one right now, and
> > all appears correct, but I figured I'd ask ...
> 
> pg_upgrade should work, or at least it's worth trying --- see the
> message I just posted.  If you have anything in pg_group then the
> best procedure is to use the RC5 pg_dumpall, since RC4 and before's
> pg_dumpall neglects to dump pg_group.  In any case, RC4 and before's
> pg_upgrade is now known to be broken, so be sure you use RC5's script
> at that point.
> 
> Or just use dump/initdb/reload, but it'd be nice to get some pounding
> on pg_upgrade and find out if it's trustworthy now.
> 
> I'd definitely recommend a full pg_dumpall before experimenting with
> pg_upgrade, just in case things go worng ;-)
> 
>             regards, tom lane
> 

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: pg_group_name_index corrupt?

From
Bruce Momjian
Date:
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > Why does pg_group exist under $PGDATA though the indexes exist
> > under each $PGDATA/base/db_name ?
> > Could it be consistent on all databases ?
> 
> Oh my, I think you've got it!  The indexes must be SharedSystemRelations!!
> 
...
> 
> pg_shadow would have the same problem if it had indices, which I thought
> it did but they seem to have been disabled.
> 
> Can you say "one more initdb"?  I knew you could...

You know, I am reading through this message, and thinking, gee, how is
he going to get out of this without initdb?  Well I have my answer.



--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> After further looking, I notice that users.c is one of the few places
>> that will drop AccessExclusiveLock at heap_close time rather than
>> holding it till xact commit.  I wonder whether this is a bug...
>> it could allow another backend to get in and start vacuuming the file
>> before our updates have committed.  I am not sure that vacuum would do
>> the wrong thing in that case, but maybe so.  Comments anyone (Vadim?)
>> 
>> I also notice that there definitely is a glaring bug there:
>> write_password_file() leaks one kernel file descriptor each time it runs
>> (note the creat() call).  Alter enough pg_shadow entries in one session
>> and your backend stops working.  I think this is a "must fix" problem
>> --- any objections?

> Please fix both, I think.

I fixed the file descriptor leak, which was pretty simple and easily
checked.  After looking at VACUUM I think it will behave reasonably
with a not-yet-committed tuple, so I left that issue alone for now.

My current thought on releasing locks at heap_close time is that it's
probably best to release the lock if and only if your use of the table
was strictly read-only.  If you modified the table then your lock
should be held till xact commit.  I'm not sure about that rule however.
Any comments?
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Bruce Momjian
Date:
> > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >> Why does pg_group exist under $PGDATA though the indexes exist
> >> under each $PGDATA/base/db_name ?
> >> Could it be consistent on all databases ?
> 
> > Oh my, I think you've got it!  The indexes must be SharedSystemRelations!!
> 
> Yup, Hiroshi has spotted the problem.  Turning the indexes on pg_group
> into shared relations fixes the cross-database misbehavior shown in my
> prior message, and I'll bet this bug explains Marc's report too.
> 
> We never had any indexes on pg_group (or any shared relation) before,
> which is why we hadn't seen this kind of failure before.  (Another
> limitation of the regression tests exposed --- they don't test
> cross-database behaviors.)

Oh, so it was my indexes on system relations, and because pg_group is a
shared one, there was a problem.

> So, now what?  This is a simple fix, but it will require initdb (or at
> least pg_upgrade), which I'd really rather not do at this point in the
> release cycle.  But I'm not sure we have any choice.  As it stands,
> pg_group is broken.
> 
> If we are going to have to force a new initdb here, we probably ought
> to reconsider a couple of recent past decisions that were put off on
> grounds of "we don't want another initdb before 7.0".  I'm thinking of
> the remaining ODBC support functions and the new LIKE estimator in
> particular.  Do we want to revisit those decisions, or leave well enough
> alone?

Well, pg_group is not used very much.  Can we disable the cache from
using the index?  I think that would get us out of the problem.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Well, pg_group is not used very much.  Can we disable the cache from
> using the index?  I think that would get us out of the problem.

But they're unique indexes ... can you guarantee that a unique index
that's only getting told about some of the updates to its table is
not going to kick out any inappropriate errors?  It might be OK but
it doesn't give me a warm feeling.

I think we were kinda stuck on this one.  The correct fix is clear
and I don't think we'd have been doing anyone any favors by trying
to invent a workaround.  I'm just glad we found it now and not a
few days *after* 7.0 release...
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Bruce Momjian
Date:
> That's a fair objection for the LIKE estimator, which after all hasn't
> gotten much testing.  I'll leave well enough alone there.  But those
> missing ODBC functions are just another dozen SQL-function entries for
> pg_proc; hard to see how they can break anything else, even if (worst
> case) they're wrong themselves ...

Agreed on the ODBC.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_group_name_index corrupt?

From
Bruce Momjian
Date:
> The Hermit Hacker <scrappy@hub.org> writes:
> >> Can you say "one more initdb"?  I knew you could...
> 
> > Would a pg_upgrade fix this, or do I have to do a whole dump/reload?  
> 
> pg_upgrade *should* work, but if I were you I'd make a backup dump
> first, just in case.  I don't trust pg_upgrade very much...

Aw, badmouthing pg_upgrade again.  :-)

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_group_name_index corrupt?

From
Bruce Momjian
Date:
> >> Would a pg_upgrade fix this, or do I have to do a whole dump/reload?  
> 
> > pg_upgrade *should* work, but if I were you I'd make a backup dump
> > first, just in case.  I don't trust pg_upgrade very much...
> 
> Oh, btw: pg_upgrade will *not* work to save and reload your group
> definitions, because neither it nor pg_dumpall do anything at all with
> pg_group!  For that matter, a full dumpall/reload won't preserve
> groups either!
> 
> I griped about that a week or so ago, but no one seems to have picked up
> on it.  Do you want to consider that a "must fix" problem as well?
> I think it's a pretty minor fix, but considering how late we are in the
> cycle...

Please give me something to add to the TODO.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> I griped about that a week or so ago, but no one seems to have picked up
>> on it.  Do you want to consider that a "must fix" problem as well?
>> I think it's a pretty minor fix, but considering how late we are in the
>> cycle...

> Please give me something to add to the TODO.

Try "DONE" ;-)
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Thomas Lockhart
Date:
> Hiroshi did commit a change to doc/src/sgml/ref/postgres-ref.sgml for
> that just a few days ago, but I guess it didn't make it into the derived
> files that Thomas was preparing off-line.  I believe there were several
> other small doco fixes while Thomas was on vacation --- I dunno if he
> plans to try to get those into the 7.0 derived docs or not.

I'll move a new copy of the html into users-lounge before release, and
will try poking at the hardcopy versions too (but I consider the
hardcopy fixes noncritical since we are so far past the "freeze"
point).

Depends on the actual release schedule as to whether they will get in
to the hardcopy; the pg_group stuff doesn't open a large can of worms,
but we might want a few days for it to settle out??

btw, I'm all for fixing pg_group (it is really nonoptional imho) and
the ODBC stuff is noncritical since loading the contrib/odbc/odbc.sql
is sufficient to get the functions or operators defined since the
support code is already in the backend.
                       - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: pg_group_name_index corrupt?

From
Bruce Momjian
Date:
> The Hermit Hacker <scrappy@hub.org> writes:
> >> I griped about that a week or so ago, but no one seems to have picked up
> >> on it.  Do you want to consider that a "must fix" problem as well?
> >> I think it's a pretty minor fix, but considering how late we are in the
> >> cycle...
> 
> > considering where the problem is, I think that if it can be safely done,
> > please do it ...
> 
> Done and done.  I also realized that pg_upgrade had another nasty bug
> in it: the VACUUMs were not necessarily executed as superuser, but as
> whichever user happened to own the item dumped last by pg_dump in each
> database.  That would result in VACUUM skipping over tables it thought
> it didn't have permission to vacuum --- like, say, all the system
> tables.  Perhaps this explains the failures that some people have
> reported.
> 
> Another day, another bug swatted ...

Yes, good eye.  Thanks.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_group_name_index corrupt?

From
The Hermit Hacker
Date:
On Fri, 5 May 2000, Thomas Lockhart wrote:

> > Hiroshi did commit a change to doc/src/sgml/ref/postgres-ref.sgml for
> > that just a few days ago, but I guess it didn't make it into the derived
> > files that Thomas was preparing off-line.  I believe there were several
> > other small doco fixes while Thomas was on vacation --- I dunno if he
> > plans to try to get those into the 7.0 derived docs or not.
> 
> I'll move a new copy of the html into users-lounge before release, and
> will try poking at the hardcopy versions too (but I consider the
> hardcopy fixes noncritical since we are so far past the "freeze"
> point).
> 
> Depends on the actual release schedule as to whether they will get in
> to the hardcopy; the pg_group stuff doesn't open a large can of worms,
> but we might want a few days for it to settle out??

I have no probs with extending the release a couple of days ... Wednesday
sound good for everyone, as things stand now?

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: pg_group_name_index corrupt?

From
Bruce Momjian
Date:
> 
> okay, something that I think needs to be clarified ... RC5 requires an
> initdb, so you have to do a pg_dumpall first, then initdb, then reload ...
> 
> your recent fixes ... should we be running pg_dumpall from RC5 on our RC4
> databases, or does it not matter?  I'm using the RC5 one right now, and
> all appears correct, but I figured I'd ask ...

Doesn't matter, except that RC5 dumps groups properly. 


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Done and done.  I also realized that pg_upgrade had another nasty bug
>> in it: the VACUUMs were not necessarily executed as superuser, but as
>> whichever user happened to own the item dumped last by pg_dump in each
>> database.  That would result in VACUUM skipping over tables it thought
>> it didn't have permission to vacuum --- like, say, all the system
>> tables.  Perhaps this explains the failures that some people have
>> reported.
>> 
>> Another day, another bug swatted ...

> Yes, good eye.  Thanks.

BTW, I believe that this was actually a bug I created --- in prior
releases VACUUM would happily vacuum anything in sight, whether you
had privileges to lock the target tables or not.  So pg_upgrade didn't
have to worry before.  But it does now...
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> btw, I'm all for fixing pg_group (it is really nonoptional imho) and
> the ODBC stuff is noncritical since loading the contrib/odbc/odbc.sql
> is sufficient to get the functions or operators defined since the
> support code is already in the backend.

OK, I thought you might want to argue more strongly for adding the
ODBC stuff while we had the chance ... but if you are happy with the
status quo then I'm happy.
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Thomas Lockhart
Date:
> OK, I thought you might want to argue more strongly for adding the
> ODBC stuff while we had the chance ... but if you are happy with the
> status quo then I'm happy.

No, I've already been embarrassed once recently; don't want to touch
anything in the last few days. Especially since I've been *so* busy at
work and with the docs that it will be hard to give it enough
attention (even though in principle it is pretty trivial).
                      - Tom

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: pg_group_name_index corrupt?

From
Tatsuo Ishii
Date:
> I have no probs with extending the release a couple of days ... Wednesday
> sound good for everyone, as things stand now?

Sigh... we are going to make a free commemoration CD-ROM (yes, we are
going to have a general meeting of local PostgreSQL user's group in
Japan. It's almost one year since we established it. Visit
http://www.jp.postgresql.org/ if you can read Japanese) with the brand
new PostgreSQL 7.0 source in it of course, and the deadline to make a
master CD-R is scheduled to 5/8...
--
Tatsuo Ishii


Re: pg_group_name_index corrupt?

From
Bruce Momjian
Date:
> > Hiroshi did commit a change to doc/src/sgml/ref/postgres-ref.sgml for
> > that just a few days ago, but I guess it didn't make it into the derived
> > files that Thomas was preparing off-line.  I believe there were several
> > other small doco fixes while Thomas was on vacation --- I dunno if he
> > plans to try to get those into the 7.0 derived docs or not.
> 
> I'll move a new copy of the html into users-lounge before release, and
> will try poking at the hardcopy versions too (but I consider the
> hardcopy fixes noncritical since we are so far past the "freeze"
> point).

Thomas, do you want PDF's rather than postscript files?  Let me know.  I
can easily download them and generate PDF's.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
Peter Eisentraut <e99re41@DoCS.UU.SE> writes:
>> After further looking, I notice that users.c is one of the few places
>> that will drop AccessExclusiveLock at heap_close time rather than
>> holding it till xact commit.  I wonder whether this is a bug...

> Well it was you who did it while introducing that second argument in the
> first place. I think the safest thing to do is definitely to hold any lock
> until transaction end. I'm not sure, shouldn't the transaction isolation
> level apply here as well? In the end, is there ever a good reason for
> releasing a lock in heap_close?

I provided that hook because of stuff like the LISTEN/NOTIFY support,
which grabs and releases locks on pg_listener --- as it's presently
designed, not releasing the lock at end of statement could hang the
entire system until you commit your transaction.  (Probably
LISTEN/NOTIFY should be re-examined to see if it couldn't use a less
drastic lock than AccessExclusiveLock.  That code hasn't been gone over
since before MVCC.)  Also, I still think that in general it's OK to
release a read lock on a system table early.  Write locks, maybe not.

>> I also notice that there definitely is a glaring bug there:
>> write_password_file() leaks one kernel file descriptor each time it runs
>> (note the creat() call).

> Wow, this has been there for over two years.

Yeah, a long time :-(.  It's not the first resource leak we've found
in the password-related code, too.  I wonder if there are more...
        regards, tom lane


RE: pg_group_name_index corrupt?

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: lockhart@sd.tpf.co.jp [mailto:lockhart@sd.tpf.co.jp]On
>
> > Hiroshi did commit a change to doc/src/sgml/ref/postgres-ref.sgml for
> > that just a few days ago, but I guess it didn't make it into the derived
> > files that Thomas was preparing off-line.  I believe there were several
> > other small doco fixes while Thomas was on vacation --- I dunno if he
> > plans to try to get those into the 7.0 derived docs or not.
>
> I'll move a new copy of the html into users-lounge before release, and
> will try poking at the hardcopy versions too (but I consider the
> hardcopy fixes noncritical since we are so far past the "freeze"
> point).
>

I'm sorry to inconveniece you.
In addition I have to mention that I gave up to confirm sgml format myself.
I apologize again that I have inconvenienced you.

Thanks.

Hiroshi inoue
Inoue@tpf.co.jp



RE: pg_group_name_index corrupt?

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> The Hermit Hacker <scrappy@hub.org> writes:
> >> I griped about that a week or so ago, but no one seems to have
> picked up
> >> on it.  Do you want to consider that a "must fix" problem as well?
> >> I think it's a pretty minor fix, but considering how late we are in the
> >> cycle...
>
> > considering where the problem is, I think that if it can be safely done,
> > please do it ...
>
> Done and done.  I also realized that pg_upgrade had another nasty bug
> in it: the VACUUMs were not necessarily executed as superuser, but as
> whichever user happened to own the item dumped last by pg_dump in each
> database.  That would result in VACUUM skipping over tables it thought
> it didn't have permission to vacuum --- like, say, all the system
> tables.  Perhaps this explains the failures that some people have
> reported.
>
> Another day, another bug swatted ...
>

If I remember correctly,pg_upgrade doesn't shutdown the postmaster
after(or before) moving OLD data to the target dir though it tells us
the message "You must stop/start the postmaster ...".
How about calling pg_ctl from pg_upgrade to stop the postmaster ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: pg_group_name_index corrupt?

From
Thomas Lockhart
Date:
> I'm sorry to inconveniece you.
> In addition I have to mention that I gave up to confirm sgml format myself.
> I apologize again that I have inconvenienced you.

No, it is better to have the right info *somewhere* than to wait to
put it in later. Even if I miss something now, it will be correct next
time, which is the best thing.

Regards.
                     - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> If I remember correctly,pg_upgrade doesn't shutdown the postmaster
> after(or before) moving OLD data to the target dir though it tells us
> the message "You must stop/start the postmaster ...".
> How about calling pg_ctl from pg_upgrade to stop the postmaster ?

What I would actually like to see happen is that pg_upgrade uses a
standalone backend, with no postmaster running at all for the entire
procedure.  Having a live postmaster connected to the system just
opens the door to getting screwed up by some other user connecting to
the database.  But that's a bigger change than I dare try to make right
now...
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Bruce Momjian
Date:
> If I remember correctly,pg_upgrade doesn't shutdown the postmaster
> after(or before) moving OLD data to the target dir though it tells us
> the message "You must stop/start the postmaster ...".
> How about calling pg_ctl from pg_upgrade to stop the postmaster ?

Added to TODO list.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_group_name_index corrupt?

From
Bruce Momjian
Date:
> If I remember correctly,pg_upgrade doesn't shutdown the postmaster
> after(or before) moving OLD data to the target dir though it tells us
> the message "You must stop/start the postmaster ...".
> How about calling pg_ctl from pg_upgrade to stop the postmaster ?

Great idea.  We never had an automated way to do that before.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> How about calling pg_ctl from pg_upgrade to stop the postmaster ?

> Great idea.  We never had an automated way to do that before.

I'd just as soon *not* have pg_upgrade assuming that it knows how
I like my postmaster started.  If pg_ctl will work to stop a postmaster
whether or not it was started with pg_ctl, then it'd probably be a good
idea to use pg_ctl to ensure that there is no postmaster running in the
target database.  I'd suggest not trying to restart the postmaster
automatically afterwards, though.  Too much site dependency in that.
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> How about calling pg_ctl from pg_upgrade to stop the postmaster ?
> 
> > Great idea.  We never had an automated way to do that before.
> 
> I'd just as soon *not* have pg_upgrade assuming that it knows how
> I like my postmaster started.  If pg_ctl will work to stop a postmaster
> whether or not it was started with pg_ctl, then it'd probably be a good
> idea to use pg_ctl to ensure that there is no postmaster running in the
> target database.  I'd suggest not trying to restart the postmaster
> automatically afterwards, though.  Too much site dependency in that.

But doesn't it find the params from the last startup?

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> I'd suggest not trying to restart the postmaster
>> automatically afterwards, though.  Too much site dependency in that.

> But doesn't it find the params from the last startup?

What last startup?  This may be a virgin database we're talking about
(probably would be, if I get my way about not using a postmaster at
all during pg_upgrade).  More to the point, people who are using
system-boot-time scripts to start postgres may expect their postmasters
to be started in a different environment than what pg_ctl would produce.
(Just because pg_ctl is available is not a good reason to assume that
people are using it, particularly not existing dbadmins who will have
developed their own procedures.)

The environment issue is potentially a pretty nasty gotcha; you'll
recall the problem reports we've heard in the past that turned out to
trace to different settings of LOCALE or whathaveyou between postmasters
started by hand and postmasters started by scripts.  Also, IIRC, pg_ctl
doesn't currently support sending the postmaster log anywhere but
/dev/null, which will annoy at least some people ;-).  One might also
guess that some sites run their postmasters with higher or lower process
priority than normal, or several other things that pg_ctl knows nothing
about.

So I think it's not really a good idea to wire use of pg_ctl into other
tools just yet.  Maybe after pg_ctl has been around for a few releases...

As I said, I see no harm in using pg_ctl to *stop* a postmaster, if it
can do that.  I just don't want to have it used automatically to
*start* one.
        regards, tom lane


Re: pg_group_name_index corrupt?

From
Tatsuo Ishii
Date:
> What last startup?  This may be a virgin database we're talking about
> (probably would be, if I get my way about not using a postmaster at
> all during pg_upgrade).  More to the point, people who are using
> system-boot-time scripts to start postgres may expect their postmasters
> to be started in a different environment than what pg_ctl would produce.
> (Just because pg_ctl is available is not a good reason to assume that
> people are using it, particularly not existing dbadmins who will have
> developed their own procedures.)
> 
> The environment issue is potentially a pretty nasty gotcha; you'll
> recall the problem reports we've heard in the past that turned out to
> trace to different settings of LOCALE or whathaveyou between postmasters
> started by hand and postmasters started by scripts.  Also, IIRC, pg_ctl
> doesn't currently support sending the postmaster log anywhere but
> /dev/null, which will annoy at least some people ;-).  One might also
> guess that some sites run their postmasters with higher or lower process
> priority than normal, or several other things that pg_ctl knows nothing
> about.

Tom is right except that pg_ctl sends postmaster log to
/dev/null. pg_ctl actulally does nothing with it. So if postmaster
sends log to stdout, then it is sent to stdout. If postmaster is
invoked with -S option, logs are sent to /dev/null. That all.

BTW, currently the best way to get logs from postmaster woul be
compiling it with USE_SYSLOG (why is this not enabled by configure?),
and sending log to syslog, IMHO. Sending logs to a plain file is just
too dumb, like infinite log file size, to rotate log we have to stop
postmaster...

> So I think it's not really a good idea to wire use of pg_ctl into other
> tools just yet.  Maybe after pg_ctl has been around for a few releases...

Maybe the way we should go is that pg_ctl sends signal to postmaster
and let it restart by itself. This would solve part of the problems
Tom mentioned, at least environment variables ones.

(BTW is it really possible for postmaster to remember its process
priority?)

> As I said, I see no harm in using pg_ctl to *stop* a postmaster, if it
> can do that.

Definitely it can, if the postmaster is 7.0 or later.
--
Tatsuo Ishii


Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> (BTW is it really possible for postmaster to remember its process
> priority?)

No, but a site might have a startup script that nice's the postmaster
to a desired priority.

> BTW, currently the best way to get logs from postmaster woul be
> compiling it with USE_SYSLOG (why is this not enabled by configure?),

Good question.  Before we standardize on that, however, some testing
might be in order.  I haven't tried stuffing multimegabyte querytree
dumps into syslog ... will it work?  On how many platforms?  The syslog
applications I've seen never write more than a hundred or so characters
per log entry, so I'm a tad nervous about assuming that we will get
reasonable behavior for large messages...
        regards, tom lane


RE: pg_group_name_index corrupt?

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> 
> > If I remember correctly,pg_upgrade doesn't shutdown the postmaster
> > after(or before) moving OLD data to the target dir though it tells us
> > the message "You must stop/start the postmaster ...".
> > How about calling pg_ctl from pg_upgrade to stop the postmaster ?
> 
> Added to TODO list.
>

Hmm,what I meant is to simply insert a command call 'pg_ctl -w stop'
before moving old data to the target directory in pg_upgrade.
It seems dangerous to leave the postmaster alive that doesn't
know the changes on empty tables created by pg_upgrade.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: pg_group_name_index corrupt?

From
Peter Eisentraut
Date:
Tom Lane writes:

> > BTW, currently the best way to get logs from postmaster woul be
> > compiling it with USE_SYSLOG (why is this not enabled by configure?),

An --enable-syslog switch will be in the next release unless someone
protests.

> Good question.  Before we standardize on that, however, some testing
> might be in order.  I haven't tried stuffing multimegabyte querytree
> dumps into syslog ... will it work?  On how many platforms?  The
> syslog applications I've seen never write more than a hundred or so
> characters per log entry, so I'm a tad nervous about assuming that we
> will get reasonable behavior for large messages...

Indeed. My syslog becomes a little erratic with messages longer that half
a screen or so. Nothing critical but it splits up the message in separate
chunks together with `Cannot glue message parts together' entries. It also
sends out an empty log message to everyone's terminal. I recall that while
doing some work for getting exim acquainted with syslog we have had the
same problems with long messages. Also, I bet this syslog implementation
(claims to be derived from "stock BSD") is very widespread.

It's debatable whether these sort of things are appropriate for syslog,
though. Who cares about the parse tree the next day?

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: pg_group_name_index corrupt?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> An --enable-syslog switch will be in the next release unless someone
> protests.

Seems reasonable.

>> Good question.  Before we standardize on that, however, some testing
>> might be in order.  I haven't tried stuffing multimegabyte querytree
>> dumps into syslog ... will it work?  On how many platforms?

> It's debatable whether these sort of things are appropriate for syslog,
> though. Who cares about the parse tree the next day?

Yeah.  We really need multiple log destinations to do this right.
Tim Holloway had some ideas about that last year, which I thought were
overkill at the time, but I'm beginning to see his point...
        regards, tom lane