Thread: Update on my 6.4.2 progress

Update on my 6.4.2 progress

From
Wayne Piekarski
Date:
Hi,

Just thought I'd drop an email again - I was the one having neverending
trouble with 6.4.2 jamming with backends waiting, and other types of
problems.

Although I'm still using 6.4.2, I hope this will still be helpful for
the developers in case it impacts on things in 6.5.

We installed Tom Lanes shared memory patches, which I emailed about
earlier, and they helped a bit, but unfortunately, we still get backends
stuck waiting even today....

The interesting thing is, we went and put in another 128 mb of ram (from
256 to 384 now) and recompiled the kernel with more semaphores and shared
memory, and the improvement was incredible! Before, we would get semget
failures every so often when we had about 50 backends going, causing the
whole thing to fall over, but now we get
"fmgr_info: function 111257088: cache lookup failed"
after 64 backends (which is what we compiled postgres for) which I
assume isn't so fatal and the whole system keeps running.

For three days after our little upgrade, the whole thing ran smoothly,
then we ran into the problem of the stuck waiting backends. We thought the
problem was gone but it was still there. So what would happen is a backend
would get stuck, cause others to get stuck, and the postgres' would just
build up until it hit 64, then we'd have to kill them off and would be ok
again. At least now the number of problems have decreased slightly.

One interesting message we got during this problem was:
NOTICE:  LockRelease: locktable lookup failed, no lock 

It seems as though the backends are waiting for a lock that got deleted
accidentally, although I have no idea how the code works so can't offer
any advice where.

Lately though, the problems are happening with higher frequency, and every
so often we still get the BTP_CHAIN problems with tables (which I sent
another email about fixing) so I need to fix this.


One thing I was disappointed with was after adding an extra 128 mb of ram,
I was hoping that this would be used for disk caching, but when performing
repeated select queries on tables, where I did something like:

select sum(some_value) from some_table;

The result took the same amount of time to run each time, and was not
cached at all (the table was about 100 mb) and when doing the query, our
raid controller would just light up which I wanted to avoid. After seeing
this, I read posts on the hackers list where people were talking about
fsync'ing the pg_log to note down whether things had been commited or not.

The table I was testing was totally read only, no modifications being
made, however, another table gets almost continuous changes 24 hours per
day, more than 1 per second, so would this be causing the machine to
continuously flush pg_log to disk and cause my read-only tables to still
not be cached?

I guess my next question is, can i comment out the fsync call? <grin> With
the disks performing more efficient updates, the whole thing would run
faster and run less risks of crashing. Currently, the performance can be
quite bad sometimes when the machine is doing lots of disk activity,
because even the simplest read only queries block because they aren't
cached.

Would moving pg_log to a 2nd disk make a difference? Are there other
important files like pg_log which should go onto separate disks as well? I
have no problem with multiple disks, but it was only recently that I
discovered this fsyncing thing on pg_log. Is pg_log more speed and fsync
critical than the actual data itself? I have two raid controllers, a slow
and a fast one, and I want to move pg_log to one of them, but not sure
which one.


So in summary, I've learned that if you are having troubles, put in more
memory, (even if you have some free) and increase your kernels internal
sizes for semaphores and shared memory values to really large values, even
when postgres isn't complaining. It makes a difference for some reason
and everything was a lot happier.

BTP_CHAIN and the backends waiting problem are still occuring, although I
cannot build a test case for either of them, they are very much problems
which occur accidentally and at random times.


thanks again,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au


Re: [HACKERS] Update on my 6.4.2 progress

From
Hannu Krosing
Date:
Wayne Piekarski wrote:
> 
> I guess my next question is, can i comment out the fsync call?

if you ar confident in your os and hardware, you can 
pass the -F flag to backend and no fsyncs are done.

(add -o '-F' to postmaster startup line)

I think it is in some faq too.

--------------
Hannu


Re: [HACKERS] Update on my 6.4.2 progress

From
Bruce Momjian
Date:
> The interesting thing is, we went and put in another 128 mb of ram (from
> 256 to 384 now) and recompiled the kernel with more semaphores and shared
> memory, and the improvement was incredible! Before, we would get semget
> failures every so often when we had about 50 backends going, causing the
> whole thing to fall over, but now we get
> "fmgr_info: function 111257088: cache lookup failed"
> after 64 backends (which is what we compiled postgres for) which I
> assume isn't so fatal and the whole system keeps running.

The 6.4.2 code would not allocate all shared memory/semaphores at
startup, and only fail when you go to a large number of backends.  6.5
fixes this by allocating it all on startup.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Update on my 6.4.2 progress

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> The 6.4.2 code would not allocate all shared memory/semaphores at
> startup, and only fail when you go to a large number of backends.  6.5
> fixes this by allocating it all on startup.

Also, I don't think 6.4.* actually tested for an attempt to start one
too many backends; it'd just do it and eventually you'd get a failure
downstream somewhere.  (A failure *will* happen, because there are
fixed-size arrays containing per-backend entries, but I think the code
failed to notice ...)

There is now code in the postmaster that prevents starting that fatal
65th (or whatever) backend.  If you want to keep running 6.4.2 you
should consider adopting CountChildren() and the code that calls it
from 6.5's src/backend/postmaster/postmaster.c.
        regards, tom lane


Re: [HACKERS] Update on my 6.4.2 progress

From
Wayne Piekarski
Date:
> Wayne Piekarski wrote:
> > 
> > I guess my next question is, can i comment out the fsync call?
> 
> if you ar confident in your os and hardware, you can 
> pass the -F flag to backend and no fsyncs are done.
> 
> (add -o '-F' to postmaster startup line)
> 
> I think it is in some faq too.

I already have the -o -F switch in the startup file (which I believe is
working) but I'm under the impression from what I read that there are two
fsync's - one you can switch off, and one which is fixed into the code
and possibly can't be removed?

Regards,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au


Re: [HACKERS] Update on my 6.4.2 progress

From
Tom Lane
Date:
Wayne Piekarski <wayne@senet.com.au> writes:
> I already have the -o -F switch in the startup file (which I believe is
> working) but I'm under the impression from what I read that there are two
> fsync's - one you can switch off, and one which is fixed into the code
> and possibly can't be removed?

No.  I've looked.

Actually there is an un-disablable fsync() on the error file in elog.c,
but it's not invoked under ordinary scenarios as far as I can tell,
and it shouldn't be a performance bottleneck anyway.  *All* the ordinary
uses of fsync go through pg_fsync.
        regards, tom lane


Re: [HACKERS] Update on my 6.4.2 progress

From
Wayne Piekarski
Date:
> > The interesting thing is, we went and put in another 128 mb of ram (from
> > 256 to 384 now) and recompiled the kernel with more semaphores and shared
> > memory, and the improvement was incredible! Before, we would get semget
> > failures every so often when we had about 50 backends going, causing the
> > whole thing to fall over, but now we get
> > "fmgr_info: function 111257088: cache lookup failed"
> > after 64 backends (which is what we compiled postgres for) which I
> > assume isn't so fatal and the whole system keeps running.
> 
> The 6.4.2 code would not allocate all shared memory/semaphores at
> startup, and only fail when you go to a large number of backends.  6.5
> fixes this by allocating it all on startup.

Ok, thats cool ... One question though: is the cache lookup failed message
really bad or is it a cryptic way of saying that the connection is refused
but everything else is cool? I have no problem with the fact that the
connection failed, but does it cause corruption or postgres to fall over
later on? Ie, if you get a semget failure, shortly after the whole thing
will die, possibly causing data corruption or something. Would these kind
of errors cause BTP_CHAIN errors, or is that totally unrelated?

As another general question, if I randomly kill postgres backends during
the middle of transactions, is there a possibility for corruption, or is
it safe due to the way transactions are commited, etc. I've always been
very nervous when it comes to killing backends, as I was worried something
might go wrong, leaving something out of sync.

thanks,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au


Re: [HACKERS] Update on my 6.4.2 progress

From
Wayne Piekarski
Date:
> Wayne Piekarski <wayne@senet.com.au> writes:
> > I already have the -o -F switch in the startup file (which I believe is
> > working) but I'm under the impression from what I read that there are two
> > fsync's - one you can switch off, and one which is fixed into the code
> > and possibly can't be removed?
> 
> No.  I've looked.
> 
> Actually there is an un-disablable fsync() on the error file in elog.c,
> but it's not invoked under ordinary scenarios as far as I can tell,
> and it shouldn't be a performance bottleneck anyway.  *All* the ordinary
> uses of fsync go through pg_fsync.

I had a dig through the source code yesterday and witnessed the same thing
as well, each call is controlled with -F. However, I did mess up when I
wrote my previous email though, because I don't have -F enabled right now,
so I am running with the fsync() turned on, which makes sense and explains
what is happening with the cache. 

After reading the mailing list I was under the impression this fsyncing
was different from the one controlled by -F.

I am going to be taking it for a test tonight with -F enabled to observe
how much better the performance is. Hopefully it will cache better as a
result of this, I guess I'll have to run it like this from now on.

thanks,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au


Re: [HACKERS] Update on my 6.4.2 progress

From
Wayne Piekarski
Date:
> > > I guess my next question is, can i comment out the fsync call?
> > 
> > if you ar confident in your os and hardware, you can 
> > pass the -F flag to backend and no fsyncs are done.
> > 
> > (add -o '-F' to postmaster startup line)
> > 
> > I think it is in some faq too.
> 
> I already have the -o -F switch in the startup file (which I believe is
> working) but I'm under the impression from what I read that there are two
> fsync's - one you can switch off, and one which is fixed into the code
> and possibly can't be removed?

Eeeep! When I wrote the above, I was mistaken. My config file did not have
-o -F, which was why the fsync's were occuring. Sorry for messing you
around here .... 

What I was concerned about was the lack of caching and thrashing, but I
guess I can solve that with no fsync.

thanks,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au


Re: [HACKERS] Update on my 6.4.2 progress

From
Tom Lane
Date:
Wayne Piekarski <wayne@senet.com.au> writes:
>>>> whole thing to fall over, but now we get
>>>> "fmgr_info: function 111257088: cache lookup failed"
>>>> after 64 backends (which is what we compiled postgres for) which I
>>>> assume isn't so fatal and the whole system keeps running.

> ... One question though: is the cache lookup failed message
> really bad or is it a cryptic way of saying that the connection is refused
> but everything else is cool?

I'd put it in the "really bad" category, mainly because I don't see the
cause-and-effect chain.  It is *not* anything to do with connection
validation, that's for sure.  My guess is that the additional backend
has connected and is trying to make queries, and that queries are now
failing for some resource-exhaustion kind of reason.  But I don't know
why that would tend to show up as an fmgr_info failure before anything
else.  Do you use user-defined functions especially heavily in this
database?  For that matter, does the OID reported by fmgr_info actually
correspond to any row of pg_proc?

> As another general question, if I randomly kill postgres backends during
> the middle of transactions, is there a possibility for corruption, or is
> it safe due to the way transactions are commited, etc.

I'd regard it as very risky --- if that backend is in the middle of
modifying shared memory, you could leave shared memory datastructures
and/or disk blocks in inconsistent states.  You could probably get away
with it for a backend that was blocked waiting for a lock.
        regards, tom lane


Re: [HACKERS] Update on my 6.4.2 progress

From
Wayne Piekarski
Date:
Hi,

> Wayne Piekarski <wayne@senet.com.au> writes:
> >>>> whole thing to fall over, but now we get
> >>>> "fmgr_info: function 111257088: cache lookup failed"
> >>>> after 64 backends (which is what we compiled postgres for) which I
> >>>> assume isn't so fatal and the whole system keeps running.
>
> failing for some resource-exhaustion kind of reason.  But I don't know
> why that would tend to show up as an fmgr_info failure before anything
> else.  Do you use user-defined functions especially heavily in this
> database?  For that matter, does the OID reported by fmgr_info actually
> correspond to any row of pg_proc?

I had a look, and there is no entry in pg_proc for any oid like the above
mentioned. One thing that is very interesting is that we use a ton of user
defined function (in C, plpgsql, and SQL) like you asked and that we
also had this problem a while back:

At midnight, we have a process called the vacuum manager, which drops the
indices on a table, vacuum's it, and then recreates the indices. During
this time, we suspend the processes which could possibly do work, so they
sit there waiting for this lock file on disk to disappear, then they
resume their work when the vacuum manager is finished.

The interesting part is, when this one process would resume, it would die
inside a plpgsql function. It would crash the backend with a message like:
ExecOpenR: relation == NULL, heap_open failed". I put some extra code to
find the oid value, but the oid didn't exist in pg_proc. I think somewhere
internally postgres had stored the oid of an index, and then barfed when
it tried to use that index later on. 

To avoid backends crashing, we reconnected when the lock file was removed,
and this fixed the problem up. However, I don't know why this happened at
all, it was really bizarre. The stranger part was that the query that died
would always be in a plpgsql function, why is that? My next question is,
are user defined function bad in general, could they cause locking
problems, crashing, etc, which might explain some of the massive problems
I'm having [Still got problems with BTP_CHAIN and backends waiting - 6.4.2]

> > As another general question, if I randomly kill postgres backends during
> > the middle of transactions, is there a possibility for corruption, or is
> > it safe due to the way transactions are commited, etc.
> 
> I'd regard it as very risky --- if that backend is in the middle of
> modifying shared memory, you could leave shared memory datastructures
> and/or disk blocks in inconsistent states.  You could probably get away
> with it for a backend that was blocked waiting for a lock.

Well, technically when a backend crashes, it kills all the other backends
as well so this should avoid the shared memory corruption problems right?

****

Also, I'm still having troubles with this BTP_CHAIN stuff ... I think I've
worked out how to reproduce it, but not enough to write a script for it.

Basically, if I have lots of writers and readers doing small work and then
someone comes along with a huge read or write (ie, join against a big
table and it takes ages) then all of a sudden queries will try to do an
update and I get the BTP_CHAIN problem.

Apart from reloading the table, is there any way I can fix up the
BTP_CHAIN problem an easier way? It takes ages to reload a 100 mb table :(
Vacuum fails with blowawayrelationbuffers = -2 (As re my previous email)

This BTP_CHAIN stuff is really bad, I can't make this stuff work reliably
and it causes n-million problems for the people who need to use the dbms
and the table is dead.

****


thanks,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au