Thread: Possible bug with shared memory buffers

Possible bug with shared memory buffers

From
Mark Rae
Date:
Hi,

I have been playing with with a number of large databases in postgres
(several GB in size), over the last 6 months or so, and have
occasionally noticed behaviour which might indicate a possible
problem with the shared memory.

I initially noticed it when using v7.1.2, but have recently been
trying the 7.2b versions which seem to have the same/similar problem.
(My setup is a dual Athlon1800+ with 1GB of memory, 512M of which
is dedicated to postgres shm buffers)


Case 1: This was with v7.1.2/3 on FreeBSD
I would create a database and load some data (2-3Gb) into
it using Perl::DBI. I would then make some changes to the loading
programs, 'DROP' the database, recreate it and reload the data to
measure the effects of the changes.

Normally the backend process would 'swap in' all 512M of shared
memory when loading, but occasionally after dropping the previous
database, the new backend would only seem to be able to use a
small amount of the shared memory. Restarting postgres would
clear the problem.


Case 2: This is with 7.2b4 on Linux
I created one database as before and everything works as expected,
however I then created a second database without dropping the original,
and it also seemed to be working as expected.
However as it grew in size, it slowed down significantly (20-30 times slower),
as though it could not get enough memory for the indexes (The largest of
which can be 0.5G-1G in size).
Although this time the backend process appeared to have the whole 512M
'swapped in'. Again restarting postgres cleared the problem.


Unfortunately the problem does not appear to be reliably repeatable,
so I am just guessing that the problem might be related to the SHM buffers
being 'leaked' or becoming permanently assigned to a specific
database, even when they are no longer needed.


Has anyone else seen similar behaviour, or is there any way to
check how postgres has allocated the buffers to help pinpoint
the problem?


    -Mark

--
Mark Rae                                       Tel: +44(0)20 7074 4648
Inpharmatica                                   Fax: +44(0)20 7074 4700
m.rae@inpharmatica.co.uk                http://www.inpharmatica.co.uk/

Re: Possible bug with shared memory buffers

From
"P.J. \"Josh\" Rovero"
Date:
I believe I am seeing a similar problem in 7.2b4 under Linux.
Large and complex queries work fine after the postmaster is
first started up.  The same queries fail days/weeks later.

Typically psql unexpectedly disconnects while running the
queries.  Restarting psql will not clear the problem, but
stopping/restarting the postmaster will.

Mark Rae wrote:


>
> Case 2: This is with 7.2b4 on Linux
> I created one database as before and everything works as expected,
> however I then created a second database without dropping the original,
> and it also seemed to be working as expected.
> However as it grew in size, it slowed down significantly (20-30 times slower),
> as though it could not get enough memory for the indexes (The largest of
> which can be 0.5G-1G in size).
> Although this time the backend process appeared to have the whole 512M
> 'swapped in'. Again restarting postgres cleared the problem.
>
>
> Unfortunately the problem does not appear to be reliably repeatable,
> so I am just guessing that the problem might be related to the SHM buffers
> being 'leaked' or becoming permanently assigned to a specific
> database, even when they are no longer needed.


--
P. J. "Josh" Rovero                                 Sonalysts, Inc.
Email: rovero@sonalysts.com    www.sonalysts.com    215 Parkway North
Work: (860)326-3671 or 442-4355                     Waterford CT 06385
***********************************************************************


Re: Possible bug with shared memory buffers

From
Tom Lane
Date:
Mark Rae <m.rae@inpharmatica.co.uk> writes:
> Normally the backend process would 'swap in' all 512M of shared
> memory when loading, but occasionally after dropping the previous
> database, the new backend would only seem to be able to use a
> small amount of the shared memory.

This is so vague that I'm not even sure what you're complaining about.
What do you mean by "only seem to be able to use a small amount of the
shared memory"?

            regards, tom lane

Re: Possible bug with shared memory buffers

From
Tom Lane
Date:
"P.J. \"Josh\" Rovero" <rovero@sonalysts.com> writes:
> I believe I am seeing a similar problem in 7.2b4 under Linux.
> Large and complex queries work fine after the postmaster is
> first started up.  The same queries fail days/weeks later.

> Typically psql unexpectedly disconnects while running the
> queries.

That would suggest a core dump.  Have you looked at the postmaster log
and/or tried to debug the corefiles to follow up on these failures?

If you're going to run beta software then you should expect to expend
some effort helping to identify and fix its bugs ...

            regards, tom lane

Re: Possible bug with shared memory buffers

From
Mark Rae
Date:
Tom Lane wrote:
>
> Mark Rae <m.rae@inpharmatica.co.uk> writes:
> > Normally the backend process would 'swap in' all 512M of shared
> > memory when loading, but occasionally after dropping the previous
> > database, the new backend would only seem to be able to use a
> > small amount of the shared memory.
>
> This is so vague that I'm not even sure what you're complaining about.
> What do you mean by "only seem to be able to use a small amount of the
> shared memory"?
>

I was referring to the memory usage as reported by top,ps etc.

Normally as the data is loaded, the RSS,SIZE,SHARE values for the
backend would grow towards ~512M.
However when the problem occurs the memory usage would not grow as normal
but get 'stuck' at some lower value (about 10M or so).


In 'Case 2' which I described, the process size was reporting
the use of all 512M, but it still slowed down in the same way.

i.e. the speed of loading was as expected, until the database
grew above a certain size, at which point the performance dropped
drastically. And at that point the whole database (including
indexes) certainly couldn't have been more than about 100M in size.


Also for information, the machine is dedicated machine, with nothing
else running except the loading process which is about 5M. So it
couldn't have been anything else hogging the memory.

    -Mark

--
Mark Rae                                       Tel: +44(0)20 7074 4648
Inpharmatica                                   Fax: +44(0)20 7074 4700
m.rae@inpharmatica.co.uk                http://www.inpharmatica.co.uk/

Re: Possible bug with shared memory buffers

From
Bruce Momjian
Date:
Mark Rae wrote:
> Tom Lane wrote:
> >
> > Mark Rae <m.rae@inpharmatica.co.uk> writes:
> > > Normally the backend process would 'swap in' all 512M of shared
> > > memory when loading, but occasionally after dropping the previous
> > > database, the new backend would only seem to be able to use a
> > > small amount of the shared memory.
> >
> > This is so vague that I'm not even sure what you're complaining about.
> > What do you mean by "only seem to be able to use a small amount of the
> > shared memory"?
> >
>
> I was referring to the memory usage as reported by top,ps etc.
>
> Normally as the data is loaded, the RSS,SIZE,SHARE values for the
> backend would grow towards ~512M.
> However when the problem occurs the memory usage would not grow as normal
> but get 'stuck' at some lower value (about 10M or so).

OK, let me jump in here.  First, I am unsure how the various OS's
determine memory usage when using shared memory.  Do they report the
512mb shared buffers for each backend, none of them, one of them, or
spread the 512mb evenly among all attached backends?

Second, the idea of shared memory being paged out is not attractive to
me.  On FreeBSD, I believe sysctl will allow you to control if the
shared memory is paged out.  Not sure about Linux.  I recommend turning
off shared memory paging and see what happens.  I can imagine
performance taking a major hit if any shared memory is not in RAM.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Possible bug with shared memory buffers

From
Mark Rae
Date:
Bruce Momjian wrote:
> > > Mark Rae <m.rae@inpharmatica.co.uk> writes:
> > > > Normally the backend process would 'swap in' all 512M of shared
> > > > memory when loading, but occasionally after dropping the previous
>
> OK, let me jump in here.  First, I am unsure how the various OS's
> determine memory usage when using shared memory.  Do they report the
> 512mb shared buffers for each backend, none of them, one of them, or
> spread the 512mb evenly among all attached backends?
>
> Second, the idea of shared memory being paged out is not attractive to
> me.  On FreeBSD, I believe sysctl will allow you to control if the
> shared memory is paged out.  Not sure about Linux.  I recommend turning
> off shared memory paging and see what happens.  I can imagine
> performance taking a major hit if any shared memory is not in RAM.

Firstly, I should apologise for causing a bit of confusion, I meant to
say 'mapped' not 'swapped'.
The actual shared memory area was resident in memory, It was just that
the backend process stats were reporting less than this was mapped into
the address space of the process, implying that it was not being used
like it normally would be.

Having reread what I wrote, I seem to have obscured the facts
with my speculation as to the cause of the problem. So I shall
have another go.

The only things I know for certain are that loading processes which
I have run many times before, have very occasionally slowed down by a
factor of 20 or more as the database size has grown, but were behaving
as expected when small. This happened 3 or 4 times with 7.1.2 a few
months ago, and most recently last week with 7.2b4
Restarting the loading, which means a new backend process, does not
fix the problem, however restarting the database does.

With v7.1.2 on FreeBSD I also noticed that the backend was not using
all of the shared memory that it potentially had available to it, as
it normally does. With 7.2b4 on Linux I did not see this effect with
the memory, but the behaviour was otherwise the same.

I realise that this is not much to go on, and unfortunately quite
rare and unpredictable so I can't really narrow it down any further
at the moment.
But such a drastic slowdown is quite worrying, so what I was really
hoping for is some advice on the best way of figuring out what is
going on when it happens again.

    -Mark

--
Mark Rae                                       Tel: +44(0)20 7074 4648
Inpharmatica                                   Fax: +44(0)20 7074 4700
m.rae@inpharmatica.co.uk                http://www.inpharmatica.co.uk/