Thread: Possible bug with shared memory buffers
Hi, I have been playing with with a number of large databases in postgres (several GB in size), over the last 6 months or so, and have occasionally noticed behaviour which might indicate a possible problem with the shared memory. I initially noticed it when using v7.1.2, but have recently been trying the 7.2b versions which seem to have the same/similar problem. (My setup is a dual Athlon1800+ with 1GB of memory, 512M of which is dedicated to postgres shm buffers) Case 1: This was with v7.1.2/3 on FreeBSD I would create a database and load some data (2-3Gb) into it using Perl::DBI. I would then make some changes to the loading programs, 'DROP' the database, recreate it and reload the data to measure the effects of the changes. Normally the backend process would 'swap in' all 512M of shared memory when loading, but occasionally after dropping the previous database, the new backend would only seem to be able to use a small amount of the shared memory. Restarting postgres would clear the problem. Case 2: This is with 7.2b4 on Linux I created one database as before and everything works as expected, however I then created a second database without dropping the original, and it also seemed to be working as expected. However as it grew in size, it slowed down significantly (20-30 times slower), as though it could not get enough memory for the indexes (The largest of which can be 0.5G-1G in size). Although this time the backend process appeared to have the whole 512M 'swapped in'. Again restarting postgres cleared the problem. Unfortunately the problem does not appear to be reliably repeatable, so I am just guessing that the problem might be related to the SHM buffers being 'leaked' or becoming permanently assigned to a specific database, even when they are no longer needed. Has anyone else seen similar behaviour, or is there any way to check how postgres has allocated the buffers to help pinpoint the problem? -Mark -- Mark Rae Tel: +44(0)20 7074 4648 Inpharmatica Fax: +44(0)20 7074 4700 m.rae@inpharmatica.co.uk http://www.inpharmatica.co.uk/
I believe I am seeing a similar problem in 7.2b4 under Linux. Large and complex queries work fine after the postmaster is first started up. The same queries fail days/weeks later. Typically psql unexpectedly disconnects while running the queries. Restarting psql will not clear the problem, but stopping/restarting the postmaster will. Mark Rae wrote: > > Case 2: This is with 7.2b4 on Linux > I created one database as before and everything works as expected, > however I then created a second database without dropping the original, > and it also seemed to be working as expected. > However as it grew in size, it slowed down significantly (20-30 times slower), > as though it could not get enough memory for the indexes (The largest of > which can be 0.5G-1G in size). > Although this time the backend process appeared to have the whole 512M > 'swapped in'. Again restarting postgres cleared the problem. > > > Unfortunately the problem does not appear to be reliably repeatable, > so I am just guessing that the problem might be related to the SHM buffers > being 'leaked' or becoming permanently assigned to a specific > database, even when they are no longer needed. -- P. J. "Josh" Rovero Sonalysts, Inc. Email: rovero@sonalysts.com www.sonalysts.com 215 Parkway North Work: (860)326-3671 or 442-4355 Waterford CT 06385 ***********************************************************************
Mark Rae <m.rae@inpharmatica.co.uk> writes: > Normally the backend process would 'swap in' all 512M of shared > memory when loading, but occasionally after dropping the previous > database, the new backend would only seem to be able to use a > small amount of the shared memory. This is so vague that I'm not even sure what you're complaining about. What do you mean by "only seem to be able to use a small amount of the shared memory"? regards, tom lane
"P.J. \"Josh\" Rovero" <rovero@sonalysts.com> writes: > I believe I am seeing a similar problem in 7.2b4 under Linux. > Large and complex queries work fine after the postmaster is > first started up. The same queries fail days/weeks later. > Typically psql unexpectedly disconnects while running the > queries. That would suggest a core dump. Have you looked at the postmaster log and/or tried to debug the corefiles to follow up on these failures? If you're going to run beta software then you should expect to expend some effort helping to identify and fix its bugs ... regards, tom lane
Tom Lane wrote: > > Mark Rae <m.rae@inpharmatica.co.uk> writes: > > Normally the backend process would 'swap in' all 512M of shared > > memory when loading, but occasionally after dropping the previous > > database, the new backend would only seem to be able to use a > > small amount of the shared memory. > > This is so vague that I'm not even sure what you're complaining about. > What do you mean by "only seem to be able to use a small amount of the > shared memory"? > I was referring to the memory usage as reported by top,ps etc. Normally as the data is loaded, the RSS,SIZE,SHARE values for the backend would grow towards ~512M. However when the problem occurs the memory usage would not grow as normal but get 'stuck' at some lower value (about 10M or so). In 'Case 2' which I described, the process size was reporting the use of all 512M, but it still slowed down in the same way. i.e. the speed of loading was as expected, until the database grew above a certain size, at which point the performance dropped drastically. And at that point the whole database (including indexes) certainly couldn't have been more than about 100M in size. Also for information, the machine is dedicated machine, with nothing else running except the loading process which is about 5M. So it couldn't have been anything else hogging the memory. -Mark -- Mark Rae Tel: +44(0)20 7074 4648 Inpharmatica Fax: +44(0)20 7074 4700 m.rae@inpharmatica.co.uk http://www.inpharmatica.co.uk/
Mark Rae wrote: > Tom Lane wrote: > > > > Mark Rae <m.rae@inpharmatica.co.uk> writes: > > > Normally the backend process would 'swap in' all 512M of shared > > > memory when loading, but occasionally after dropping the previous > > > database, the new backend would only seem to be able to use a > > > small amount of the shared memory. > > > > This is so vague that I'm not even sure what you're complaining about. > > What do you mean by "only seem to be able to use a small amount of the > > shared memory"? > > > > I was referring to the memory usage as reported by top,ps etc. > > Normally as the data is loaded, the RSS,SIZE,SHARE values for the > backend would grow towards ~512M. > However when the problem occurs the memory usage would not grow as normal > but get 'stuck' at some lower value (about 10M or so). OK, let me jump in here. First, I am unsure how the various OS's determine memory usage when using shared memory. Do they report the 512mb shared buffers for each backend, none of them, one of them, or spread the 512mb evenly among all attached backends? Second, the idea of shared memory being paged out is not attractive to me. On FreeBSD, I believe sysctl will allow you to control if the shared memory is paged out. Not sure about Linux. I recommend turning off shared memory paging and see what happens. I can imagine performance taking a major hit if any shared memory is not in RAM. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: > > > Mark Rae <m.rae@inpharmatica.co.uk> writes: > > > > Normally the backend process would 'swap in' all 512M of shared > > > > memory when loading, but occasionally after dropping the previous > > OK, let me jump in here. First, I am unsure how the various OS's > determine memory usage when using shared memory. Do they report the > 512mb shared buffers for each backend, none of them, one of them, or > spread the 512mb evenly among all attached backends? > > Second, the idea of shared memory being paged out is not attractive to > me. On FreeBSD, I believe sysctl will allow you to control if the > shared memory is paged out. Not sure about Linux. I recommend turning > off shared memory paging and see what happens. I can imagine > performance taking a major hit if any shared memory is not in RAM. Firstly, I should apologise for causing a bit of confusion, I meant to say 'mapped' not 'swapped'. The actual shared memory area was resident in memory, It was just that the backend process stats were reporting less than this was mapped into the address space of the process, implying that it was not being used like it normally would be. Having reread what I wrote, I seem to have obscured the facts with my speculation as to the cause of the problem. So I shall have another go. The only things I know for certain are that loading processes which I have run many times before, have very occasionally slowed down by a factor of 20 or more as the database size has grown, but were behaving as expected when small. This happened 3 or 4 times with 7.1.2 a few months ago, and most recently last week with 7.2b4 Restarting the loading, which means a new backend process, does not fix the problem, however restarting the database does. With v7.1.2 on FreeBSD I also noticed that the backend was not using all of the shared memory that it potentially had available to it, as it normally does. With 7.2b4 on Linux I did not see this effect with the memory, but the behaviour was otherwise the same. I realise that this is not much to go on, and unfortunately quite rare and unpredictable so I can't really narrow it down any further at the moment. But such a drastic slowdown is quite worrying, so what I was really hoping for is some advice on the best way of figuring out what is going on when it happens again. -Mark -- Mark Rae Tel: +44(0)20 7074 4648 Inpharmatica Fax: +44(0)20 7074 4700 m.rae@inpharmatica.co.uk http://www.inpharmatica.co.uk/