Thread: PostgreSQL 8.0.6 crash

PostgreSQL 8.0.6 crash

From

"Mark Woodward"

Date:

09 February 2006, 10:44:10

PostgreSQL promptly uses all available memory for the query and
subsequently crashes.

I'm sure it can be corrected with a setting, but should it crash?

freedb=# create table ucode as select distinct ucode from cdtitles group
by ucode having count(ucode)>1 ;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

(Table layout)
freedb=# \d cdtitles
                      Table "public.cdtitles"
  Column   |       Type        |             Modifiers
-----------+-------------------+-----------------------------------
 cdid      | integer           | default nextval('seq_cdid'::text)
 cdcode    | character varying |
 ucode     | character varying |
 title     | character varying |
 artist    | character varying |
 genre     | character varying |
 dgenre    | character varying |
 year      | integer           |
 processed | character varying |
 submit    | character varying |
 revision  | integer           |
 disclen   | integer           |
 tracks    | integer           |
 extkey    | integer           |
 offsets   | integer[]         |
Indexes:
    "cdtitles_ucode" btree (ucode) CLUSTER

(Size of table)
freedb=# select count(*) from cdtitles ;
  count
---------
 1927912
(1 row)

(Sample row)
freedb=# select * from cdtitles where cdid = 100001 limit 1;
  cdid  |  cdcode  |         ucode         |          title           |
artist   | genre | dgenre | year |                    processed
          |     submit      | revision | disclen | tracks | extkey |
                                                               offsets

--------+----------+-----------------------+--------------------------+------------+-------+--------+------+-------------------------------------------------+-----------------+----------+---------+--------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------
 100001 | 320e9e14 | 0ea0-14-3399-0006ad6a | Var Slutar V�arna
(CD1) | Streaplers | rock  |        |    0 | cddbd v1.5PL3 Copyright (c)
Steve Scherf et al. | dBpowerAMP V4.0 |        0 |         |     19 |
   |

{21,150,13920,31180,43664,59907,73163,86629,98447,113684,130205,141682,154920,166783,179028,192689,205161,222164,234649,249692,264340,3744,0}
(1 row

Attachment

postgresql.conf

Re: PostgreSQL 8.0.6 crash

From

"Mark Woodward"

Date:

09 February 2006, 10:52:39

More info: the machine has 512M RAM and 512M swap
Work mem is set to:work_mem = 1024

This should't have crashed, should it?

> PostgreSQL promptly uses all available memory for the query and
> subsequently crashes.
>
> I'm sure it can be corrected with a setting, but should it crash?
>
> freedb=# create table ucode as select distinct ucode from cdtitles group
> by ucode having count(ucode)>1 ;
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> (Table layout)
> freedb=# \d cdtitles
>                       Table "public.cdtitles"
>   Column   |       Type        |             Modifiers
> -----------+-------------------+-----------------------------------
>  cdid      | integer           | default nextval('seq_cdid'::text)
>  cdcode    | character varying |
>  ucode     | character varying |
>  title     | character varying |
>  artist    | character varying |
>  genre     | character varying |
>  dgenre    | character varying |
>  year      | integer           |
>  processed | character varying |
>  submit    | character varying |
>  revision  | integer           |
>  disclen   | integer           |
>  tracks    | integer           |
>  extkey    | integer           |
>  offsets   | integer[]         |
> Indexes:
>     "cdtitles_ucode" btree (ucode) CLUSTER
>
> (Size of table)
> freedb=# select count(*) from cdtitles ;
>   count
> ---------
>  1927912
> (1 row)
>
> (Sample row)
> freedb=# select * from cdtitles where cdid = 100001 limit 1;
>   cdid  |  cdcode  |         ucode         |          title           |
> artist   | genre | dgenre | year |                    processed
>           |     submit      | revision | disclen | tracks | extkey |
>                                                                offsets
>
--------+----------+-----------------------+--------------------------+------------+-------+--------+------+-------------------------------------------------+-----------------+----------+---------+--------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------
>  100001 | 320e9e14 | 0ea0-14-3399-0006ad6a | Var Slutar V�arna
> (CD1) | Streaplers | rock  |        |    0 | cddbd v1.5PL3 Copyright (c)
> Steve Scherf et al. | dBpowerAMP V4.0 |        0 |         |     19 |
>    |
>
{21,150,13920,31180,43664,59907,73163,86629,98447,113684,130205,141682,154920,166783,179028,192689,205161,222164,234649,249692,264340,3744,0}
> (1 row
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: PostgreSQL 8.0.6 crash

From

Tom Lane

Date:

09 February 2006, 11:14:15

"Mark Woodward" <pgsql@mohawksoft.com> writes:
> PostgreSQL promptly uses all available memory for the query and
> subsequently crashes.

I'll bet a nickel this is on a Linux machine with OOM kill enabled.
What does the postmaster log show --- or look in the kernel log to
see if it mentions anything about an out-of-memory kill.

> freedb=# create table ucode as select distinct ucode from cdtitles group
> by ucode having count(ucode)>1 ;
> server closed the connection unexpectedly

What does EXPLAIN show as the plan for that?  If it's a hash aggregate,
try with "enable_hashagg" turned off.  How many distinct ucode values
are there in the table?
        regards, tom lane

Re: PostgreSQL 8.0.6 crash

From

"Mark Woodward"

Date:

09 February 2006, 11:27:35

> "Mark Woodward" <pgsql@mohawksoft.com> writes:
>> PostgreSQL promptly uses all available memory for the query and
>> subsequently crashes.
>
> I'll bet a nickel this is on a Linux machine with OOM kill enabled.
> What does the postmaster log show --- or look in the kernel log to
> see if it mentions anything about an out-of-memory kill.

That's a no brainer. Maybe I should have rephrased the condition, crash
may be the wrong word, it was definitely killed by out of memory. Sorry.

>
>> freedb=# create table ucode as select distinct ucode from cdtitles group
>> by ucode having count(ucode)>1 ;
>> server closed the connection unexpectedly
>
> What does EXPLAIN show as the plan for that?  If it's a hash aggregate,
> try with "enable_hashagg" turned off.  How many distinct ucode values
> are there in the table?

There are over 1.7M distinct rows, about 200K non-distinct that I want to
somehow remove.

It does have hash aggregate:
freedb=# explain select distinct ucode from cdtitles group by ucode having
count(ucode)>1 ;                                    QUERY PLAN
-------------------------------------------------------------------------------------Unique  (cost=106536.32..106537.32
rows=200width=32)  ->  Sort  (cost=106536.32..106536.82 rows=200 width=32)        Sort Key: ucode        ->
HashAggregate (cost=106527.68..106528.68 rows=200 width=32)              Filter: (count(ucode) > 1)              ->
SeqScan on cdtitles  (cost=0.00..96888.12 rows=1927912
 
width=32)
(6 rows)


Well, shouldn't hash aggregate respect work memory limits?

Re: PostgreSQL 8.0.6 crash

From

Tom Lane

Date:

09 February 2006, 11:36:14

"Mark Woodward" <pgsql@mohawksoft.com> writes:
>          ->  HashAggregate  (cost=106527.68..106528.68 rows=200 width=32)
>                Filter: (count(ucode) > 1)
>                ->  Seq Scan on cdtitles  (cost=0.00..96888.12 rows=1927912
> width=32)

> Well, shouldn't hash aggregate respect work memory limits?

If the planner thought there were 1.7M distinct values, it wouldn't have
used hash aggregate ... but it only thinks there are 200, which IIRC is
the default assumption.  Have you ANALYZEd this table lately?

Meanwhile, I'd strongly recommend turning off OOM kill.  That's got to
be the single worst design decision in the entire Linux kernel.
        regards, tom lane

Re: PostgreSQL 8.0.6 crash

From

"Mark Woodward"

Date:

09 February 2006, 11:48:28

> "Mark Woodward" <pgsql@mohawksoft.com> writes:
>>          ->  HashAggregate  (cost=106527.68..106528.68 rows=200
>> width=32)
>>                Filter: (count(ucode) > 1)
>>                ->  Seq Scan on cdtitles  (cost=0.00..96888.12
>> rows=1927912
>> width=32)
>
>> Well, shouldn't hash aggregate respect work memory limits?
>
> If the planner thought there were 1.7M distinct values, it wouldn't have
> used hash aggregate ... but it only thinks there are 200, which IIRC is
> the default assumption.  Have you ANALYZEd this table lately?

I thought that I had, but I did CLUSTER at some point. Or maybe I didn't
I'm, not sure. I have been working on a file reader/parser/importer
program.  I created and dropped the DB so many times it is hard to keep
track. Still, I would say that is is extremly bad behavior for not having
stats, wouldn't you think?

>
> Meanwhile, I'd strongly recommend turning off OOM kill.  That's got to
> be the single worst design decision in the entire Linux kernel.

How is this any different than the FreeBSD having a default 512M process
size limit? On FreeBSD, the process would have been killed earlier.

Re: PostgreSQL 8.0.6 crash

From

Tom Lane

Date:

09 February 2006, 11:53:37

"Mark Woodward" <pgsql@mohawksoft.com> writes:
> Still, I would say that is is extremly bad behavior for not having
> stats, wouldn't you think?

Think of it as a kernel bug.

>> Meanwhile, I'd strongly recommend turning off OOM kill.  That's got to
>> be the single worst design decision in the entire Linux kernel.

> How is this any different than the FreeBSD having a default 512M process
> size limit? On FreeBSD, the process would have been killed earlier.

No, the process would have been politely told it was out of memory, and
would have told you the same.  If the kernel's way of notifying a
process that it's out of memory is SIGKILL, there is not a damn thing
that we can do to operate robustly.
        regards, tom lane

Re: PostgreSQL 8.0.6 crash

From

"Mark Woodward"

Date:

09 February 2006, 12:17:03

> "Mark Woodward" <pgsql@mohawksoft.com> writes:
>> Still, I would say that is is extremly bad behavior for not having
>> stats, wouldn't you think?
>
> Think of it as a kernel bug.

While I respect your viewpoint that the Linux kernel should not kill an
offending process if the system runs out of memory, I sort of disagree in
that OOM is a disaster preventor. It should be viewed as a last ditch "him
or me" choice the kernel needs to make and it should not get into that
position in the first place.

Regardless, it is troubling that failing to have current stats can cause
the system, with a large data set, to exceed working memory limits.

I think it is still a bug. While it may manifest itself as a pg crash on
Linux because of a feature with which you have issue, the fact remains
that PG is exeeding its working memory limit.

Should failing to run ANALYZE cause this behavior?
If so, how does this get clearly documented?
If not, can it be prevented?

>
>>> Meanwhile, I'd strongly recommend turning off OOM kill.  That's got to
>>> be the single worst design decision in the entire Linux kernel.
>
>> How is this any different than the FreeBSD having a default 512M process
>> size limit? On FreeBSD, the process would have been killed earlier.
>
> No, the process would have been politely told it was out of memory, and
> would have told you the same.  If the kernel's way of notifying a
> process that it's out of memory is SIGKILL, there is not a damn thing
> that we can do to operate robustly.

Lets not waste time on a Linux discussion. Linux and FreeBSD have their
strengths, and a debate on the dubious merits of either is a long and
contentious debate. Both systems are fine, just with some subtle
differences in design goals.

Re: PostgreSQL 8.0.6 crash

From

Andrew Dunstan

Date:

09 February 2006, 12:18:07

Tom Lane wrote:

>"Mark Woodward" <pgsql@mohawksoft.com> writes:
>  
>
>>Still, I would say that is is extremly bad behavior for not having
>>stats, wouldn't you think?
>>    
>>
>
>Think of it as a kernel bug.
>
>  
>
>>>Meanwhile, I'd strongly recommend turning off OOM kill.  That's got to
>>>be the single worst design decision in the entire Linux kernel.
>>>      
>>>
>
>  
>
>>How is this any different than the FreeBSD having a default 512M process
>>size limit? On FreeBSD, the process would have been killed earlier.
>>    
>>
>
>No, the process would have been politely told it was out of memory, and
>would have told you the same.  If the kernel's way of notifying a
>process that it's out of memory is SIGKILL, there is not a damn thing
>that we can do to operate robustly.
>  
>

And we have docco on it: 
http://www.postgresql.org/docs/current/static/kernel-resources.html#AEN18105 
which I assume is still current. Back in October I mentioned the OOM 
killer to Andrew Morton - his reaction was a very pained look and a curt 
admonition: "turn it off".

cheers

andrew

Re: PostgreSQL 8.0.6 crash

From

Tom Lane

Date:

09 February 2006, 12:58:15

"Mark Woodward" <pgsql@mohawksoft.com> writes:
> I think it is still a bug. While it may manifest itself as a pg crash on
> Linux because of a feature with which you have issue, the fact remains
> that PG is exeeding its working memory limit.

The problem is that *we have no way to know what that limit is* ---
short of exceeding it and being summarily killed.  (BTW, the kernel
doesn't know what the limit is either.)  There is simply not any way
to operate robustly under the OOM-kill regime.

While I'll certainly acknowledge that it'd be nice if hashagg had
spill-to-disk capability, that wouldn't alter the fundamental fact that
if you want reliable behavior you MUST turn off OOM kill.  There is not
anything we can do at the database level to work around that kernel-level
misdesign.
        regards, tom lane

Re: PostgreSQL 8.0.6 crash

From

"Mark Woodward"

Date:

09 February 2006, 14:02:30

> "Mark Woodward" <pgsql@mohawksoft.com> writes:
>> I think it is still a bug. While it may manifest itself as a pg crash on
>> Linux because of a feature with which you have issue, the fact remains
>> that PG is exeeding its working memory limit.
>
> The problem is that *we have no way to know what that limit is* ---
> short of exceeding it and being summarily killed.  (BTW, the kernel
> doesn't know what the limit is either.)  There is simply not any way
> to operate robustly under the OOM-kill regime.

No, you misunderstand what I said, the "working memory" as defined in
postgresql.conf. I don't care about the OS debate.

>
> While I'll certainly acknowledge that it'd be nice if hashagg had
> spill-to-disk capability, that wouldn't alter the fundamental fact that
> if you want reliable behavior you MUST turn off OOM kill.  There is not
> anything we can do at the database level to work around that kernel-level
> misdesign.

Again, regardless of OS used, hashagg will exceed "working memory" as
defined in postgresql.conf.

At issue is would a lack of ANALYZE justify this behavior? If so, it
should be documented.

Re: PostgreSQL 8.0.6 crash

From

Tom Lane

Date:

09 February 2006, 14:13:20

"Mark Woodward" <pgsql@mohawksoft.com> writes:
> Again, regardless of OS used, hashagg will exceed "working memory" as
> defined in postgresql.conf.

So?  If you've got OOM kill enabled, it can zap a process whether it's
strictly adhered to work_mem or not.  The OOM killer is entirely capable
of choosing a victim process whose memory footprint hasn't changed
materially since it started (eg, the postmaster).
        regards, tom lane

Re: PostgreSQL 8.0.6 crash

From

Stephen Frost

Date:

09 February 2006, 14:21:17

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> "Mark Woodward" <pgsql@mohawksoft.com> writes:
> > Again, regardless of OS used, hashagg will exceed "working memory" as
> > defined in postgresql.conf.
>
> So?  If you've got OOM kill enabled, it can zap a process whether it's
> strictly adhered to work_mem or not.  The OOM killer is entirely capable
> of choosing a victim process whose memory footprint hasn't changed
> materially since it started (eg, the postmaster).

Unless I've missed something here, disabling the OOM killer doesn't
really solve the problem here.  What solves the problem is running
ANALYZE but it's still certainly the case that it would make some sense
for the Postmaster, upon realizing that it's gone well beyond its
work_mem boundary, to ideally switch plans to one which isn't going to
exceed its work_mem limit.  Less ideally, it could give up and issue an
error to the user instead of running the box out of memory.

I appriciate that this is probably not very easy to implement but I
do believe the current situation could be improved in this regard.
Thanks,
    Stephen

Re: PostgreSQL 8.0.6 crash

From

"Mark Woodward"

Date:

09 February 2006, 14:37:44

> "Mark Woodward" <pgsql@mohawksoft.com> writes:
>> Again, regardless of OS used, hashagg will exceed "working memory" as
>> defined in postgresql.conf.
>
> So?  If you've got OOM kill enabled, it can zap a process whether it's
> strictly adhered to work_mem or not.  The OOM killer is entirely capable
> of choosing a victim process whose memory footprint hasn't changed
> materially since it started (eg, the postmaster).

Sorry, I must strongly disagree here. The postgresql.conf "working mem" is
a VERY IMPORTANT setting, it is intended to limit the consumption of
memory by the postgresql process. Often times PostgreSQL will work along
side other application servers on the same system, infact, may be a
sub-part of application servers on the same system. (This is, in fact, how
it is used on one of my site servers.)

Clearly, if the server will use 1000 times this number (Set for 1024K, but
exceeds 1G) this is broken, and it may cause other systems to fail or
perform very poorly.

If it is not something that can be fixed, it should be clearly documented.

Re: PostgreSQL 8.0.6 crash

From

"Jim C. Nasby"

Date:

09 February 2006, 15:11:23

On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote:
> > "Mark Woodward" <pgsql@mohawksoft.com> writes:
> >> Again, regardless of OS used, hashagg will exceed "working memory" as
> >> defined in postgresql.conf.
> >
> > So?  If you've got OOM kill enabled, it can zap a process whether it's
> > strictly adhered to work_mem or not.  The OOM killer is entirely capable
> > of choosing a victim process whose memory footprint hasn't changed
> > materially since it started (eg, the postmaster).
> 
> Sorry, I must strongly disagree here. The postgresql.conf "working mem" is
> a VERY IMPORTANT setting, it is intended to limit the consumption of
> memory by the postgresql process. Often times PostgreSQL will work along

Actually, no, it's not designed for that at all.

> side other application servers on the same system, infact, may be a
> sub-part of application servers on the same system. (This is, in fact, how
> it is used on one of my site servers.)
> 
> Clearly, if the server will use 1000 times this number (Set for 1024K, but
> exceeds 1G) this is broken, and it may cause other systems to fail or
> perform very poorly.
> 
> If it is not something that can be fixed, it should be clearly documented.

work_mem (integer)
   Specifies the amount of memory to be used by internal sort   operations and hash tables before switching to
temporarydisk files.   The value is specified in kilobytes, and defaults to 1024 kilobytes   (1 MB). Note that for a
complexquery, several sort or hash   operations might be running in parallel; each one will be allowed to   use as much
memoryas this value specifies before it starts to put   data into temporary files. Also, several running sessions could
be  doing such operations concurrently. So the total memory used could   be many times the value of work_mem; it is
necessaryto keep this   fact in mind when choosing the value. Sort operations are used for   ORDER BY, DISTINCT, and
mergejoins. Hash tables are used in hash   joins, hash-based aggregation, and hash-based processing of IN   subqueries.

So it says right there that it's very easy to exceed work_mem by a very
large amount. Granted, this is a very painful problem to deal with and
will hopefully be changed at some point, but it's pretty clear as to how
this works.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: PostgreSQL 8.0.6 crash

From

"Jim C. Nasby"

Date:

09 February 2006, 15:15:45

On Thu, Feb 09, 2006 at 11:42:57AM -0500, Mark Woodward wrote:
> > "Mark Woodward" <pgsql@mohawksoft.com> writes:
> >> Still, I would say that is is extremly bad behavior for not having
> >> stats, wouldn't you think?
> >
> > Think of it as a kernel bug.
> 
> While I respect your viewpoint that the Linux kernel should not kill an
> offending process if the system runs out of memory, I sort of disagree in
> that OOM is a disaster preventor. It should be viewed as a last ditch "him
> or me" choice the kernel needs to make and it should not get into that
> position in the first place.

I've had processes run away on a FreeBSD box before, to the extent of
running entirely out of swap and memory. Instead of random processes
just dying for no apparent reason, I instead started getting a bunch of
out-of-memory errors. No disaster, I just fixed the problem and life
went on.

Well, ok, the box did become rather unresponsive when my fix for the
problem meant that all the sudden there were about 950 perl processes
trying to run at the same time. I wish I'd captured top showing 900+
runnable processes. But after a few minutes the processes started
completing and exiting and everything was soon back to normal. I rather
doubt Linux would survive that...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: PostgreSQL 8.0.6 crash

From

Stephen Frost

Date:

09 February 2006, 15:16:26

* Jim C. Nasby (jnasby@pervasive.com) wrote:
> On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote:
> > If it is not something that can be fixed, it should be clearly documented.
>
> work_mem (integer)
>
>     Specifies the amount of memory to be used by internal sort
>     operations and hash tables before switching to temporary disk files.
>     The value is specified in kilobytes, and defaults to 1024 kilobytes
>     (1 MB). Note that for a complex query, several sort or hash
>     operations might be running in parallel; each one will be allowed to
>     use as much memory as this value specifies before it starts to put
>     data into temporary files. Also, several running sessions could be
>     doing such operations concurrently. So the total memory used could
>     be many times the value of work_mem; it is necessary to keep this
>     fact in mind when choosing the value. Sort operations are used for
>     ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash
>     joins, hash-based aggregation, and hash-based processing of IN
>     subqueries.
>
> So it says right there that it's very easy to exceed work_mem by a very
> large amount. Granted, this is a very painful problem to deal with and
> will hopefully be changed at some point, but it's pretty clear as to how
> this works.

It also says that when it goes over, it'll spill to disk.  Additionally,
we're talking about one hash here, not multiple ones.  It seems at least
misleading as, if I understand correctly, Postgres isn't actually
actively checking to see if the amount of memory used by an in-progress
hash creation has gone over the limit but rather it guesses at how much
memory will be used during the planning stage to decide if a hash plan
is possible or not.  That guess can certainly be wrong but there's
nothing in place to handle the situation where the guess is wrong...
Thanks,
    Stephen

Re: PostgreSQL 8.0.6 crash

From

"Mark Woodward"

Date:

09 February 2006, 15:26:25

> On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote:
>> > "Mark Woodward" <pgsql@mohawksoft.com> writes:
>> >> Again, regardless of OS used, hashagg will exceed "working memory" as
>> >> defined in postgresql.conf.
>> >
>> > So?  If you've got OOM kill enabled, it can zap a process whether it's
>> > strictly adhered to work_mem or not.  The OOM killer is entirely
>> capable
>> > of choosing a victim process whose memory footprint hasn't changed
>> > materially since it started (eg, the postmaster).
>>
>> Sorry, I must strongly disagree here. The postgresql.conf "working mem"
>> is
>> a VERY IMPORTANT setting, it is intended to limit the consumption of
>> memory by the postgresql process. Often times PostgreSQL will work along
>
> Actually, no, it's not designed for that at all.

I guess that's a matter of opinion.

>
>> side other application servers on the same system, infact, may be a
>> sub-part of application servers on the same system. (This is, in fact,
>> how
>> it is used on one of my site servers.)
>>
>> Clearly, if the server will use 1000 times this number (Set for 1024K,
>> but
>> exceeds 1G) this is broken, and it may cause other systems to fail or
>> perform very poorly.
>>
>> If it is not something that can be fixed, it should be clearly
>> documented.
>
> work_mem (integer)
>
>     Specifies the amount of memory to be used by internal sort
>     operations and hash tables before switching to temporary disk files.
>     The value is specified in kilobytes, and defaults to 1024 kilobytes
>     (1 MB). Note that for a complex query, several sort or hash
>     operations might be running in parallel; each one will be allowed to
>     use as much memory as this value specifies before it starts to put
>     data into temporary files. Also, several running sessions could be
>     doing such operations concurrently. So the total memory used could
>     be many times the value of work_mem; it is necessary to keep this
>     fact in mind when choosing the value. Sort operations are used for
>     ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash
>     joins, hash-based aggregation, and hash-based processing of IN
>     subqueries.
>
> So it says right there that it's very easy to exceed work_mem by a very
> large amount. Granted, this is a very painful problem to deal with and
> will hopefully be changed at some point, but it's pretty clear as to how
> this works.

Well, if you read that paragraph carefully, I'll admit that I was a little
too literal in my statement apliying it to the "process" and not specific
functions within the process, but in the documentation:

"each one will be allowed to use as much memory as this value specifies
before it starts to put data into temporary files."

According to the documentation the behavior of hashagg is broken. It did
not use up to this amount and then start to use temporary files, it used
1000 times this limit and was killed by the OS.

I think it should be documented as the behavior is unpredictable.

Re: PostgreSQL 8.0.6 crash

From

Greg Stark

Date:

09 February 2006, 15:32:03

Stephen Frost <sfrost@snowman.net> writes:

> Unless I've missed something here, disabling the OOM killer doesn't
> really solve the problem here.  

Well in a way it does. Postgres would get an out-of-memory error from malloc
which it would handle properly and the world would be happy.

Except not quite, since I think an out of memory error still means that
backend exits instead of just that query failing. That means if you have an
application running such as apache then all subsequent transactions on that
connection fail too, instead of just the transaction that misbehaved.

And as the other poster mentioned, having Postgres use up every available byte
of memory isn't really very friendly to anything else running on the box.

It doesn't seem like a bad idea to have a max_memory parameter that if a
backend ever exceeded it would immediately abort the current transaction. That
would let an application continue operating normally after getting an error.

-- 
greg

Re: PostgreSQL 8.0.6 crash

From

Tom Lane

Date:

09 February 2006, 15:35:41

Greg Stark <gsstark@mit.edu> writes:
> Except not quite, since I think an out of memory error still means that
> backend exits instead of just that query failing.

Not at all!  PG will recover from this perfectly well ... if it's given
the opportunity, rather than being SIGKILLed.

> It doesn't seem like a bad idea to have a max_memory parameter that if a
> backend ever exceeded it would immediately abort the current
> transaction.

See ulimit (or local equivalent).
        regards, tom lane

Re: PostgreSQL 8.0.6 crash

From

Alvaro Herrera

Date:

09 February 2006, 15:52:35

Greg Stark wrote:

> Well in a way it does. Postgres would get an out-of-memory error from malloc
> which it would handle properly and the world would be happy.
> 
> Except not quite, since I think an out of memory error still means that
> backend exits instead of just that query failing.

Not at all -- the transaction is aborted, but the backend can continue
working perfectly fine.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: PostgreSQL 8.0.6 crash

From

Stephen Frost

Date:

09 February 2006, 15:52:50

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > It doesn't seem like a bad idea to have a max_memory parameter that if a
> > backend ever exceeded it would immediately abort the current
> > transaction.
>
> See ulimit (or local equivalent).

As much as setting ulimit in shell scripts is fun, I have to admit that
I really don't see it happening very much.  Having Postgres set a ulimit
for itself may not be a bad idea and would perhaps provide a "least
suprise" for new users.  Perhaps shared_buffers + 10*work_mem +
maintenance_work_mem + max_stack_depth?  Then errors from running out of
memory could provide a 'HINT: Memory consumption went well over allowed
work_mem, perhaps you need to run ANALYZE or raise work_mem?'.
Just some thoughts,
    Stephen

Re: PostgreSQL 8.0.6 crash

From

Greg Stark

Date:

09 February 2006, 16:13:35

Stephen Frost <sfrost@snowman.net> writes:

> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > Greg Stark <gsstark@mit.edu> writes:
> > > It doesn't seem like a bad idea to have a max_memory parameter that if a
> > > backend ever exceeded it would immediately abort the current
> > > transaction.
> > 
> > See ulimit (or local equivalent).
> 
> As much as setting ulimit in shell scripts is fun, I have to admit that
> I really don't see it happening very much.  

For one thing it requires admin access to the startup scripts to arrange this.
And it's always cluster-wide.

Having a GUC parameter would mean it could be set per-session. Even if the GUC
parameter were just implemented by calling setrlimit it might be useful.

-- 
greg

Re: PostgreSQL 8.0.6 crash

From

"Mark Woodward"

Date:

09 February 2006, 16:19:47

> Stephen Frost <sfrost@snowman.net> writes:
>
>> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> > Greg Stark <gsstark@mit.edu> writes:
>> > > It doesn't seem like a bad idea to have a max_memory parameter that
>> if a
>> > > backend ever exceeded it would immediately abort the current
>> > > transaction.
>> >
>> > See ulimit (or local equivalent).
>>
>> As much as setting ulimit in shell scripts is fun, I have to admit that
>> I really don't see it happening very much.
>
> For one thing it requires admin access to the startup scripts to arrange
> this.
> And it's always cluster-wide.
>
> Having a GUC parameter would mean it could be set per-session. Even if the
> GUC
> parameter were just implemented by calling setrlimit it might be useful.
>

I don't think it needs a new GUC parameter, just having hashagg respect
work_mem would fix the problem.

Re: PostgreSQL 8.0.6 crash

From

Martijn van Oosterhout

Date:

09 February 2006, 17:58:03

On Thu, Feb 09, 2006 at 02:35:34PM -0500, Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > Except not quite, since I think an out of memory error still means that
> > backend exits instead of just that query failing.
>
> Not at all!  PG will recover from this perfectly well ... if it's given
> the opportunity, rather than being SIGKILLed.

FWIW, the problem is mainly from the situation where some process
accesses a piece of memory that has been swapped out, but there is no
memory available to swap the page in. Or write to a page marked
copy-on-write. What do you do? There's is no way to return -ENOMEM from
a normal memory access and PostgreSQL wouldn't handle that anyway.

When people talk about disabling the OOM killer, it doesn't stop the
SIGKILL behaviour, it just causes the kernel to return -ENOMEM for
malloc() much much earlier... (ie when you still actually have memory
available).

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: PostgreSQL 8.0.6 crash

From

Rick Gigger

Date:

09 February 2006, 18:01:50

On Feb 9, 2006, at 11:22 AM, Stephen Frost wrote:

> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> "Mark Woodward" <pgsql@mohawksoft.com> writes:
>>> Again, regardless of OS used, hashagg will exceed "working  
>>> memory" as
>>> defined in postgresql.conf.
>>
>> So?  If you've got OOM kill enabled, it can zap a process whether  
>> it's
>> strictly adhered to work_mem or not.  The OOM killer is entirely  
>> capable
>> of choosing a victim process whose memory footprint hasn't changed
>> materially since it started (eg, the postmaster).
>
> Unless I've missed something here, disabling the OOM killer doesn't
> really solve the problem here.  What solves the problem is running
> ANALYZE but it's still certainly the case that it would make some  
> sense
> for the Postmaster, upon realizing that it's gone well beyond its
> work_mem boundary, to ideally switch plans to one which isn't going to
> exceed its work_mem limit.  Less ideally, it could give up and  
> issue an
> error to the user instead of running the box out of memory.

So is the work_mem paramater that is set not strictly enforced?  Is  
it more like a suggestions as to what it SHOULD use and not a hard  
limit on how much memory the each process is ALLOWED to use?

If his work_mem is set to 1 mb and that process is using 500 mb for  
tasks that are supposed to stay in work_mem then doesn't that mean  
that that limit is not really a hard limit but rather a suggestion?

Rick

Re: PostgreSQL 8.0.6 crash

From

Tom Lane

Date:

09 February 2006, 18:04:48

Martijn van Oosterhout <kleptog@svana.org> writes:
> When people talk about disabling the OOM killer, it doesn't stop the
> SIGKILL behaviour,

Yes it does, because the situation will never arise.

> it just causes the kernel to return -ENOMEM for
> malloc() much much earlier... (ie when you still actually have memory
> available).

Given the current price of disk, there is no sane reason not to have
enough swap space configured to make this not-a-problem.  The OOM kill
mechanism was a reasonable solution for running systems that were not
expected to be too reliable anyway on small hardware, but if you're
trying to run a 24/7 server you're simply incompetent if you don't
disable it.
        regards, tom lane

Re: PostgreSQL 8.0.6 crash

From

"Mark Woodward"

Date:

09 February 2006, 19:27:26

> Martijn van Oosterhout <kleptog@svana.org> writes:
>> When people talk about disabling the OOM killer, it doesn't stop the
>> SIGKILL behaviour,
>
> Yes it does, because the situation will never arise.
>
>> it just causes the kernel to return -ENOMEM for
>> malloc() much much earlier... (ie when you still actually have memory
>> available).
>
> Given the current price of disk, there is no sane reason not to have
> enough swap space configured to make this not-a-problem.  The OOM kill
> mechanism was a reasonable solution for running systems that were not
> expected to be too reliable anyway on small hardware, but if you're
> trying to run a 24/7 server you're simply incompetent if you don't
> disable it.

And people say I have STRONG opinions. Don't hold back Tom, let us know
what you really think.

Re: PostgreSQL 8.0.6 crash

From

Ernst Herzberg

Date:

09 February 2006, 20:23:12

On Friday 10 February 2006 00:53, Mark Woodward wrote:
> > Martijn van Oosterhout <kleptog@svana.org> writes:
> >> When people talk about disabling the OOM killer, it doesn't stop the
> >> SIGKILL behaviour,
> >
> > Yes it does, because the situation will never arise.
> >
> >> it just causes the kernel to return -ENOMEM for
> >> malloc() much much earlier... (ie when you still actually have memory
> >> available).
> >
> > Given the current price of disk, there is no sane reason not to have
> > enough swap space configured to make this not-a-problem.  The OOM kill
> > mechanism was a reasonable solution for running systems that were not
> > expected to be too reliable anyway on small hardware, but if you're
> > trying to run a 24/7 server you're simply incompetent if you don't
> > disable it.
>
> And people say I have STRONG opinions. Don't hold back Tom, let us know
> what you really think.

Read 
http://linux-mm.org/OverCommitAccounting
or 
file://usr/src/linux/Documentation/vm/overcommit-accounting

It is a good idea to have enough swap space. If not, set 
vm.overcommit_memory=2

Re: PostgreSQL 8.0.6 crash

From

Rick Gigger

Date:

09 February 2006, 20:28:17

On Feb 9, 2006, at 12:49 PM, Mark Woodward wrote:

>> On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote:
>>>> "Mark Woodward" <pgsql@mohawksoft.com> writes:
>>>>> Again, regardless of OS used, hashagg will exceed "working  
>>>>> memory" as
>>>>> defined in postgresql.conf.
>>>>
>>>> So?  If you've got OOM kill enabled, it can zap a process  
>>>> whether it's
>>>> strictly adhered to work_mem or not.  The OOM killer is entirely
>>> capable
>>>> of choosing a victim process whose memory footprint hasn't changed
>>>> materially since it started (eg, the postmaster).
>>>
>>> Sorry, I must strongly disagree here. The postgresql.conf  
>>> "working mem"
>>> is
>>> a VERY IMPORTANT setting, it is intended to limit the consumption of
>>> memory by the postgresql process. Often times PostgreSQL will  
>>> work along
>>
>> Actually, no, it's not designed for that at all.
>
> I guess that's a matter of opinion.
>
>>
>>> side other application servers on the same system, infact, may be a
>>> sub-part of application servers on the same system. (This is, in  
>>> fact,
>>> how
>>> it is used on one of my site servers.)
>>>
>>> Clearly, if the server will use 1000 times this number (Set for  
>>> 1024K,
>>> but
>>> exceeds 1G) this is broken, and it may cause other systems to  
>>> fail or
>>> perform very poorly.
>>>
>>> If it is not something that can be fixed, it should be clearly
>>> documented.
>>
>> work_mem (integer)
>>
>>     Specifies the amount of memory to be used by internal sort
>>     operations and hash tables before switching to temporary disk  
>> files.
>>     The value is specified in kilobytes, and defaults to 1024  
>> kilobytes
>>     (1 MB). Note that for a complex query, several sort or hash
>>     operations might be running in parallel; each one will be  
>> allowed to
>>     use as much memory as this value specifies before it starts to  
>> put
>>     data into temporary files. Also, several running sessions  
>> could be
>>     doing such operations concurrently. So the total memory used  
>> could
>>     be many times the value of work_mem; it is necessary to keep this
>>     fact in mind when choosing the value. Sort operations are used  
>> for
>>     ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash
>>     joins, hash-based aggregation, and hash-based processing of IN
>>     subqueries.
>>
>> So it says right there that it's very easy to exceed work_mem by a  
>> very
>> large amount. Granted, this is a very painful problem to deal with  
>> and
>> will hopefully be changed at some point, but it's pretty clear as  
>> to how
>> this works.
>
> Well, if you read that paragraph carefully, I'll admit that I was a  
> little
> too literal in my statement apliying it to the "process" and not  
> specific
> functions within the process, but in the documentation:
>
> "each one will be allowed to use as much memory as this value  
> specifies
> before it starts to put data into temporary files."
>
> According to the documentation the behavior of hashagg is broken.  
> It did
> not use up to this amount and then start to use temporary files, it  
> used
> 1000 times this limit and was killed by the OS.
>
> I think it should be documented as the behavior is unpredictable.

It seems to me that the solution for THIS INCIDENT is to run an  
analyze.  That should fix the problem at hand.  I have nothing to say  
about the OOM issue except that hopefully the analyze will prevent  
him from running out of memory at all.

However if hashagg truly does not obey the limit that is supposed to  
be imposed by work_mem then it really ought to be documented.  Is  
there a misunderstanding here and it really does obey it?  Or is  
hashagg an exception but the other work_mem associated operations  
work fine?  Or is it possible for them all to go out of bounds?

Even if you've got 100 terabyts of swap space though if seems like if  
your system is very heavy on reads then you would really want that  
single backend to start using up your disk space and leave your  
memory alone so that most of your data can stay cached and largely  
unaffeted by the problem of one backend.

If your bottleneck is writing to the disk then it doesn't really seem  
to matter.  You just need to make sure that huge out of control  
hashagg never occurs.  If your disks get saturated with writes  
because of the hashagg of one backend then all other processes that  
need to write a lot of info to disk are going to come to a grinding  
halt and queries are not going to complete quickly and build up and  
you will have a huge mess on your hands that will essentially prevent  
postgres from being able to do it's job even if it doesn't actually  
die.  In this situation disk bandwidth is a scarce commodity and  
whether you let the OS handle it all with virtual memory or you let  
postgres swap everything out to disc for that one operation you are  
still using disc to make up for a lack of RAM.  At some point you  
you've either got to stock up on enough RAM to run your queries  
properly or alter how your queries run to use less RAM.  Having a  
process go out of control in resource usage is going to cause big  
problems one way or another.

Re: PostgreSQL 8.0.6 crash

From

Tom Lane

Date:

09 February 2006, 20:48:03

Rick Gigger <rick@alpinenetworking.com> writes:
> However if hashagg truly does not obey the limit that is supposed to  
> be imposed by work_mem then it really ought to be documented.  Is  
> there a misunderstanding here and it really does obey it?  Or is  
> hashagg an exception but the other work_mem associated operations  
> work fine?  Or is it possible for them all to go out of bounds?

hashagg is the exception.  It should be fixed, not documented, but no
one's got round to that.

One point to consider is that if the planner's estimate is as far off
as exhibited in the OP's example, a hashagg that does spill to disk
is likely to take so long that he'll be back here complaining that
the query never terminates ;-).  In most practical situations, I think
exceeding work_mem is really the best solution, as long as it's not
by more than 10x or 100x.  It's when the estimate is off by many
orders of magnitude that you've got a problem.  Running out of memory
is not necessarily the worst response ... as long as the system doesn't
kill the process in response to that.
        regards, tom lane

Re: PostgreSQL 8.0.6 crash

From

"Mark Woodward"

Date:

10 February 2006, 10:31:07

> Rick Gigger <rick@alpinenetworking.com> writes:
>> However if hashagg truly does not obey the limit that is supposed to
>> be imposed by work_mem then it really ought to be documented.  Is
>> there a misunderstanding here and it really does obey it?  Or is
>> hashagg an exception but the other work_mem associated operations
>> work fine?  Or is it possible for them all to go out of bounds?
>
> hashagg is the exception.  It should be fixed, not documented, but no
> one's got round to that.

Well, it is clearly a pathalogical condition. Fixed? Sure, but someone
should document it so that others don't stumble across it.

>
> One point to consider is that if the planner's estimate is as far off
> as exhibited in the OP's example, a hashagg that does spill to disk
> is likely to take so long that he'll be back here complaining that
> the query never terminates ;-).

That's not fair, now is it? This isn't about the OP (me), it is about
PostgreSQL behaving badly.

> In most practical situations, I think
> exceeding work_mem is really the best solution, as long as it's not
> by more than 10x or 100x.  It's when the estimate is off by many
> orders of magnitude that you've got a problem.  Running out of memory
> is not necessarily the worst response ... as long as the system doesn't
> kill the process in response to that.

I don't agree with you here. Many PostgreSQL installations use PostgreSQL
as part of a larger whole. Adjusting "work_mem" should give the admin some
control over the memory footprint of the system. It is documented as the
limit a specific function path will use before spilling to disk.

I set up a lot of systems and I write a lot of software that uses
PostgreSQL. Periodically I run across features/problems/limitations of
PostgreSQL and post them.

This was/is an example of where the behavior of PostgreSQL is clearly
unacceptable. OK, yes, this problem goes away with an ANALYZE, but it
isn't clear how anyone could have known this, and unexpected behavior is
bad in any product.

In your statement, "he'll be back here complaining that the query never
terminates," that's not true. A long query typically gets examined with
explain (or in Oracle, explain plan) and evaluated from there. When the
process exhibits runaway memory use, that's a problem.

Re: PostgreSQL 8.0.6 crash

From

"Jim C. Nasby"

Date:

10 February 2006, 11:55:32

On Thu, Feb 09, 2006 at 03:13:22PM -0500, Greg Stark wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> 
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > > Greg Stark <gsstark@mit.edu> writes:
> > > > It doesn't seem like a bad idea to have a max_memory parameter that if a
> > > > backend ever exceeded it would immediately abort the current
> > > > transaction.
> > > 
> > > See ulimit (or local equivalent).
> > 
> > As much as setting ulimit in shell scripts is fun, I have to admit that
> > I really don't see it happening very much.  
> 
> For one thing it requires admin access to the startup scripts to arrange this.
> And it's always cluster-wide.
> 
> Having a GUC parameter would mean it could be set per-session. Even if the GUC
> parameter were just implemented by calling setrlimit it might be useful.

Trying to tune work_mem is extremely difficult in PostgreSQL, because
you are constantly running the risk of sending the server into a
swap-storm. Having a set-able per-backend memory limit would allow a lot
more flexability in setting work_mem, because you could now ensure that
you wouldn't push the server into serious swapping.

Even better would be a means to set a cluster-wide memory limit, but of
course that's substantially more work.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: PostgreSQL 8.0.6 crash

From

"Jim C. Nasby"

Date:

10 February 2006, 12:01:25

On Thu, Feb 09, 2006 at 05:04:38PM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > When people talk about disabling the OOM killer, it doesn't stop the
> > SIGKILL behaviour,
> 
> Yes it does, because the situation will never arise.
> 
> > it just causes the kernel to return -ENOMEM for
> > malloc() much much earlier... (ie when you still actually have memory
> > available).
> 
> Given the current price of disk, there is no sane reason not to have
> enough swap space configured to make this not-a-problem.  The OOM kill
> mechanism was a reasonable solution for running systems that were not
> expected to be too reliable anyway on small hardware, but if you're
> trying to run a 24/7 server you're simply incompetent if you don't
> disable it.

BTW, I was shocked when I found out that FreeBSD actually has an OOM
killer itself. Yet I've never heard of anyone having problems with it.
Granted, the FreeBSD OOM could be better designed to pick the right
process to kill, but I'd bet that the real reason you never hear about
it is because FreeBSD admins are clued enough to a) setup a reasonable
amount of swap and b) do a better job of monitoring memory usage so that
you don't start swapping in the first place.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: PostgreSQL 8.0.6 crash

From

"Jim C. Nasby"

Date:

10 February 2006, 12:09:55

On Fri, Feb 10, 2006 at 09:57:12AM -0500, Mark Woodward wrote:
> > In most practical situations, I think
> > exceeding work_mem is really the best solution, as long as it's not
> > by more than 10x or 100x.  It's when the estimate is off by many
> > orders of magnitude that you've got a problem.  Running out of memory
> > is not necessarily the worst response ... as long as the system doesn't
> > kill the process in response to that.
> 
> I don't agree with you here. Many PostgreSQL installations use PostgreSQL
> as part of a larger whole. Adjusting "work_mem" should give the admin some
> control over the memory footprint of the system. It is documented as the
> limit a specific function path will use before spilling to disk.

And even when PostgreSQL has the server all to itself, having a hashagg
spill to disk is *way* better than pushing the machine into a swap
storm. At least if you spill the hashagg you only have one backend
running at a snail's pace; a swap storm means next to nothing gets done.

> This was/is an example of where the behavior of PostgreSQL is clearly
> unacceptable. OK, yes, this problem goes away with an ANALYZE, but it
> isn't clear how anyone could have known this, and unexpected behavior is
> bad in any product.

Care to submit a documentation patch before releases are bundled (I
think on Sunday?) At least then people would be aware that work_mem is
just a suggestion to hash_aggs. I'd do a patch myself but I doubt I'll
have time before the release. :(
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: PostgreSQL 8.0.6 crash

From

Martijn van Oosterhout

Date:

10 February 2006, 12:47:40

On Fri, Feb 10, 2006 at 10:01:18AM -0600, Jim C. Nasby wrote:
> BTW, I was shocked when I found out that FreeBSD actually has an OOM
> killer itself. Yet I've never heard of anyone having problems with it.
> Granted, the FreeBSD OOM could be better designed to pick the right
> process to kill, but I'd bet that the real reason you never hear about
> it is because FreeBSD admins are clued enough to a) setup a reasonable
> amount of swap and b) do a better job of monitoring memory usage so that
> you don't start swapping in the first place.

Hmm, I do wonder what FreeBSDs overcommit policy is. For example on my
computer right now the total allocated VM is approximately 3 times the
actual memory in the computer and about twice if you include swap. By a
strict policy of overcommit my computer wouldn't complete the boot
sequence, whereas as currently it runs without using any swap.

Disabling overcommit has a serious cost in that most of your VM will
never be used. Are people really suggesting that I can't run a few
daemons, X and a web-browser on FreeBSD without allocating 3 times my
physical memory in swap?

However, my real question is: while trying to find info about FreeBSDs
overcommit policy, I just get lot of people complaining about freebsd
killing random processes. Does anyone know a site that describes how it
works? I understand Linux's overcommit policy just fine.

Disclaimer: The Linux OOM killer has never killed the wrong process for
me, so I don't have any bad experiences with overcommit.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: PostgreSQL 8.0.6 crash

From

"Mark Woodward"

Date:

10 February 2006, 12:50:03

> On Fri, Feb 10, 2006 at 09:57:12AM -0500, Mark Woodward wrote:
>> > In most practical situations, I think
>> > exceeding work_mem is really the best solution, as long as it's not
>> > by more than 10x or 100x.  It's when the estimate is off by many
>> > orders of magnitude that you've got a problem.  Running out of memory
>> > is not necessarily the worst response ... as long as the system
>> doesn't
>> > kill the process in response to that.
>>
>> I don't agree with you here. Many PostgreSQL installations use
>> PostgreSQL
>> as part of a larger whole. Adjusting "work_mem" should give the admin
>> some
>> control over the memory footprint of the system. It is documented as the
>> limit a specific function path will use before spilling to disk.
>
> And even when PostgreSQL has the server all to itself, having a hashagg
> spill to disk is *way* better than pushing the machine into a swap
> storm. At least if you spill the hashagg you only have one backend
> running at a snail's pace; a swap storm means next to nothing gets done.
>
>> This was/is an example of where the behavior of PostgreSQL is clearly
>> unacceptable. OK, yes, this problem goes away with an ANALYZE, but it
>> isn't clear how anyone could have known this, and unexpected behavior is
>> bad in any product.
>
> Care to submit a documentation patch before releases are bundled (I
> think on Sunday?) At least then people would be aware that work_mem is
> just a suggestion to hash_aggs. I'd do a patch myself but I doubt I'll
> have time before the release. :(

I would be glad too. What's the process?

Re: PostgreSQL 8.0.6 crash

From

Tom Lane

Date:

10 February 2006, 13:21:36

Martijn van Oosterhout <kleptog@svana.org> writes:
> Disclaimer: The Linux OOM killer has never killed the wrong process for
> me, so I don't have any bad experiences with overcommit.

You haven't tried real hard.  What I've seen recently when I do something
that makes a PG backend go overboard is that the kernel zaps both the
misbehaving backend and the bgwriter process.  No idea what it's got
against the bgwriter, but the behavior's been pretty consistent under
recent Fedora 4 kernels ...

(This is on a development machine, not a server, so I'm not particularly
worried by leaving the default overcommit policy in place.  I wouldn't
do that on a server --- but it's not worth my time to change it on a
devel machine.)
        regards, tom lane

Re: PostgreSQL 8.0.6 crash

From

Greg Stark

Date:

10 February 2006, 14:05:43

Martijn van Oosterhout <kleptog@svana.org> writes:

> Disabling overcommit has a serious cost in that most of your VM will
> never be used. Are people really suggesting that I can't run a few
> daemons, X and a web-browser on FreeBSD without allocating 3 times my
> physical memory in swap?

There's a possibility you're misreading your process info there. The X server
maps large areas of graphics memory in its address space which shows up as
virtual memory used in ps on some OSes. I'm not sure if BSD is included there.
That's not memory that can ever be swapped out and it doesn't take up any
memory from the non-video memory.

That said, it's true that some applications allocate much more memory than
needed. There's a bit of a feedback loop here. Because application writers
know that OSes overcommit they don't worry about avoiding unnecessary
allocations. Sun's original Java memory management system just started off
with allocating 20MB before it ran a single bytecode for example. 

That's why merely allocating tons of swap doesn't necessarily protect you.
It's still possible for a process (or several processes if you allocate more
swap than you have address space) to mmap gigabytes of memory without touching
it and then start touching those pages. Hopefully the OOM killer targets the
offender but there's no real way for it to guarantee it.

-- 
greg

Re: PostgreSQL 8.0.6 crash

From

Tom Lane

Date:

10 February 2006, 15:08:56

Greg Stark <gsstark@mit.edu> writes:
> That's why merely allocating tons of swap doesn't necessarily protect you.
> It's still possible for a process (or several processes if you allocate more
> swap than you have address space) to mmap gigabytes of memory without touching
> it and then start touching those pages.

So?  If the swap exists to back that memory, there's no problem.  It
might be slow, but it will not fail.
        regards, tom lane

Re: PostgreSQL 8.0.6 crash

From

"Jim C. Nasby"

Date:

11 February 2006, 07:53:54

On Fri, Feb 10, 2006 at 12:16:04PM -0500, Mark Woodward wrote:
> > And even when PostgreSQL has the server all to itself, having a hashagg
> > spill to disk is *way* better than pushing the machine into a swap
> > storm. At least if you spill the hashagg you only have one backend
> > running at a snail's pace; a swap storm means next to nothing gets done.
> >
> >> This was/is an example of where the behavior of PostgreSQL is clearly
> >> unacceptable. OK, yes, this problem goes away with an ANALYZE, but it
> >> isn't clear how anyone could have known this, and unexpected behavior is
> >> bad in any product.
> >
> > Care to submit a documentation patch before releases are bundled (I
> > think on Sunday?) At least then people would be aware that work_mem is
> > just a suggestion to hash_aggs. I'd do a patch myself but I doubt I'll
> > have time before the release. :(
> 
> I would be glad too. What's the process?

Well, find the appropriate file in doc/src/sgml, make a copy, edit the
file, generate a diff with diff -u, and email that diff/patch to
pgsql-patches.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: PostgreSQL 8.0.6 crash

From

Greg Stark

Date:

11 February 2006, 13:20:17

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > That's why merely allocating tons of swap doesn't necessarily protect you.
> > It's still possible for a process (or several processes if you allocate more
> > swap than you have address space) to mmap gigabytes of memory without touching
> > it and then start touching those pages.
> 
> So?  If the swap exists to back that memory, there's no problem.  It
> might be slow, but it will not fail.

Sure, but there's no way to know how much swap you need. No matter how much
swap you allocate these processes can allocate more pages of untouched RAM and
then blow up.

Of course realistically allocating 4G of swap is enough to deal with something
like Postgres where you're not being maliciously attacked. One process on ia32
can't accidentally allocate more than 4G of ram.

I was just trying to clarify the situation since someone made some comment
about it having to do with memory being swapped out and then finding nowhere
to swap in when needed. That's not exactly what's happening.

-- 
greg

Re: PostgreSQL 8.0.6 crash

From

Bruce Momjian

Date:

11 February 2006, 13:26:50

Greg Stark wrote:
> 
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > Greg Stark <gsstark@mit.edu> writes:
> > > That's why merely allocating tons of swap doesn't necessarily protect you.
> > > It's still possible for a process (or several processes if you allocate more
> > > swap than you have address space) to mmap gigabytes of memory without touching
> > > it and then start touching those pages.
> > 
> > So?  If the swap exists to back that memory, there's no problem.  It
> > might be slow, but it will not fail.
> 
> Sure, but there's no way to know how much swap you need. No matter how much
> swap you allocate these processes can allocate more pages of untouched RAM and
> then blow up.
> 
> Of course realistically allocating 4G of swap is enough to deal with something
> like Postgres where you're not being maliciously attacked. One process on ia32
> can't accidentally allocate more than 4G of ram.
> 
> I was just trying to clarify the situation since someone made some comment
> about it having to do with memory being swapped out and then finding nowhere
> to swap in when needed. That's not exactly what's happening.

I guess the fundamental issue is whether Linux requires all mmap()'ed
file contents to be in memory, or whether it pushes data to disk and
unmaps it as it runs low on memory.  I don't know the answer to that.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: PostgreSQL 8.0.6 crash

From

Tom Lane

Date:

11 February 2006, 13:31:47

Greg Stark <gsstark@mit.edu> writes:
> I was just trying to clarify the situation since someone made some comment
> about it having to do with memory being swapped out and then finding nowhere
> to swap in when needed. That's not exactly what's happening.

No.  I believe the case that is actually hard for the kernel to predict
comes from copy-on-write: when a process forks, you could potentially
need twice its current memory image, but in reality you probably won't
ever need that much since many of the shared pages won't ever be written
by either process.  However, a non-overcommitting kernel must assume
that worst case, and hence fail the fork() if it doesn't have enough
swap space to cover both processes.  If it does not, then the crunch
comes when one process does touch a shared page.  If there is no
available swap space at that time, kill -9 is the only recourse, because
there is no way in the Unix API to fail a write to valid memory.

The reason for having a lot more swap space than you really need is just
to cover the potential demand from copy-on-write of pages that are
currently shared.  But given the price of disk these days, it's pretty
cheap insurance.
        regards, tom lane