Re: Searching for Duplicates and Hosed the System - Mailing list pgsql-general

From Tom Lane
Subject Re: Searching for Duplicates and Hosed the System
Date
Msg-id 19142.1187543991@sss.pgh.pa.us
Whole thread Raw
In response to Searching for Duplicates and Hosed the System  (Bill Thoen <bthoen@gisnet.com>)
Responses Re: Searching for Duplicates and Hosed the System  (Bill Thoen <bthoen@gisnet.com>)
List pgsql-general
Bill Thoen <bthoen@gisnet.com> writes:
> I knew this would take some time, but what I didn't expect was that about
> an hour into the select, my mouse and keyboard locked up and also I
> couldn't log in from another computer via SSH. This is a Linux machine
> running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on
> the disc too.

> I finally had to shut the power off and reboot to regain control of my
> computer (that wasn't good idea, either, but eventually I got everything
> working again.)

I've seen Fedora go nuts like that when it ran out of memory.  Once it
starts to swap heavily, performance goes into the tank; and once the
kernel realizes it's in memory trouble, it starts to kill processes
more or less at random.  That might explain why ssh stopped working.

One thing to do to make it more robust is to disable memory overcommit.
I suspect also that configuring it with lots of swap space is
counterproductive, because that just encourages the kernel to allow lots
of swapping.  I haven't actually experimented with that part though.

As for why PG ran the system out of memory, I suspect that the planner
drastically underestimated the number of groups to be created by your
GROUP BY, and thought it could get away with a hash aggregation.  We
don't currently have any provision for spilling hash aggregation to
disk, so if there's a very large number of groups the table just gets
very big :-(.  The planner is not supposed to choose hash agg if the
estimated table size exceeds work_mem ... but if it had out-of-date
statistics to work with it might have gotten the wrong answer.  Have
you ANALYZEd this table recently?  What does EXPLAIN show as the
estimated number of result rows?

            regards, tom lane

pgsql-general by date:

Previous
From: "Webb Sprague"
Date:
Subject: Re: Postgresql performance in production environment
Next
From: "Harpreet Dhaliwal"
Date:
Subject: Re: Transactional DDL