Re: Searching for Duplicates and Hosed the System - Mailing list pgsql-general

From Bill Thoen
Subject Re: Searching for Duplicates and Hosed the System
Date
Msg-id 20070819194018.GA16687@www.gisnet.com
Whole thread Raw
In response to Re: Searching for Duplicates and Hosed the System  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Searching for Duplicates and Hosed the System  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom, here's the "explain" results: Does this help explain what went wrong?
(And yes, I think there will be a *lot* of groups.)

explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr,
tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from
compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc;

                             QUERY PLAN
--------------------------------------------------------
 Sort  (cost=15119390.46..15123902.54 rows=1804832 width=160)
   Sort Key: count(*)
   ->  GroupAggregate  (cost=13782933.29..14301822.43 rows=1804832
width=160)
         ->  Sort  (cost=13782933.29..13828054.08 rows=18048318 width=160)
               Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr
               ->  Seq Scan on compliance_2006  (cost=0.00..1039927.18
rows=18048318 width=160)
(6 rows)



On Sun, Aug 19, 2007 at 01:19:51PM -0400, Tom Lane wrote:
> Bill Thoen <bthoen@gisnet.com> writes:
> > I knew this would take some time, but what I didn't expect was that about
> > an hour into the select, my mouse and keyboard locked up and also I
> > couldn't log in from another computer via SSH. This is a Linux machine
> > running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on
> > the disc too.
>
> > I finally had to shut the power off and reboot to regain control of my
> > computer (that wasn't good idea, either, but eventually I got everything
> > working again.)
>
> I've seen Fedora go nuts like that when it ran out of memory.  Once it
> starts to swap heavily, performance goes into the tank; and once the
> kernel realizes it's in memory trouble, it starts to kill processes
> more or less at random.  That might explain why ssh stopped working.
>
> One thing to do to make it more robust is to disable memory overcommit.
> I suspect also that configuring it with lots of swap space is
> counterproductive, because that just encourages the kernel to allow lots
> of swapping.  I haven't actually experimented with that part though.
>
> As for why PG ran the system out of memory, I suspect that the planner
> drastically underestimated the number of groups to be created by your
> GROUP BY, and thought it could get away with a hash aggregation.  We
> don't currently have any provision for spilling hash aggregation to
> disk, so if there's a very large number of groups the table just gets
> very big :-(.  The planner is not supposed to choose hash agg if the
> estimated table size exceeds work_mem ... but if it had out-of-date
> statistics to work with it might have gotten the wrong answer.  Have
> you ANALYZEd this table recently?  What does EXPLAIN show as the
> estimated number of result rows?
>
>             regards, tom lane
>

pgsql-general by date:

Previous
From: Tino Wildenhain
Date:
Subject: Re: Transactional DDL
Next
From: Michael Glaesemann
Date:
Subject: Re: entry log