Re: Searching for Duplicates and Hosed the System - Mailing list pgsql-general

From Tom Lane
Subject Re: Searching for Duplicates and Hosed the System
Date
Msg-id 21792.1187672695@sss.pgh.pa.us
Whole thread Raw
In response to Re: Searching for Duplicates and Hosed the System  (Bill Thoen <bthoen@gisnet.com>)
Responses Re: Searching for Duplicates and Hosed the System  (Decibel! <decibel@decibel.org>)
List pgsql-general
Bill Thoen <bthoen@gisnet.com> writes:
> I ran VACUUM ANALYZE just before I launched this and there were no other
> postgress jobs running. I'm the only user as well. I also ran EXPLAIN
> prior to the run and got this:

> Nested Loop  (cost=11.71..28800.34 rows=7219 width=584)
>    ->  Seq Scan on ers_regions e  (cost=0.00..71.90 rows=16 width=28)
>          Filter: (region = 1)
>    ->  Bitmap Heap Scan on compliance_2006 a  (cost=11.71..1788.76
> rows=451 width=584)
>          Recheck Cond: ((a.fips_st_cd = "outer".fips_st) AND
> (a.fips_cnty_cd = "outer".fips_cou))
>          ->  Bitmap Index Scan on key_tract  (cost=0.00..11.71 rows=451
> width=0)
>                Index Cond: ((a.fips_st_cd = "outer".fips_st) AND
> (a.fips_cnty_cd = "outer".fips_cou))
> (7 rows)

Do those estimated row counts look sane to you --- in particular the
estimate of 7219 rows out of the whole query?

AFAICS a plan of this shape simply cannot consume a huge amount of
memory on the server, no matter how badly off the rowcount estimates
are.  However, if it sends enough rows to the client, the *client* side
could be having a problem absorbing the data.  libpq is designed to
try to absorb the full result rowset --- the API it provides is not
amenable to partial result sets.

> I looked in the log and saw this:
> LOG:  transaction ID wrap limit is 1073746500, limited by database
> "postgres"
> LOG:  transaction ID wrap limit is 1073746500, limited by database
> "postgres"
> LOG:  could not send data to client: Broken pipe

Those first two messages are unrelated --- they are just routine
autovacuum output.  The third one says that a client process crashed.
So now I'm thinking that the memory-overrun problem was on the client
side.

If you need to deal with very large result sets, the standard advice
is to use a cursor so you can pull a few hundred or thousand rows
at a time via FETCH.

            regards, tom lane

pgsql-general by date:

Previous
From: "Pavel Stehule"
Date:
Subject: Re: Automated testing of functions
Next
From: Trinath Somanchi
Date:
Subject: Pgcluster 1.7 Fail safe !!!