Disk space is consumed by UPDATE query - Mailing list pgsql-bugs

From Patrick Clery
Subject Disk space is consumed by UPDATE query
Date
Msg-id 200409281603.45267.patrick@phpforhire.com
Whole thread Raw
Responses Re: Disk space is consumed by UPDATE query
List pgsql-bugs
Your name               : Patrick Clery
Your email address      : patrick@phpforhire.com


System Configuration
---------------------
  Architecture (example: Intel Pentium)         : Intel
  Operating System (example: Linux 2.4.18)      :  FreeBSD 4.10-stable
  PostgreSQL version (example: PostgreSQL-8.0):   PostgreSQL-8.0
  Compiler used (example:  gcc 2.95.2)          : gcc 2.95.4


Please enter a FULL description of your problem:
------------------------------------------------

This query appears to enter an infinite loop and fill up my /usr partition=
=20
(PGDATA=3D/usr/local/pgsql/data) at a rapid rate:

UPDATE people_locations
SET postalcode =3D foo.pcode, city_id =3D foo.cid
FROM (
    SELECT
        p.postalcode AS pcode,
        p.city_id AS cid,
        c.state_id AS sid
    FROM postalcodes p
    JOIN cities c USING (city_id)
) foo
WHERE foo.sid =3D state_id AND old_postalcode =3D foo.pcode

psql:/usr/local/www/beano/datingsite/sql/import_people.sql:363: ERROR:  cou=
ld=20
not write to hash-join temporary file: No space left on device


From when the query is first run (somehow the disk space goes up initially):
=20
$ while : ; do df -h /usr/; sleep 3; done
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.8G   7.8G    43%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.2G   8.3G    39%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.1G   8.4G    38%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.2G   8.4G    38%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.2G   8.3G    39%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.3G   8.3G    39%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.3G   8.3G    39%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.4G   8.2G    40%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.4G   8.2G    40%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.5G   8.1G    40%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.5G   8.1G    41%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.6G   8.0G    41%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.6G   8.0G    41%    /usr
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1g    15G   5.7G   7.9G    42%    /usr
... and on and on until it reaches zero.


Here's the query plan:

                                                    QUERY PLAN=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
---------------------------------------------------------------------------=
-------------------------------------------
 Hash Join  (cost=3D18770.77..185690.90 rows=3D20626 width=3D140)
   Hash Cond: ((("outer".postalcode)::text =3D ("inner".old_postalcode)::te=
xt)=20
AND ("outer".city_id =3D "inner".city_id))
   ->  Seq Scan on postalcodes p  (cost=3D0.00..14742.12 rows=3D825012 widt=
h=3D18)
   ->  Hash  (cost=3D9955.64..9955.64 rows=3D366625 width=3D126)
         ->  Merge Join  (cost=3D69.83..9955.64 rows=3D366625 width=3D126)
               Merge Cond: ("outer".state_id =3D "inner".state_id)
               ->  Index Scan using cities_state_id on cities c=20=20
(cost=3D0.00..4203.13 rows=3D73325 width=3D8)
               ->  Sort  (cost=3D69.83..72.33 rows=3D1000 width=3D122)
                     Sort Key: people_locations.state_id
                     ->  Seq Scan on people_locations  (cost=3D0.00..20.00=
=20
rows=3D1000 width=3D122)
(10 rows)


Here's the inner query by itself:
=20
datingsite=3D> EXPLAIN ANALYZE     SELECT
datingsite->         p.postalcode AS pcode,
datingsite->         p.city_id AS cid,
datingsite->         c.state_id AS sid
datingsite->     FROM postalcodes p
datingsite->     JOIN cities c USING (city_id);
                                                          QUERY PLAN=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20
---------------------------------------------------------------------------=
----------------------------------------------------
 Hash Join  (cost=3D2091.56..47451.98 rows=3D825012 width=3D22) (actual=20
time=3D1132.994..16764.241 rows=3D825012 loops=3D1)
   Hash Cond: ("outer".city_id =3D "inner".city_id)
   ->  Seq Scan on postalcodes p  (cost=3D0.00..14742.12 rows=3D825012 widt=
h=3D18)=20
(actual time=3D0.077..4657.842 rows=3D825012 loops=3D1)
   ->  Hash  (cost=3D1585.25..1585.25 rows=3D73325 width=3D8) (actual=20
time=3D1131.010..1131.010 rows=3D0 loops=3D1)
         ->  Seq Scan on cities c  (cost=3D0.00..1585.25 rows=3D73325 width=
=3D8)=20
(actual time=3D0.031..738.582 rows=3D73325 loops=3D1)
 Total runtime: 20475.610 ms
(6 rows)



Both tables are rather large:


datingsite=3D> select count(*) from people_locations ;
 count=20=20
--------
 131266
(1 row)
=20
Time: 2566.282 ms
datingsite=3D> select count(*) from postalcodes;=20=20=20=20=20=20
 count=20=20
--------
 825012
(1 row)
=20
Time: 4246.360 ms



Here is the schema:



datingsite=3D> \d postalcodes;
           Table "public.postalcodes"
   Column   |         Type          | Modifiers=20
------------+-----------------------+-----------
 postalcode | character varying(10) | not null
 city_id    | integer               | not null
Indexes:
    "postalcodes_pkey" PRIMARY KEY, btree (postalcode, city_id)
Foreign-key constraints:
    "postalcodes_city_id_fkey" FOREIGN KEY (city_id) REFERENCES=20
cities(city_id) ON DELETE CASCADE
=20
datingsite=3D> \d people_locations;
                           Table "public.people_locations"
     Column      |          Type          |                 Modifiers=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
-----------------+------------------------+--------------------------------=
-----------
 person_id       | integer                | not null
 city_id         | integer                | not null default 0
 postalcode      | character varying(30)  | not null default 'N/A'::charact=
er=20
varying
 postalcode_city | integer                | not null default 0
 country_iso     | integer                | not null default 0
 state_id        | integer                | not null default 0
 areacode        | integer                | not null default 0
 old_postalcode  | character varying(10)  | not null default ''::character=
=20
varying
 old_cityname    | character varying(128) | not null default ''::character=
=20
varying
Indexes:
    "people_locations_pkey" PRIMARY KEY, btree (person_id)
    "people_loc_postalcode" btree (postalcode)
Foreign-key constraints:
    "people_locations_person_id_fkey" FOREIGN KEY (person_id) REFERENCES=20
people(person_id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
    "people_locations_city_id_fkey" FOREIGN KEY (city_id) REFERENCES=20
cities(city_id) ON UPDATE CASCADE ON DELETE RESTRICT
    "people_locations_country_iso_fkey" FOREIGN KEY (country_iso) REFERENCE=
S=20
countries(country_iso)
    "people_locations_state_id_fkey" FOREIGN KEY (state_id) REFERENCES=20
states(state_id)


Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:=20
----------------------------------------------------------------------
I can repeat the problem each time I run that query. If there is any furthe=
r=20
debugging info needed, I'm willing to provide it.




If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------

pgsql-bugs by date:

Previous
From: "Alexander Zhiltsov"
Date:
Subject: Bug in PostrgeSQL 8.0beta
Next
From: Ivan
Date:
Subject: pg_dumpall fails