Thread: BUG #2428: ERROR: out of memory, running INSERT SELECT statement

BUG #2428: ERROR: out of memory, running INSERT SELECT statement

From
"Casey Duncan"
Date:
The following bug has been logged online:

Bug reference:      2428
Logged by:          Casey Duncan
Email address:      casey@pandora.com
PostgreSQL version: 8.1.3
Operating system:   Debian Linux (2.6.13.1-20050914 #2 SMP) 2xOpteron 8GB
RAM
Description:        ERROR:  out of memory, running INSERT SELECT statement
Details:

I filed this a few days back, but I came up with some more detail. I started
with a clean 8.1.3 installation (no databases), imported a production
snapshot and ran part of a large upgrade script on it.

The server has this config:

shared_buffers = 20000
max_prepared_transactions = 200
work_mem = 8192                 # 8 Mb
maintenance_work_mem = 131072   # 128 Mb
max_fsm_pages = 50000
wal_buffers = 64

The part of the db being upgraded has this schema to start with:

-- Begin Schema
SET client_encoding = 'UTF8';
SET default_with_oids = false;

CREATE TABLE ll (
    ll_id integer DEFAULT nextval(('ll_id_seq'::text)::regclass) NOT NULL,
    username text,
    "password" text,
    expiration_date timestamp without time zone,
    state text NOT NULL,
    billing_frequency text,
    alert_code text,
    auto_renew boolean DEFAULT true NOT NULL,
    email_opt_in boolean DEFAULT false NOT NULL,
    date_created timestamp without time zone DEFAULT now(),
    web_name text,
    birth_year integer,
    gender text,
    zipcode text
);

CREATE SEQUENCE ll_id_seq
    INCREMENT BY 1
    NO MAXVALUE
    NO MINVALUE
    CACHE 1;

CREATE TABLE ll_to_ss (
    ll_id integer NOT NULL,
    ss_id integer NOT NULL,
    time_added timestamp without time zone DEFAULT now()
);

CREATE TABLE ss (
    ss_id integer DEFAULT nextval(('ss_id_seq'::text)::regclass) NOT NULL,
    name character varying(64) NOT NULL,
    creator_id integer NOT NULL,
    ll_count integer DEFAULT 0 NOT NULL,
    initial_mm_id character varying(20),
    CONSTRAINT ss_name CHECK (((name)::text <> ''::text))
);

CREATE SEQUENCE ss_id_seq
    INCREMENT BY 1
    NO MAXVALUE
    NO MINVALUE
    CACHE 1;

ALTER TABLE ONLY ll
    ADD CONSTRAINT ll_pkey PRIMARY KEY (ll_id);
ALTER TABLE ONLY ll_to_ss
    ADD CONSTRAINT ll_to_ss_pkey PRIMARY KEY (ll_id, ss_id);
ALTER TABLE ll_to_ss CLUSTER ON ll_to_ss_pkey;
ALTER TABLE ONLY ll
    ADD CONSTRAINT ll_username_key UNIQUE (username);
ALTER TABLE ONLY ll
    ADD CONSTRAINT ll_web_name_key UNIQUE (web_name);
ALTER TABLE ONLY ss
    ADD CONSTRAINT ss_pkey PRIMARY KEY (ss_id);
CREATE INDEX ll_expiration_date_idx ON ll USING btree (expiration_date);
CREATE INDEX ll_to_ss_ss_id_idx ON ll_to_ss USING btree (ss_id);
CREATE INDEX ss_creator_id_initial_mm_id_idx ON ss USING btree (creator_id,
initial_mm_id);
CREATE INDEX ss_ll_count_idx ON ss USING btree (ll_count);
ALTER TABLE ONLY ss
    ADD CONSTRAINT "$1" FOREIGN KEY (creator_id) REFERENCES ll(ll_id);
ALTER TABLE ONLY ll_to_ss
    ADD CONSTRAINT "$1" FOREIGN KEY (ll_id) REFERENCES ll(ll_id);
ALTER TABLE ONLY ll_to_ss
    ADD CONSTRAINT "$2" FOREIGN KEY (ss_id) REFERENCES ss(ss_id) ON DELETE
RESTRICT;
-- End of Schema

Here is the upgrade script that causes the memory error:

BEGIN; --Upgrade script
ALTER TABLE ss RENAME COLUMN creator_id TO ll_id;

DROP INDEX ss_creator_id_initial_mm_id_idx;
CREATE INDEX ss_ll_id_initial_mm_id ON
    ss (ll_id, initial_mm_id);

DROP INDEX ss_ll_count_idx;

ALTER TABLE ss ALTER COLUMN ss_id DROP DEFAULT;
ALTER TABLE ss ALTER COLUMN ll_id DROP NOT NULL;
ALTER TABLE ss DROP COLUMN ll_count;
ALTER TABLE ss ADD COLUMN shared_ss_id BIGINT;
ALTER TABLE ss ADD COLUMN time_added TIMESTAMP;
ALTER TABLE ss ADD COLUMN shared_creator_id BIGINT;
ALTER TABLE ss ADD CONSTRAINT ss_shared_chk CHECK
    ((shared_ss_id != ss_id) AND (shared_creator_id != ll_id));

-- Update ss table in place for "original" sss
UPDATE ss SET time_added = lts.time_added
FROM ll_to_ss AS lts
WHERE ss.ll_id = lts.ll_id;

-- Add content to ss table for shared sss
INSERT INTO ss
    (ss_id, name, ll_id, shared_ss_id, time_added,
    shared_creator_id)
SELECT nextval('ss_id_seq'), s.name, lts.ll_id, s.ss_id,
    lts.time_added, s.ll_id
FROM ss AS s, ll_to_ss AS lts
WHERE lts.ll_id != s.ll_id;

DROP TABLE ll_to_ss CASCADE;

CREATE FUNCTION write_error_trigf() RETURNS trigger AS '
    BEGIN
        RAISE EXCEPTION ''Writes not allowed to this table on this node'';
    END;
' LANGUAGE plpgsql;
END; --Upgrade script

In the database being upgraded, the "ll" table has 8740364 rows, the "ss"
table has 18953787 rows and the "ll_to_ss" has 19233345 rows. The script
runs fine on an empty database. Running the script on the populated database
results in the following error (from the server log):

TopMemoryContext: 61976 total in 6 blocks; 5936 free (11 chunks); 56040
used
Type information cache: 8192 total in 1 blocks; 1864 free (0 chunks); 6328
used
Operator class cache: 8192 total in 1 blocks; 4936 free (0 chunks); 3256
used
TopTransactionContext: -1268785152 total in 372 blocks; 12672 free (372
chunks); -1268797824 used
MessageContext: 24576 total in 2 blocks; 1152 free (4 chunks); 23424 used
smgr relation table: 8192 total in 1 blocks; 2872 free (0 chunks); 5320
used
Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 896 free (0 chunks); 128 used
ExecutorState: 8912976 total in 12 blocks; 8361368 free (109253 chunks);
551608 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8128 free (0 chunks); 64 used
Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks); 4816 used
CacheMemoryContext: 516096 total in 6 blocks; 81656 free (0 chunks); 434440
used
ll_to_ss_ss_id_idx: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
ll_to_ss_pkey: 1024 total in 1 blocks; 328 free (0 chunks); 696 used
ss_pkey: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_constraint_contypid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_constraint_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_constraint_conname_nsp_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_attrdef_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
ss_ll_id_initial_mm_id: 1024 total in 1 blocks; 328 free (0 chunks); 696
used
pg_shdepend_depender_index: 1024 total in 1 blocks; 256 free (0 chunks); 768
used
pg_description_o_c_o_index: 1024 total in 1 blocks; 256 free (0 chunks); 768
used
pg_depend_depender_index: 1024 total in 1 blocks; 256 free (0 chunks); 768
used
pg_depend_reference_index: 1024 total in 1 blocks; 256 free (0 chunks); 768
used
pg_trigger_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_trigger_tgconstrrelid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_trigger_tgconstrname_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_index_indrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_constraint_conrelid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696
used
pg_type_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0 chunks);
768 used
pg_proc_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 192 free (0
chunks); 832 used
pg_operator_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_opclass_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 256 free (0 chunks);
768 used
pg_namespace_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_language_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_language_name_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_authid_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_authid_rolname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_database_datname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_conversion_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_conversion_default_index: 1024 total in 1 blocks; 192 free (0 chunks);
832 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696
used
pg_class_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_cast_source_target_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_amproc_opc_proc_index: 1024 total in 1 blocks; 256 free (0 chunks); 768
used
pg_amop_opr_opc_index: 1024 total in 1 blocks; 328 free (0 chunks); 696
used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks); 768
used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
MdSmgr: 8192 total in 1 blocks; 6568 free (0 chunks); 1624 used
LockTable (locallock hash): 8192 total in 1 blocks; 3912 free (0 chunks);
4280 used
Timezones: 47592 total in 2 blocks; 5968 free (0 chunks); 41624 used
ErrorContext: 8192 total in 1 blocks; 8176 free (4 chunks); 16 used
2006-05-08 18:35:28.494 PDT [d:radio_source u:slony s:445fc180.17e3 15]
ERROR:  out of memory
2006-05-08 18:35:28.494 PDT [d:radio_source u:slony s:445fc180.17e3 16]
DETAIL:  Failed on request of size 32.
2006-05-08 18:35:28.494 PDT [d:radio_source u:slony s:445fc180.17e3 17]
STATEMENT:  INSERT INTO ss
        (ss_id, name, ll_id, shared_ss_id, time_added,
        shared_creator_id)
    SELECT nextval('ss_id_seq'), s.name, lts.ll_id, s.ss_id,
        lts.time_added, s.ll_id
    FROM ss AS s, ll_to_ss AS lts
    WHERE lts.ll_id != s.ll_id;
"Casey Duncan" <casey@pandora.com> writes:
> CREATE FUNCTION write_error_trigf() RETURNS trigger AS '
>     BEGIN
>         RAISE EXCEPTION ''Writes not allowed to this table on this node'';
>     END;
> ' LANGUAGE plpgsql;
> END; --Upgrade script

You did not show how this function is being used, but I'm wondering if
it is an AFTER trigger on inserts into ss?  If so, the reason for the
out-of-memory failure might be accumulation of pending trigger event
records.

Without wishing to defend our lack of ability to spill trigger events
to disk, you probably wouldn't be happy with the performance if it did
work :-(.  Consider making the trigger BEFORE instead of AFTER, so that
there's not a need to remember a ton of pending trigger firings.  I don't
see any reason why this trigger needs to be AFTER.

            regards, tom lane

Re: BUG #2428: ERROR: out of memory, running INSERT SELECT statement

From
Casey Duncan
Date:
On May 9, 2006, at 9:09 PM, Tom Lane wrote:

> "Casey Duncan" <casey@pandora.com> writes:
>> CREATE FUNCTION write_error_trigf() RETURNS trigger AS '
>>     BEGIN
>>         RAISE EXCEPTION ''Writes not allowed to this table on this
>> node'';
>>     END;
>> ' LANGUAGE plpgsql;
>> END; --Upgrade script
>
> You did not show how this function is being used, but I'm wondering if
> it is an AFTER trigger on inserts into ss?  If so, the reason for the
> out-of-memory failure might be accumulation of pending trigger event
> records.
>
> Without wishing to defend our lack of ability to spill trigger events
> to disk, you probably wouldn't be happy with the performance if it did
> work :-(.  Consider making the trigger BEFORE instead of AFTER, so
> that
> there's not a need to remember a ton of pending trigger firings.  I
> don't
> see any reason why this trigger needs to be AFTER.

That's good to know, but I think it's a red herring in this case. The
upgrade script creates this function, but it doesn't setup the
triggers themselves, another external script does that after the
upgrade. Basically we're creating a multi-database cluster where all
the dbs have the same schema, but you aren't supposed to write to
certain tables in certain nodes.

The script actually fails before the function is even defined anyhow,
on this statement:

INSERT INTO ss
     (ss_id, name, ll_id, shared_ss_id, time_added,
     shared_creator_id)
SELECT nextval('ss_id_seq'), s.name, lts.ll_id, s.ss_id,
     lts.time_added, s.ll_id
FROM ss AS s, ll_to_ss AS lts
WHERE lts.ll_id != s.ll_id;

There are no triggers of any kind on ss. Basically this statement is
a step toward eliminating the ll_to_ss table (which was used for a
many-to-many between ll and ss) and creating a one-to-many between ll
and ss. This merges some portion of ll_to_ss with ss.

-Casey

Re: BUG #2428: ERROR: out of memory, running INSERT SELECT

From
Simon Riggs
Date:
On Wed, 2006-05-10 at 09:34 -0700, Casey Duncan wrote:

> The script actually fails before the function is even defined anyhow,
> on this statement:
>
> INSERT INTO ss
>      (ss_id, name, ll_id, shared_ss_id, time_added,
>      shared_creator_id)
> SELECT nextval('ss_id_seq'), s.name, lts.ll_id, s.ss_id,
>      lts.time_added, s.ll_id
> FROM ss AS s, ll_to_ss AS lts
> WHERE lts.ll_id != s.ll_id;

As your database is defined, this SQL statement will return
approximately 4 trillion rows, by my calculation. As you say, it returns
no rows at all when the database is empty.

If it hadn't failed on OOM it would have failed on disk space, assuming
you didn't have a requirement for a 100 Tb table. So fixing this problem
at the server end isn't something that is likely to happen soon/ever.

>From here, your SQL looks like it has an error-of-intention.

[This is exactly the type of statement that statement_cost_limit patch
would have rejected early with an appropriate message.]

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: BUG #2428: ERROR: out of memory, running INSERT SELECT statement

From
Casey Duncan
Date:
On May 11, 2006, at 4:42 AM, Simon Riggs wrote:

> On Wed, 2006-05-10 at 09:34 -0700, Casey Duncan wrote:
>
>> The script actually fails before the function is even defined anyhow,
>> on this statement:
>>
>> INSERT INTO ss
>>      (ss_id, name, ll_id, shared_ss_id, time_added,
>>      shared_creator_id)
>> SELECT nextval('ss_id_seq'), s.name, lts.ll_id, s.ss_id,
>>      lts.time_added, s.ll_id
>> FROM ss AS s, ll_to_ss AS lts
>> WHERE lts.ll_id != s.ll_id;
>
> As your database is defined, this SQL statement will return
> approximately 4 trillion rows, by my calculation. As you say, it
> returns
> no rows at all when the database is empty.

*slaps forehead* I totally missed the "!=" in the where clause, Doh!
Thanks for hitting me with a clue-stick.

> If it hadn't failed on OOM it would have failed on disk space,
> assuming
> you didn't have a requirement for a 100 Tb table. So fixing this
> problem
> at the server end isn't something that is likely to happen soon/ever.
>
>> From here, your SQL looks like it has an error-of-intention.
>
> [This is exactly the type of statement that statement_cost_limit patch
> would have rejected early with an appropriate message.]

That would be nice, as it is it currently fails in very nasty way.
"ERROR: Query too stupid" might be better for this one ;^)

Thanks much.

-Casey
Casey Duncan <casey@pandora.com> writes:
> On May 11, 2006, at 4:42 AM, Simon Riggs wrote:
>> As your database is defined, this SQL statement will return
>> approximately 4 trillion rows, by my calculation. As you say, it
>> returns no rows at all when the database is empty.

> *slaps forehead* I totally missed the "!=" in the where clause, Doh!
> Thanks for hitting me with a clue-stick.

I'm still wondering why you got "out of memory", though.  I'd have
expected that to grind for a really long time, gradually filling your
disk, until you got an out-of-disk-space kind of error; if you didn't
notice and stop it first.  There aren't (supposed to be) any long-term
memory leaks in query processing, other than than the known issue of
pending trigger events, which you say you haven't got on this table.

            regards, tom lane

Re: BUG #2428: ERROR: out of memory, running INSERT SELECT

From
Simon Riggs
Date:
On Thu, 2006-05-11 at 20:14 -0400, Tom Lane wrote:
> Casey Duncan <casey@pandora.com> writes:
> > On May 11, 2006, at 4:42 AM, Simon Riggs wrote:
> >> As your database is defined, this SQL statement will return
> >> approximately 4 trillion rows, by my calculation. As you say, it
> >> returns no rows at all when the database is empty.
>
> > *slaps forehead* I totally missed the "!=" in the where clause, Doh!
> > Thanks for hitting me with a clue-stick.
>
> I'm still wondering why you got "out of memory", though.  I'd have
> expected that to grind for a really long time, gradually filling your
> disk, until you got an out-of-disk-space kind of error; if you didn't
> notice and stop it first.  There aren't (supposed to be) any long-term
> memory leaks in query processing, other than than the known issue of
> pending trigger events, which you say you haven't got on this table.

Seems broken either way, OOM or OOD. We need a way to stop runaway
queries from happening in the first place.

--
  Simon Riggs
  EnterpriseDB          http://www.enterprisedb.com

Re: BUG #2428: ERROR: out of memory, running INSERT SELECT

From
"Jim C. Nasby"
Date:
On Fri, May 12, 2006 at 08:04:20AM +0100, Simon Riggs wrote:
> On Thu, 2006-05-11 at 20:14 -0400, Tom Lane wrote:
> > Casey Duncan <casey@pandora.com> writes:
> > > On May 11, 2006, at 4:42 AM, Simon Riggs wrote:
> > >> As your database is defined, this SQL statement will return
> > >> approximately 4 trillion rows, by my calculation. As you say, it
> > >> returns no rows at all when the database is empty.
> >
> > > *slaps forehead* I totally missed the "!=" in the where clause, Doh!
> > > Thanks for hitting me with a clue-stick.
> >
> > I'm still wondering why you got "out of memory", though.  I'd have
> > expected that to grind for a really long time, gradually filling your
> > disk, until you got an out-of-disk-space kind of error; if you didn't
> > notice and stop it first.  There aren't (supposed to be) any long-term
> > memory leaks in query processing, other than than the known issue of
> > pending trigger events, which you say you haven't got on this table.
>
> Seems broken either way, OOM or OOD. We need a way to stop runaway
> queries from happening in the first place.

Well, the question still remains, had they been trying this with a 100TB
table, would it have actually worked, or is there some kind of overflow?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461