Thread: pg_restore failing with "ERROR: out of memory"

pg_restore failing with "ERROR: out of memory"

From
Aaron Brown
Date:
I’m attempting to do something that should be a trivially simple task.  I want to do a data only dump from my production data in the public schema and restore it on another machine.  Both machines are running 8.2.5, both machines are virtually identical and have 16GB of RAM.  

I created an archive with the following command:
pg_dump -v -Fc -a -n public -f 20080318-data-archive

The archive is roughly 2GB.  An uncompressed SQL dump is ~9GB.

...and am trying to restore it with:
pg_restore -av -d m16test 20080318-data-archive

I have a copy of the public schema loaded prior to doing the pg_restore.  

Several small tables import fine, but when it gets to answerselectinstance, a table with > 107 million rows, it chokes:

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3709; 0 1170915411 TABLE DATA answerselectinstance bzzagent
pg_restore: [archiver (db)] COPY failed: ERROR:  out of memory
DETAIL:  Failed on request of size 32.
CONTEXT:  COPY answerselectinstance, line 61944061: "180097     63      0"

I have also tried running a straight data-only SQL file through psql and get a similar error.

Answerselectinstance is defined as follows:
      Table "public.answerselectinstance"
     Column     |  Type   |     Modifiers      
----------------+---------+--------------------
 memberid       | integer | not null
 answerid       | integer | not null
 taskinstanceid | integer | not null default 0
Indexes:
    "answerselectinstance_pkey" PRIMARY KEY, btree (memberid, answerid, taskinstanceid)
    "asi_answerid_idx" btree (answerid)
    "asi_memberid_idx" btree (memberid)
    "asi_taskinstanceid_idx" btree (taskinstanceid)
Triggers:
    "RI_ConstraintTrigger_1170917112" AFTER INSERT OR UPDATE ON answerselectinstance FROM answer NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_check_ins"('<unnamed>', 'answerselectinstance', 'answer', 'UNSPECIFIED', 'answerid', 'id')
    _bzz_cluster_logtrigger_301 AFTER INSERT OR DELETE OR UPDATE ON answerselectinstance FOR EACH ROW EXECUTE PROCEDURE _bzz_cluster.logtrigger('_bzz_cluster', '301', 'kkk')

As you can see, this is part of a slony cluster.  I have created the schema using slony1_extract_schema.sh and am importing the data into a standalone server.

Our DBA solved this problem a few months ago, but recently left the company and did not document the procedure.  I’m no PostgreSQL expert....

Let me know what other information would be helpful.

Thanks,
Aaron

Re: pg_restore failing with "ERROR: out of memory"

From
Tom Lane
Date:
Aaron Brown <abrown@bzzagent.com> writes:
> I�m attempting to do something that should be a trivially simple task.  I
> want to do a data only dump from my production data in the public schema and
> restore it on another machine.

Does it really need to be data-only?  A regular schema+data dump usually
restores a lot faster.

Your immediate problem is probably that it's running out of memory for
pending foreign-key triggers.  Even if it didn't run out of memory, the
ensuing one-tuple-at-a-time checks would take forever.  You'd be better
off dropping the FK constraint, loading the data, and re-creating the
constraint.

There's further discussion of bulk-loading tricks in the manual:
http://www.postgresql.org/docs/8.2/static/populate.html

            regards, tom lane

Re: pg_restore failing with "ERROR: out of memory"

From
Aaron Brown
Date:
Yes, it kind of needs to be data only since I am pulling from a slonized database.  My experience has been if you don’t load the schema first with a schema from slony1_extract_schema.sh, we end up with all the slony triggers and crap in the dump.  If there is a better way of doing this, I’m definitely all ears.

Aaron


On 3/19/08 3:06 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Aaron Brown <abrown@bzzagent.com> writes:
> I’m attempting to do something that should be a trivially simple task.  I
> want to do a data only dump from my production data in the public schema and
> restore it on another machine.

Does it really need to be data-only?  A regular schema+data dump usually
restores a lot faster.

Your immediate problem is probably that it's running out of memory for
pending foreign-key triggers.  Even if it didn't run out of memory, the
ensuing one-tuple-at-a-time checks would take forever.  You'd be better
off dropping the FK constraint, loading the data, and re-creating the
constraint.

There's further discussion of bulk-loading tricks in the manual:
http://www.postgresql.org/docs/8.2/static/populate.html

                        regards, tom lane




-------------------------------------------------------
Aaron Brown, Systems Engineer
BzzAgent, Inc. | www.bzzagent.com
abrown@bzzagent.com | 617.451.2280
-------------------------------------------------------

Re: pg_restore failing with "ERROR: out of memory"

From
Tom Lane
Date:
Aaron Brown <abrown@bzzagent.com> writes:
> Yes, it kind of needs to be data only since I am pulling from a slonized
> database.  My experience has been if you don�t load the schema first with a
> schema from slony1_extract_schema.sh, we end up with all the slony triggers
> and crap in the dump.  If there is a better way of doing this, I�m
> definitely all ears.

Well, you'd need to take that up on the Slony lists, but I'd say that
Slony definitely ought to provide a better export mechanism than forcing
you to use a standard data-only dump.  At minimum they should teach that
script to emit two separate files: one to load before the data and one
after.  Building the indexes and FK constraints on already-loaded data
is a heck of a lot faster.

            regards, tom lane

Re: pg_restore failing with "ERROR: out of memory"

From
Aaron Brown
Date:
I agree that they “should”.   Unfortunately, nothing about slony is simple, in my experience.


On 3/19/08 3:17 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Aaron Brown <abrown@bzzagent.com> writes:
> Yes, it kind of needs to be data only since I am pulling from a slonized
> database.  My experience has been if you don’t load the schema first with a
> schema from slony1_extract_schema.sh, we end up with all the slony triggers
> and crap in the dump.  If there is a better way of doing this, I’m
> definitely all ears.

Well, you'd need to take that up on the Slony lists, but I'd say that
Slony definitely ought to provide a better export mechanism than forcing
you to use a standard data-only dump.  At minimum they should teach that
script to emit two separate files: one to load before the data and one
after.  Building the indexes and FK constraints on already-loaded data
is a heck of a lot faster.

                        regards, tom lane




-------------------------------------------------------
Aaron Brown, Systems Engineer
BzzAgent, Inc. | www.bzzagent.com
abrown@bzzagent.com | 617.451.2280
-------------------------------------------------------