Partial backup of linked tables - Mailing list pgsql-performance

From Cisko
Subject Partial backup of linked tables
Date
Msg-id 29C5C371-FCE7-4271-AB9A-1C2A8CDDDFDE@gmail.com
Whole thread Raw
List pgsql-performance
Hi, i'm new to this ML, i'll try to explain my issue:

I've two tables defined as is (postgresql 8.1):

CREATE TABLE table1
(
    _id                  serial,
    num1                 int4 not null,
    num2                 int4 not null,

    primary key(_id)
);

CREATE INDEX table1IDX1 ON table1(num1);

CREATE TABLE table2
  (
    _id         serial,
    _table1_id  int  not null,
    num3        int4 not null,
    num4        int4 not null,

    primary key(_id),

    foreign key(_table1_id) references table1(_id) on delete CASCADE
  );

CREATE INDEX table2IDX1 ON table2(_table1_id);


I need to select only a subset of table1/table2 records and backup
them (to disk).

I proceed as following:

1. Create equivalent tables with _tmp name with indexes and cascade;

CREATE TABLE table1_tmp
(
    _id                  serial,
    num1                 int4 not null,
    num2                 int4 not null,

    primary key(_id)
);

CREATE INDEX table1_tmpIDX1 ON table1_tmp(num1);

CREATE TABLE table2_tmp
  (
    _id         serial,
    _table1_id  int  not null,
    num3        int4 not null,
    num4        int4 not null,

    primary key(_id),

    foreign key(_table1_id) references table1_tmp(_id) on delete CASCADE
  );

CREATE INDEX table2_tmpIDX1 ON table2_tmp(_table1_id);


2. Select and insert into table1_tmp a subset of table1 based on a
query (num1 < 10)

INSERT INTO table1_tmp SELECT * from table1 WHERE num1 < 10;


3. Populate other tables with a foreign key;

INSERT INTO table2_tmp SELECT table2.* from table2, table1_tmp WHERE
table2._table1_id = table1_tmp._id;


4. Copy each table into a file (i don't have an 8.2, so that i can't
execute pg_dump with several -t options)

COPY table1_tmp TO "/tmp/table1_tmp.data";
COPY table2_tmp TO "/tmp/table2_tmp.data";


This is only an example, i've more complex tables, but schema is
equivalent to previous.

My question is: There'are some optimization/tips that i can do for
achieve better performance?
When i have several rows (10^6 or greater) returned by query into
table1, that starts to hogs time and CPU.

Doing an EXPLAIN, all queries on join are performed using indexes.

Thanks in advance,
Cisko

pgsql-performance by date:

Previous
From: Matthew Wakeling
Date:
Subject: Re: Tsearch2 Initial Search Speed
Next
From: Antonio Perez
Date:
Subject: WAL DUDAS