Re: creating a subset DB efficiently ? - Mailing list pgsql-general
From | Ron Johnson |
---|---|
Subject | Re: creating a subset DB efficiently ? |
Date | |
Msg-id | CANzqJaCR3rhrUCimF9uF+j1hAPZUEO5wZFos_PjweZM91jQoHA@mail.gmail.com Whole thread Raw |
In response to | creating a subset DB efficiently ? (David Gauthier <dfgpostgres@gmail.com>) |
List | pgsql-general |
On Fri, Mar 8, 2024 at 11:22 AM David Gauthier <dfgpostgres@gmail.com> wrote:
Here's the situation....- The DB contains data for several projects.- The tables of the DB contain data for all projects (data is not partitioned on project name or anything like that)- The "project" identifier (table column) exists in a few "parent" tables with many child... grandchild,... tables under them connected with foreign keys defined with "on delete cascade". So if a record in one of the parent table records is deleted, all of its underlying, dependent records get deleted too.- New projects come in, and old ones need to be removed and "archived" in DBs of their own. So there's a DB called "active_projects" and there's a DB called "project_a_archive" (identical metadata).- The idea is to copy the data for project "a" that's in "active_projects" to the "project_a_arhchive" DB AND delete the project a data out of "active_projects".- Leave "project_a_archive" up and running if someone needs to attach to that and get some old/archived data.The brute-force method I've been using is...1) pg_dump "active_projects" to a (huge) file then populate "project_a_archive" using that (I don't have the privs to create database, IT creates an empty one for me, so this is how I do it).2) go into the "project_a_archive" DB and run... "delete from par_tbl_1 where project <> 'a' ", "delete from par_tbl_2 where project <> 'a' ", etc... leaving only project "a" data in the DB.3) go into the "active_projects" DB and "delete from par_tbl_1 where project = 'a' ", etc... removing project "a" from the "active_projects DB.Ya, not very elegant, it takes a long time and it takes a lot of resources. So I'm looking for ideas on how to do this better.
I wrote this function to list all of the FK "children" that reference a "parent" table. It might help you determine all of the children, grandchildren, etc.
CREATE OR REPLACE FUNCTION dba.get_fk_referenced_by_child(_p_table_name TEXT)
RETURNS TABLE (downstream_table TEXT, downstream_column NAME, downstream_index NAME)
LANGUAGE plpgsql
AS
$$
DECLARE
_v_name TEXT;
_v_downstream_index_name NAME;
_c_downstream CURSOR FOR
select cla2.relnamespace::regnamespace::text||'.'||co.conrelid::regclass::text AS downstream_table
, co.conrelid AS downstream_oid
, att.attname AS downstream_column
from pg_constraint co
inner join pg_class cla1
on co.confrelid = cla1.oid
inner join pg_class cla2
on co.conrelid = cla2.oid
inner join pg_attribute att
on co.conrelid = att.attrelid
where co.contype = 'f'
and cla1.relnamespace::regnamespace::text||'.'||cla1.relname = _v_name
and att.attnum = co.conkey[1]
order by 1;
BEGIN
_v_name := TRIM(BOTH FROM _p_table_name);
PERFORM relname
FROM pg_class
WHERE relnamespace::regnamespace::text||'.'||relname = _v_name
AND relkind = 'r';
IF NOT FOUND THEN
downstream_table := _v_name;
downstream_column := 'does not exist';
downstream_index := NULL;
RETURN NEXT;
RETURN;
END IF;
FOR i IN _c_downstream
LOOP
downstream_table := i.downstream_table;
downstream_column := i.downstream_column;
SELECT indexrelid::regclass
into downstream_index
FROM pg_index
INNER JOIN pg_attribute ON indexrelid = attrelid
WHERE indrelid = i.downstream_oid
AND attname = i.downstream_column;
IF downstream_index IS NULL THEN
downstream_index := 'ALERT!!! MISSING INDEX';
END IF;
RETURN NEXT;
END LOOP;
END;
$$;
RETURNS TABLE (downstream_table TEXT, downstream_column NAME, downstream_index NAME)
LANGUAGE plpgsql
AS
$$
DECLARE
_v_name TEXT;
_v_downstream_index_name NAME;
_c_downstream CURSOR FOR
select cla2.relnamespace::regnamespace::text||'.'||co.conrelid::regclass::text AS downstream_table
, co.conrelid AS downstream_oid
, att.attname AS downstream_column
from pg_constraint co
inner join pg_class cla1
on co.confrelid = cla1.oid
inner join pg_class cla2
on co.conrelid = cla2.oid
inner join pg_attribute att
on co.conrelid = att.attrelid
where co.contype = 'f'
and cla1.relnamespace::regnamespace::text||'.'||cla1.relname = _v_name
and att.attnum = co.conkey[1]
order by 1;
BEGIN
_v_name := TRIM(BOTH FROM _p_table_name);
PERFORM relname
FROM pg_class
WHERE relnamespace::regnamespace::text||'.'||relname = _v_name
AND relkind = 'r';
IF NOT FOUND THEN
downstream_table := _v_name;
downstream_column := 'does not exist';
downstream_index := NULL;
RETURN NEXT;
RETURN;
END IF;
FOR i IN _c_downstream
LOOP
downstream_table := i.downstream_table;
downstream_column := i.downstream_column;
SELECT indexrelid::regclass
into downstream_index
FROM pg_index
INNER JOIN pg_attribute ON indexrelid = attrelid
WHERE indrelid = i.downstream_oid
AND attname = i.downstream_column;
IF downstream_index IS NULL THEN
downstream_index := 'ALERT!!! MISSING INDEX';
END IF;
RETURN NEXT;
END LOOP;
END;
$$;
Related question...
The "delete from par_tbl_a where project <> 'a' " is taking forever. I fear it's because it's trying to journal everything in case I want to rollback. But this is just in the archive DB and I don't mind taking the risk if I can speed this up outside of a transaction. How can I run a delete command like this without the rollback recovery overhead ?
pgsql-general by date: