Issues with patitionning and triggers - Mailing list pgsql-general
From | Samuel Gilbert |
---|---|
Subject | Issues with patitionning and triggers |
Date | |
Msg-id | 1398300.p7Rp9drW4Y@yamium Whole thread Raw |
Responses |
Re: Issues with patitionning and triggers
Re: Issues with patitionning and triggers |
List | pgsql-general |
I have data warehousing DB 2 fairly big tables : one contains about 200 million rows and the other one contains about 4 billion rows. Some queries are now taking way too long to run (> 13 hours). I need to get these queries to run in an hour or so. The slowdown was gradual, but I eventually hit a wall, when the planner stopped using indexes. All of this was done on PostgreSQL 9.2.0 64-bit compiled from the official source. Significant changes in postgresql.conf : shared_buffers = 8GB work_mem = 8GB maintenance_work_mem = 8GB max_stack_depth = 2MB Here is the information about the big tables : SELECT nspname || '.' || relname AS relation, pg_stat_get_live_tuples(C.oid) AS live_tuples, pg_size_pretty(pg_relation_size(C.oid, 'main')) AS relation_size, pg_size_pretty(pg_indexes_size(C.oid)) AS indexes_size, pg_size_pretty(pg_total_relation_size(C.oid)) AS total_size FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND C.relkind <> 'i' AND nspname !~ '^pg_toast' AND relname IN ('obs', 'forecast'); relation | live_tuples | relation_size | indexes_size | total_size -----------------+-------------+---------------+--------------+------------ public.obs | 193235914 | 18 GB | 15 GB | 33 GB public.forecast | 3914247064 | 425 GB | 148 GB | 573 GB From what I read, the planner probably stopped using indexes since they are so big compared to the system RAM (64 GB). I therefore tried partitioning the obs table. The forecast table is the biggest issue, but I wanted to be able to try this out in a single day before tackling the real monster. Here is the structure of the parent table : Table "public.observation" Column | Type | Modifiers ------------------+-----------------------------+----------- station | integer | not null method | integer | not null startdate | timestamp without time zone | not null duration | interval | not null value | real | not null dataset | integer | not null modificationdate | timestamp without time zone | not null Check constraints: "observation_check1" CHECK ((startdate + duration) < now()) "observation_duration_check1" CHECK (duration > '00:00:00'::interval) Foreign-key constraints: "observation_dataset_fkey1" FOREIGN KEY (dataset) REFERENCES dataset(id) MATCH FULL "observation_method_fkey1" FOREIGN KEY (method) REFERENCES method(id) MATCH FULL "observation_station_fkey1" FOREIGN KEY (station) REFERENCES station(id) MATCH FULL Triggers: trigger_insert_00 BEFORE INSERT ON observation FOR EACH ROW EXECUTE PROCEDURE observation_insert_trigger() INSERT ... RETURNING does not work with partitioned tables, since the trigger function on the parent that dispatches new rows to the children tables, must return NULL. If the trigger function on the parent ends with "RETURN NEW", INSERT ... RETURNING works, but new rows are duplicated; they are inserted both in the parent and child tables. Is there a way to make INSERT ... RETURNING work without duplicating the rows? The other issue I'm encountering is that I also have very simple BEFORE UPDATE and BEFORE INSERT triggers that set the modification date on every single row : CREATE FUNCTION set_modificationDate() RETURNS TRIGGER AS $$ BEGIN NEW.modificationDate := now(); RETURN NEW; END; $$ LANGUAGE 'plpgsql'; The modification date must be updated if any row is modified in any way. I first tried to define the triggers on the parent table. This worked, but I realized that if a queries targets explicitly a child table, it could modify a row without the date being updated. I therefore dropped the triggers on the parent table and defined them for every child. To my great surprise, the insert below failed with a message saying that NULLs are not allowed in the modificationdate column. INSERT INTO observation (dataset, station, method, startdate, duration, value) VALUES (9, 2, 128, '2014-01-01 00:00:00', '24 hours', 42); Why isn't the BEFORE INSERT trigger on the child table being executed? Cheers, Samuel Gilbert
pgsql-general by date: