Thread: Issues with patitionning and triggers
I have data warehousing DB 2 fairly big tables : one contains about 200 million rows and the other one contains about 4 billion rows. Some queries are now taking way too long to run (> 13 hours). I need to get these queries to run in an hour or so. The slowdown was gradual, but I eventually hit a wall, when the planner stopped using indexes. All of this was done on PostgreSQL 9.2.0 64-bit compiled from the official source. Significant changes in postgresql.conf : shared_buffers = 8GB work_mem = 8GB maintenance_work_mem = 8GB max_stack_depth = 2MB Here is the information about the big tables : SELECT nspname || '.' || relname AS relation, pg_stat_get_live_tuples(C.oid) AS live_tuples, pg_size_pretty(pg_relation_size(C.oid, 'main')) AS relation_size, pg_size_pretty(pg_indexes_size(C.oid)) AS indexes_size, pg_size_pretty(pg_total_relation_size(C.oid)) AS total_size FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND C.relkind <> 'i' AND nspname !~ '^pg_toast' AND relname IN ('obs', 'forecast'); relation | live_tuples | relation_size | indexes_size | total_size -----------------+-------------+---------------+--------------+------------ public.obs | 193235914 | 18 GB | 15 GB | 33 GB public.forecast | 3914247064 | 425 GB | 148 GB | 573 GB From what I read, the planner probably stopped using indexes since they are so big compared to the system RAM (64 GB). I therefore tried partitioning the obs table. The forecast table is the biggest issue, but I wanted to be able to try this out in a single day before tackling the real monster. Here is the structure of the parent table : Table "public.observation" Column | Type | Modifiers ------------------+-----------------------------+----------- station | integer | not null method | integer | not null startdate | timestamp without time zone | not null duration | interval | not null value | real | not null dataset | integer | not null modificationdate | timestamp without time zone | not null Check constraints: "observation_check1" CHECK ((startdate + duration) < now()) "observation_duration_check1" CHECK (duration > '00:00:00'::interval) Foreign-key constraints: "observation_dataset_fkey1" FOREIGN KEY (dataset) REFERENCES dataset(id) MATCH FULL "observation_method_fkey1" FOREIGN KEY (method) REFERENCES method(id) MATCH FULL "observation_station_fkey1" FOREIGN KEY (station) REFERENCES station(id) MATCH FULL Triggers: trigger_insert_00 BEFORE INSERT ON observation FOR EACH ROW EXECUTE PROCEDURE observation_insert_trigger() INSERT ... RETURNING does not work with partitioned tables, since the trigger function on the parent that dispatches new rows to the children tables, must return NULL. If the trigger function on the parent ends with "RETURN NEW", INSERT ... RETURNING works, but new rows are duplicated; they are inserted both in the parent and child tables. Is there a way to make INSERT ... RETURNING work without duplicating the rows? The other issue I'm encountering is that I also have very simple BEFORE UPDATE and BEFORE INSERT triggers that set the modification date on every single row : CREATE FUNCTION set_modificationDate() RETURNS TRIGGER AS $$ BEGIN NEW.modificationDate := now(); RETURN NEW; END; $$ LANGUAGE 'plpgsql'; The modification date must be updated if any row is modified in any way. I first tried to define the triggers on the parent table. This worked, but I realized that if a queries targets explicitly a child table, it could modify a row without the date being updated. I therefore dropped the triggers on the parent table and defined them for every child. To my great surprise, the insert below failed with a message saying that NULLs are not allowed in the modificationdate column. INSERT INTO observation (dataset, station, method, startdate, duration, value) VALUES (9, 2, 128, '2014-01-01 00:00:00', '24 hours', 42); Why isn't the BEFORE INSERT trigger on the child table being executed? Cheers, Samuel Gilbert
On 02/18/2014 02:10 PM, Samuel Gilbert wrote: > I have data warehousing DB 2 fairly big tables : one contains about 200 > million rows and the other one contains about 4 billion rows. Some queries > are now taking way too long to run (> 13 hours). I need to get these queries > to run in an hour or so. The slowdown was gradual, but I eventually hit a > wall, when the planner stopped using indexes. > > The other issue I'm encountering is that I also have very simple BEFORE UPDATE > and BEFORE INSERT triggers that set the modification date on every single row > : > > CREATE FUNCTION set_modificationDate() RETURNS TRIGGER AS $$ > BEGIN > NEW.modificationDate := now(); > RETURN NEW; > END; > $$ LANGUAGE 'plpgsql'; > > The modification date must be updated if any row is modified in any way. I > first tried to define the triggers on the parent table. This worked, but I > realized that if a queries targets explicitly a child table, it could modify a > row without the date being updated. I therefore dropped the triggers on the > parent table and defined them for every child. To my great surprise, the > insert below failed with a message saying that NULLs are not allowed in the > modificationdate column. > > INSERT INTO observation > (dataset, station, method, startdate, duration, value) > VALUES (9, 2, 128, '2014-01-01 00:00:00', '24 hours', 42); > > Why isn't the BEFORE INSERT trigger on the child table being executed? Constraints are checked before triggers are run. > > Cheers, > Samuel Gilbert > > -- Adrian Klaver adrian.klaver@aklaver.com
On 2014-02-18 14:25:59 Adrian Klaver wrote: > On 02/18/2014 02:10 PM, Samuel Gilbert wrote: > > I have data warehousing DB 2 fairly big tables : one contains about 200 > > million rows and the other one contains about 4 billion rows. Some > > queries > > are now taking way too long to run (> 13 hours). I need to get these > > queries to run in an hour or so. The slowdown was gradual, but I > > eventually hit a wall, when the planner stopped using indexes. > > > > > > The other issue I'm encountering is that I also have very simple BEFORE > > UPDATE and BEFORE INSERT triggers that set the modification date on every > > single row > > > > > > CREATE FUNCTION set_modificationDate() RETURNS TRIGGER AS $$ > > BEGIN > > > > NEW.modificationDate := now(); > > RETURN NEW; > > > > END; > > $$ LANGUAGE 'plpgsql'; > > > > The modification date must be updated if any row is modified in any way. > > I > > first tried to define the triggers on the parent table. This worked, but > > I > > realized that if a queries targets explicitly a child table, it could > > modify a row without the date being updated. I therefore dropped the > > triggers on the parent table and defined them for every child. To my > > great surprise, the insert below failed with a message saying that NULLs > > are not allowed in the modificationdate column. > > > > INSERT INTO observation > > (dataset, station, method, startdate, duration, value) > > VALUES (9, 2, 128, '2014-01-01 00:00:00', '24 hours', 42); > > > > Why isn't the BEFORE INSERT trigger on the child table being executed? > > Constraints are checked before triggers are run. > > > Cheers, > > Samuel Gilbert I don't think that is the case since I currently have the BEFORE INSERT trigger working on the non-partitioned version of the table. The modificationdate field has a NOT NULL constraint. Even if I explicitly provide a NULL for the modificationdate column, a date gets written in the table. This leads me to believe that the BEFORE INSERT trigger is really executed before the constraint is checked. What I don't understand is why the trigger doesn't appear to be executed when it's defined on a child table. I'll add a RAISE NOTICE to the trigger function to makes sure it's not getting called.
Samuel Gilbert <samuel.gilbert@ec.gc.ca> writes: > All of this was done on PostgreSQL 9.2.0 64-bit compiled from the official > source. Significant changes in postgresql.conf : Why in the world are you using 9.2.0? You're missing a year and a half worth of bug fixes, some of them quite serious. > INSERT ... RETURNING does not work with partitioned tables, since the trigger > function on the parent that dispatches new rows to the children tables, must > return NULL. If the trigger function on the parent ends with "RETURN NEW", > INSERT ... RETURNING works, but new rows are duplicated; they are inserted > both in the parent and child tables. > Is there a way to make INSERT ... RETURNING work without duplicating the rows? Fraid not --- it only shows what got inserted into the parent table, which is nothing if you're using this technique. > The modification date must be updated if any row is modified in any way. I > first tried to define the triggers on the parent table. This worked, but I > realized that if a queries targets explicitly a child table, it could modify a > row without the date being updated. I therefore dropped the triggers on the > parent table and defined them for every child. To my great surprise, the > insert below failed with a message saying that NULLs are not allowed in the > modificationdate column. You'd have to provide a self-contained example for anyone to help you with that. The most obvious explanation is that you forgot to attach the trigger to the specific child table ... regards, tom lane
On 02/18/2014 02:42 PM, Samuel Gilbert wrote: > On 2014-02-18 14:25:59 Adrian Klaver wrote: >> On 02/18/2014 02:10 PM, Samuel Gilbert wrote: >>> I have data warehousing DB 2 fairly big tables : one contains about 200 >>> million rows and the other one contains about 4 billion rows. Some >>> queries >>> are now taking way too long to run (> 13 hours). I need to get these >>> queries to run in an hour or so. The slowdown was gradual, but I >>> eventually hit a wall, when the planner stopped using indexes. >>> >>> >>> The other issue I'm encountering is that I also have very simple BEFORE >>> UPDATE and BEFORE INSERT triggers that set the modification date on every >>> single row >>> >>> >>> CREATE FUNCTION set_modificationDate() RETURNS TRIGGER AS $$ >>> BEGIN >>> >>> NEW.modificationDate := now(); >>> RETURN NEW; >>> >>> END; >>> $$ LANGUAGE 'plpgsql'; >>> >>> The modification date must be updated if any row is modified in any way. >>> I >>> first tried to define the triggers on the parent table. This worked, but >>> I >>> realized that if a queries targets explicitly a child table, it could >>> modify a row without the date being updated. I therefore dropped the >>> triggers on the parent table and defined them for every child. To my >>> great surprise, the insert below failed with a message saying that NULLs >>> are not allowed in the modificationdate column. >>> >>> INSERT INTO observation >>> (dataset, station, method, startdate, duration, value) >>> VALUES (9, 2, 128, '2014-01-01 00:00:00', '24 hours', 42); >>> >>> Why isn't the BEFORE INSERT trigger on the child table being executed? >> >> Constraints are checked before triggers are run. >> >>> Cheers, >>> Samuel Gilbert > > I don't think that is the case since I currently have the BEFORE INSERT > trigger working on the non-partitioned version of the table. Sorry for steering you wrong. I could have sworn I saw the behavior I mentioned, previously, when trying to do what you have done. > > The modificationdate field has a NOT NULL constraint. Even if I explicitly > provide a NULL for the modificationdate column, a date gets written in the > table. This leads me to believe that the BEFORE INSERT trigger is really > executed before the constraint is checked. > > What I don't understand is why the trigger doesn't appear to be executed when > it's defined on a child table. I'll add a RAISE NOTICE to the trigger > function to makes sure it's not getting called. Still not sure what is going on, but I do have a question based on this statement from your original post: "The modification date must be updated if any row is modified in any way." If that is the case shouldn't the trigger also cover UPDATE? > > -- Adrian Klaver adrian.klaver@aklaver.com
> "The modification date must be updated if any row is modified in any way." > > If that is the case shouldn't the trigger also cover UPDATE? You completely right about that! I actually have both configured, but I focused only on the INSERT to try keep the length of my post as short as possible. As Tom Lane pointed out, it's hard to get help without a complete self- contained example. I will work on writing that up tomorrow. Cheers! On 2014-02-18 15:02:41 Adrian Klaver wrote: > On 02/18/2014 02:42 PM, Samuel Gilbert wrote: > > On 2014-02-18 14:25:59 Adrian Klaver wrote: > >> On 02/18/2014 02:10 PM, Samuel Gilbert wrote: > >>> I have data warehousing DB 2 fairly big tables : one contains about 200 > >>> million rows and the other one contains about 4 billion rows. Some > >>> queries > >>> are now taking way too long to run (> 13 hours). I need to get these > >>> queries to run in an hour or so. The slowdown was gradual, but I > >>> eventually hit a wall, when the planner stopped using indexes. > >>> > >>> > >>> The other issue I'm encountering is that I also have very simple BEFORE > >>> UPDATE and BEFORE INSERT triggers that set the modification date on > >>> every > >>> single row > >>> > >>> > >>> CREATE FUNCTION set_modificationDate() RETURNS TRIGGER AS $$ > >>> BEGIN > >>> > >>> NEW.modificationDate := now(); > >>> RETURN NEW; > >>> > >>> END; > >>> $$ LANGUAGE 'plpgsql'; > >>> > >>> The modification date must be updated if any row is modified in any way. > >>> I > >>> first tried to define the triggers on the parent table. This worked, > >>> but > >>> I > >>> realized that if a queries targets explicitly a child table, it could > >>> modify a row without the date being updated. I therefore dropped the > >>> triggers on the parent table and defined them for every child. To my > >>> great surprise, the insert below failed with a message saying that NULLs > >>> are not allowed in the modificationdate column. > >>> > >>> INSERT INTO observation > >>> (dataset, station, method, startdate, duration, value) > >>> VALUES (9, 2, 128, '2014-01-01 00:00:00', '24 hours', 42); > >>> > >>> Why isn't the BEFORE INSERT trigger on the child table being executed? > >> > >> Constraints are checked before triggers are run. > >> > >>> Cheers, > >>> Samuel Gilbert > > > > I don't think that is the case since I currently have the BEFORE INSERT > > trigger working on the non-partitioned version of the table. > > Sorry for steering you wrong. I could have sworn I saw the behavior I > mentioned, previously, when trying to do what you have done. > > > The modificationdate field has a NOT NULL constraint. Even if I > > explicitly > > provide a NULL for the modificationdate column, a date gets written in the > > table. This leads me to believe that the BEFORE INSERT trigger is really > > executed before the constraint is checked. > > > > What I don't understand is why the trigger doesn't appear to be executed > > when it's defined on a child table. I'll add a RAISE NOTICE to the > > trigger function to makes sure it's not getting called. > > Still not sure what is going on, but I do have a question based on this > statement from your original post: > > "The modification date must be updated if any row is modified in any way." > > If that is the case shouldn't the trigger also cover UPDATE?
On 2014-02-18 17:59:35 Tom Lane wrote: > Samuel Gilbert <samuel.gilbert@ec.gc.ca> writes: > > All of this was done on PostgreSQL 9.2.0 64-bit compiled from the official > > > source. Significant changes in postgresql.conf : > Why in the world are you using 9.2.0? You're missing a year and a half > worth of bug fixes, some of them quite serious. Yes. I know and I bear the pain and shame not running at least the latest revision of the 9.2 branch. Unfortunately, it's hard to get my manager to view the update of software that "just works" as something to prioritize. The good news is that your reply is a good argument to do so! :) Cheers! > > > INSERT ... RETURNING does not work with partitioned tables, since the > > trigger function on the parent that dispatches new rows to the children > > tables, must return NULL. If the trigger function on the parent ends > > with "RETURN NEW", INSERT ... RETURNING works, but new rows are > > duplicated; they are inserted both in the parent and child tables. > > > > Is there a way to make INSERT ... RETURNING work without duplicating the > > rows? > Fraid not --- it only shows what got inserted into the parent table, which > is nothing if you're using this technique. > > > The modification date must be updated if any row is modified in any way. > > I > > first tried to define the triggers on the parent table. This worked, but > > I > > realized that if a queries targets explicitly a child table, it could > > modify a row without the date being updated. I therefore dropped the > > triggers on the parent table and defined them for every child. To my > > great surprise, the insert below failed with a message saying that NULLs > > are not allowed in the modificationdate column. > > You'd have to provide a self-contained example for anyone to help you with > that. The most obvious explanation is that you forgot to attach the > trigger to the specific child table ... > > regards, tom lane
Hello everyone, Here is a complete example demonstrating the issue I am encountering : CREATE TABLE parent ( split INTEGER NOT NULL, happiness INTEGER NOT NULL, modificationDate TIMESTAMP NOT NULL ); CREATE TABLE child_split1 ( CHECK (split = 1) ) INHERITS (parent); CREATE TABLE child_split2 ( CHECK (split = 2) ) INHERITS (parent); CREATE OR REPLACE FUNCTION parent_dispatcher_trigger() RETURNS TRIGGER AS $$ BEGIN CASE NEW.split WHEN 1 THEN INSERT INTO child_split1 VALUES (NEW.*); WHEN 2 THEN INSERT INTO child_split2 VALUES (NEW.*); ELSE RAISE EXCEPTION 'Partition for % does not exist!',NEW.split; END CASE; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER trigger_insert_00 BEFORE INSERT ON parent FOR EACH ROW EXECUTE PROCEDURE parent_dispatcher_trigger(); CREATE OR REPLACE FUNCTION set_modificationDate_debug() RETURNS TRIGGER AS $$ BEGIN NEW.modificationDate := now(); RAISE NOTICE 'NEW row is now : (%, %, %)', NEW.split, NEW.happiness, NEW.modificationDate; RETURN NEW; END; $$ LANGUAGE 'plpgsql'; CREATE TRIGGER trigger_insert_00 BEFORE INSERT ON child_split1 FOR EACH ROW EXECUTE PROCEDURE set_modificationdate_debug(); CREATE TRIGGER trigger_insert_00 BEFORE INSERT ON child_split2 FOR EACH ROW EXECUTE PROCEDURE set_modificationdate_debug(); INSERT INTO parent (split, happiness) VALUES (1, 42); -- NOTICE: NEW row is now : (1, 42, 2014-02-19 16:31:07.384151) -- CONTEXT: SQL statement "INSERT INTO child_split1 VALUES (NEW.*)" -- PL/pgSQL function parent_dispatcher_trigger() line 4 at SQL statement -- ERROR: null value in column "modificationdate" violates not-null constraint -- DETAIL: Failing row contains (1, 42, null). INSERT INTO child_split1 (split, happiness) VALUES (1, 42); -- NOTICE: NEW row is now : (1, 42, 2014-02-19 16:37:27.134194) -- INSERT 0 1 So, we clearly see that trigger_insert_00 is called in both cases. I don't understand why the query fails on the parent, but works when the INSERT targets the child table directly. Regards, Samuel Gilbert On 2014-02-18 23:16:31 Samuel Gilbert wrote: > On 2014-02-18 17:59:35 Tom Lane wrote: > > Samuel Gilbert <samuel.gilbert@ec.gc.ca> writes: > > > All of this was done on PostgreSQL 9.2.0 64-bit compiled from the > > > official > > > > > source. Significant changes in postgresql.conf : > > Why in the world are you using 9.2.0? You're missing a year and a half > > worth of bug fixes, some of them quite serious. > > Yes. I know and I bear the pain and shame not running at least the latest > revision of the 9.2 branch. Unfortunately, it's hard to get my manager to > view the update of software that "just works" as something to prioritize. > > The good news is that your reply is a good argument to do so! :) > > Cheers! > > > > INSERT ... RETURNING does not work with partitioned tables, since the > > > trigger function on the parent that dispatches new rows to the children > > > tables, must return NULL. If the trigger function on the parent ends > > > with "RETURN NEW", INSERT ... RETURNING works, but new rows are > > > duplicated; they are inserted both in the parent and child tables. > > > > > > Is there a way to make INSERT ... RETURNING work without duplicating the > > > rows? > > > > Fraid not --- it only shows what got inserted into the parent table, which > > is nothing if you're using this technique. > > > > > The modification date must be updated if any row is modified in any way. > > > I > > > first tried to define the triggers on the parent table. This worked, > > > but > > > I > > > realized that if a queries targets explicitly a child table, it could > > > modify a row without the date being updated. I therefore dropped the > > > triggers on the parent table and defined them for every child. To my > > > great surprise, the insert below failed with a message saying that NULLs > > > are not allowed in the modificationdate column. > > > > You'd have to provide a self-contained example for anyone to help you with > > that. The most obvious explanation is that you forgot to attach the > > trigger to the specific child table ... > > > > regards, tom lane
Samuel Gilbert <samuel.gilbert@ec.gc.ca> writes: > Here is a complete example demonstrating the issue I am encountering : What seems to be happening is: 1. The parent table has a NOT NULL constraint on modificationDate. 2. parent_dispatcher_trigger does RETURN NEW, so the tuple will be inserted into the parent table as well as the selected child (presumably this is not what you wanted ...) 3. There is no trigger that will replace the modificationDate for an insertion into the parent. So the parent insertion fails, after all the triggers have run. regards, tom lane