Issues with patitionning and triggers - Mailing list pgsql-general

From Samuel Gilbert
Subject Issues with patitionning and triggers
Date
Msg-id 1398300.p7Rp9drW4Y@yamium
Whole thread Raw
Responses Re: Issues with patitionning and triggers
Re: Issues with patitionning and triggers
List pgsql-general
I have data warehousing DB 2 fairly big tables : one contains about 200
million rows and the other one contains about 4 billion rows.  Some queries
are now taking way too long to run (> 13 hours).  I need to get these queries
to run in an hour or so.  The slowdown was gradual, but I eventually hit a
wall, when the planner stopped using indexes.

All of this was done on PostgreSQL 9.2.0 64-bit compiled from the official
source.  Significant changes in postgresql.conf :

shared_buffers = 8GB
work_mem = 8GB
maintenance_work_mem = 8GB
max_stack_depth = 2MB

Here is the information about the big tables :

SELECT
   nspname || '.' || relname AS relation,
   pg_stat_get_live_tuples(C.oid) AS live_tuples,
   pg_size_pretty(pg_relation_size(C.oid, 'main')) AS relation_size,
   pg_size_pretty(pg_indexes_size(C.oid)) AS indexes_size,
   pg_size_pretty(pg_total_relation_size(C.oid)) AS total_size
FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE
   nspname NOT IN ('pg_catalog', 'information_schema') AND
   C.relkind <> 'i' AND
   nspname !~ '^pg_toast' AND
   relname IN ('obs', 'forecast');

    relation     | live_tuples | relation_size | indexes_size | total_size
-----------------+-------------+---------------+--------------+------------
 public.obs      |   193235914 | 18 GB         | 15 GB        | 33 GB
 public.forecast |  3914247064 | 425 GB        | 148 GB       | 573 GB

From what I read, the planner probably stopped using indexes since they are so
big compared to the system RAM (64 GB).  I therefore tried partitioning the
obs table.  The forecast table is the biggest issue, but I wanted to be able
to try this out in a single day before tackling the real monster.

Here is the structure of the parent table :

                 Table "public.observation"
      Column      |            Type             | Modifiers
------------------+-----------------------------+-----------
 station          | integer                     | not null
 method           | integer                     | not null
 startdate        | timestamp without time zone | not null
 duration         | interval                    | not null
 value            | real                        | not null
 dataset          | integer                     | not null
 modificationdate | timestamp without time zone | not null
Check constraints:
    "observation_check1" CHECK ((startdate + duration) < now())
    "observation_duration_check1" CHECK (duration > '00:00:00'::interval)
Foreign-key constraints:
    "observation_dataset_fkey1" FOREIGN KEY (dataset) REFERENCES dataset(id)
MATCH FULL
    "observation_method_fkey1" FOREIGN KEY (method) REFERENCES method(id)
MATCH FULL
    "observation_station_fkey1" FOREIGN KEY (station) REFERENCES station(id)
MATCH FULL
Triggers:
    trigger_insert_00 BEFORE INSERT ON observation FOR EACH ROW EXECUTE
PROCEDURE observation_insert_trigger()

INSERT ... RETURNING does not work with partitioned tables, since the trigger
function on the parent that dispatches new rows to the children tables, must
return NULL.  If the trigger function on the parent ends with "RETURN NEW",
INSERT ... RETURNING works, but new rows are duplicated; they are inserted
both in the parent and child tables.

Is there a way to make INSERT ... RETURNING work without duplicating the rows?

The other issue I'm encountering is that I also have very simple BEFORE UPDATE
and BEFORE INSERT triggers that set the modification date on every single row
:

CREATE FUNCTION set_modificationDate() RETURNS TRIGGER AS $$
BEGIN
   NEW.modificationDate := now();
   RETURN NEW;
END;
$$ LANGUAGE 'plpgsql';

The modification date must be updated if any row is modified in any way.  I
first tried to define the triggers on the parent table.  This worked, but I
realized that if a queries targets explicitly a child table, it could modify a
row without the date being updated.  I therefore dropped the triggers on the
parent table and defined them for every child.  To my great surprise, the
insert below failed with a message saying that NULLs are not allowed in the
modificationdate column.

INSERT INTO observation
(dataset, station, method, startdate, duration, value)
VALUES (9, 2, 128, '2014-01-01 00:00:00', '24 hours', 42);

Why isn't the BEFORE INSERT trigger on the child table being executed?

Cheers,
Samuel Gilbert


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: syslog facilites and postgres ?
Next
From: Adrian Klaver
Date:
Subject: Re: Issues with patitionning and triggers