Time-based table design / index selection - Mailing list pgsql-general

From Tony Shelver
Subject Time-based table design / index selection
Date
Msg-id CAG0dhZArSC-jHW2RdWtuf4t+7=MXf2pUkUFqw+UEnZG8yz8hrg@mail.gmail.com
Whole thread Raw
List pgsql-general
I come form a DB background using Oracle (which is quite outdated) and more recently SQL Server, 2012 and 2016, so would like some ideas from the group.

I have 2 tables which represent a time-based relationship between various object types, most of which also have a geographical component.
There are other tables involved obviously, but these two represent the core.

Objects can be customer accounts (each of which has it's own reporting substructure), customer branches, vehicles, telematic / GPS tracking devices, drivers, managers, geolocations such as geofences (area boundaries, exclusionary and inclusionary), geopoints / locations, and much more.
Objects are stored in the to_obj_locn_attrib table.

These objects can be organized hierarchically (vehicles and drivers within branches, branches within customer account, or at a global level, or both (geographical locations and areas).
These relationships / relationship types (drivers authorized to drive a specific vehicle, drivers who have actually driven a vehicle during a specific period, for example) are stored in the to_reln table.

The objects mutate over time, and  we always need to be able to report on the data at a point in time.

The two main tables in this implementations are to_obj_locn_attrib and to_reln.
There are other tables containing object data (such as to_object, which is the core id and immutable portion of the object. : 

CREATE TABLE teleon.to_obj_locn_attrib
(
    obj_id smallint NOT NULL,
    active_dt timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
    active_status character(1) COLLATE pg_catalog."default" NOT NULL DEFAULT 'A'::bpchar,
    user_upd character varying(40) COLLATE pg_catalog."default" NOT NULL,
    obj_name character varying(150) COLLATE pg_catalog."default",
    geo_type character(1) COLLATE pg_catalog."default",
    geo_locn geometry(Polygon,4326),
    geo_area character varying COLLATE pg_catalog."default",
    time_zone character varying(15) COLLATE pg_catalog."default",
    obj_locn_attribs jsonb,
    CONSTRAINT to_obj_locn_attr_pk PRIMARY KEY (obj_id, active_dt)
        USING INDEX TABLESPACE "TeleTS1",
    CONSTRAINT to_object_obj_locn_attr_fk FOREIGN KEY (obj_id)
        REFERENCES teleon.to_object (obj_id) MATCH FULL
        ON UPDATE CASCADE
        ON DELETE RESTRICT,
    CONSTRAINT geo_type_check CHECK (geo_type = ANY (ARRAY['P'::bpchar, 'O'::bpchar, 'L'::bpchar, 'R'::bpchar, 'C'::bpchar])),
    CONSTRAINT to_obj_locn_active_dt CHECK (active_status = ANY (ARRAY['A'::bpchar, 'I'::bpchar, 'D'::bpchar]))
)
.
There is a unique btree index on obj_id and active_dt (composite primary key).
Any changes to the information contained in this table results in a new record with the same obj_id and a new active_dt being inserted into the table.  
The to_obj_locn_attrib table is relatively static.

CREATE TABLE teleon.to_reln
(
    reln_id bigint NOT NULL DEFAULT nextval('teleon.to_reln_reln_id_seq'::regclass),
    reln_typ_id character varying(6) COLLATE pg_catalog."default" NOT NULL,
    active_dt timestamp without time zone NOT NULL,
    active_status character(1) COLLATE pg_catalog."default" NOT NULL,
    user_upd character varying(40) COLLATE pg_catalog."default" NOT NULL,
    obj_id_owner_of   integer NOT NULL,
    obj_id_owned_by integer NOT NULL,
    reln_attribs jsonb,
    CONSTRAINT to_reln_pk PRIMARY KEY (reln_id)
        USING INDEX TABLESPACE "TeleTS1",
    CONSTRAINT to_object_owned_by FOREIGN KEY (obj_id_owned_by)
        REFERENCES teleon.to_object (obj_id) MATCH FULL
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT to_object_owner_of FOREIGN KEY (obj_id_owner_of)
        REFERENCES teleon.to_object (obj_id) MATCH FULL
        ON UPDATE CASCADE
        ON DELETE RESTRICT,
    CONSTRAINT to_reln_type_reln_fk FOREIGN KEY (reln_typ_id)
        REFERENCES teleon.to_reln_type (reln_typ_id) MATCH FULL
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

Indexes: Apart from the Id column of reln_id, there is a btree index on reln_typ_id, obj_id_owner_of, and active_dt. and another on  reln_typ_id, , obj_id_owned_by and active_dt to cater to the most common queries.

This identifies the specific relationship of a specific relationship type between two objects at a specific point in time.

A typical query would be something like:
SELECT p.obj_id_owner_of , active_status, ... 
FROM teleon.to_reln p
WHERE p.reln_typ_id = 'DPTAST' AND
p.obj_id_owned_by = l_id AND
p.active_dt = (
SELECT MAX(c.active_dt)
FROM teleon.to_reln c
WHERE c.reln_typ_id = p.reln_typ_id AND
c.obj_id_owned_by = p.obj_id_owned_by );

A single object could be involved in many types of relationships with many other objects, 

Considering that this query would form the basis of a recursive query when extracting hierarchical structures, I would like this to be as efficient as possible.

Note that I have implemented the type of structure above in a few quite large projects with millions of objects and relationships quite successfully, using SQL Server.

One consideration is to implement a date range (actually tsrange) and then 2 GIST or BRIN indexes covering the above attributes of reln_typ_id, obj_id_owner_of, and active_dt_range. and another on  reln_typ_id, , obj_id_owned_by and active_dt_range

When changing the ot_reln id, I would update the active_dt_range (tsrange datatype to close the range at the time of change, and then implement a new copy of the record with active_dt_range starting at the time of the change, where applicable.
I would use the && operator to get the current records based on the range.

Another alternative could be to implement  the active range using separate date fields and a btree index to cover the reln_typ_id, obj_id_owner_of, active_dt_from and active_dt_to. and another on  reln_typ_id, , obj_id_owned_by, active_dt_from and active_dt_to.


I'd appreciate any input the group can come up with.


Tony

pgsql-general by date:

Previous
From: Jerry Sievers
Date:
Subject: Re: Oracle vs PG
Next
From: Steve Clark
Date:
Subject: no libpq.pc for Centos 6 or Centos 7