Thread: Index/trigger implementation for accessing latest records

Index/trigger implementation for accessing latest records

From

Alastair McKinley

Date:

02 May 2018, 12:51:49

Hi,

I have a table that stores a location identifier per person which will be appended to many times.

However, for many queries in this system we only need to know the most recent location per person, which is limited to about 1000 records.

Is the following trigger/index strategy a reasonable and safe approach to fast access to the latest location records per person?

A boolean column (latest_record default true) to identify the latest record per person
A before insert trigger that updates all other records for that person to latest_record = false
A partial index on the latest_record column where latest_record is true

Aside from performance, is it safe to update other records in the table from the insert trigger in this way?

Minimal example is shown below:

create table location_records
(
id bigserial,
person_id bigint,
location_id bigint,
latest_record boolean not null default true
);

create function latest_record_update() returns trigger as
$$
BEGIN
update location_records set latest_record = false where person_id = new.person_id and latest_record is true and id != new.id;
return new;
END;
$$ language plpgsql;

create trigger latest_record_trigger before insert on location_records
for each row execute procedure latest_record_update();

create index latest_record_index on location_records(latest_record) where latest_record is true;

insert into location_records(person_id,location_id) values (1,1);
insert into location_records(person_id,location_id) values (1,2);
insert into location_records(person_id,location_id) values (1,3);

insert into location_records(person_id,location_id) values (2,3);
insert into location_records(person_id,location_id) values (2,4);

select * from location_records;

Best regards,

Alastair

Re: Index/trigger implementation for accessing latest records

From

Gavin Flower

Date:

02 May 2018, 13:36:12

Hi Alastair,

See embedded comments.

On 02/05/18 21:51, Alastair McKinley wrote:
>
> Hi,
>
>
> I have a table that stores a location identifier per person which will 
> be appended to many times.
>
> However, for many queries in this system we only need to know the most 
> recent location per person, which is limited to about 1000 records.
>
>
> Is the following trigger/index strategy a reasonable and safe approach 
> to fast access to the latest location records per person?
>
>
>  1. A boolean column (latest_record default true) to identify the
>     latest record per person
>  2. A before insert trigger that updates all other records for that
>     person to latest_record = false
>  3. A partial index on the latest_record column where latest_record is
>     true
>
Suggest simplest and fastest is to use timestamptz, a timestamp with 
time zone (copes with changes of daylight saving and different 
timezones. The you have no need of triggers.

Then all you need to do, is search for the person-id with the maximum 
value of the timestampz!

[...]


Cheers,
Gavin

Re: Index/trigger implementation for accessing latest records

From

Alvaro Herrera

Date:

02 May 2018, 23:32:36

Alastair McKinley wrote:

> create function latest_record_update() returns trigger as
> $$
> BEGIN
>         update location_records set latest_record = false where person_id = new.person_id and latest_record is true
andid != new.id;

>         return new;
> END;
> $$ language plpgsql;
> 
> create trigger latest_record_trigger before insert on location_records
> for each row execute procedure latest_record_update();

For maximum safety, you should use BEFORE trigger only to modify the row
being inserted/updated (or to abort the operation); any change to other
rows should occur in an AFTER trigger.  One risk associated with failing
to do that is that some other BEFORE trigger further modifies the new
row, making your trigger-invoked UPDATE wrong.  Also, be sure to think
about possible infinite recursion.

Another angle to keep in mind is what happens with insertions of
historical records, i.e. those that are not latest (today you think "ah,
but that never happens" and three months from now this is requested as a
feature.  Time to rethink the whole development ...)  You'd clobber the
latest_record flag without a replacement for it, which is probably
undesirable.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Index/trigger implementation for accessing latest records

From

Tim Cross

Date:

03 May 2018, 01:14:55

Alastair McKinley <a.mckinley@analyticsengines.com> writes:

> Hi,
>
>
> I have a table that stores a location identifier per person which will be appended to many times.
>
> However, for many queries in this system we only need to know the most recent location per person, which is limited
toabout 1000 records.
 
>
>
> Is the following trigger/index strategy a reasonable and safe approach to fast access to the latest location records
perperson?
 
>
>
>   1.  A boolean column (latest_record default true) to identify the latest record per person
>   2.  A before insert trigger that updates all other records for that person to latest_record = false
>   3.  A partial index on the latest_record column where latest_record is true
>
>
> Aside from performance, is it safe to update other records in the table from the insert trigger in this way?
>
>
> Minimal example is shown below:
>
>
> create table location_records
>
> (
>
>         id bigserial,
>
>         person_id bigint,
>
>         location_id bigint,
>
>         latest_record boolean not null default true
>
> );
>
>
> create function latest_record_update() returns trigger as
>
> $$
>
> BEGIN
>
>         update location_records set latest_record = false where person_id = new.person_id and latest_record is true
andid != new.id;
 
>
>         return new;
>
> END;
>
> $$ language plpgsql;
>
>
> create trigger latest_record_trigger before insert on location_records
>
> for each row execute procedure latest_record_update();
>
>
> create index latest_record_index on location_records(latest_record) where latest_record is true;
>
>
> insert into location_records(person_id,location_id) values (1,1);
>
> insert into location_records(person_id,location_id) values (1,2);
>
> insert into location_records(person_id,location_id) values (1,3);
>
>
> insert into location_records(person_id,location_id) values (2,3);
>
> insert into location_records(person_id,location_id) values (2,4);
>
>
> select * from location_records;
>

My personal bias will come out here ....

I don't think using a trigger is a good solution here. Although very
powerful, the problem with triggers is that they are a 'hidden' side
effect which is easily overlooked and often adds an additional
maintenance burden which could be avoided using alternative approaches.

Consider a few months down the road and your on holidays. One of your
colleagues is asked to add a new feature which involves inserting
records into this table. During testing, they observe an odd result - a
field changing which according to the SQL they wrote should not. The
simple new feature now takes twice as long to develop as your colleague
works out there is a trigger on the table. Worse yet, they don't notice
and put there changes into production and then issue start getting
raised about communications going to the wrong location for customers
etc.

Triggers often become a lot more complicated than they will initially
appear. In your example, what happens for updates as opposed to inserts?
What happens if the 'new' location is actually the same as a previously
recorded location etc. 

In your case, I would try to make what your doing more explicit and
avoid the trigger. There are a number of ways to do this such as

- A function to insert the record. The function could check to see if
  that customer has any previous records and if so, set the boolean flag
  to false for all existing records and true for the new one. You might
  even want to break it up into two functions so that you have one which
  just sets the flag based on a unique key parameter - this would
  provide a way of resetting the current location without having to do
  an insert. 

- Use a timestamp instead of a boolean and change your logic to select
  the current location by selecting the record with the latest
  timestamp.

- Keep the two actions as separate SQL - one to insert a record and one
  to set the current location. This has the advantage of making actions
  clear and easier to maintain and can be useful in domains where people
  move between locations (for example, I've done this for a University
  where the data represented the students current address, which would
  change in and out of semester periods, but often cycle between two
  addresses, their college and their parental home). The downside of
  this approach is that applications which insert this information must
  remember to execute both SQL statements. If you have multiple
  interfaces, this might become a maintenance burden (one of the
  advantages of using a DB function). 


Tim


--
Tim Cross