Thread: Index/trigger implementation for accessing latest records
Hi,
I have a table that stores a location identifier per person which will be appended to many times.
However, for many queries in this system we only need to know the most recent location per person, which is limited to about 1000 records.
Is the following trigger/index strategy a reasonable and safe approach to fast access to the latest location records per person?
- A boolean column (latest_record default true) to identify the latest record per person
- A before insert trigger that updates all other records for that person to latest_record = false
- A partial index on the latest_record column where latest_record is true
Aside from performance, is it safe to update other records in the table from the insert trigger in this way?
Minimal example is shown below:
create table location_records(id bigserial,person_id bigint,location_id bigint,latest_record boolean not null default true);create function latest_record_update() returns trigger as$$BEGINupdate location_records set latest_record = false where person_id = new.person_id and latest_record is true and id != new.id;return new;END;$$ language plpgsql;create trigger latest_record_trigger before insert on location_recordsfor each row execute procedure latest_record_update();create index latest_record_index on location_records(latest_record) where latest_record is true;insert into location_records(person_id,location_id) values (1,1);insert into location_records(person_id,location_id) values (1,2);insert into location_records(person_id,location_id) values (1,3);insert into location_records(person_id,location_id) values (2,3);insert into location_records(person_id,location_id) values (2,4);select * from location_records;
Alastair
Hi Alastair, See embedded comments. On 02/05/18 21:51, Alastair McKinley wrote: > > Hi, > > > I have a table that stores a location identifier per person which will > be appended to many times. > > However, for many queries in this system we only need to know the most > recent location per person, which is limited to about 1000 records. > > > Is the following trigger/index strategy a reasonable and safe approach > to fast access to the latest location records per person? > > > 1. A boolean column (latest_record default true) to identify the > latest record per person > 2. A before insert trigger that updates all other records for that > person to latest_record = false > 3. A partial index on the latest_record column where latest_record is > true > Suggest simplest and fastest is to use timestamptz, a timestamp with time zone (copes with changes of daylight saving and different timezones. The you have no need of triggers. Then all you need to do, is search for the person-id with the maximum value of the timestampz! [...] Cheers, Gavin
Alastair McKinley wrote: > create function latest_record_update() returns trigger as > $$ > BEGIN > update location_records set latest_record = false where person_id = new.person_id and latest_record is true andid != new.id; > return new; > END; > $$ language plpgsql; > > create trigger latest_record_trigger before insert on location_records > for each row execute procedure latest_record_update(); For maximum safety, you should use BEFORE trigger only to modify the row being inserted/updated (or to abort the operation); any change to other rows should occur in an AFTER trigger. One risk associated with failing to do that is that some other BEFORE trigger further modifies the new row, making your trigger-invoked UPDATE wrong. Also, be sure to think about possible infinite recursion. Another angle to keep in mind is what happens with insertions of historical records, i.e. those that are not latest (today you think "ah, but that never happens" and three months from now this is requested as a feature. Time to rethink the whole development ...) You'd clobber the latest_record flag without a replacement for it, which is probably undesirable. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alastair McKinley <a.mckinley@analyticsengines.com> writes: > Hi, > > > I have a table that stores a location identifier per person which will be appended to many times. > > However, for many queries in this system we only need to know the most recent location per person, which is limited toabout 1000 records. > > > Is the following trigger/index strategy a reasonable and safe approach to fast access to the latest location records perperson? > > > 1. A boolean column (latest_record default true) to identify the latest record per person > 2. A before insert trigger that updates all other records for that person to latest_record = false > 3. A partial index on the latest_record column where latest_record is true > > > Aside from performance, is it safe to update other records in the table from the insert trigger in this way? > > > Minimal example is shown below: > > > create table location_records > > ( > > id bigserial, > > person_id bigint, > > location_id bigint, > > latest_record boolean not null default true > > ); > > > create function latest_record_update() returns trigger as > > $$ > > BEGIN > > update location_records set latest_record = false where person_id = new.person_id and latest_record is true andid != new.id; > > return new; > > END; > > $$ language plpgsql; > > > create trigger latest_record_trigger before insert on location_records > > for each row execute procedure latest_record_update(); > > > create index latest_record_index on location_records(latest_record) where latest_record is true; > > > insert into location_records(person_id,location_id) values (1,1); > > insert into location_records(person_id,location_id) values (1,2); > > insert into location_records(person_id,location_id) values (1,3); > > > insert into location_records(person_id,location_id) values (2,3); > > insert into location_records(person_id,location_id) values (2,4); > > > select * from location_records; > My personal bias will come out here .... I don't think using a trigger is a good solution here. Although very powerful, the problem with triggers is that they are a 'hidden' side effect which is easily overlooked and often adds an additional maintenance burden which could be avoided using alternative approaches. Consider a few months down the road and your on holidays. One of your colleagues is asked to add a new feature which involves inserting records into this table. During testing, they observe an odd result - a field changing which according to the SQL they wrote should not. The simple new feature now takes twice as long to develop as your colleague works out there is a trigger on the table. Worse yet, they don't notice and put there changes into production and then issue start getting raised about communications going to the wrong location for customers etc. Triggers often become a lot more complicated than they will initially appear. In your example, what happens for updates as opposed to inserts? What happens if the 'new' location is actually the same as a previously recorded location etc. In your case, I would try to make what your doing more explicit and avoid the trigger. There are a number of ways to do this such as - A function to insert the record. The function could check to see if that customer has any previous records and if so, set the boolean flag to false for all existing records and true for the new one. You might even want to break it up into two functions so that you have one which just sets the flag based on a unique key parameter - this would provide a way of resetting the current location without having to do an insert. - Use a timestamp instead of a boolean and change your logic to select the current location by selecting the record with the latest timestamp. - Keep the two actions as separate SQL - one to insert a record and one to set the current location. This has the advantage of making actions clear and easier to maintain and can be useful in domains where people move between locations (for example, I've done this for a University where the data represented the students current address, which would change in and out of semester periods, but often cycle between two addresses, their college and their parental home). The downside of this approach is that applications which insert this information must remember to execute both SQL statements. If you have multiple interfaces, this might become a maintenance burden (one of the advantages of using a DB function). Tim -- Tim Cross