Thread: [SQL] Most recent row
This question has been asked a few times, and Google returns a few different answers, but I am interested people's opinions and suggestions for the *best* wat to retrieve the most recent row from a table. My case is: create table people ( p_id serial primary key,...... ); create table assessments ( p_id int4 not null references people(p_id), as_timestamp timestamp not null, ...... ); select p.*, (most recent) a.* from people p, assessments a .. ;
On Fri, May 05, 2017 at 09:25:04AM +0100, Gary Stainburn wrote: > This question has been asked a few times, and Google returns a few different > answers, but I am interested people's opinions and suggestions for the *best* > wat to retrieve the most recent row from a table. > > My case is: > > create table people ( > p_id serial primary key, > ...... > ); > > create table assessments ( > p_id int4 not null references people(p_id), > as_timestamp timestamp not null, > ...... > ); > > select p.*, (most recent) a.* > from people p, assessments a > .. > ; You will need to provide a definition for *exactly* what "most recent" means in this context. Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On 2017-05-05 10:25, Gary Stainburn wrote: > This question has been asked a few times, and Google returns a few > different > answers, but I am interested people's opinions and suggestions for the > *best* > wat to retrieve the most recent row from a table. > > My case is: > > create table people ( > p_id serial primary key, > ...... > ); > > create table assessments ( > p_id int4 not null references people(p_id), > as_timestamp timestamp not null, > ...... > ); > > select p.*, (most recent) a.* > from people p, assessments a > .. > ; It depends on what you mean by 'most recent'. Do you mean "the row I just inserted" or "the row with a date closest to now()" or perhaps some other definition?
On Friday 05 May 2017 09:32:21 Karsten Hilbert wrote: > On Fri, May 05, 2017 at 09:25:04AM +0100, Gary Stainburn wrote: > > This question has been asked a few times, and Google returns a few > > different answers, but I am interested people's opinions and suggestions > > for the *best* wat to retrieve the most recent row from a table. > > > > My case is: > > > > create table people ( > > p_id serial primary key, > > ...... > > ); > > > > create table assessments ( > > p_id int4 not null references people(p_id), > > as_timestamp timestamp not null, > > ...... > > ); > > > > select p.*, (most recent) a.* > > from people p, assessments a > > .. > > ; > > You will need to provide a definition for *exactly* what > "most recent" means in this context. > > Karsten Appologies all. I though that was obvious, but it is only obvious for me. What I mean by most recent is the assessment record with the highest (most recent) timestamp. Specfically join a people row with the assessment row for that people. Each assessment will assign scores for the person being assessed. The scores from their most recent assessment are their current scores and what I want to appear in the view. In the live project it will actually be a left outer join in case the person has not yet been assessed.
On Fri, May 05, 2017 at 09:25:04AM +0100, Gary Stainburn wrote: > This question has been asked a few times, and Google returns a few different > answers, but I am interested people's opinions and suggestions for the *best* > wat to retrieve the most recent row from a table. > > My case is: > > create table people ( > p_id serial primary key, > ...... > ); > > create table assessments ( > p_id int4 not null references people(p_id), > as_timestamp timestamp not null, > ...... > ); > > select p.*, (most recent) a.* > from people p, assessments a > .. > ; How many rows are in people? How many in assessments? Do you really want data on all people? Or just some? Best regards, depesz
On Friday 05 May 2017 10:35:05 hubert depesz lubaczewski wrote: > On Fri, May 05, 2017 at 09:25:04AM +0100, Gary Stainburn wrote: > > This question has been asked a few times, and Google returns a few > > different answers, but I am interested people's opinions and suggestions > > for the *best* wat to retrieve the most recent row from a table. > > > > My case is: > > > > create table people ( > > p_id serial primary key, > > ...... > > ); > > > > create table assessments ( > > p_id int4 not null references people(p_id), > > as_timestamp timestamp not null, > > ...... > > ); > > > > select p.*, (most recent) a.* > > from people p, assessments a > > .. > > ; > > How many rows are in people? How many in assessments? Do you really want > data on all people? Or just some? > > Best regards, > > depesz This will be open ended so both datasets will just grow over time. There are 720 people records currently, and there should be 6-monthly assessments. TBH, I was expecting the dataset to be bigger. I was looking for a balanced solution, combining performance and SQL 'purity'. For example, the quickest method is probably to store the most recent assessment timestamp in the people row, but then that would be classed as redundent data as it is derivable from a related table. While this is a simple example, it is a real one as it is one that I need to implement now. However, I'm also looking for a techniquie that I can apply to more complex but basically similar situations.
On Fri, May 05, 2017 at 10:44:54AM +0100, Gary Stainburn wrote: > > How many rows are in people? How many in assessments? Do you really want > > data on all people? Or just some? > I was looking for a balanced solution, combining performance and SQL 'purity'. Solution depends on answers to questions I asked. Will you always get data for all people? Or just some? How many assessments per person will there usually be? Best regards, depesz
On Friday 05 May 2017 11:00:37 hubert depesz lubaczewski wrote: > On Fri, May 05, 2017 at 10:44:54AM +0100, Gary Stainburn wrote: > > > How many rows are in people? How many in assessments? Do you really > > > want data on all people? Or just some? > > > > I was looking for a balanced solution, combining performance and SQL > > 'purity'. > > Solution depends on answers to questions I asked. Will you always get > data for all people? Or just some? How many assessments per person will > there usually be? > > Best regards, > > depesz Oddly, part of my reply was missing. As I said, there are currently 720 rows in the people table. The will be a new assessment record for each person roughly every six months. The dataset will be open ended so will slowly but indefinitely grow. The part that was missing was that there will be new people records added who initially will not have an assessment. They will still need to be included in the report.
On Fri, May 05, 2017 at 11:54:54AM +0100, Gary Stainburn wrote: > On Friday 05 May 2017 11:00:37 hubert depesz lubaczewski wrote: > > On Fri, May 05, 2017 at 10:44:54AM +0100, Gary Stainburn wrote: > > > > How many rows are in people? How many in assessments? Do you really > > > > want data on all people? Or just some? > > > > > > I was looking for a balanced solution, combining performance and SQL > > > 'purity'. > > > > Solution depends on answers to questions I asked. Will you always get > > data for all people? Or just some? How many assessments per person will > > there usually be? > > > > Best regards, > > > > depesz > > Oddly, part of my reply was missing. > > As I said, there are currently 720 rows in the people table. > The will be a new assessment record for each person roughly every six months. > The dataset will be open ended so will slowly but indefinitely grow. > > The part that was missing was that there will be new people records added who > initially will not have an assessment. They will still need to be included in > the report. There was this part, but it still doesn't contain information about: - how many assessments per person will there usually be? - will you always get data for all people? - if not (all people) - how many (percent or count)? Best regards, depesz
This question has been asked a few times, and Google returns a few different
answers, but I am interested people's opinions and suggestions for the *best*
wat to retrieve the most recent row from a table.
My case is:
create table people (
p_id serial primary key,
......
);
create table assessments (
p_id int4 not null references people(p_id),
as_timestamp timestamp not null,
......
);
select p.*, (most recent) a.*
from people p, assessments a
..
;
I would start with something using DISTINCT ON and avoid redundant data. If performance starts to suck I would then probably add a field to people where you can record the most recent assessment id and which you would change via a trigger on assessments.
(not tested)
SELECT DISTINCT ON (p) p, a
FROM people p
LEFT JOIN assessments a USING (p_id)
ORDER BY p, a.as_timestamp DESC;
David J.
On 05/05/2017 08:14 AM, David G. Johnston wrote: > On Fri, May 5, 2017 at 1:25 AM, Gary Stainburn > <gary.stainburn@ringways.co.uk <mailto:gary.stainburn@ringways.co.uk>>wrote: > > This question has been asked a few times, and Google returns a few > different > answers, but I am interested people's opinions and suggestions for > the *best* > wat to retrieve the most recent row from a table. > > My case is: > > create table people ( > p_id serial primary key, > ...... > ); > > create table assessments ( > p_id int4 not null references people(p_id), > as_timestamp timestamp not null, > ...... > ); > > select p.*, (most recent) a.* > from people p, assessments a > .. > ; > > > I would start with something using DISTINCT ON and avoid redundant > data. If performance starts to suck I would then probably add a field > to people where you can record the most recent assessment id and which > you would change via a trigger on assessments. > > (not tested) > > SELECT DISTINCT ON (p) p, a > FROM people p > LEFT JOIN assessments a USING (p_id) > ORDER BY p, a.as_timestamp DESC; > > David J. > My take: create table people(p_id SERIAL PRIMARY KEY, name_first VARCHAR, name_last VARCHAR); create table assessments(p_id INTEGER NOT NULL REFERENCES people(p_id), as_timestamp TIMESTAMP NOT NULL) insert into people(name_first, name_last) values ('Daffy', 'Duck'), ('Mickey', 'Mouse'), ('Rocky', 'Squirrel'); insert into assessments (p_id, as_timestamp) values (1, '09/12/2016'), (3, '10/01/2016'), (2, '11/14/2016'), (1, '12/27/2016'), (2,'01/03/2017'),(3, '02/23/2017'), (1, '03/05/2017'); SELECT * FROM people JOIN ( SELECT p_id, max(as_timestamp) AS last_assessment FROM assessments GROUP BY p_id) AS max_ts ON people.p_id = max_ts.p_id JOIN assessments AS a ON a.p_id= max_ts.p_id AND a.as_timestamp = max_ts.last_assessment ORDER BY people.p_id; p_id | name_first | name_last | p_id | max ------+------------+-----------+------+--------------------- 1 | Daffy | Duck | 1 | 2017-03-05 00:00:00 2 | Mickey | Mouse | 2 | 2017-01-03 00:00:00 3 | Rocky | Squirrel | 3 | 2017-02-23 00:00:00 (3 rows) -- Adrian Klaver adrian.klaver@aklaver.com
with mytab as (
select p.pid,a.as_timestamp
from people p, assessments a
where a.p_id = a.p_id)
select p_id,as_timestamp from mytab z
where not exist( select 1 from mytab x
where x.p_id = z.p_id
and z.as_timestamp > x.as_timestamp);
or you could use analytics
On Fri, May 5, 2017 at 9:02 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 05/05/2017 08:14 AM, David G. Johnston wrote:On Fri, May 5, 2017 at 1:25 AM, Gary Stainburn
<gary.stainburn@ringways.co.uk <mailto:gary.stainburn@ringways.co.uk>>wrote:
This question has been asked a few times, and Google returns a few
different
answers, but I am interested people's opinions and suggestions for
the *best*
wat to retrieve the most recent row from a table.
My case is:
create table people (
p_id serial primary key,
......
);
create table assessments (
p_id int4 not null references people(p_id),
as_timestamp timestamp not null,
......
);
select p.*, (most recent) a.*
from people p, assessments a
..
;
I would start with something using DISTINCT ON and avoid redundant
data. If performance starts to suck I would then probably add a field
to people where you can record the most recent assessment id and which
you would change via a trigger on assessments.
(not tested)
SELECT DISTINCT ON (p) p, a
FROM people p
LEFT JOIN assessments a USING (p_id)
ORDER BY p, a.as_timestamp DESC;
David J.
My take:
create table people(p_id SERIAL PRIMARY KEY, name_first VARCHAR, name_last VARCHAR);
create table assessments(p_id INTEGER NOT NULL REFERENCES people(p_id), as_timestamp TIMESTAMP NOT NULL)
insert into people(name_first, name_last) values ('Daffy', 'Duck'), ('Mickey', 'Mouse'), ('Rocky', 'Squirrel');
insert into assessments (p_id, as_timestamp) values (1, '09/12/2016'), (3, '10/01/2016'), (2, '11/14/2016'), (1, '12/27/2016'), (2,'01/03/2017'),(3, '02/23/2017'), (1, '03/05/2017');
SELECT
*
FROM
people
JOIN (
SELECT
p_id,
max(as_timestamp) AS last_assessment
FROM
assessments
GROUP BY
p_id) AS max_ts ON people.p_id = max_ts.p_id
JOIN assessments AS a ON a.p_id = max_ts.p_id
AND a.as_timestamp = max_ts.last_assessment
ORDER BY
people.p_id;
p_id | name_first | name_last | p_id | max
------+------------+-----------+------+---------------------
1 | Daffy | Duck | 1 | 2017-03-05 00:00:00
2 | Mickey | Mouse | 2 | 2017-01-03 00:00:00
3 | Rocky | Squirrel | 3 | 2017-02-23 00:00:00
(3 rows)
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
select * from mike_people;
10,'Mike'
20,'Jane'
30,'Seiobhan'
select * from mike_assessments;
300,10,'2017-05-05 14:11:24.885633','C'
400,10,'2017-05-05 14:12:22.650532','B'
500,10,'2017-05-05 14:13:07.722378','A'
600,20,'2017-05-05 14:13:26.115105','B'
select * from
( select p_id,p_name from mike_people ) e1
left join lateral
( select max(as_timestamp) from mike_assessments
where p_id = e1.p_id
group by p_id) e2
on true;
10,'Mike','2017-05-05 14:13:07.722378'
20,'Jane','2017-05-05 14:13:26.115105'
30,'Seiobhan','<NULL>'
On Fri, May 5, 2017 at 1:47 PM, Michael Moore <michaeljmoore@gmail.com> wrote:
with mytab as (select p.pid,a.as_timestampfrom people p, assessments awhere a.p_id = a.p_id)select p_id,as_timestamp from mytab zwhere not exist( select 1 from mytab xwhere x.p_id = z.p_idand z.as_timestamp > x.as_timestamp);or you could use analyticsOn Fri, May 5, 2017 at 9:02 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 05/05/2017 08:14 AM, David G. Johnston wrote:On Fri, May 5, 2017 at 1:25 AM, Gary Stainburn
<gary.stainburn@ringways.co.uk <mailto:gary.stainburn@ringways.co.uk>>wrote:
This question has been asked a few times, and Google returns a few
different
answers, but I am interested people's opinions and suggestions for
the *best*
wat to retrieve the most recent row from a table.
My case is:
create table people (
p_id serial primary key,
......
);
create table assessments (
p_id int4 not null references people(p_id),
as_timestamp timestamp not null,
......
);
select p.*, (most recent) a.*
from people p, assessments a
..
;
I would start with something using DISTINCT ON and avoid redundant
data. If performance starts to suck I would then probably add a field
to people where you can record the most recent assessment id and which
you would change via a trigger on assessments.
(not tested)
SELECT DISTINCT ON (p) p, a
FROM people p
LEFT JOIN assessments a USING (p_id)
ORDER BY p, a.as_timestamp DESC;
David J.
My take:
create table people(p_id SERIAL PRIMARY KEY, name_first VARCHAR, name_last VARCHAR);
create table assessments(p_id INTEGER NOT NULL REFERENCES people(p_id), as_timestamp TIMESTAMP NOT NULL)
insert into people(name_first, name_last) values ('Daffy', 'Duck'), ('Mickey', 'Mouse'), ('Rocky', 'Squirrel');
insert into assessments (p_id, as_timestamp) values (1, '09/12/2016'), (3, '10/01/2016'), (2, '11/14/2016'), (1, '12/27/2016'), (2,'01/03/2017'),(3, '02/23/2017'), (1, '03/05/2017');
SELECT
*
FROM
people
JOIN (
SELECT
p_id,
max(as_timestamp) AS last_assessment
FROM
assessments
GROUP BY
p_id) AS max_ts ON people.p_id = max_ts.p_id
JOIN assessments AS a ON a.p_id = max_ts.p_id
AND a.as_timestamp = max_ts.last_assessment
ORDER BY
people.p_id;
p_id | name_first | name_last | p_id | max
------+------------+-----------+------+---------------------
1 | Daffy | Duck | 1 | 2017-03-05 00:00:00
2 | Mickey | Mouse | 2 | 2017-01-03 00:00:00
3 | Rocky | Squirrel | 3 | 2017-02-23 00:00:00
(3 rows)
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
with mytab as (
select p.p_id,a.as_timestamp ,p_name ,grade
from mike_people p, mike_assessments a
where a.p_id = p.p_id)
select p_id, as_timestamp, p_name, grade from mytab z
where not exists( select 1 from mytab x where x.p_id = z.p_id
and z.as_timestamp < x.as_timestamp);
tested
On Fri, May 5, 2017 at 2:25 PM, Michael Moore <michaeljmoore@gmail.com> wrote:
select * from mike_people;10,'Mike'20,'Jane'30,'Seiobhan'select * from mike_assessments;300,10,'2017-05-05 14:11:24.885633','C'400,10,'2017-05-05 14:12:22.650532','B'500,10,'2017-05-05 14:13:07.722378','A'600,20,'2017-05-05 14:13:26.115105','B'select * from( select p_id,p_name from mike_people ) e1left join lateral( select max(as_timestamp) from mike_assessmentswhere p_id = e1.p_idgroup by p_id) e2on true;10,'Mike','2017-05-05 14:13:07.722378'20,'Jane','2017-05-05 14:13:26.115105'30,'Seiobhan','<NULL>'On Fri, May 5, 2017 at 1:47 PM, Michael Moore <michaeljmoore@gmail.com> wrote:with mytab as (select p.pid,a.as_timestampfrom people p, assessments awhere a.p_id = a.p_id)select p_id,as_timestamp from mytab zwhere not exist( select 1 from mytab xwhere x.p_id = z.p_idand z.as_timestamp > x.as_timestamp);or you could use analyticsOn Fri, May 5, 2017 at 9:02 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 05/05/2017 08:14 AM, David G. Johnston wrote:On Fri, May 5, 2017 at 1:25 AM, Gary Stainburn
<gary.stainburn@ringways.co.uk <mailto:gary.stainburn@ringways.co.uk>>wrote:
This question has been asked a few times, and Google returns a few
different
answers, but I am interested people's opinions and suggestions for
the *best*
wat to retrieve the most recent row from a table.
My case is:
create table people (
p_id serial primary key,
......
);
create table assessments (
p_id int4 not null references people(p_id),
as_timestamp timestamp not null,
......
);
select p.*, (most recent) a.*
from people p, assessments a
..
;
I would start with something using DISTINCT ON and avoid redundant
data. If performance starts to suck I would then probably add a field
to people where you can record the most recent assessment id and which
you would change via a trigger on assessments.
(not tested)
SELECT DISTINCT ON (p) p, a
FROM people p
LEFT JOIN assessments a USING (p_id)
ORDER BY p, a.as_timestamp DESC;
David J.
My take:
create table people(p_id SERIAL PRIMARY KEY, name_first VARCHAR, name_last VARCHAR);
create table assessments(p_id INTEGER NOT NULL REFERENCES people(p_id), as_timestamp TIMESTAMP NOT NULL)
insert into people(name_first, name_last) values ('Daffy', 'Duck'), ('Mickey', 'Mouse'), ('Rocky', 'Squirrel');
insert into assessments (p_id, as_timestamp) values (1, '09/12/2016'), (3, '10/01/2016'), (2, '11/14/2016'), (1, '12/27/2016'), (2,'01/03/2017'),(3, '02/23/2017'), (1, '03/05/2017');
SELECT
*
FROM
people
JOIN (
SELECT
p_id,
max(as_timestamp) AS last_assessment
FROM
assessments
GROUP BY
p_id) AS max_ts ON people.p_id = max_ts.p_id
JOIN assessments AS a ON a.p_id = max_ts.p_id
AND a.as_timestamp = max_ts.last_assessment
ORDER BY
people.p_id;
p_id | name_first | name_last | p_id | max
------+------------+-----------+------+---------------------
1 | Daffy | Duck | 1 | 2017-03-05 00:00:00
2 | Mickey | Mouse | 2 | 2017-01-03 00:00:00
3 | Rocky | Squirrel | 3 | 2017-02-23 00:00:00
(3 rows)
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
On 05/05/17 20:54, Gary Stainburn wrote: > On Friday 05 May 2017 11:00:37 hubert depesz lubaczewski wrote: >> On Fri, May 05, 2017 at 10:44:54AM +0100, Gary Stainburn wrote: >>>> How many rows are in people? How many in assessments? Do you really >>>> want data on all people? Or just some? >>> >>> I was looking for a balanced solution, combining performance and SQL >>> 'purity'. >> >> Solution depends on answers to questions I asked. Will you always get >> data for all people? Or just some? How many assessments per person will >> there usually be? >> >> Best regards, >> >> depesz > > Oddly, part of my reply was missing. > > As I said, there are currently 720 rows in the people table. > The will be a new assessment record for each person roughly every six months. > The dataset will be open ended so will slowly but indefinitely grow. > > The part that was missing was that there will be new people records added who > initially will not have an assessment. They will still need to be included in > the report. > > > Depending upon what you might want to do with assessments that are not "most-recent", you could use a trigger function to move "out-of-date" assessments into a separate table and only keep the most recent as the "current" assessment in the assessment table. Makes the joins a little simpler and less time-consuming, again, depending upon what you want to do with non-most-recent rows. I have a real-world application that has marks awarded to assessments per student (>10k students, >1k assessments) and I keep two tables: 'mark' for "current" (or most recent) marks and 'old_mark' as a kind of audit trail of marks that have been modified, by whom, when and for what reason. I use two trigger functions: 'insert_mark_func' and 'update_mark_func' which are triggered on inserts and updates on the mark table respectively. Deletes are not allowed, but we can update an existing mark so that its value is "NULL", which gets treated as if it were a non-mark. Don't know if this is best-practice, and certainly not "pure SQL", but it is a balanced solution and has been working well for us for over 15 years. cheers, Bob Edwards.
On Friday 05 May 2017 12:09:39 hubert depesz lubaczewski wrote: > There was this part, but it still doesn't contain information about: To clarify the current situation > - how many assessments per person will there usually be? There will be roughly one assessment per person every 6 months, so initially the count will be low, but will increase over time. As we are still to implement this, the assessment count is zero, although many of the assessments have already been done on paper. Once up to date, everyone should have at least once assessment. > - will you always get data for all people? When a new person is added he will not have an assessment. > - if not (all people) - how many (percent or count)? Once each person has been added, an assessment wil be scheduled for as quickly as possible. As we currently have 720 people, adding 7 more people is only 1% so I should say the percentage will be between 0% and 1% (Once up to date)
David G. Johnston schrieb am 05.05.2017 um 17:14: > I would start with something using DISTINCT ON and avoid redundant > data. If performance starts to suck I would then probably add a field > to people where you can record the most recent assessment id and > which you would change via a trigger on assessments. > > (not tested) > > SELECT DISTINCT ON (p) p, a > FROM people p > LEFT JOIN assessments a USING (p_id) > ORDER BY p, a.as_timestamp DESC; > > David J. > I would probably put the evaluation of the "most recent assessment" into a derived table: select * from people p join ( select distinct on (p_id) * from assessments order by p_id, as_timestamp desc ) a ona.p_id = p.id; In my experience joining with the result of the distinct on () is quicker then applying the distinct on () on the resultof the join.