Thread: Subquery to select max(date) value
The query is to return the latest next_contact date for each person. Using the max() aggregate function and modeling the example of lo_temp on page 13 of the rel. 10 manual I wrote this statement: select (P.person_id, P.lname, P.fname, P.direct_phone, O.org_name, A.next_contact) from People as P, Organizations as O, Activities as A where P.org_id = O.org_id and P.person_id = A.person_id and A.next_contact = select (max(A.next_contact) from A) group by A.next_contact, P.person_id; The syntax error returned by psql is: psql:next_contact_date.sql:7: ERROR: syntax error at or near "select" LINE 4: A.next_contact = select (max(A.next_contact) from A) ^ and I fail to see what I've done incorrectly. Do I need to insert DISTINCT ON in the main or sub query? If so, what is the correct syntax to extract all desired columns from each selected row? If this is covered in the manual please point me to the proper section; if not, please educate me on the appropriate syntax to produce the desired output. TIA, Rich
On Tue, Feb 12, 2019 at 3:24 PM Rich Shepard <rshepard@appl-ecosys.com> wrote: > psql:next_contact_date.sql:7: ERROR: syntax error at or near "select" > LINE 4: A.next_contact = select (max(A.next_contact) from A) > ^ > and I fail to see what I've done incorrectly. You put the open parenthesis after the word select instead of before. A.next_contact = (SELECT max(A.next_contact) FROM A) David J.
On Tue, 12 Feb 2019, David G. Johnston wrote: > You put the open parenthesis after the word select instead of before. > A.next_contact = (SELECT max(A.next_contact) FROM A) David. Color me suitably embarrassed. Thank you, Rich
On Tue, 12 Feb 2019, Jeff Ross wrote: > Try (select (max(A.next_contact) from A) Thanks, Jeff. The syntax accepted by psql is A.next_contact = (select (max(A.next_contact)) from Activities as A) but the date is not included in the output. The revised statement is now: select (P.person_id, P.lname, P.fname, P.direct_phone, O.org_name, A.next_contact) from People as P, Organizations as O, Activities as A where P.org_id = O.org_id and P.person_id = A.person_id and A.next_contact = (select (max(A.next_contact)) from Activities as A) group by A.next_contact, O.org_id, P.person_id; This produces the first 5 colums in the outer select but no next_contact date. When I move A.next_contact to the head of the select list each row begins with 'infinity'. I'm still missing the requisite knowledge. Rich
On Tue, 12 Feb 2019, Rich Shepard wrote: > A.next_contact = (select (max(A.next_contact)) from Activities as A) Errata: The parentheses around the max aggregate are not necessary. A.next_contact now displays at the end of each returned row as 'infinity'. Rich
Hey Rich,
On Tue, Feb 12, 2019 at 4:24 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
The query is to return the latest next_contact date for each person. Using
the max() aggregate function and modeling the example of lo_temp on page 13
of the rel. 10 manual I wrote this statement:
I use DISTINCT ON and ORDER BY to get the single latest value from a table with multiple date entries:
HTH,
-m
On 2/12/19 2:48 PM, Rich Shepard wrote: > On Tue, 12 Feb 2019, Rich Shepard wrote: > >> A.next_contact = (select (max(A.next_contact)) from Activities as A) > > Errata: > > The parentheses around the max aggregate are not necessary. > > A.next_contact now displays at the end of each returned row as 'infinity'. 'infinity' is the max date, so this is what you want? > > Rich > > -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, Feb 12, 2019 at 2:48 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Tue, 12 Feb 2019, Rich Shepard wrote:
> A.next_contact = (select (max(A.next_contact)) from Activities as A)
Errata:
The parentheses around the max aggregate are not necessary.
A.next_contact now displays at the end of each returned row as 'infinity'.
Your subquery isn't doing anything to match on person_id, so it's going to match all the records with the highest next_contact in activities.
I think you want something more like:
A.next_contact = (select (max(A.next_contact)) from Activities as A2 WHERE A2.person_id=A.person_id)
Or, for that matter, since next_contact is all that you're drawing from activities, you can also just put it in the select:
select (P.person_id, P.lname, P.fname, P.direct_phone, O.org_name,
(select max(A.next_contact) from Activities as A WHERE p.person_id=A.person_id)
FROM ...
Cheers,
Ken
AGENCY Software
A Free Software data system
By and for non-profits
(253) 245-3801
learn more about AGENCY or
follow the discussion.
On Tue, 12 Feb 2019, Adrian Klaver wrote: > 'infinity' is the max date, so this is what you want? Adrian, Nope. When I went to make a cup of coffee I realized that I need the other date constraints (such as IS NOT NULL), too. I'm re-wording the statement to put everything in the correct order. Will probably try DISTINCT ON, too, if that makes it simpler or faster. Thanks, Rich
On Tue, 12 Feb 2019, Ken Tanzer wrote: > select (P.person_id, P.lname, P.fname, P.direct_phone, O.org_name, > (select max(A.next_contact) from Activities as A WHERE > p.person_id=A.person_id) > FROM ... Ken, Yes, cheers indeed. A bit of thinking and re-organizing resulted in a working statement that's close to what I want: select (P.person_id, P.lname, P.fname, P.direct_phone, O.org_name, (select max(A.next_contact))) from People as P, Organizations as O, Activities as A where P.org_id = O.org_id and P.person_id = A.person_id and /*A.next_contact = (select max(A.next_contact) from Activities as A) and */ A.next_contact <= 'today' and A.next_contact > '2018-12-31' and A.next_contact is not null group by A.next_contact, O.org_id, P.person_id; The two issues I now focus on resolving are the multiple rows per person rather than only the most recent and the date displayed at the end of each output row. DISTINCT ON will eliminate the first issue. Thanks, Rich
Ken,
Yes, cheers indeed. A bit of thinking and re-organizing resulted in a
working statement that's close to what I want:
select (P.person_id, P.lname, P.fname, P.direct_phone, O.org_name,
(select max(A.next_contact)))
from People as P, Organizations as O, Activities as A
where P.org_id = O.org_id and P.person_id = A.person_id and
/*A.next_contact = (select max(A.next_contact) from Activities as A) and */
A.next_contact <= 'today' and A.next_contact > '2018-12-31' and
A.next_contact is not null
group by A.next_contact, O.org_id, P.person_id;
The two issues I now focus on resolving are the multiple rows per person
rather than only the most recent and the date displayed at the end of each
output row. DISTINCT ON will eliminate the first issue.
If that's getting you what you want, then great and more power to you. It looks like you'll only get people who have a next_contact in your target window there. You might also consider something like this...
select
p.person_id,
p.lname,
p.fname,
p.direct_phone,
o.org_name,
a.next_contact
from
people as p
LEFT JOIN organizations o USING (person_id)
LEFT JOIN (
SELECT
DISTINCT ON (person_id)
person_id,
next_contact
FROM activities a
-- WHERE ???
ORDER BY person_id,next_contact DESC
) a USING (person_id)
;
Cheers,
Ken
AGENCY Software
A Free Software data system
By and for non-profits
(253) 245-3801
learn more about AGENCY or
follow the discussion.
On Tue, 12 Feb 2019, Ken Tanzer wrote: > If that's getting you what you want, then great and more power to you. It > looks like you'll only get people who have a next_contact in your target > window there. You might also consider something like this... <clip> Ken, I'll work with your example. This looks most promising. What I want is a report of folks I need to contact. Some have next_contact dates in the past when I did not act so I need to do so as long as there is a next_contact date (no NULLs) and the prospect is active. When I have a fully working statement I'll post it to the list so others can read the accepted working solution. Thanks again, Rich
>>>>> "Rich" == Rich Shepard <rshepard@appl-ecosys.com> writes: Rich> Will probably try DISTINCT ON, too, if that makes it simpler or Rich> faster. You want LATERAL. -- Andrew (irc:RhodiumToad)
Am Dienstag, 12. Februar 2019, 23:23:49 CET schrieb Rich Shepard: > The query is to return the latest next_contact date for each person. Using > the max() aggregate function and modeling the example of lo_temp on page 13 > of the rel. 10 manual I wrote this statement: You don't even need a subselect to do so: select P.person_id, P.lname, P.fname, P.direct_phone, O.org_name, max(A.next_contact) as next_contact from People as P join Organizations as O on P.org_id = O.org_id join Activities as A on P.person_id = A.person_id group by P.person_id, P.lname, P.fname, P.direct_phone, O.org_name; -- Kind regards Jan
On Wed, 13 Feb 2019, Andrew Gierth wrote: > Rich> Will probably try DISTINCT ON, too, if that makes it simpler or > Rich> faster. > > You want LATERAL. Andrew, That's new to me so I'll read about it. Thanks, Rich
On Tue, 12 Feb 2019, Ken Tanzer wrote: > If that's getting you what you want, then great and more power to you. It > looks like you'll only get people who have a next_contact in your target > window there. You might also consider something like this... > > select > p.person_id, > p.lname, > p.fname, > p.direct_phone, > o.org_name, > a.next_contact > from > people as p > LEFT JOIN organizations o USING (person_id) > LEFT JOIN ( > SELECT > DISTINCT ON (person_id) > person_id, > next_contact > FROM activities a > -- WHERE ??? > ORDER BY person_id,next_contact DESC > ) a USING (person_id) > ; I modified this to restrict the time and ignore nulls by replacing the question marks: WHERE a.next_contact <= 'today' and a.next_contact > '2018-12-31' and a.next_contact is not null For a reason I've not yet found, the last condition is not observed; i.e., those rows with null next_contact dates appear in the results. Position in the sequence makes no difference. What might cause this? Regards, Rich
On 2/13/19 6:28 AM, Rich Shepard wrote: > On Tue, 12 Feb 2019, Ken Tanzer wrote: > >> If that's getting you what you want, then great and more power to >> you. It >> looks like you'll only get people who have a next_contact in your target >> window there. You might also consider something like this... >> >> select >> p.person_id, >> p.lname, >> p.fname, >> p.direct_phone, >> o.org_name, >> a.next_contact >> from >> people as p >> LEFT JOIN organizations o USING (person_id) >> LEFT JOIN ( >> SELECT >> DISTINCT ON (person_id) >> person_id, >> next_contact >> FROM activities a >> -- WHERE ??? >> ORDER BY person_id,next_contact DESC >> ) a USING (person_id) >> ; > > I modified this to restrict the time and ignore nulls by replacing the > question marks: > > WHERE a.next_contact <= 'today' and a.next_contact > '2018-12-31' and > a.next_contact is not null > > For a reason I've not yet found, the last condition is not observed; i.e., > those rows with null next_contact dates appear in the results. Position in > the sequence makes no difference. What might cause this? The LEFT JOIN. There are rows in people for which there no records coming from the sub-select on activities, so the row is 'padded' with NULL values for the missing data. > > Regards, > > Rich > > > > -- Adrian Klaver adrian.klaver@aklaver.com
On Wed, 13 Feb 2019, Adrian Klaver wrote: > The LEFT JOIN. There are rows in people for which there no records coming > from the sub-select on activities, so the row is 'padded' with NULL values > for the missing data. Adrian, I assume it's the inner left join. I'll trace what's happening at each step and learn where to specify no nulls. Thanks, Rich
On 2/13/19 7:37 AM, Rich Shepard wrote: > On Wed, 13 Feb 2019, Adrian Klaver wrote: > >> The LEFT JOIN. There are rows in people for which there no records coming >> from the sub-select on activities, so the row is 'padded' with NULL >> values >> for the missing data. > > Adrian, > > I assume it's the inner left join. I'll trace what's happening at each step AFAIK there is no inner left join: https://www.postgresql.org/docs/10/sql-select.html#SQL-FROM " join_type One of [ INNER ] JOIN LEFT [ OUTER ] JOIN RIGHT [ OUTER ] JOIN FULL [ OUTER ] JOIN CROSS JOIN " > and learn where to specify no nulls. You can't it is the nature of the join: "LEFT OUTER JOIN returns all rows in the qualified Cartesian product (i.e., all combined rows that pass its join condition), plus one copy of each row in the left-hand table for which there was no right-hand row that passed the join condition. This left-hand row is extended to the full width of the joined table by inserting null values for the right-hand columns. Note that only the JOIN clause's own condition is considered while deciding which rows have matches. Outer conditions are applied afterwards." What you are seeing are records for which there is a person in the people table that either: 1) Do not have a record under that person_id in the activities table or 2) Do not have a record that meets the date filtering you are doing on next_contact. > > Thanks, > > Rich > > -- Adrian Klaver adrian.klaver@aklaver.com
On 2/13/19 7:37 AM, Rich Shepard wrote: > On Wed, 13 Feb 2019, Adrian Klaver wrote: > >> The LEFT JOIN. There are rows in people for which there no records coming >> from the sub-select on activities, so the row is 'padded' with NULL >> values >> for the missing data. > > Adrian, > > I assume it's the inner left join. I'll trace what's happening at each step > and learn where to specify no nulls. Should have been clearer in my previous post, you can get rid of the nulls by filtering out the entire row. > > Thanks, > > Rich > > -- Adrian Klaver adrian.klaver@aklaver.com
On Wed, 13 Feb 2019, Adrian Klaver wrote: > AFAIK there is no inner left join: > https://www.postgresql.org/docs/10/sql-select.html#SQL-FROM Sigh. There are two LEFT JOINS in the statement. I referred to the SECOND one as INNER. I correct my message to refer to the second of the two left joins. Rich
On Wed, 13 Feb 2019, Adrian Klaver wrote: > Should have been clearer in my previous post, you can get rid of the nulls > by filtering out the entire row. Adrian, Thank you. I'm rebuilding the statement from the inside out (which helps me learn more SQL in the process). For example, select * from activities where next_contact is not null and next_contact <= 'today' and next_contact > '2018-12-31'; works as written. This suggests that it's the placement within the two joins that needs correcting. Now I'll add the two joined tables one at a time and learn how to structure the whole statement. Best regards, Rich
On Wed, Feb 13, 2019, at 9:06 AM, Rich Shepard wrote: > On Wed, 13 Feb 2019, Adrian Klaver wrote: > > > Should have been clearer in my previous post, you can get rid of the nulls > > by filtering out the entire row. > > Adrian, > > Thank you. I'm rebuilding the statement from the inside out (which helps me > learn more SQL in the process). For example, > > select * from activities where next_contact is not null and > next_contact <= 'today' and next_contact > '2018-12-31'; > > works as written. This suggests that it's the placement within the two joins > that needs correcting. Now I'll add the two joined tables one at a time and > learn how to structure the whole statement. It will work if you use it to filter after the joins are done. > > Best regards, > > Rich > > >
On Wed, 13 Feb 2019, Adrian Klaver wrote: > It will work if you use it to filter after the joins are done. I'll work on finding the proper syntax to do this. Need to do more reading and trial-and-error testing. Regards, Rich
On 2/13/19 7:37 AM, Rich Shepard wrote: > On Wed, 13 Feb 2019, Adrian Klaver wrote: > >> The LEFT JOIN. There are rows in people for which there no records coming >> from the sub-select on activities, so the row is 'padded' with NULL >> values >> for the missing data. > > Adrian, > > I assume it's the inner left join. I'll trace what's happening at each step > and learn where to specify no nulls. create table people(person_id integer, desc_fld varchar); create table activities(person_id integer, next_contact date); insert into people values (1, 'contacted'), (2, 'never contacted'), (3, 'out of range'), (4, 'contacted'); insert into activities values (1, '01/31/19'), (3, '11/01/18'), (4, '02/02/19'); No activities record for person_id =2, activities data set to NULL: SELECT * FROM people AS p LEFT JOIN activities AS a ON p.person_id = a.person_id; person_id | desc_fld | person_id | next_contact -----------+-----------------+-----------+-------------- 1 | contacted | 1 | 2019-01-31 2 | never contacted | NULL | NULL 3 | out of range | 3 | 2018-11-01 4 | contacted | 4 | 2019-02-02 Close to your last posted query. person_id 2 and 3 have NULL values for activities data as there is no record for 2 and 3 is out of the date range.: select p.person_id, p.desc_fld, a.next_contact from people as p LEFT JOIN ( SELECT DISTINCT ON (person_id) person_id, next_contact FROM activities a WHERE a.next_contact <= 'today' and a.next_contact > '2018-12-31' and a.next_contact is not null ) a USING (person_id) ; person_id | desc_fld | next_contact -----------+-----------------+-------------- 1 | contacted | 2019-01-31 2 | never contacted | NULL 3 | out of range | NULL 4 | contacted | 2019-02-02 (4 rows) What I think you want: select p.person_id, p.desc_fld, a.next_contact from people as p LEFT JOIN ( SELECT DISTINCT ON (person_id) person_id, next_contact FROM activities a WHERE a.next_contact <= 'today' and a.next_contact > '2018-12-31' and a.next_contact is not null ) a USING (person_id) WHERE next_contact is not null; person_id | desc_fld | next_contact -----------+-----------+-------------- 1 | contacted | 2019-01-31 4 | contacted | 2019-02-02 > > Thanks, > > Rich > > -- Adrian Klaver adrian.klaver@aklaver.com
>>>>> "Adrian" == Adrian Klaver <adrian.klaver@aklaver.com> writes: Adrian> Close to your last posted query. person_id 2 and 3 have NULL Adrian> values for activities data as there is no record for 2 and 3 is Adrian> out of the date range.: Adrian> select Adrian> p.person_id, Adrian> p.desc_fld, Adrian> a.next_contact Adrian> from Adrian> people as p Adrian> LEFT JOIN ( Adrian> SELECT Adrian> DISTINCT ON (person_id) [...] Adrian> ) a USING (person_id) Adrian> ; DISTINCT ON with no matching ORDER BY at the _same_ query level is non-deterministic. Also DISTINCT ON isn't efficient. Consider instead something along the lines of: select p.*, a.* -- for illustration from people p join lateral (select * from activities a1 where a1.person_id = p.person_id and a1.next_contact > '2018-12-31' and a1.next_contact <= 'today' order by a1.next_contact desc limit 1) a on true; (make sure to have an index on activities(person_id,next_contact)) -- Andrew (irc:RhodiumToad)
On Wed, 13 Feb 2019, Andrew Gierth wrote: > Adrian> Close to your last posted query. person_id 2 and 3 have NULL > Adrian> values for activities data as there is no record for 2 and 3 is > Adrian> out of the date range.: > DISTINCT ON with no matching ORDER BY at the _same_ query level is > non-deterministic. > > Also DISTINCT ON isn't efficient. Consider instead something along the > lines of: Andrew/Adrian, I again read about DISTINCT and DISTINCT ON and fully understand them. I've also again read about JOINs; I understand them in terms of sets and _think_ that in this query the people table is the LEFT (many) while the organizations and activities tables are the RIGHT (one) in the many-to-one relationships. That is, for each person_id there is only one org_id and only one next_contact that meets the three constraints. I'm now working on understanding how the syntax in the examples you two, Ken, and others have provided expresses the many-to-one relationships of organization and activities to people. I have the syntax that returns the next_date meeting the WHERE constraints to each person_id and am now focusing on adding the additional people and organization columns to the results. Might not be until tomorrow or Friday but I'll let you and the list subscribes know when I have understood all your suggestions and get the results I want from the query. Thanks again, Rich
On 2/13/19 2:24 PM, Rich Shepard wrote: > On Wed, 13 Feb 2019, Andrew Gierth wrote: > >> Adrian> Close to your last posted query. person_id 2 and 3 have NULL >> Adrian> values for activities data as there is no record for 2 and 3 is >> Adrian> out of the date range.: > >> DISTINCT ON with no matching ORDER BY at the _same_ query level is >> non-deterministic. >> >> Also DISTINCT ON isn't efficient. Consider instead something along the >> lines of: > > Andrew/Adrian, > > I again read about DISTINCT and DISTINCT ON and fully understand them. I've > also again read about JOINs; I understand them in terms of sets and _think_ > that in this query the people table is the LEFT (many) while the > organizations and activities tables are the RIGHT (one) in the many-to-one > relationships. That is, for each person_id there is only one org_id and > only > one next_contact that meets the three constraints. Given a sufficiently large date range that may not be true as you may have contacted a given person multiple times during that range and generated multiple activities records. > > I'm now working on understanding how the syntax in the examples you two, > Ken, and others have provided expresses the many-to-one relationships of > organization and activities to people. I have the syntax that returns the > next_date meeting the WHERE constraints to each person_id and am now > focusing on adding the additional people and organization columns to the > results. Might not be until tomorrow or Friday but I'll let you and the > list > subscribes know when I have understood all your suggestions and get the > results I want from the query. > > Thanks again, > > Rich > > > -- Adrian Klaver adrian.klaver@aklaver.com
On Wed, 13 Feb 2019, Adrian Klaver wrote: > Given a sufficiently large date range that may not be true as you may have > contacted a given person multiple times during that range and generated > multiple activities records. Adrian, This is true as it has occurred. I want only the most recent activity row associated with that person_id. (NB: while I'm really comfortable with DDL statements my DML experience is seriously lacking and that's what I need to improve now.) I've just read a couple of blog posts on the LATERAL join added in 9.3 and understand it in theory. Properly applying it to my application is now my focus (and I need to re-read Andrew's example very closely.) Best regards, Rich
I have not really followed this thread but would not a query along the lines of
select * from activity where person_id = n and timestamp = (select max(timestamp) from activity where person_id = n);
give the required answer ie, always return the latest result for the specified person_id??
Brent Wood
Programme leader: Environmental Information Delivery
NIWA
DDI: +64 (4) 3860529
select * from activity where person_id = n and timestamp = (select max(timestamp) from activity where person_id = n);
give the required answer ie, always return the latest result for the specified person_id??
Brent Wood
Programme leader: Environmental Information Delivery
NIWA
DDI: +64 (4) 3860529
|
|
________________________________________
From: Rich Shepard <rshepard@appl-ecosys.com>
Sent: Thursday, February 14, 2019 12:13
To: pgsql-generallists.postgresql.org
Subject: Re: Subquery to select max(date) value
On Wed, 13 Feb 2019, Adrian Klaver wrote:
> Given a sufficiently large date range that may not be true as you may have
> contacted a given person multiple times during that range and generated
> multiple activities records.
Adrian,
This is true as it has occurred. I want only the most recent activity row
associated with that person_id. (NB: while I'm really comfortable with DDL
statements my DML experience is seriously lacking and that's what I need to
improve now.)
I've just read a couple of blog posts on the LATERAL join added in 9.3 and
understand it in theory. Properly applying it to my application is now my
focus (and I need to re-read Andrew's example very closely.)
Best regards,
Rich
From: Rich Shepard <rshepard@appl-ecosys.com>
Sent: Thursday, February 14, 2019 12:13
To: pgsql-generallists.postgresql.org
Subject: Re: Subquery to select max(date) value
On Wed, 13 Feb 2019, Adrian Klaver wrote:
> Given a sufficiently large date range that may not be true as you may have
> contacted a given person multiple times during that range and generated
> multiple activities records.
Adrian,
This is true as it has occurred. I want only the most recent activity row
associated with that person_id. (NB: while I'm really comfortable with DDL
statements my DML experience is seriously lacking and that's what I need to
improve now.)
I've just read a couple of blog posts on the LATERAL join added in 9.3 and
understand it in theory. Properly applying it to my application is now my
focus (and I need to re-read Andrew's example very closely.)
Best regards,
Rich
Attachment
On Wed, 13 Feb 2019, Brent Wood wrote: > I have not really followed this thread but would not a query along the lines of > select * from activity where person_id = n and timestamp = (select > max(timestamp) from activity where person_id = n); > give the required answer ie, always return the latest result for the specified person_id?? Brent, I don't know. What does work is this statement: SELECT DISTINCT ON (person_id) person_id, next_contact FROM activities AS a WHERE a.next_contact is not null and a.next_contact <= 'today' and a.next_contact > '2018-12-31' ORDER BY person_id,next_contact; which returns these results: person_id | next_contact -----------+-------------- 1 | 2019-01-14 4 | 2019-01-14 22 | 2019-01-14 36 | 2019-01-03 37 | 2019-01-14 38 | 2019-01-21 40 | 2019-02-11 41 | 2019-02-11 42 | 2019-02-11 43 | 2019-02-11 44 | 2019-02-11 45 | 2019-02-11 46 | 2019-02-11 (13 rows) Now I'm learning how to join the people and organization table using LATERAL join(s) so the results include names and phone numbers. Thanks for the suggestion, Rich
On Wed, 13 Feb 2019, Andrew Gierth wrote: > You want LATERAL. Andrew, et al,: I found a couple of web pages describing the lateral join yet have not correctly applied them. The manual's page did not help me get the correct syntax, either. Think I'm close, however: select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, a.next_contact from people as p, organizations as o lateral (select a.next_contact from activities as a where a.next_contact is not null and a.next_contact <= 'today' and a.next_contact > '2018-12-31' order by person_id,next_contact); When run I get this syntax error: $ psql -f get_next_contact_dates.sql -d bustrac psql:get_next_contact_dates.sql:10: ERROR: syntax error at or near "lateral" LINE 3: lateral What am I still missing? Regards, Rich
>>>>> "Rich" == Rich Shepard <rshepard@appl-ecosys.com> writes: Rich> I found a couple of web pages describing the lateral join yet Rich> have not correctly applied them. The manual's page did not help Rich> me get the correct syntax, either. Think I'm close, however: Rich> select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, a.next_contact Rich> from people as p, organizations as o Rich> lateral Rich> (select a.next_contact LATERAL (SELECT ...) is syntactically like (SELECT ...) in that it comes _after_ a "," in the from-clause or after a [LEFT] JOIN keyword. Don't think of LATERAL as being a type of join, think of it as qualifying the (SELECT ...) that follows. Rich> from activities as a Rich> where a.next_contact is not null and a.next_contact <= 'today' and Rich> a.next_contact > '2018-12-31' You'd want a condition here that references the "people" table; the whole point of LATERAL is that it opens up the scope of column references in the subquery to include those tables which are to its left in the from-clause. Rich> order by person_id,next_contact); and I'm guessing you want that ordered by next_contact alone, possibly with LIMIT 1 to get just the nearest following next_contact time. -- Andrew (irc:RhodiumToad)
On Fri, 15 Feb 2019, Andrew Gierth wrote: > LATERAL (SELECT ...) is syntactically like (SELECT ...) in that it comes > _after_ a "," in the from-clause or after a [LEFT] JOIN keyword. Don't > think of LATERAL as being a type of join, think of it as qualifying the > (SELECT ...) that follows. Andrew, Thank you. Now I understand the difference. > Rich> from activities as a > Rich> where a.next_contact is not null and a.next_contact <= 'today' and > Rich> a.next_contact > '2018-12-31' > > You'd want a condition here that references the "people" table; the whole > point of LATERAL is that it opens up the scope of column references in the > subquery to include those tables which are to its left in the from-clause. And here I got it backwards, thinking the subquery could reference the columns in the people table from the initial select. > Rich> order by person_id,next_contact); > > and I'm guessing you want that ordered by next_contact alone, possibly > with LIMIT 1 to get just the nearest following next_contact time. That's true. With 'discrete on' only the most recent next_contact date is returned. More work over the weekend on this now I have a better understanding of lateral. Thanks again, Rich
On Fri, 15 Feb 2019, Andrew Gierth wrote: > LATERAL (SELECT ...) is syntactically like (SELECT ...) in that it > comes _after_ a "," in the from-clause or after a [LEFT] JOIN keyword. Andrew, Yes, the missing ',' made a big difference. > You'd want a condition here that references the "people" table; Got it. > and I'm guessing you want that ordered by next_contact alone, possibly > with LIMIT 1 to get just the nearest following next_contact time. Using LIMIT 1 produces only the first returned row. This statement (using max() for next_contact) produces no error message, but also no results so I killed the process after 30 seconds. Without a syntax error for guidance I don't know how to proceed. I've not before run 'explain' on a query. Would that be appropriate here? select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name from people as p, organizations as o, lateral (select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, max(a.next_contact) from people as p, organizations as o, activities as a where a.next_contact > '2018-12-31' and a.next_contact <= 'today' and a.next_contact is not null group by p.person_id, o.org_name, a.next_contact order by p.person_id, o.org_name, a.next_contact) sq; Regards, Rich
>>>>> "Rich" == Rich Shepard <rshepard@appl-ecosys.com> writes: Rich> Using LIMIT 1 produces only the first returned row. This Rich> statement (using max() for next_contact) produces no error Rich> message, but also no results so I killed the process after 30 Rich> seconds. Without a syntax error for guidance I don't know how to Rich> proceed. I've not before run 'explain' on a query. Would that be Rich> appropriate here? Yes. Rich> select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name Rich> from people as p, organizations as o, Rich> lateral Rich> (select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, Rich> max(a.next_contact) Rich> from people as p, organizations as o, activities as a Rich> where a.next_contact > '2018-12-31' and Rich> a.next_contact <= 'today' and Rich> a.next_contact is not null Rich> group by p.person_id, o.org_name, a.next_contact Rich> order by p.person_id, o.org_name, a.next_contact) sq; The problem here is that you have no join conditions at all, so the result set of this query is massive. And you've duplicated many tables inside the subquery which is not necessary or appropriate. select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, sq.* from people as p join organizations as o on p.organization_id=o.id -- OR WHATEVER cross join lateral (select a.next_contact from activities as a where a.person_id=p.person_id --VERY IMPORTANT and a.next_contact > '2018-12-31' and a.next_contact <= 'today' and a.next_contact is not null order by a.next_contact DESC limit 1) sq; Ordering by DESC with a limit 1 is used to get the max next_contact value rather than the smallest; this is similar to max(), but makes it trivial to also access the other columns of the _same_ activities row which is being selected. -- Andrew (irc:RhodiumToad)
On Fri, 15 Feb 2019, Andrew Gierth wrote: > Rich> I've not before run 'explain' on a query. Would that be > Rich> appropriate here? > > Yes. Andrew, I'll learn how to use it. > The problem here is that you have no join conditions at all, so the > result set of this query is massive. And you've duplicated many tables > inside the subquery which is not necessary or appropriate. Got it now. > select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, sq.* > from people as p > join organizations as o on p.organization_id=o.id -- OR WHATEVER > cross join > lateral (select a.next_contact > from activities as a > where a.person_id=p.person_id --VERY IMPORTANT > and a.next_contact > '2018-12-31' > and a.next_contact <= 'today' > and a.next_contact is not null > order by a.next_contact DESC > limit 1) sq; > > Ordering by DESC with a limit 1 is used to get the max next_contact > value rather than the smallest; this is similar to max(), but makes it > trivial to also access the other columns of the _same_ activities row > which is being selected. This puts everything in perspective and is a very valuable lesson for me as this application has many queries of this type. You've put together all that I've read in the manual, on this mail list thread, and on web pages. I really appreciate your patient guidance. Best regards, Rich
On Fri, 15 Feb 2019, Andrew Gierth wrote: > select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, sq.* > from people as p > join organizations as o on p.organization_id=o.id -- OR WHATEVER > cross join > lateral (select a.next_contact > from activities as a > where a.person_id=p.person_id --VERY IMPORTANT > and a.next_contact > '2018-12-31' > and a.next_contact <= 'today' > and a.next_contact is not null > order by a.next_contact DESC > limit 1) sq; After working with this query I modified it slightly to return only the next_contact date: select p.person_id, p.lname, p.fname, p.direct_phone, p.active, o.org_name, sq.* from people as p join organizations as o on p.org_id = o.org_id cross join lateral (select a.next_contact from activities as a where a.person_id = p.person_id and p.active='True' and a.next_contact is not null order by a.next_contact DESC limit 1) sq; It works wellm, but the row order is not that of a.next_contact. In fact, there seems to be no order in the returned set. The next_contact column is in the lateral sub-query. Does this make a difference? I've no idea how to modify the query so that returned rows are in decreasing next_contact order. Best regards, Rich
On Thu, Mar 28, 2019 at 3:59 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
After working with this query I modified it slightly to return only the
next_contact date:
select p.person_id, p.lname, p.fname, p.direct_phone, p.active, o.org_name, sq.*
from people as p
join organizations as o on p.org_id = o.org_id
cross join
lateral
(select a.next_contact
from activities as a
where a.person_id = p.person_id and
p.active='True' and
a.next_contact is not null
order by a.next_contact DESC
limit 1) sq;
It works wellm, but the row order is not that of a.next_contact. In fact,
there seems to be no order in the returned set. The next_contact column is
in the lateral sub-query. Does this make a difference? I've no idea how to
modify the query so that returned rows are in decreasing next_contact order.
Best regards,
Rich
You need the ORDER BY in the outer join. (And I don't think the one inside the lateral join is doing you any good). Try:
...
a.next_contact is not null
limit 1) sq
order by sq.next_contact DESC;limit 1) sq
Cheers,
Ken
Ken
AGENCY Software
A Free Software data system
By and for non-profits
(253) 245-3801
learn more about AGENCY or
follow the discussion.
On Thu, Mar 28, 2019 at 3:59 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
select p.person_id, p.lname, p.fname, p.direct_phone, p.active, o.org_name, sq.*
from people as p
join organizations as o on p.org_id = o.org_id
cross join
lateral
(select a.next_contact
from activities as a
where a.person_id = p.person_id and
p.active='True' and
a.next_contact is not null
order by a.next_contact DESC
limit 1) sq;
It works wellm, but the row order is not that of a.next_contact. In fact,
there seems to be no order in the returned set.
The next_contact column is
in the lateral sub-query. Does this make a difference?
Yes, if you join the result on an ordered subquery to anything you no longer have a guaranteed order for the combined relation.
select ...
from ...
join ...
cross join lateral ...
-- now add an order by for the top-level query
order by
David J.
On Thu, Mar 28, 2019 at 4:05 PM Ken Tanzer <ken.tanzer@gmail.com> wrote:
(And I don't think the one inside the lateral join is doing you any good). Try:...a.next_contact is not nullorder by sq.next_contact DESC;
limit 1) sq
The fact that the subquery has a LIMT 1 clause leads me to assume the ORDER BY there is quite necessary.
David J.
On Thu, Mar 28, 2019 at 4:07 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thu, Mar 28, 2019 at 4:05 PM Ken Tanzer <ken.tanzer@gmail.com> wrote:(And I don't think the one inside the lateral join is doing you any good). Try:...a.next_contact is not nullorder by sq.next_contact DESC;
limit 1) sqThe fact that the subquery has a LIMT 1 clause leads me to assume the ORDER BY there is quite necessary.David J.
Ken
--
AGENCY Software
A Free Software data system
By and for non-profits
(253) 245-3801
learn more about AGENCY or
follow the discussion.
On Thu, 28 Mar 2019, Ken Tanzer wrote: > You need the ORDER BY in the outer join. Ken, I thought so. But, ... > (And I don't think the one inside the lateral join is doing you any good). > Try: > ... > a.next_contact is not null > limit 1) sq > order by sq.next_contact DESC; This re-orders the returned set, but still not in chronological order. Thanks, Rich
On Thu, 28 Mar 2019, David G. Johnston wrote: > Yes, if you join the result on an ordered subquery to anything you no > longer have a guaranteed order for the combined relation. David, This makes sense to me. > select ... > from ... > join ... > cross join lateral ... > -- now add an order by for the top-level query > order by Tried this and did not do it correctly. Should there be two 'order by', one in the sub-query, the other in the top-level query? This does not return the desired order: select p.person_id, p.lname, p.fname, p.direct_phone, p.active, o.org_name, sq.* from people as p join organizations as o on p.org_id = o.org_id cross join lateral (select a.next_contact from activities as a where a.person_id = p.person_id and p.active='True' and a.next_contact is not null order by a.next_contact DESC limit 1) sq order by sq.next_contact DESC; Obviously, I'm still missing the implementation of your response. Best regards, Rich
On Thu, Mar 28, 2019 at 4:14 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Thu, 28 Mar 2019, Ken Tanzer wrote:
> You need the ORDER BY in the outer join.
Ken,
I thought so. But, ...
> (And I don't think the one inside the lateral join is doing you any good).
> Try:
> ...
> a.next_contact is not null
> limit 1) sq
> order by sq.next_contact DESC;
This re-orders the returned set, but still not in chronological order.
Really? Is your next_contact field a date field and not a text field? What order does it come out in?
(And I assume you saw David J's correction to the misinformation I offered.)
Cheers,
Ken
AGENCY Software
A Free Software data system
By and for non-profits
(253) 245-3801
learn more about AGENCY or
follow the discussion.
On Thu, Mar 28, 2019 at 4:21 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
This does not return the
desired order:
It returns something at least. If you put the output you get into a spreadsheet are you able to manually sort it the way you desire?
David J.
>>>>> "Rich" == Rich Shepard <rshepard@appl-ecosys.com> writes: Rich> Tried this and did not do it correctly. Should there be two Rich> 'order by', one in the sub-query, the other in the top-level Rich> query? Yes. Rich> This does not return the desired order: Rich> select p.person_id, p.lname, p.fname, p.direct_phone, p.active, o.org_name, sq.* Rich> from people as p Rich> join organizations as o on p.org_id = o.org_id Rich> cross join Rich> lateral Rich> (select a.next_contact Rich> from activities as a Rich> where a.person_id = p.person_id and Rich> p.active='True' and Rich> a.next_contact is not null Rich> order by a.next_contact DESC Rich> limit 1) sq Rich> order by sq.next_contact DESC; That query seems correct assuming you want the result in descending order of next_contact. How did the actual result differ from your expectation? -- Andrew (irc:RhodiumToad)
On Thu, 28 Mar 2019, David G. Johnston wrote: > It returns something at least. If you put the output you get into a > spreadsheet are you able to manually sort it the way you desire? David, Probably, but that's not practical for the application. Thanks, Rich
On Thu, 28 Mar 2019, Ken Tanzer wrote: > Really? Is your next_contact field a date field and not a text field? What > order does it come out in? > > (And I assume you saw David J's correction to the misinformation I offered.) Ken, Mea culpa. the next_contact column is a date, I did apply David's correction, and I had a couple of mistaken dates (I had entered 2018 rather than 2019; the middleware code will check for this error and have the user correct entries other than the current year). Regards, Rich
On Fri, 29 Mar 2019, Andrew Gierth wrote: > That query seems correct assuming you want the result in descending order > of next_contact. How did the actual result differ from your expectation? Andrew, User error: I had a couple of date typos (2018 rather than 2019) and was thinking of generating the display in ascending, rather than decending, order. Thinking again about what I need in terms of query results I made a couple of changes. Now the query selects the most recent next_contact date >= a week ago today and into the future: select p.person_id, p.lname, p.fname, p.direct_phone, p.active, o.org_name, sq.* from people as p join organizations as o on p.org_id = o.org_id cross join lateral (select a.next_contact from activities as a where a.person_id = p.person_id and p.active='True' and a.next_contact >= current_date - interval '7' day and a.next_contact is not null order by a.next_contact ASC limit 1) sq order by sq.next_contact ASC; The results are what I need and allow me to not miss a date as long as I run the query at least once a week. My thanks to you, David J., and Ken for the valuable lessons. Best regards, Rich