Thread: Improve a query...
Looking for the best way to formulate a query to select the most "recent" entry for an organization in a table like: CREATE TABLE reports ( -- Report Id used to link up related 1:M rather than multi-key rpt_id SERIAL NOT NULL PRIMARY KEY, -- A Unique ID for the organization org_id char(10) NOT NULL CHECK(CHARACTER_LENGTH(op_id) = 10), -- The reporting period period integer NOT NULL -- Various and Sundry ... . . . UNIQUE (org_id,period) ); If max(period) for an organization yields the most recent reporting, I want to get whatever is the most recent report for each organization. This query works, but seems expensive... SELECT a.rpt_id, a.org_id, a.period, ... FROM reports As a INNER JOIN (SELECT b.org_id, max(b.period) As period FROM reports b group by b.org_id) As c ON a.org_id = c.org_id and a.period = c.period; EXPLAIN looks thusly: NOTICE: QUERY PLAN: Merge Join (cost=147.98..164.48 rows=10 width=48) -> Sort (cost=69.83..69.83 rows=1000 width=32) -> Seq Scan on reports a (cost=0.00..20.00 rows=1000 width=32) -> Sort (cost=78.15..78.15 rows=100 width=16) -> Subquery Scan c (cost=69.83..74.83 rows=100 width=16) -> Aggregate (cost=69.83..74.83 rows=100 width=16) -> Group (cost=69.83..72.33 rows=1000 width=16) -> Sort (cost=69.83..69.83 rows=1000 width=16) -> Seq Scan on reports b (cost=0.00..20.00 rows=1000 width=16) The data is very hierarchical so I didn't want to carry around alot of key fields in related "many-sided" tables which may also have related "many-sided" tables. Any ideas on how to minimize the multiple scans on the table? The numbers for explain probably aren't telling much since there's not much real data in the table at this time... -- Eric G. Miller <egm2@jps.net>
"Eric G. Miller" <egm2@jps.net> writes: > Looking for the best way to formulate a query to select > the most "recent" entry for an organization in a table > like: Take a look at the SELECT reference page's example for SELECT DISTINCT ON: : For example, : : SELECT DISTINCT ON (location) location, time, report : FROM weatherReports : ORDER BY location, time DESC; : : retrieves the most recent weather report for each location. A tad nonstandard, but bloody effective, particularly if you have an appropriate index in place... regards, tom lane
On Wed, May 02, 2001 at 01:17:44AM -0400, Tom Lane wrote: > "Eric G. Miller" <egm2@jps.net> writes: > > Looking for the best way to formulate a query to select > > the most "recent" entry for an organization in a table > > like: > > Take a look at the SELECT reference page's example for > SELECT DISTINCT ON: > > : For example, > : > : SELECT DISTINCT ON (location) location, time, report > : FROM weatherReports > : ORDER BY location, time DESC; > : > : retrieves the most recent weather report for each location. > > A tad nonstandard, but bloody effective, particularly if you > have an appropriate index in place... Tricky! After adding the unique 2 column index and using that construct I get: NOTICE: QUERY PLAN: Unique (cost=1.20..1.22 rows=1 width=32) -> Sort (cost=1.20..1.20 rows=8 width=32) -> Seq Scan on reports (cost=0.00..1.08 rows=8 width=32) EXPLAIN -- Eric G. Miller <egm2@jps.net>
Try: SELECT DISTINCT ON (org_id) rpt_id, org_id, period,... FROM reports ORDER BY period DESC; > -----Original Message----- > From: Eric G. Miller [SMTP:egm2@jps.net] > Sent: Tuesday, May 01, 2001 4:04 AM > To: PostgreSQL General > Subject: [GENERAL] Improve a query... > > Looking for the best way to formulate a query to select > the most "recent" entry for an organization in a table > like: > > CREATE TABLE reports ( > > -- Report Id used to link up related 1:M rather than multi-key > rpt_id SERIAL NOT NULL PRIMARY KEY, > > -- A Unique ID for the organization > org_id char(10) NOT NULL CHECK(CHARACTER_LENGTH(op_id) = 10), > > -- The reporting period > period integer NOT NULL > > -- Various and Sundry ... > . > . > . > > UNIQUE (org_id,period) > ); > > If max(period) for an organization yields the most recent reporting, > I want to get whatever is the most recent report for each organization. > > This query works, but seems expensive... > > SELECT a.rpt_id, a.org_id, a.period, ... > FROM reports As a > INNER JOIN > (SELECT b.org_id, max(b.period) As period > FROM reports b group by b.org_id) As c > ON a.org_id = c.org_id and a.period = c.period; > > EXPLAIN looks thusly: > > NOTICE: QUERY PLAN: > > Merge Join (cost=147.98..164.48 rows=10 width=48) > -> Sort (cost=69.83..69.83 rows=1000 width=32) > -> Seq Scan on reports a (cost=0.00..20.00 rows=1000 width=32) > -> Sort (cost=78.15..78.15 rows=100 width=16) > -> Subquery Scan c (cost=69.83..74.83 rows=100 width=16) > -> Aggregate (cost=69.83..74.83 rows=100 width=16) > -> Group (cost=69.83..72.33 rows=1000 width=16) > -> Sort (cost=69.83..69.83 rows=1000 width=16) > -> Seq Scan on reports b > (cost=0.00..20.00 > rows=1000 width=16) > > > The data is very hierarchical so I didn't want to carry around alot of > key fields in related "many-sided" tables which may also have related > "many-sided" tables. Any ideas on how to minimize the multiple > scans on the table? The numbers for explain probably aren't telling > much since there's not much real data in the table at this time... > > -- > Eric G. Miller <egm2@jps.net> > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Wed, May 02, 2001 at 07:30:13PM +0200, Peter Eisentraut wrote: > Eric G. Miller writes: > > > If max(period) for an organization yields the most recent reporting, > > I want to get whatever is the most recent report for each organization. > > SELECT org_id, max(period) FROM reports GROUP BY org_id; Naw, doesn't work. I need the other columns, so the GROUP BY is out (except in a subselect). The DISTINCT ON () syntax, though non-standard, seems to do the job. -- Eric G. Miller <egm2@jps.net>
Eric G. Miller writes: > If max(period) for an organization yields the most recent reporting, > I want to get whatever is the most recent report for each organization. SELECT org_id, max(period) FROM reports GROUP BY org_id; -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter