RE: Improve a query... - Mailing list pgsql-general

From Jeff Eckermann
Subject RE: Improve a query...
Date
Msg-id 08CD1781F85AD4118E0800A0C9B8580B094A5F@NEZU
Whole thread Raw
In response to Improve a query...  ("Eric G. Miller" <egm2@jps.net>)
List pgsql-general
Try:
SELECT DISTINCT ON (org_id) rpt_id, org_id, period,...
FROM reports
ORDER BY period DESC;

> -----Original Message-----
> From:    Eric G. Miller [SMTP:egm2@jps.net]
> Sent:    Tuesday, May 01, 2001 4:04 AM
> To:    PostgreSQL General
> Subject:    [GENERAL] Improve a query...
>
> Looking for the best way to formulate a query to select
> the most "recent" entry for an organization in a table
> like:
>
> CREATE TABLE reports (
>
>   -- Report Id used to link up related 1:M rather than multi-key
>   rpt_id  SERIAL NOT NULL PRIMARY KEY,
>
>   -- A Unique ID for the organization
>   org_id   char(10) NOT NULL CHECK(CHARACTER_LENGTH(op_id) = 10),
>
>   -- The reporting period
>   period   integer NOT NULL
>
>   -- Various and Sundry ...
>   .
>   .
>   .
>
>   UNIQUE (org_id,period)
> );
>
> If max(period) for an organization yields the most recent reporting,
> I want to get whatever is the most recent report for each organization.
>
> This query works, but seems expensive...
>
> SELECT a.rpt_id, a.org_id, a.period, ...
> FROM reports As a
> INNER JOIN
>   (SELECT b.org_id, max(b.period) As period
>    FROM reports b group by b.org_id) As c
> ON a.org_id = c.org_id and a.period = c.period;
>
> EXPLAIN looks thusly:
>
> NOTICE:  QUERY PLAN:
>
> Merge Join  (cost=147.98..164.48 rows=10 width=48)
>   ->  Sort  (cost=69.83..69.83 rows=1000 width=32)
>         ->  Seq Scan on reports a  (cost=0.00..20.00 rows=1000 width=32)
>   ->  Sort  (cost=78.15..78.15 rows=100 width=16)
>         ->  Subquery Scan c  (cost=69.83..74.83 rows=100 width=16)
>               ->  Aggregate  (cost=69.83..74.83 rows=100 width=16)
>                     ->  Group  (cost=69.83..72.33 rows=1000 width=16)
>                           ->  Sort  (cost=69.83..69.83 rows=1000 width=16)
>                                 ->  Seq Scan on reports b
> (cost=0.00..20.00
>                             rows=1000 width=16)
>
>
> The data is very hierarchical so I didn't want to carry around alot of
> key fields in related "many-sided" tables which may also have related
> "many-sided" tables.  Any ideas on how to minimize the multiple
> scans on the table?  The numbers for explain probably aren't telling
> much since there's not much real data in the table at this time...
>
> --
> Eric G. Miller <egm2@jps.net>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

pgsql-general by date:

Previous
From: Andy Koch
Date:
Subject: Re: Cannot build PL/Perl ...
Next
From: Joel Burton
Date:
Subject: Re: best way to implement producer/consumer in Perl