Thread: Grabbing Newest Records From Duplicates

Grabbing Newest Records From Duplicates

From
"Travis Whitton"
Date:
Given the following test table, I want to grab only the newest record and disregard any older duplicates based on name.
Isthis the most efficient way to do it? Will the indicies even make a difference? The table below demonstrates a simple
proofof concept. My final table will have millions of records; however, the test I'm doing now does not seem to use the
indicies(probably because there are too few records?). <br /><br />optin=# \d test<br />     Table "public.test"<br
/> Column|  Type   | Modifiers<br />--------+---------+-----------<br /> name   | text    |<br /> time   | date    |<br
/> id    | integer |<br />Indexes:<br />    "idx_name" btree (name) <br />    "idx_time" btree ("time")<br /><br
/>optin=#explain select * from test t1 where not exists (select 1 from test t2 where <a
href="http://t2.name">t2.name</a>= <a href="http://t1.name">t1.name</a> and t2.time > t1.time);<br /><br
/>Thanks,<br/>Travis<br /><br /> 

Re: Grabbing Newest Records From Duplicates

From
Tom Lane
Date:
"Travis Whitton" <tinymountain@gmail.com> writes:
> Given the following test table, I want to grab only the newest record and
> disregard any older duplicates based on name. Is this the most efficient way
> to do it?

No, it's gonna be pretty awful.  The best way I know of involves
DISTINCT ON (see the "weather reports" example in the SELECT reference
page).  Unfortunately that's a Postgres-only construct.  If you want
something portable then you'll need something messy with subqueries...
        regards, tom lane


Re: Grabbing Newest Records From Duplicates

From
"Travis Whitton"
Date:
One last question, using the "weather report" example, is it going to improve performance for the DISTINCT ON query to add an index to the location and time columns?

Thanks a lot,
Travis

On 3/16/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Travis Whitton" <tinymountain@gmail.com> writes:
> Given the following test table, I want to grab only the newest record and
> disregard any older duplicates based on name. Is this the most efficient way
> to do it?

No, it's gonna be pretty awful.  The best way I know of involves
DISTINCT ON (see the "weather reports" example in the SELECT reference
page).  Unfortunately that's a Postgres-only construct.  If you want
something portable then you'll need something messy with subqueries...

                        regards, tom lane

Re: Grabbing Newest Records From Duplicates

From
Tom Lane
Date:
"Travis Whitton" <tinymountain@gmail.com> writes:
> One last question, using the "weather report" example, is it going to
> improve performance for the DISTINCT ON query to add an index to the
> location and time columns?

A two-column index matching the ORDER BY that you use might help,
or then again might not --- for full-table scans it's often faster
to sort than to try to use an index.
        regards, tom lane