Re: Retrieve most recent 1 record from joined table - Mailing list pgsql-sql
From | agharta |
---|---|
Subject | Re: Retrieve most recent 1 record from joined table |
Date | |
Msg-id | 53FD9F21.7040908@gmail.com Whole thread Raw |
In response to | Re: Retrieve most recent 1 record from joined table (Vik Fearing <vik.fearing@dalibo.com>) |
List | pgsql-sql |
On 08/23/2014 02:15 PM, Vik Fearing wrote: > On 08/22/2014 10:05 AM, agharta wrote: >> Hi all, >> This is my first question, don't hate me please if it is the wrong place. > This is the right place. See below for my answer. > > >> Mi question is relative seimple, but goes me crazy. >> >> First, create some example tables: >> >> ---- >> create table table1( >> t1_ID SERIAL primary key, >> t1_value text >> ); >> >> create table table2( >> t2_ID SERIAL primary key, >> t2_value text, >> t1_id integer >> ); >> >> >> create table table3( >> t3_ID SERIAL primary key, >> t3_value text, >> t3_date timestamp, >> t2_id integer >> ); >> >> ALTER TABLE table2 ADD FOREIGN KEY (t1_id) REFERENCES table1 (t1_ID) ON >> DELETE CASCADE; >> ALTER TABLE table3 ADD FOREIGN KEY (t2_id) REFERENCES table2 (t2_ID) ON >> DELETE CASCADE; >> >> CREATE INDEX IDX_TABLE1_T1_value ON table1 (t1_value); >> CREATE INDEX IDX_TABLE2_T2_value ON table2 (t2_value); >> CREATE INDEX IDX_TABLE3_T3_value ON table3 (t3_value); >> CREATE INDEX IDX_TABLE3_T3_DATE ON table3 (t3_date); >> ----- >> >> As you can see, table3 is referenced to table2 and table2 is referenced >> to table1 >> >> Next, insert some data on tables: >> >> --one millon records into table1 >> insert into table1(t1_value) select md5(random()::text) from (select >> generate_series(1,1000000) as a) as c; >> >> --100.000 records on table2 where t1_id is a random number between 1 and >> 10.001. This guarantees many records pointing to same table1 t1_id >> insert into table2(t2_value, t1_id) select md5(random()::text), >> trunc((random()+1)*10000)::integer from (select >> generate_series(1,100000) as a) as c; >> >> --again, 1.022.401 records on table3 where t2_id is a random number >> between 1 and 10.001. This guarantee many records pointing to same >> table2 t2_id. >> -- random numbers (1.022.401) are generated by generated_series function >> with timestamp (1 minute) interval >> insert into table3(t3_value, t2_id, t3_date) select md5(random()::text), >> trunc((random()+1)*10000)::integer, c.date_val from (select >> generate_series(timestamp '2013-01-01',timestamp '2014-12-12', interval >> '1 minute') as date_val) as c; >> >> >> So, now we should have table3 with many rows per table2(t2_id) and many >> rows in table2 with same t1_id. >> >> >> Now, the question: >> >> >> Joining the tables, how to get ONLY most recent record per >> table3(t3_date)?? >> >> Query example: >> >> select * from table1 as t1 >> inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') ) >> inner join table3 t3 on (t2.t2_id = t3.t2_id and t3.t3_date <= timestamp >> '2014-08-20') >> order by t3.t2_id, t3.t3_date desc >> >> produces to me this dataset (extract) >> >> >> t1_id t1_value t2_id t2_value t1_id >> t3_id t3_value t3_date >> t2_id >> 17098 74127bc80ca759678892c957b6a34fc7 10020 >> 9182c1f48cd008e31d781abc45723a10 17098 857683 >> 0fb6cb380522dd83b6ac4beba2c6e98f 2014-08-19 14:42:00.0 10020 >> 17098 74127bc80ca759678892c957b6a34fc7 10020 >> 9182c1f48cd008e31d781abc45723a10 17098 857088 >> 9c3481bfc0bdea51e62b338a1777cde6 2014-08-19 04:47:00.0 10020 >> 17098 74127bc80ca759678892c957b6a34fc7 10020 >> 9182c1f48cd008e31d781abc45723a10 17098 853208 >> 253dc2083e70dd9e276867f43889973f 2014-08-16 12:07:00.0 10020 >> 17098 74127bc80ca759678892c957b6a34fc7 10020 >> 9182c1f48cd008e31d781abc45723a10 17098 851237 >> 970acf901c4232c178b5dbeda4e44ac8 2014-08-15 03:16:00.0 10020 >> 17098 74127bc80ca759678892c957b6a34fc7 10020 >> 9182c1f48cd008e31d781abc45723a10 17098 847436 >> cb8d52255eaa850f9d7f203092a2ce13 2014-08-12 11:55:00.0 10020 >> 17098 74127bc80ca759678892c957b6a34fc7 10020 >> 9182c1f48cd008e31d781abc45723a10 17098 819339 >> d9e36ad622b1db499b3f623cdd46a811 2014-07-23 23:38:00.0 10020 >> 17098 74127bc80ca759678892c957b6a34fc7 10020 >> 9182c1f48cd008e31d781abc45723a10 17098 818022 >> b9d13239f522e3a22f20d36ea6dab8ad 2014-07-23 01:41:00.0 10020 >> 17098 74127bc80ca759678892c957b6a34fc7 10020 >> 9182c1f48cd008e31d781abc45723a10 17098 803046 >> c36164f3077894a1986c4922dfb632ec 2014-07-12 16:05:00.0 10020 >> 17098 74127bc80ca759678892c957b6a34fc7 10020 >> 9182c1f48cd008e31d781abc45723a10 17098 788129 >> 32c966feab2212a29f86bebbaa6dfec9 2014-07-02 07:28:00.0 10020 >> >> >> As you can see, there are many t3_id per single t2_id. >> >> I need the same dataset, but i want only the most recent (one) record >> per table3, and, of course, directly in the join rule (in production i >> have a long and complex query). >> >> My personal solution (very slow, i can't use it): >> >> select * from table1 as t1 >> inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') ) >> inner join table3 t3 on ( >> t3.t3_id = (select t3_id from table3 where t2_id = t2.t2_id and t3_date >> <= timestamp '2014-08-20' order by t3_date desc fetch first 1 rows only) >> and t3.t2_id = t2.t2_id >> ) >> order by t3.t2_id, t3.t3_date desc >> >> >> it gives me the right result, but performances are poor........ >> >> Same bad performance result with a function that performs the table3 query. >> >> Suggestions? >> >> Thanks to anyone who can answer to me!!!! > You didn't say what version of PostgreSQL you're using. The following > solution works with 9.3. > > select * > from table1 as t1 > join table2 as t2 on t2.t1_id = t1.t1_id and t2.t2_value like '%ab%' > join lateral (select * > from table3 > where t2_id = t2.t2_id > and t3_date <= '2014-08-20' > order by t3_date desc > limit 1) as t3 on true > order by t3.t2_id, t3.t3_date desc; > > You will need the following indexes for it to give good performance: > > create index on t2 using gist (t2_value gist_trgm_ops); > create index on t3 using btree (t2_id, t3_date desc); > > That first index will require the pg_trgm extension that you can get with: > > create extension pg_trgm; > > > If you are not using 9.3, I can give you a (much) more complicated > version that will perform well, but ideally you'll want to use the above. Hi Vik, Sorry for late, i was testing your solution and.....what that performance!!! I'm using 9.3, i've don't say it, my mistake. Using join lateral with a lot of analize command to see where put the right index, performance becomes amazing!! Thanks again for the priceless suggestion!!! .....join lateral.... what a beautiful keywords :-) Cheers, Agharta