Thread: How can I speed up with query?
I would like to know if there is a more efficient way to perform a query. I have a table (see below) with an ID, a date, and a value. The date specifies when the entry was added. I want to query the table to determine what the value was on a specific day. I have the following query, which works quite well but it is very slow (the real table I'm using has 200000 records)... I'm sure the slowness is due to the subselect... does anybody know any way to make this query faster? Thanks in advance, Scott My current query: select a.* from sample a where (id,date) in (select a.id,max(date) from sample where date<='<<the date>>' and id=a.id) order by id; So with the data below, filling in '2003-02-01' for <<the date>> gives: id | date | value ----+------------+------- 1 | 2003-02-01 | 12 2 | 2003-02-01 | 9 3 | 2003-02-01 | 2 4 | 2003-02-01 | 11 And filling in '2003-02-04' for <<the date>> gives: id | date | value ----+------------+------- 1 | 2003-02-04 | 21 2 | 2003-02-01 | 9 3 | 2003-02-01 | 2 4 | 2003-02-03 | 12 Here is the table layout and the sample data I'm using: Table "sample" Column | Type | Modifiers --------+---------+----------- id | integer | not null date | date | not null value | integer | not null Primary key: sample_pkey id | date | value ----+------------+------- 1 | 2003-02-01 | 12 1 | 2003-02-02 | 16 1 | 2003-02-04 | 21 2 | 2003-02-01 | 9 3 | 2003-02-01 | 2 4 | 2003-02-01 | 11 4 | 2003-02-03 | 12 (7 rows)
How bout select a.* from sample a where exists (select id from sample where date <= '<<thedate>>' and id = a.id order by date desc limit 1) order by id Is that what your after? Seems like you could do this w/o the subquery.. but Im not sure exactly what you want. Thanks Chad ----- Original Message ----- From: "Scott Morrison" <smorrison@navtechinc.com> To: <pgsql-novice@postgresql.org> Sent: Wednesday, February 05, 2003 11:59 AM Subject: [NOVICE] How can I speed up with query? > I would like to know if there is a more efficient way to perform a query. > > I have a table (see below) with an ID, a date, and a value. The date > specifies when the entry was added. I want to query the table to determine > what the value was on a specific day. > > I have the following query, which works quite well but it is very slow (the > real table I'm using has 200000 records)... I'm sure the slowness is due to > the subselect... does anybody know any way to make this query faster? > > Thanks in advance, > Scott > > My current query: > select a.* from sample a where (id,date) in (select a.id,max(date) from > sample where date<='<<the date>>' and id=a.id) order by id; > > So with the data below, filling in '2003-02-01' for <<the date>> gives: > id | date | value > ----+------------+------- > 1 | 2003-02-01 | 12 > 2 | 2003-02-01 | 9 > 3 | 2003-02-01 | 2 > 4 | 2003-02-01 | 11 > > And filling in '2003-02-04' for <<the date>> gives: > id | date | value > ----+------------+------- > 1 | 2003-02-04 | 21 > 2 | 2003-02-01 | 9 > 3 | 2003-02-01 | 2 > 4 | 2003-02-03 | 12 > > Here is the table layout and the sample data I'm using: > > Table "sample" > Column | Type | Modifiers > --------+---------+----------- > id | integer | not null > date | date | not null > value | integer | not null > Primary key: sample_pkey > > id | date | value > ----+------------+------- > 1 | 2003-02-01 | 12 > 1 | 2003-02-02 | 16 > 1 | 2003-02-04 | 21 > 2 | 2003-02-01 | 9 > 3 | 2003-02-01 | 2 > 4 | 2003-02-01 | 11 > 4 | 2003-02-03 | 12 > (7 rows) > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I have a table (see below) with an ID, a date, and a value. The date > specifies when the entry was added. I want to query the table to determine > what the value was on a specific day. > ... > select a.* from sample a where (id,date) in (select a.id,max(date) from > sample where date<='<<the date>>' and id=a.id) order by id; Actually, your query as written wants to find the latest value added before or on a certain date. A subtle difference, but it makes a difference. Some quick notes before I jump into my shot at this: * Avoid the max() function - it is not fully optimized yet. Instead use SELECT foo FROM table ORDER BY foo DESC LIMIT 1 (and use ASC to replace the min() function) * Try not to use keywords such as "date" for your column names. * Always run VACUUM ANALYZE and create an index: in this case, on the "date" column * If you can't match on a single column (as in the id,date from your original query, use the oid) (The explain analyze below is for a 200,000 row table with 20 distinct ids and a time period of abot a month.) VACUUM ANALYZE sample; CREATE INDEX sample_date on sample(date); EXPLAIN ANALYZE SELECT id, date, value FROM sample a WHERE date <='2003-01-01' AND oid = (SELECT oid FROM sample WHERE id = a.id AND DATE <='2003-01-01' ORDER BY date DESC LIMIT 1) ORDER BY id; QUERY PLAN - -------------------------------------------------------------------------------------------------- Sort (cost=1712.13..1712.14) (actual time=5292.35..5292.38 rows=20 loops=1) Sort Key: id -> Index Scan using sample_date on sample a (cost=0.00..1712.12) (actual time=5258.10..5292.22 rows=20 loops=1) Index Cond: (date <= '2003-01-01'::date) Filter: (oid = (subplan)) SubPlan -> Limit (cost=0.00..81.53) (actual time=0.71..0.78 rows=1 loops=6532) -> Index Scan Backward using sample_date on sample (cost=0.00..81.53) (actual time=0.70..0.78 rows=2 loops=6532) Index Cond: (date <= '2003-01-01'::date) Filter: (id = $0) Total runtime: 5292.50 msec - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200302051538 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE+QXnMvJuQZxSWSsgRAhmDAKDwMmf0GvhnVFKeiDPnVolx3wOO1gCgsasJ 3t3LCOa6Q5uOCJpawodJO54= =dGj1 -----END PGP SIGNATURE-----
On Thu, 2003-02-06 at 07:59, Scott Morrison wrote: > > My current query: > select a.* from sample a where (id,date) in (select a.id,max(date) from > sample where date<='<<the date>>' and id=a.id) order by id; max(date) will require a scan on the recordset in the subselect. You might be better to use 'ORDER BY date DESC LIMIT 1' which will use an index on date if one exists. SELECT * FROM sample a WHERE (id, date) IN (SELECT a.id, date FROM sample WHERE id=a.id ORDER BY date DESC LIMIT 1) When you had 20,000 records you would definitely be wanting an index on date and an index on id. I think that the LIMIT clause on subselects is only implemented in recent PostgreSQL however - 7.2 on, I believe. To examine the query plans that PostgreSQL comes up with, and help you choose a better structure for your query, you should use "EXPLAIN <<the query>>" . Regards, Andrew. -- --------------------------------------------------------------------- Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Survey for nothing with http://survey.net.nz/ ---------------------------------------------------------------------