Thread: index on timestamp performance
I have this schema: motid | integer | not null objid | integer | not null date | timestamp without time zone | not null Indexes: dico_frs_motid_date btree (motid, date) dico_frs_objid btree (objid) The performance I'm getting from the index that contains 'date' is much slower than when using the objid index (different queries of course). This is a 10 million row table. Am I right to assume that postgres needs to do more work because it has to convert the dates to some internal (integer?) format? Consequently would I gain performance if I changed the schema so that 'date' is an integer (which would not be a problem for my application) ? Thanks, -- Eric Cholet
To whomever runs this list: I have tried repeatedly to get majordomo to unsubscribe me from this list, to no avail. I do not currently have the means of filtering my inbox at this address, so receiving these emails is EXTREMELY annoying for me. PLEASE PLEASE PLEASE PLEASE unsubscribe me. PLEASE. On Wed, 29 Jan 2003, Eric Cholet wrote: > I have this schema: > > > motid | integer | not null > objid | integer | not null > date | timestamp without time zone | not null > Indexes: dico_frs_motid_date btree (motid, date) > dico_frs_objid btree (objid) > > The performance I'm getting from the index that contains > 'date' is much slower than when using the objid index > (different queries of course). This is a 10 million row > table. Am I right to assume that postgres needs to do > more work because it has to convert the dates to some > internal (integer?) format? > Consequently would I gain performance if I changed the > schema so that 'date' is an integer (which would not > be a problem for my application) ? > > Thanks, > -- > Eric Cholet > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html >
On Wed, 29 Jan 2003, Eric Cholet wrote: > I have this schema: > > > motid | integer | not null > objid | integer | not null > date | timestamp without time zone | not null > Indexes: dico_frs_motid_date btree (motid, date) > dico_frs_objid btree (objid) > > The performance I'm getting from the index that contains > 'date' is much slower than when using the objid index > (different queries of course). This is a 10 million row > table. Am I right to assume that postgres needs to do > more work because it has to convert the dates to some > internal (integer?) format? What does explain (analyze if possible) show for the two queries? It could just be a difference in plans or estimates.
Eric Cholet <cholet@logilune.com> writes: > Consequently would I gain performance if I changed the > schema so that 'date' is an integer No, you wouldn't, at least not enough to notice. (But if you only need a date and not a time of day, consider changing the column type to "date".) I suspect the issue is selectivity of the queries, but with no actual query examples or EXPLAIN ANALYZE results to look at, that's just idle speculation. regards, tom lane
--On Wednesday, January 29, 2003 08:51:30 -0800 Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote: > On Wed, 29 Jan 2003, Eric Cholet wrote: > >> I have this schema: >> >> >> motid | integer | not null >> objid | integer | not null >> date | timestamp without time zone | not null >> Indexes: dico_frs_motid_date btree (motid, date) >> dico_frs_objid btree (objid) >> >> The performance I'm getting from the index that contains >> 'date' is much slower than when using the objid index >> (different queries of course). This is a 10 million row >> table. Am I right to assume that postgres needs to do >> more work because it has to convert the dates to some >> internal (integer?) format? > > What does explain (analyze if possible) show for the two queries? I can't really run two equivalent queries that will each use a different index. Here's a query that uses the index with 'date' (output wrapped manually) => explain analyze select objid from dico_frs where motid=1247 and date <= '2003-01-29 17:55:17' and date >= '2002-10-29 17:55:17' order by date desc limit 11; Limit (cost=4752.14..4752.17 rows=11 width=12) (actual time=63.20..63.37 rows=11 loops=1) -> Sort (cost=4752.14..4755.11 rows=1187 width=12) (actual time=63.17..63.23 rows=12 loops=1) Sort Key: date -> Index Scan using dico_frs_motid_date on dico_frs (cost=0.00..4691.50 rows=1187 width=12) (actual time=0.08..41.88 rows=2924 loops=1) Index Cond: ((motid = 1247) AND (date <= '2003-01-29 17:55:17'::timestamp without time zone) AND (date >= '2002-10-29 17:55:17'::timestamp without time zone)) Total runtime: 63.93 msec (6 rows) > It could just be a difference in plans or estimates. Right, but still I'd like to know whether the timestamp datatype in the index results in more work than an integer datatype. Thanks, -- Eric Cholet
On Wed, 29 Jan 2003, Eric Cholet wrote: > --On Wednesday, January 29, 2003 08:51:30 -0800 Stephan Szabo > <sszabo@megazone23.bigpanda.com> wrote: > > > On Wed, 29 Jan 2003, Eric Cholet wrote: > > > >> I have this schema: > >> > >> > >> motid | integer | not null > >> objid | integer | not null > >> date | timestamp without time zone | not null > >> Indexes: dico_frs_motid_date btree (motid, date) > >> dico_frs_objid btree (objid) > >> > >> The performance I'm getting from the index that contains > >> 'date' is much slower than when using the objid index > >> (different queries of course). This is a 10 million row > >> table. Am I right to assume that postgres needs to do > >> more work because it has to convert the dates to some > >> internal (integer?) format? > > > > What does explain (analyze if possible) show for the two queries? > > I can't really run two equivalent queries that will each use > a different index. Here's a query that uses the index with 'date' > (output wrapped manually) Well, I was wondering for example, the objid queries you were comparing to, were they returning an equivalent number of about 3000 rows from the index scan? I think part of the difference may be that here the plan grabs the rows from the index and then sorts them all so the limit doesn't actually save you significant time. For better speed on this particular sort of query, you might be better off with an order of : order by motid desc, date desc > => explain analyze select objid from dico_frs where motid=1247 > and date <= '2003-01-29 17:55:17' and date >= '2002-10-29 17:55:17' > order by date desc limit 11; > > Limit (cost=4752.14..4752.17 rows=11 width=12) (actual time=63.20..63.37 > rows=11 loops=1) > -> Sort (cost=4752.14..4755.11 rows=1187 width=12) (actual > time=63.17..63.23 rows=12 loops=1) > Sort Key: date > -> Index Scan using dico_frs_motid_date on dico_frs > (cost=0.00..4691.50 rows=1187 width=12) (actual time=0.08..41.88 rows=2924 > loops=1) > Index Cond: ((motid = 1247) AND (date <= '2003-01-29 > 17:55:17'::timestamp without time zone) AND (date >= '2002-10-29 > 17:55:17'::timestamp without time zone)) > Total runtime: 63.93 msec > (6 rows) > > > > It could just be a difference in plans or estimates. > > Right, but still I'd like to know whether the timestamp datatype in the > index > results in more work than an integer datatype. Almost certainly a little bit, but not so much that I would expect to see order of magnitude differences or anything.
--On Wednesday, January 29, 2003 10:45:34 -0800 Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote: > >> >> motid | integer | not null >> >> objid | integer | not null >> >> date | timestamp without time zone | not null >> >> Indexes: dico_frs_motid_date btree (motid, date) >> >> dico_frs_objid btree (objid) >> >> > Well, I was wondering for example, the objid queries you were comparing > to, were they returning an equivalent number of about 3000 rows from the > index scan? I think part of the difference may be that here the plan > grabs the rows from the index and then sorts them all so the limit doesn't > actually save you significant time. For better speed on this particular > sort of query, you might be better off with an order of : > order by motid desc, date desc Indeed, the speed gain is amazing, thanks for your explanation, I understand things a little better now. Note to self: next time don't make any assumptions, just post the explain analyze! >> => explain analyze select objid from dico_frs where motid=1247 >> and date <= '2003-01-29 17:55:17' and date >= '2002-10-29 17:55:17' >> order by date desc limit 11; >> >> Limit (cost=4752.14..4752.17 rows=11 width=12) (actual >> time=63.20..63.37 rows=11 loops=1) >> -> Sort (cost=4752.14..4755.11 rows=1187 width=12) (actual >> time=63.17..63.23 rows=12 loops=1) >> Sort Key: date >> -> Index Scan using dico_frs_motid_date on dico_frs >> (cost=0.00..4691.50 rows=1187 width=12) (actual time=0.08..41.88 >> rows=2924 loops=1) >> Index Cond: ((motid = 1247) AND (date <= '2003-01-29 >> 17:55:17'::timestamp without time zone) AND (date >= '2002-10-29 >> 17:55:17'::timestamp without time zone)) >> Total runtime: 63.93 msec >> (6 rows) -- Eric Cholet