Thread: Query Plan choice with timestamps
Hi, I have a timestamptz field that I want to use with a query, but I don’t need the full timestamp resolution, so I’ve created a day_trunc(timestamptz) immutable function which I’ll use with the query and with a new index: logs=> create index test_idx on blackbox (day_trunc(ts)); However, the query plan doesn’t use the index: logs=>explain select count(*) from blackbox group by day_trunc(ts) order by day_trunc(ts); QUERY PLAN ------------------------------------------------------------------------------------------ GroupAggregate (cost=98431.58..119773.92 rows=74226 width=8) -> Sort (cost=98431.58..99050.92 rows=247736 width=8) Sort Key: (day_trunc(ts)) -> Seq Scan on blackbox (cost=0.00..72848.36 rows=247736 width=8) (4 rows) while with this index: logs=>create index test_2_idx on blackbox (ts); the query plan is the expected one: logs=>explain select count(*) from blackbox group by ts order by ts; QUERY PLAN ------------------------------------------------------------------------------------------ GroupAggregate (cost=0.00..19109.66 rows=74226 width=8) -> Index Scan using test_2_idx on blackbox (cost=0.00..16943.16 rows=247736 width=8) But I fail to see why. Any hints? Thank you in advance -- Giorgio Valoti
Giorgio Valoti wrote: > Hi, I have a timestamptz field that I want to use with a query, but I > don’t need the full timestamp resolution, so I’ve created a > day_trunc(timestamptz) immutable function which I’ll use with the query > and with a new index: > > logs=> create index test_idx on blackbox (day_trunc(ts)); > > However, the query plan doesn’t use the index: Does it use it ever? e.g. with SELECT * FROM blackbox WHERE day_trunk(ts) = '...' -- Richard Huxton Archonet Ltd
On 07/ago/08, at 10:35, Richard Huxton wrote: > Giorgio Valoti wrote: >> Hi, I have a timestamptz field that I want to use with a query, but >> I don’t need the full timestamp resolution, so I’ve created a >> day_trunc(timestamptz) immutable function which I’ll use with the >> query and with a new index: >> logs=> create index test_idx on blackbox (day_trunc(ts)); >> However, the query plan doesn’t use the index: > > Does it use it ever? e.g. with > SELECT * FROM blackbox WHERE day_trunk(ts) = '...' It’s used: logs=> explain select * from blackbox where day_trunc(ts) = day_trunc(now()); QUERY PLAN --------------------------------------------------------------------------- Bitmap Heap Scan on blackbox (cost=22.38..3998.43 rows=1239 width=264) Recheck Cond: (day_trunc(ts) = day_trunc(now())) -> Bitmap Index Scan on date_idx (cost=0.00..22.07 rows=1239 width=0) Index Cond: (day_trunc(ts) = day_trunc(now())) -- Giorgio Valoti
Giorgio Valoti wrote: > > On 07/ago/08, at 10:35, Richard Huxton wrote: > >> Giorgio Valoti wrote: >>> Hi, I have a timestamptz field that I want to use with a query, but I >>> don’t need the full timestamp resolution, so I’ve created a >>> day_trunc(timestamptz) immutable function which I’ll use with the >>> query and with a new index: >>> logs=> create index test_idx on blackbox (day_trunc(ts)); >>> However, the query plan doesn’t use the index: >> >> Does it use it ever? e.g. with >> SELECT * FROM blackbox WHERE day_trunk(ts) = '...' > > It’s used: [snip] OK - so the index is working. If you disable seq-scans before running the query, does it use it then? SET enable_seqscan = off; > logs=>explain select count(*) from blackbox group by day_trunc(ts) order > by day_trunc(ts); > QUERY PLAN > ------------------------------------------------------------------------------------------ > > GroupAggregate (cost=98431.58..119773.92 rows=74226 width=8) In particular: 1. Is the estimated cost more or less than 119773.92? 2. How does that match the actual time taken? -- Richard Huxton Archonet Ltd
Giorgio Valoti <giorgio_v@mac.com> writes: > GroupAggregate (cost=98431.58..119773.92 rows=74226 width=8) > -> Sort (cost=98431.58..99050.92 rows=247736 width=8) > Sort Key: (day_trunc(ts)) > -> Seq Scan on blackbox (cost=0.00..72848.36 rows=247736 width=8) > GroupAggregate (cost=0.00..19109.66 rows=74226 width=8) > -> Index Scan using test_2_idx on blackbox (cost=0.00..16943.16 rows=247736 width=8) These numbers seem pretty bogus: there is hardly any scenario in which a full-table indexscan should be costed as significantly cheaper than a seqscan. Have you put in silly values for random_page_cost? If you haven't mucked with the cost parameters, the only way I can think of to get this result is to have an enormously bloated table that's mostly empty. Maybe you need to review your vacuuming procedures. regards, tom lane
On 07/ago/08, at 17:50, Tom Lane wrote: > Giorgio Valoti <giorgio_v@mac.com> writes: >> GroupAggregate (cost=98431.58..119773.92 rows=74226 width=8) >> -> Sort (cost=98431.58..99050.92 rows=247736 width=8) >> Sort Key: (day_trunc(ts)) >> -> Seq Scan on blackbox (cost=0.00..72848.36 rows=247736 >> width=8) > >> GroupAggregate (cost=0.00..19109.66 rows=74226 width=8) >> -> Index Scan using test_2_idx on blackbox >> (cost=0.00..16943.16 rows=247736 width=8) > > These numbers seem pretty bogus: there is hardly any scenario in > which a > full-table indexscan should be costed as significantly cheaper than a > seqscan. Have you put in silly values for random_page_cost? No, > > > If you haven't mucked with the cost parameters, the only way I can > think > of to get this result is to have an enormously bloated table that's > mostly empty. Maybe you need to review your vacuuming procedures. I’ll review them. Thank you -- Giorgio Valoti
On 07/ago/08, at 14:36, Richard Huxton wrote: > Giorgio Valoti wrote: >> On 07/ago/08, at 10:35, Richard Huxton wrote: >>> Giorgio Valoti wrote: >>>> Hi, I have a timestamptz field that I want to use with a query, >>>> but I don’t need the full timestamp resolution, so I’ve created a >>>> day_trunc(timestamptz) immutable function which I’ll use with the >>>> query and with a new index: >>>> logs=> create index test_idx on blackbox (day_trunc(ts)); >>>> However, the query plan doesn’t use the index: >>> >>> Does it use it ever? e.g. with >>> SELECT * FROM blackbox WHERE day_trunk(ts) = '...' >> It’s used: > [snip] > > OK - so the index is working. > > If you disable seq-scans before running the query, does it use it > then? > > SET enable_seqscan = off; Yes > […] > > In particular: > 1. Is the estimated cost more or less than 119773.92? QUERY PLAN ----------------------------------------------------------------------------------------- GroupAggregate (cost=0.00..122309.32 rows=74226 width=8) -> Index Scan using date_idx on blackbox (cost=0.00..101586.31 rows=247736 width=8) > > 2. How does that match the actual time taken? QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=0.00..122309.32 rows=74226 width=8) (actual time=0.222..1931.651 rows=428 loops=1) -> Index Scan using date_idx on blackbox (cost=0.00..101586.31 rows=247736 width=8) (actual time=0.072..1861.367 rows=247736 loops=1) Total runtime: 1931.782 ms But I haven’t revised the vacuum settings. Thank you -- Giorgio Valoti
On 07/ago/08, at 20:37, Giorgio Valoti wrote: > > […] > >> >> >> If you haven't mucked with the cost parameters, the only way I can >> think >> of to get this result is to have an enormously bloated table that's >> mostly empty. Maybe you need to review your vacuuming procedures. > > I’ll review them. I’ve manually vacuum’ed the table: logs=> VACUUM FULL verbose analyze blackbox; INFO: vacuuming "public.blackbox" INFO: "blackbox": found 0 removable, 247736 nonremovable row versions in 8436 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 137 to 1210 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 894432 bytes. 0 pages are or will become empty, including 0 at the end of the table. 2926 pages containing 564212 free bytes are potential move destinations. CPU 0.00s/0.04u sec elapsed 0.04 sec. INFO: index "blackbox_pkey" now contains 247736 row versions in 1602 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.01s/0.00u sec elapsed 0.01 sec. INFO: index "vhost_idx" now contains 247736 row versions in 1226 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: index "remoteip_idx" now contains 247736 row versions in 682 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "date_idx" now contains 247736 row versions in 547 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "test_2_idx" now contains 247736 row versions in 682 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "blackbox": moved 0 row versions, truncated 8436 to 8436 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "pg_toast.pg_toast_45532" INFO: "pg_toast_45532": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 0 to 0 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 0 bytes. 0 pages are or will become empty, including 0 at the end of the table. 0 pages containing 0 free bytes are potential move destinations. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "pg_toast_45532_index" now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing "public.blackbox" INFO: "blackbox": scanned 3000 of 8436 pages, containing 87941 live rows and 0 dead rows; 3000 rows in sample, 247290 estimated total rows VACUUM And here the explain results: logs=> explain select count(*) from blackbox group by day_trunc(ts) order by day_trunc(ts); QUERY PLAN ----------------------------------------------------------------------------- Sort (cost=74210.52..74211.54 rows=407 width=8) Sort Key: (day_trunc(ts)) -> HashAggregate (cost=74086.04..74192.88 rows=407 width=8) -> Seq Scan on blackbox (cost=0.00..72847.36 rows=247736 width=8) (4 rows) logs=> explain select count(*) from blackbox group by ts order by ts; QUERY PLAN ------------------------------------------------------------------------------------------ GroupAggregate (cost=0.00..18381.54 rows=77738 width=8) -> Index Scan using test_2_idx on blackbox (cost=0.00..16171.13 rows=247736 width=8) (2 rows) Maybe it’s the silly test queries that prove nothing: logs=> explain select * from blackbox where day_trunc(ts) = day_trunc(now()); QUERY PLAN ------------------------------------------------------------------------------- Index Scan using date_idx on blackbox (cost=0.50..158.65 rows=569 width=237) Index Cond: (day_trunc(ts) = day_trunc(now())) (2 rows) Ciao -- Giorgio Valoti
Giorgio Valoti <giorgio_v@mac.com> writes: > On 07/ago/08, at 17:50, Tom Lane wrote: >> These numbers seem pretty bogus: there is hardly any scenario in >> which a >> full-table indexscan should be costed as significantly cheaper than a >> seqscan. Have you put in silly values for random_page_cost? > No, I looked at it more closely and realized that the cost discrepancy is from the evaluation of the function: having to evaluate a SQL or plpgsql function 247736 times more than explains the cost estimate differential compared to a query that involves no function call. Some experiments here suggest that it hardly matters whether the query uses indexscan or seqscan because the time is dominated by the function calls anyway. regards, tom lane
On 07/ago/08, at 23:01, Tom Lane wrote: > Giorgio Valoti <giorgio_v@mac.com> writes: >> On 07/ago/08, at 17:50, Tom Lane wrote: >>> These numbers seem pretty bogus: there is hardly any scenario in >>> which a >>> full-table indexscan should be costed as significantly cheaper >>> than a >>> seqscan. Have you put in silly values for random_page_cost? > >> No, > > I looked at it more closely and realized that the cost discrepancy is > from the evaluation of the function: having to evaluate a SQL or > plpgsql > function 247736 times more than explains the cost estimate > differential > compared to a query that involves no function call. Some experiments > here suggest that it hardly matters whether the query uses indexscan > or > seqscan because the time is dominated by the function calls anyway. I see, thank you Tom. Could it be a good idea adding some notes about it in <http://www.postgresql.org/docs/8.3/interactive/indexes-expressional.html >? As you said, since the function call dominates the query cost, in this case, I think there’s no point to use an index expression. Ciao -- Giorgio Valoti