Thread: FW: FW: Index usage
Thanks, Tim. I tried adding an upper limit and its still the same as follows: ============== db=# explain analyze select date from chatlogs where date>='11/24/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_date_idx on chatlogs (cost=0.00..145.72 rows=37 width=4) (actual time=0.18..239.69 rows=10737 loops=1) Total runtime: 246.22 msec EXPLAIN db=# explain analyze select date from chatlogs where date>='11/23/04' and date<'11/24/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual time=0.44..4447.01 rows=13029 loops=1) Total runtime: 4455.56 msec EXPLAIN db=# explain analyze select date from chatlogs where date>='11/23/04' and date<'11/25/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual time=0.45..4268.00 rows=23787 loops=1) Total runtime: 4282.81 msec ============== How come a query on the current date filter uses an index and the others does not? This makes indexing to speed up queries quite difficult. -----Original Message----- From: Leeuw van der, Tim [mailto:tim.leeuwvander@nl.unisys.com] Sent: Wednesday, November 24, 2004 3:35 PM To: BBI Edwin Punzalan; pgsql-performance@postgresql.org Subject: RE: [PERFORM] FW: Index usage Well you just selected a whole lot more rows... What's the total number of rows in the table? In general, what I remember from reading on the list, is that when there's no upper bound on a query like this, the planner is more likely to choose a seq. scan than an index scan. Try to give your query an upper bound like: select date from chatlogs where date>='11/23/04' and date < '12/31/99'; select date from chatlogs where date>='10/23/04' and date < '12/31/99'; This should make it easier for the planner to give a proper estimate of the number of rows returned. If it doesn't help yet, please post 'explain analyze' output rather than 'explain' output, for it allows much better investigation into why the planner chooses what it chooses. cheers, --Tim -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of BBI Edwin Punzalan Sent: Wednesday, November 24, 2004 7:52 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] FW: Index usage Hi everyone, Can anyone please explain postgres' behavior on our index. I did the following query tests on our database: ==================== db=# create index chatlogs_date_idx on chatlogs (date); CREATE db=# explain select date from chatlogs where date>='11/23/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_date_idx on chatlogs (cost=0.00..144.11 rows=36 width=4) EXPLAIN db=# explain select date from chatlogs where date>='10/23/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..23938.06 rows=253442 width=4) EXPLAIN==================== Date's datatype is date. Its just odd that I just change the actual date of search and the index is not being used anymore. ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org
From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> > db=# explain analyze select date from chatlogs where date>='11/23/04' and > date<'11/25/04'; > NOTICE: QUERY PLAN: > > Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual > time=0.45..4268.00 rows=23787 loops=1) > Total runtime: 4282.81 msec > ============== > > How come a query on the current date filter uses an index and the others > does not? This makes indexing to speed up queries quite difficult. have you ANALYZED the table lately ? what version postgres are you using ? gnari
Yes, the database is being vacuum-ed and analyzed on a daily basis. Our version is 7.2.1 -----Original Message----- From: gnari [mailto:gnari@simnet.is] Sent: Wednesday, November 24, 2004 4:35 PM To: BBI Edwin Punzalan; pgsql-performance@postgresql.org Subject: Re: [PERFORM] FW: Index usage From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> > db=# explain analyze select date from chatlogs where date>='11/23/04' > and date<'11/25/04'; > NOTICE: QUERY PLAN: > > Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual > time=0.45..4268.00 rows=23787 loops=1) Total runtime: 4282.81 msec > ============== > > How come a query on the current date filter uses an index and the > others does not? This makes indexing to speed up queries quite > difficult. have you ANALYZED the table lately ? what version postgres are you using ? gnari
BBI Edwin Punzalan wrote: > Thanks, Tim. > > I tried adding an upper limit and its still the same as follows: > > ============== > db=# explain analyze select date from chatlogs where date>='11/24/04'; > NOTICE: QUERY PLAN: > > Index Scan using chatlogs_date_idx on chatlogs (cost=0.00..145.72 rows=37 > width=4) (actual time=0.18..239.69 rows=10737 loops=1) > Total runtime: 246.22 msec > > EXPLAIN > db=# explain analyze select date from chatlogs where date>='11/23/04' and > date<'11/24/04'; > NOTICE: QUERY PLAN: > > Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual > time=0.44..4447.01 rows=13029 loops=1) > Total runtime: 4455.56 msec We have two issues here 1. In the first example it only picks an index because it thinks it is going to get 37 rows, it actually gets 10737 2. It's taking 4455ms to run a seq-scan but only 246ms to run an index-scan over 10737 rows (and then fetch the rows too). Questions: 1. How many rows do you have in chatlogs? 2. Is this the only problem you are experiencing, or just one from many? 3. Have you tuned any configuration settings? e.g. as suggested in: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -- Richard Huxton Archonet Ltd
Hi. 1) chatlogs rows increases every now and then (its in a live environment) and currently have 538,696 rows 2) this is the only problem we experienced. So far, all our other indexes are being used correctly. 3) I don't remember tuning any post-installation configuration of our postgreSQL except setting fsync to false. Thanks for taking a look at our problem. :D -----Original Message----- From: Richard Huxton [mailto:dev@archonet.com] Sent: Wednesday, November 24, 2004 6:17 PM To: BBI Edwin Punzalan Cc: pgsql-performance@postgresql.org Subject: Re: FW: [PERFORM] FW: Index usage BBI Edwin Punzalan wrote: > Thanks, Tim. > > I tried adding an upper limit and its still the same as follows: > > ============== > db=# explain analyze select date from chatlogs where date>='11/24/04'; > NOTICE: QUERY PLAN: > > Index Scan using chatlogs_date_idx on chatlogs (cost=0.00..145.72 > rows=37 > width=4) (actual time=0.18..239.69 rows=10737 loops=1) > Total runtime: 246.22 msec > > EXPLAIN > db=# explain analyze select date from chatlogs where date>='11/23/04' > and date<'11/24/04'; > NOTICE: QUERY PLAN: > > Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual > time=0.44..4447.01 rows=13029 loops=1) Total runtime: 4455.56 msec We have two issues here 1. In the first example it only picks an index because it thinks it is going to get 37 rows, it actually gets 10737 2. It's taking 4455ms to run a seq-scan but only 246ms to run an index-scan over 10737 rows (and then fetch the rows too). Questions: 1. How many rows do you have in chatlogs? 2. Is this the only problem you are experiencing, or just one from many? 3. Have you tuned any configuration settings? e.g. as suggested in: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -- Richard Huxton Archonet Ltd
BBI Edwin Punzalan wrote: > Hi. > > 1) chatlogs rows increases every now and then (its in a live environment) > and currently have 538,696 rows OK, so as a rule of thumb I'd say if you were fetching less than 5000 rows it's bound to use an index. If more than 50,000 always use a seqscan, otherwise it'll depend on configuration settings. It looks like you settings are suggesting the cost of an index-scan vs seq-scan are greater than they are. > 2) this is the only problem we experienced. So far, all our other indexes > are being used correctly. Good. > 3) I don't remember tuning any post-installation configuration of our > postgreSQL except setting fsync to false. So long as you know why this can cause data loss. It won't affect this problem. Read that performance article I linked to in the last message, it's written by two people who know what they're talking about. The standard configuration settings are designed to work on any machine, not provide good performance. Work through the basics there and we can look at random_page_cost etc. if it's still causing you problems. -- Richard Huxton Archonet Ltd
From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> > > Yes, the database is being vacuum-ed and analyzed on a daily basis. > then you should consider increating the statistics on the date column, as the estimates were a bit off in the plan > Our version is 7.2.1 upgrade time ? gnari
Hi, what do you mean by increasing the statistics on the date column? We never had any upgrade on it. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of gnari Sent: Thursday, November 25, 2004 3:13 AM To: BBI Edwin Punzalan; pgsql-performance@postgresql.org Subject: Re: [PERFORM] FW: Index usage From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> > > Yes, the database is being vacuum-ed and analyzed on a daily basis. > then you should consider increating the statistics on the date column, as the estimates were a bit off in the plan > Our version is 7.2.1 upgrade time ? gnari ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> > > Hi, what do you mean by increasing the statistics on the date column? alter table chatlogs alter column date set statistics 300; analyze chatlogs; > > > Our version is 7.2.1 > > > > upgrade time ? > > We never had any upgrade on it. 7.2 is a bit dated now that 8.0 is in beta if you want to stay with 7.2, you should at least upgrade to the latest point release (7.2.6 ?), as several serious bugs have been fixed gnari
Thanks but whatever it does, it didn't work. :D Do you think upgrading will fix this problem? ========================= db=# alter table chatlogs alter column date set statistics 300; ALTER db=# analyze chatlogs; ANALYZE db=# explain analyze select * from chatlogs where date >= '12/1/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_type_idx on chatlogs (cost=0.00..6053.61 rows=3357 width=212) (actual time=22.14..138.53 rows=1312 loops=1) Total runtime: 139.42 msec EXPLAIN morphTv=# explain analyze select * from chatlogs where date >= '11/03/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..27252.86 rows=271882 width=212) (actual time=12.24..13419.36 rows=257137 loops=1) Total runtime: 13573.70 msec EXPLAIN ========================= -----Original Message----- From: gnari [mailto:gnari@simnet.is] Sent: Wednesday, December 01, 2004 10:08 AM To: BBI Edwin Punzalan; pgsql-performance@postgresql.org Subject: Re: [PERFORM] FW: Index usage From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> > > Hi, what do you mean by increasing the statistics on the date column? alter table chatlogs alter column date set statistics 300; analyze chatlogs; > > > Our version is 7.2.1 > > > > upgrade time ? > > We never had any upgrade on it. 7.2 is a bit dated now that 8.0 is in beta if you want to stay with 7.2, you should at least upgrade to the latest point release (7.2.6 ?), as several serious bugs have been fixed gnari
If it's any help, i just ran this test on 7.4.6, my table has about 7000000 rows and the index is an integer. The item id ranges from 1 to 20000. As you can see from the following plans, the optimizer changed it's plan depending on the value of the item id condition, and will use an index when it determines that the number of values that will be returned is a low % of the total table size. The item_id is an integer, but It looked like you are using a character field to store date information. Also, the dates you entered in your test case seem to be in the format DD/MM/YY which won't be amenable to useful comparative searching (I didn't read any of the earlier posts so if that isn't the case, just ignore this). If this is the case, try storing the data in a date column and see what happens then. regards Iain test=# explain analyse select * from bigtable where item_id <= 1000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- -------------------------------------------- Index Scan using d_bigtable_idx2 on bigtable (cost=0.00..118753.57 rows=59553 width=80) (actual time=0.069..704.401 rows=58102 loops=1) Index Cond: ((item_id)::integer <= 1000) Total runtime: 740.786 ms (3 rows) test=# explain analyse select * from bigtable where item_id <= 100000000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- --------------- Seq Scan on d_hi_mise_item_uri (cost=0.00..194285.15 rows=7140589 width=80) (actual time=0.027..18599.032 rows=71 14844 loops=1) Filter: ((item_id)::integer <= 100000000) Total runtime: 23024.986 ms ----- Original Message ----- From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> To: "'gnari'" <gnari@simnet.is>; <pgsql-performance@postgresql.org> Sent: Wednesday, December 01, 2004 11:33 AM Subject: Re: [PERFORM] FW: Index usage > > Thanks but whatever it does, it didn't work. :D > > Do you think upgrading will fix this problem? > > ========================= > db=# alter table chatlogs alter column date set statistics 300; > ALTER > db=# analyze chatlogs; > ANALYZE > db=# explain analyze select * from chatlogs where date >= '12/1/04'; > NOTICE: QUERY PLAN: > > Index Scan using chatlogs_type_idx on chatlogs (cost=0.00..6053.61 > rows=3357 width=212) (actual time=22.14..138.53 rows=1312 > loops=1) > Total runtime: 139.42 msec > > EXPLAIN > morphTv=# explain analyze select * from chatlogs where date >= '11/03/04'; > NOTICE: QUERY PLAN: > > Seq Scan on chatlogs (cost=0.00..27252.86 rows=271882 width=212) (actual > time=12.24..13419.36 rows=257137 loops=1) > Total runtime: 13573.70 msec > > EXPLAIN > ========================= > > > > -----Original Message----- > From: gnari [mailto:gnari@simnet.is] > Sent: Wednesday, December 01, 2004 10:08 AM > To: BBI Edwin Punzalan; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] FW: Index usage > > > From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> > > >> >> Hi, what do you mean by increasing the statistics on the date column? > > alter table chatlogs alter column date set statistics 300; analyze > chatlogs; > >> > > Our version is 7.2.1 >> > >> > upgrade time ? >> >> We never had any upgrade on it. > > 7.2 is a bit dated now that 8.0 is in beta > > if you want to stay with 7.2, you should at least upgrade > to the latest point release (7.2.6 ?), as several serious bugs have been > fixed > > gnari > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Hi. Thanks for your reply. The date column data type is date already. :D -----Original Message----- From: Iain [mailto:iain@mst.co.jp] Sent: Wednesday, December 01, 2004 12:00 PM To: BBI Edwin Punzalan; 'gnari'; pgsql-performance@postgresql.org Subject: Re: [PERFORM] FW: Index usage If it's any help, i just ran this test on 7.4.6, my table has about 7000000 rows and the index is an integer. The item id ranges from 1 to 20000. As you can see from the following plans, the optimizer changed it's plan depending on the value of the item id condition, and will use an index when it determines that the number of values that will be returned is a low % of the total table size. The item_id is an integer, but It looked like you are using a character field to store date information. Also, the dates you entered in your test case seem to be in the format DD/MM/YY which won't be amenable to useful comparative searching (I didn't read any of the earlier posts so if that isn't the case, just ignore this). If this is the case, try storing the data in a date column and see what happens then. regards Iain test=# explain analyse select * from bigtable where item_id <= 1000; QUERY PLAN ---------------------------------------------------------------------------- --------------------------------------- -------------------------------------------- Index Scan using d_bigtable_idx2 on bigtable (cost=0.00..118753.57 rows=59553 width=80) (actual time=0.069..704.401 rows=58102 loops=1) Index Cond: ((item_id)::integer <= 1000) Total runtime: 740.786 ms (3 rows) test=# explain analyse select * from bigtable where item_id <= 100000000; QUERY PLAN ---------------------------------------------------------------------------- --------------------------------------- --------------- Seq Scan on d_hi_mise_item_uri (cost=0.00..194285.15 rows=7140589 width=80) (actual time=0.027..18599.032 rows=71 14844 loops=1) Filter: ((item_id)::integer <= 100000000) Total runtime: 23024.986 ms ----- Original Message ----- From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> To: "'gnari'" <gnari@simnet.is>; <pgsql-performance@postgresql.org> Sent: Wednesday, December 01, 2004 11:33 AM Subject: Re: [PERFORM] FW: Index usage > > Thanks but whatever it does, it didn't work. :D > > Do you think upgrading will fix this problem? > > ========================= > db=# alter table chatlogs alter column date set statistics 300; ALTER > db=# analyze chatlogs; > ANALYZE > db=# explain analyze select * from chatlogs where date >= '12/1/04'; > NOTICE: QUERY PLAN: > > Index Scan using chatlogs_type_idx on chatlogs (cost=0.00..6053.61 > rows=3357 width=212) (actual time=22.14..138.53 rows=1312 > loops=1) > Total runtime: 139.42 msec > > EXPLAIN > morphTv=# explain analyze select * from chatlogs where date >= > '11/03/04'; > NOTICE: QUERY PLAN: > > Seq Scan on chatlogs (cost=0.00..27252.86 rows=271882 width=212) > (actual time=12.24..13419.36 rows=257137 loops=1) Total runtime: > 13573.70 msec > > EXPLAIN > ========================= > > > > -----Original Message----- > From: gnari [mailto:gnari@simnet.is] > Sent: Wednesday, December 01, 2004 10:08 AM > To: BBI Edwin Punzalan; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] FW: Index usage > > > From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> > > >> >> Hi, what do you mean by increasing the statistics on the date column? > > alter table chatlogs alter column date set statistics 300; analyze > chatlogs; > >> > > Our version is 7.2.1 >> > >> > upgrade time ? >> >> We never had any upgrade on it. > > 7.2 is a bit dated now that 8.0 is in beta > > if you want to stay with 7.2, you should at least upgrade > to the latest point release (7.2.6 ?), as several serious bugs have > been fixed > > gnari > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Sorry, i can't check this easily as I don't have any date fields in my data (they all held has character strings - do as i say, not as i do) but maybe you should cast or convert the string representation of the date to a date in the where clause. Postgres might be doing some implicit conversion but if it is, I'd expect it to use a YYYY-MM-DD format which is what I see here. Something like ... WHERE date>= to_date('11/03/04','DD/MM/YY') regards Iain ----- Original Message ----- From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> To: "'Iain'" <iain@mst.co.jp>; "'gnari'" <gnari@simnet.is>; <pgsql-performance@postgresql.org> Sent: Wednesday, December 01, 2004 1:05 PM Subject: RE: [PERFORM] FW: Index usage > > Hi. Thanks for your reply. The date column data type is date already. :D > > -----Original Message----- > From: Iain [mailto:iain@mst.co.jp] > Sent: Wednesday, December 01, 2004 12:00 PM > To: BBI Edwin Punzalan; 'gnari'; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] FW: Index usage > > > If it's any help, i just ran this test on 7.4.6, my table has about > 7000000 > rows and the index is an integer. > > The item id ranges from 1 to 20000. > > As you can see from the following plans, the optimizer changed it's plan > depending on the value of the item id condition, and will use an index > when > it determines that the number of values that will be returned is a low % > of > the total table size. > > The item_id is an integer, but It looked like you are using a character > field to store date information. Also, the dates you entered in your test > case seem to be in the format DD/MM/YY which won't be amenable to useful > comparative searching (I didn't read any of the earlier posts so if that > isn't the case, just ignore this). If this is the case, try storing the > data > > in a date column and see what happens then. > > regards > Iain > > test=# explain analyse select * from bigtable where item_id <= 1000; > > QUERY > PLAN > > ---------------------------------------------------------------------------- > --------------------------------------- > -------------------------------------------- > Index Scan using d_bigtable_idx2 on bigtable (cost=0.00..118753.57 > rows=59553 width=80) (actual > time=0.069..704.401 rows=58102 loops=1) > Index Cond: ((item_id)::integer <= 1000) > Total runtime: 740.786 ms > (3 rows) > > > test=# explain analyse select * from bigtable where item_id <= 100000000; > QUERY PLAN > > ---------------------------------------------------------------------------- > --------------------------------------- > --------------- > Seq Scan on d_hi_mise_item_uri (cost=0.00..194285.15 rows=7140589 > width=80) (actual time=0.027..18599.032 rows=71 > 14844 loops=1) > Filter: ((item_id)::integer <= 100000000) > Total runtime: 23024.986 ms > > ----- Original Message ----- > From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> > To: "'gnari'" <gnari@simnet.is>; <pgsql-performance@postgresql.org> > Sent: Wednesday, December 01, 2004 11:33 AM > Subject: Re: [PERFORM] FW: Index usage > > >> >> Thanks but whatever it does, it didn't work. :D >> >> Do you think upgrading will fix this problem? >> >> ========================= >> db=# alter table chatlogs alter column date set statistics 300; ALTER >> db=# analyze chatlogs; >> ANALYZE >> db=# explain analyze select * from chatlogs where date >= '12/1/04'; >> NOTICE: QUERY PLAN: >> >> Index Scan using chatlogs_type_idx on chatlogs (cost=0.00..6053.61 >> rows=3357 width=212) (actual time=22.14..138.53 rows=1312 >> loops=1) >> Total runtime: 139.42 msec >> >> EXPLAIN >> morphTv=# explain analyze select * from chatlogs where date >= >> '11/03/04'; >> NOTICE: QUERY PLAN: >> >> Seq Scan on chatlogs (cost=0.00..27252.86 rows=271882 width=212) >> (actual time=12.24..13419.36 rows=257137 loops=1) Total runtime: >> 13573.70 msec >> >> EXPLAIN >> ========================= >> >> >> >> -----Original Message----- >> From: gnari [mailto:gnari@simnet.is] >> Sent: Wednesday, December 01, 2004 10:08 AM >> To: BBI Edwin Punzalan; pgsql-performance@postgresql.org >> Subject: Re: [PERFORM] FW: Index usage >> >> >> From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> >> >> >>> >>> Hi, what do you mean by increasing the statistics on the date column? >> >> alter table chatlogs alter column date set statistics 300; analyze >> chatlogs; >> >>> > > Our version is 7.2.1 >>> > >>> > upgrade time ? >>> >>> We never had any upgrade on it. >> >> 7.2 is a bit dated now that 8.0 is in beta >> >> if you want to stay with 7.2, you should at least upgrade >> to the latest point release (7.2.6 ?), as several serious bugs have >> been fixed >> >> gnari >> >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
From: "BBI Edwin Punzalan" <edwin@bluebamboo.ph> > Thanks but whatever it does, it didn't work. : > Do you think upgrading will fix this problem? are you sure there is a problem here to solve ? > Seq Scan on chatlogs (cost=0.00..27252.86 rows=271882 width=212) (actual > time=12.24..13419.36 rows=257137 loops=1) you see that the actual rowcount matches the estimate, so the planner is not being misled by wrong statistics. you realize that an indexscan is not allways faster than sequential scan unless the number of rows are a small percentage of the total number of rows did you try to add a 'order by date' clause to your query ? gnari