Thread: slow seqscan
My first post to this list :) Scenario: I have a database used only with search queries with only one table that holds about 450.000/500.000 records. The table is well indexed so that most of the queries are executed with index scan but since there is a big text field in the table (360chars) some search operation (with certain filters) ends up with seq scans. This table is not written during normal operation: twice per week there is a batch program that insert about 35.000 records and updates another 40.000. last friday morning, after that batch has been executed, the database started responding really slowly to queries (expecially seq scans), after a "vacuum full analize" things did get something better. Yesterday the same: before the batch everything was perfect, after every query was really slow, I've vacuum it again and now is ok. Since now the db was working fine, it's 4 month's old with two updates per week and I vacuum about once per month. I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking about setting this table in a kind of "read-only" mode to improve performance, is this possible? Thank you for your help Edoardo Ceccarelli
Hi Edoardo, > The table is well indexed so that most of the queries are executed with > index scan but since there is a big text field in the table (360chars) > some search operation (with certain filters) ends up with seq scans. Please paste the exact SELECT query that uses a seqscan, plus the EXPLAIN ANALYZE of the SELECT, and the psql output of \d <table>. > This table is not written during normal operation: twice per week there > is a batch program that insert about 35.000 records and updates another > 40.000. After such an update, you need to run VACUUM ANALYZE <table>; Run it before the update as well, if it doesn't take that long. > last friday morning, after that batch has been executed, the database > started responding really slowly to queries (expecially seq scans), > after a "vacuum full analize" things did get something better. > Yesterday the same: before the batch everything was perfect, after every > query was really slow, I've vacuum it again and now is ok. > Since now the db was working fine, it's 4 month's old with two updates > per week and I vacuum about once per month. You need to vacuum analyze (NOT full) once and HOUR, not once a month. Add this command to your crontab to run once an hour and verify that it's working: vacuumdb -a -z -q Otherwise, install the auto vacuum utility found in contrib/pg_autovacuum in the postgres source. Set this up. It will monitor postgres and run vacuums and analyzes when necessary. You can then remove your cron job. > I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking > about setting this table in a kind of "read-only" mode to improve > performance, is this possible? There's no read only mode to improve performance. Upgrading to 7.4 will more than likely improve the performance of your database in general. Be careful to read the upgrade notes because there were a few incompatibilities. Chris
Edoardo Ceccarelli wrote: > My first post to this list :) > > Scenario: > I have a database used only with search queries with only one table that > holds about 450.000/500.000 records. > The table is well indexed so that most of the queries are executed with > index scan but since there is a big text field in the table (360chars) > some search operation (with certain filters) ends up with seq scans. > This table is not written during normal operation: twice per week there > is a batch program that insert about 35.000 records and updates another > 40.000. > > last friday morning, after that batch has been executed, the database > started responding really slowly to queries (expecially seq scans), > after a "vacuum full analize" things did get something better. > Yesterday the same: before the batch everything was perfect, after > every query was really slow, I've vacuum it again and now is ok. > Since now the db was working fine, it's 4 month's old with two updates > per week and I vacuum about once per month. > > I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking > about setting this table in a kind of "read-only" mode to improve > performance, is this possible? > > Thank you for your help > Edoardo Ceccarelli > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend In general we are going to need more information, like what kind of search filters you are using on the text field and an EXPLAIN ANALYZE. But can you try and run the following, bearing in mind it will take a while to complete. REINDEX TABLE <table_name> From what I remember there were issues with index space not being reclaimed in a vacuum. I believe this was fixed in 7.4. By not reclaiming the space the indexes grow larger and larger over time, causing PG to prefer a sequential scan over an index scan (I think). Hope that helps Nick
> > In general we are going to need more information, like what kind of > search filters you are using on the text field and an EXPLAIN ANALYZE. > But can you try and run the following, bearing in mind it will take a > while to complete. > > REINDEX TABLE <table_name> > > From what I remember there were issues with index space not being > reclaimed in a vacuum. I believe this was fixed in 7.4. By not > reclaiming the space the indexes grow larger and larger over time, > causing PG to prefer a sequential scan over an index scan (I think). > > The query is this: SELECT *, oid FROM annuncio400 WHERE rubric = 'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11 dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE rubric = 'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..3116.00 rows=11 width=546) (actual time=51.47..56.42 rows=11 loops=1) -> Seq Scan on annuncio400 (cost=0.00..35490.60 rows=125 width=546) (actual time=51.47..56.40 rows=12 loops=1) Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~ 'cbr%'::text)) Total runtime: 56.53 msec (4 rows) But the strangest thing ever is that if I change the filter with another one that represent a smaller amount of data it uses the index scan!!! check this (same table, same query, different rubric=MA index): dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE rubric = 'MA' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..6630.72 rows=9 width=546) (actual time=42.74..42.74 rows=0 loops=1) -> Index Scan using rubric on annuncio400 (cost=0.00..6968.48 rows=9 width=546) (actual time=42.73..42.73 rows=0 loops=1) Index Cond: (rubric = 'MA'::bpchar) Filter: (lower((testo)::text) ~~ 'cbr%'::text) Total runtime: 42.81 msec (5 rows) Thanks for your help Edoardo > >
> dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE rubric = > 'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11; > QUERY > PLAN > -------------------------------------------------------------------------------------------------------------------- > > Limit (cost=0.00..3116.00 rows=11 width=546) (actual time=51.47..56.42 > rows=11 loops=1) > -> Seq Scan on annuncio400 (cost=0.00..35490.60 rows=125 width=546) > (actual time=51.47..56.40 rows=12 loops=1) > Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~ > 'cbr%'::text)) > Total runtime: 56.53 msec > (4 rows) What happens if you go: CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, LOWER(testo)); or even just: CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo)); > But the strangest thing ever is that if I change the filter with another > one that represent a smaller amount of data it uses the index scan!!! What's strange about that? The less data is going to be retrieved, the more likely postgres is to use the index. I suggest maybe increasing the amount of stats recorded for your rubrik column: ALTER TABLE annuncio400 ALTER rubrik SET STATISTICS 100; ANALYZE annuncio400; You could also try reducing the random_page_cost value in your postgresql.conf a little, say to 3 (if it's currently 4). That will make postgres more likely to use index scans over seq scans. Chris
> What happens if you go: > > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, > LOWER(testo)); > > or even just: > > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo)); > I wasn't able to make this 2 field index with lower: dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, LOWER(testo)); ERROR: parser: parse error at or near "(" at character 71 seems impossible to creat 2 field indexes with lower function. The other one does not make it use the index. >> But the strangest thing ever is that if I change the filter with >> another one that represent a smaller amount of data it uses the >> index scan!!! > > > What's strange about that? The less data is going to be retrieved, > the more likely postgres is to use the index. > can't understand this policy: dba400=# SELECT count(*) from annuncio400 where rubric='DD'; count ------- 6753 (1 row) dba400=# SELECT count(*) from annuncio400 where rubric='MA'; count ------- 2165 (1 row) so it's using the index on 2000 rows and not for 6000? it's not that big difference, isn't it? > I suggest maybe increasing the amount of stats recorded for your > rubrik column: > > ALTER TABLE annuncio400 ALTER rubrik SET STATISTICS 100; > ANALYZE annuncio400; > done, almost the same, still not using index > You could also try reducing the random_page_cost value in your > postgresql.conf a little, say to 3 (if it's currently 4). That will > make postgres more likely to use index scans over seq scans. > changed the setting on postgresql.conf, restarted the server, nothing has changed. what about setting this to false? #enable_seqscan = true thanks again Edoardo
tried the enable_seqscan = false and I'm having all index scans, timing has improved from 600ms to 18ms wondering what other implications I might expect. Edoardo Ceccarelli ha scritto: > >> What happens if you go: >> >> CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, >> LOWER(testo)); >> >> or even just: >> >> CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo)); >> > I wasn't able to make this 2 field index with lower: > > dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON > annuncio400(rubric, LOWER(testo)); > ERROR: parser: parse error at or near "(" at character 71 > > seems impossible to creat 2 field indexes with lower function. > > The other one does not make it use the index. > > >>> But the strangest thing ever is that if I change the filter with >>> another one that represent a smaller amount of data it uses the >>> index scan!!! >> >> >> >> What's strange about that? The less data is going to be retrieved, >> the more likely postgres is to use the index. >> > can't understand this policy: > > dba400=# SELECT count(*) from annuncio400 where rubric='DD'; > count > ------- > 6753 > (1 row) > > dba400=# SELECT count(*) from annuncio400 where rubric='MA'; > count > ------- > 2165 > (1 row) > > so it's using the index on 2000 rows and not for 6000? it's not that > big difference, isn't it? > > >> I suggest maybe increasing the amount of stats recorded for your >> rubrik column: >> >> ALTER TABLE annuncio400 ALTER rubrik SET STATISTICS 100; >> ANALYZE annuncio400; >> > done, almost the same, still not using index > >> You could also try reducing the random_page_cost value in your >> postgresql.conf a little, say to 3 (if it's currently 4). That will >> make postgres more likely to use index scans over seq scans. >> > > changed the setting on postgresql.conf, restarted the server, > nothing has changed. > > what about setting this to false? > #enable_seqscan = true > > thanks again > Edoardo > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > >
> enable_seqscan = false > > and I'm having all index scans, timing has improved from 600ms to 18ms > > wondering what other implications I might expect. Lots of really bad implications...it's really not a good idea. Chris
just created a copy of the same database and it shows that is the analyze that's messing things: Slow seqscan query executed on dba400 dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE rubric = 'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..3116.00 rows=11 width=546) (actual time=46.66..51.40 rows=11 loops=1) -> Seq Scan on annuncio400 (cost=0.00..35490.60 rows=125 width=546) (actual time=46.66..51.38 rows=12 loops=1) Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~ 'cbr%'::text)) Total runtime: 51.46 msec (4 rows) fastest index scan query on dba400b (exact copy of dba400) dba400b=# explain analyze SELECT *, oid FROM annuncio400 WHERE rubric = 'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..7058.40 rows=9 width=546) (actual time=1.36..8.18 rows=11 loops=1) -> Index Scan using rubric on annuncio400 (cost=0.00..7369.42 rows=9 width=546) (actual time=1.35..8.15 rows=12 loops=1) Index Cond: (rubric = 'DD'::bpchar) Filter: (lower((testo)::text) ~~ 'cbr%'::text) Total runtime: 8.28 msec (5 rows) what about this index you suggested? it gives me sintax error while trying to create it: CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, LOWER(testo)); Thanks Edoardo Christopher Kings-Lynne ha scritto: > >> enable_seqscan = false >> >> and I'm having all index scans, timing has improved from 600ms to 18ms >> >> wondering what other implications I might expect. > > > Lots of really bad implications...it's really not a good idea. > > Chris > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if > your > joining column's datatypes do not match > > >
On Wed, 21 Apr 2004, Edoardo Ceccarelli wrote: > > > What happens if you go: > > > > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, > > LOWER(testo)); > > > > or even just: > > > > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo)); > > > I wasn't able to make this 2 field index with lower: > > dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON > annuncio400(rubric, LOWER(testo)); > ERROR: parser: parse error at or near "(" at character 71 That's a 7.4 feature I think (and I think the version with two columns may need extra parens around the lower()). I think the only way to do something equivalent in 7.3 is to make a function that concatenates the two in some fashion after having applied the lower to the one part and then using that in the queries as well. Plus, if you're not in "C" locale, I'm not sure that it'd help in 7.3 anyway. > >> But the strangest thing ever is that if I change the filter with > >> another one that represent a smaller amount of data it uses the > >> index scan!!! > > > > > > What's strange about that? The less data is going to be retrieved, > > the more likely postgres is to use the index. > > > can't understand this policy: > > dba400=# SELECT count(*) from annuncio400 where rubric='DD'; > count > ------- > 6753 > (1 row) > > dba400=# SELECT count(*) from annuncio400 where rubric='MA'; > count > ------- > 2165 > (1 row) > > so it's using the index on 2000 rows and not for 6000? it's not that > big difference, isn't it? It's a question of how many pages it thinks it's going to have to retrieve in order to handle the request. If it say needs (or think it needs) to retrieve 50% of the pages, then given a random_page_cost of 4, it's going to expect the index scan to be about twice the cost. Generally speaking one good way to compare is to try the query with explain analyze and then change parameters like enable_seqscan and try the query with explain analyze again and compare the estimated rows and costs. That'll give an idea of how it expects the two versions of the query to compare speed wise.
>>can't understand this policy: >> >>dba400=# SELECT count(*) from annuncio400 where rubric='DD'; >> count >>------- >> 6753 >>(1 row) >> >>dba400=# SELECT count(*) from annuncio400 where rubric='MA'; >> count >>------- >> 2165 >>(1 row) >> >>so it's using the index on 2000 rows and not for 6000? it's not that >>big difference, isn't it? >> >> > >It's a question of how many pages it thinks it's going to have to retrieve >in order to handle the request. If it say needs (or think it needs) to >retrieve 50% of the pages, then given a random_page_cost of 4, it's going >to expect the index scan to be about twice the cost. > >Generally speaking one good way to compare is to try the query with >explain analyze and then change parameters like enable_seqscan and try the >query with explain analyze again and compare the estimated rows and costs. >That'll give an idea of how it expects the two versions of the query to >compare speed wise. > > > > Ok then how do you explain this? just created a copy of the same database Slow seqscan query executed on dba400 dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE rubric = 'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..3116.00 rows=11 width=546) (actual time=46.66..51.40 rows=11 loops=1) -> Seq Scan on annuncio400 (cost=0.00..35490.60 rows=125 width=546) (actual time=46.66..51.38 rows=12 loops=1) Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~ 'cbr%'::text)) Total runtime: 51.46 msec (4 rows) fastest index scan query on dba400b (exact copy of dba400) dba400b=# explain analyze SELECT *, oid FROM annuncio400 WHERE rubric = 'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..7058.40 rows=9 width=546) (actual time=1.36..8.18 rows=11 loops=1) -> Index Scan using rubric on annuncio400 (cost=0.00..7369.42 rows=9 width=546) (actual time=1.35..8.15 rows=12 loops=1) Index Cond: (rubric = 'DD'::bpchar) Filter: (lower((testo)::text) ~~ 'cbr%'::text) Total runtime: 8.28 msec (5 rows) anyway, shall I try to lower the random_page value since I get an index scan? I mean that in my case I've already noted that with index scan that query get executed in 1/10 of the seqscan speed. Thank you Edoardo
Edoardo Ceccarelli <eddy@axa.it> writes: > I wasn't able to make this 2 field index with lower: > dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON > annuncio400(rubric, LOWER(testo)); > ERROR: parser: parse error at or near "(" at character 71 > seems impossible to creat 2 field indexes with lower function. You need 7.4 to do that; previous releases don't support multi-column functional indexes. regards, tom lane