Thread: slow seqscan

slow seqscan

From
Edoardo Ceccarelli
Date:
My first post to this list :)

Scenario:
I have a database used only with search queries with only one table that
holds about 450.000/500.000 records.
The table is well indexed so that most of the queries are executed with
index scan but since there is a big text field in the table (360chars)
some search operation (with certain filters) ends up with seq scans.
This table is not written during normal operation: twice per week there
is a batch program that insert about 35.000 records and updates another
40.000.

last friday morning, after that batch has been executed, the database
started responding really slowly to queries (expecially seq scans),
after a "vacuum full analize" things did get something better.
Yesterday the same: before the batch everything was perfect, after every
query was really slow, I've vacuum it again and now is ok.
Since now the db was working fine, it's 4 month's old with two updates
per week and I vacuum about once per month.

I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking
about setting this table in a kind of  "read-only" mode to improve
performance, is this possible?

Thank you for your help
Edoardo Ceccarelli

Re: slow seqscan

From
Christopher Kings-Lynne
Date:
Hi Edoardo,

> The table is well indexed so that most of the queries are executed with
> index scan but since there is a big text field in the table (360chars)
> some search operation (with certain filters) ends up with seq scans.

Please paste the exact SELECT query that uses a seqscan, plus the
EXPLAIN ANALYZE of the SELECT, and the psql output of \d <table>.

> This table is not written during normal operation: twice per week there
> is a batch program that insert about 35.000 records and updates another
> 40.000.

After such an update, you need to run VACUUM ANALYZE <table>;  Run it
before the update as well, if it doesn't take that long.

> last friday morning, after that batch has been executed, the database
> started responding really slowly to queries (expecially seq scans),
> after a "vacuum full analize" things did get something better.
> Yesterday the same: before the batch everything was perfect, after every
> query was really slow, I've vacuum it again and now is ok.
> Since now the db was working fine, it's 4 month's old with two updates
> per week and I vacuum about once per month.

You need to vacuum analyze (NOT full) once and HOUR, not once a month.
Add this command to your crontab to run once an hour and verify that
it's working:

vacuumdb -a -z -q

Otherwise, install the auto vacuum utility found in
contrib/pg_autovacuum in the postgres source.  Set this up.  It will
monitor postgres and run vacuums and analyzes when necessary.  You can
then remove your cron job.

> I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking
> about setting this table in a kind of  "read-only" mode to improve
> performance, is this possible?

There's no read only mode to improve performance.

Upgrading to 7.4 will more than likely improve the performance of your
database in general.  Be careful to read the upgrade notes because there
were a few incompatibilities.

Chris


Re: slow seqscan

From
Nick Barr
Date:
Edoardo Ceccarelli wrote:

> My first post to this list :)
>
> Scenario:
> I have a database used only with search queries with only one table that
> holds about 450.000/500.000 records.
> The table is well indexed so that most of the queries are executed with
> index scan but since there is a big text field in the table (360chars)
> some search operation (with certain filters) ends up with seq scans.
> This table is not written during normal operation: twice per week there
> is a batch program that insert about 35.000 records and updates another
> 40.000.
>
> last friday morning, after that batch has been executed, the database
> started responding really slowly to queries (expecially seq scans),
> after a "vacuum full analize" things did get something better.
> Yesterday the same: before the batch everything was perfect, after
> every query was really slow, I've vacuum it again and now is ok.
> Since now the db was working fine, it's 4 month's old with two updates
> per week and I vacuum about once per month.
>
> I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking
> about setting this table in a kind of  "read-only" mode to improve
> performance, is this possible?
>
> Thank you for your help
> Edoardo Ceccarelli
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

In general we are going to need more information, like what kind of
search filters you are using on the text field and an EXPLAIN ANALYZE.
But can you try and run the following, bearing in mind it will take a
while to complete.

REINDEX TABLE <table_name>

 From what I remember there were issues with index space not being
reclaimed in a vacuum. I believe this was fixed in 7.4. By not
reclaiming the space the indexes grow larger and larger over time,
causing PG to prefer a sequential scan over an index scan (I think).


Hope that helps

Nick



Re: slow seqscan

From
Edoardo Ceccarelli
Date:
>
> In general we are going to need more information, like what kind of
> search filters you are using on the text field and an EXPLAIN ANALYZE.
> But can you try and run the following, bearing in mind it will take a
> while to complete.
>
> REINDEX TABLE <table_name>
>
> From what I remember there were issues with index space not being
> reclaimed in a vacuum. I believe this was fixed in 7.4. By not
> reclaiming the space the indexes grow larger and larger over time,
> causing PG to prefer a sequential scan over an index scan (I think).
>
>

The query is this:
SELECT *, oid FROM annuncio400
WHERE  rubric = 'DD' AND LOWER(testo) Like LOWER('cbr%')
OFFSET 0 LIMIT 11

dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric =
'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;
                                                    QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------

Limit  (cost=0.00..3116.00 rows=11 width=546) (actual time=51.47..56.42
rows=11 loops=1)
  ->  Seq Scan on annuncio400  (cost=0.00..35490.60 rows=125 width=546)
(actual time=51.47..56.40 rows=12 loops=1)
        Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~
'cbr%'::text))
Total runtime: 56.53 msec
(4 rows)


But the strangest thing ever is that if I change the filter with another
one that represent a smaller amount of data  it uses the index scan!!!
check this (same table, same query, different rubric=MA index):

dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric =
'MA' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;
                                                         QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------


Limit  (cost=0.00..6630.72 rows=9 width=546) (actual time=42.74..42.74
rows=0 loops=1)
  ->  Index Scan using rubric on annuncio400  (cost=0.00..6968.48 rows=9
width=546) (actual time=42.73..42.73 rows=0 loops=1)
        Index Cond: (rubric = 'MA'::bpchar)
        Filter: (lower((testo)::text) ~~ 'cbr%'::text)
Total runtime: 42.81 msec
(5 rows)


Thanks for your help
Edoardo

>
>

Re: slow seqscan

From
Christopher Kings-Lynne
Date:
> dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric =
> 'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;
>                                                    QUERY
> PLAN
> --------------------------------------------------------------------------------------------------------------------
>
> Limit  (cost=0.00..3116.00 rows=11 width=546) (actual time=51.47..56.42
> rows=11 loops=1)
>  ->  Seq Scan on annuncio400  (cost=0.00..35490.60 rows=125 width=546)
> (actual time=51.47..56.40 rows=12 loops=1)
>        Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~
> 'cbr%'::text))
> Total runtime: 56.53 msec
> (4 rows)

What happens if you go:

CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric,
LOWER(testo));

or even just:

CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo));

> But the strangest thing ever is that if I change the filter with another
> one that represent a smaller amount of data  it uses the index scan!!!

What's strange about that?  The less data is going to be retrieved, the
more likely postgres is to use the index.

I suggest maybe increasing the amount of stats recorded for your rubrik
column:

ALTER TABLE annuncio400 ALTER rubrik SET STATISTICS 100;
ANALYZE annuncio400;

You could also try reducing the random_page_cost value in your
postgresql.conf a little, say to 3 (if it's currently 4).  That will
make postgres more likely to use index scans over seq scans.

Chris


Re: slow seqscan

From
Edoardo Ceccarelli
Date:
> What happens if you go:
>
> CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric,
> LOWER(testo));
>
> or even just:
>
> CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo));
>
I wasn't able to make this 2 field index with lower:

dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON
annuncio400(rubric, LOWER(testo));
ERROR:  parser: parse error at or near "(" at character 71

seems impossible to creat 2 field indexes with lower function.

The other one does not make it use the index.


>> But the strangest thing ever is that if I change the filter with
>> another one that represent a smaller amount of data  it uses the
>> index scan!!!
>
>
> What's strange about that?  The less data is going to be retrieved,
> the more likely postgres is to use the index.
>
can't understand this policy:

dba400=# SELECT count(*) from annuncio400 where rubric='DD';
 count
-------
  6753
(1 row)

dba400=# SELECT count(*) from annuncio400 where rubric='MA';
 count
-------
  2165
(1 row)

so it's using the index on 2000 rows and not for 6000?  it's not that
big difference, isn't it?


> I suggest maybe increasing the amount of stats recorded for your
> rubrik column:
>
> ALTER TABLE annuncio400 ALTER rubrik SET STATISTICS 100;
> ANALYZE annuncio400;
>
done, almost the same, still not using index

> You could also try reducing the random_page_cost value in your
> postgresql.conf a little, say to 3 (if it's currently 4).  That will
> make postgres more likely to use index scans over seq scans.
>

changed the setting on postgresql.conf, restarted the server,
nothing has changed.

what about setting this to false?
#enable_seqscan = true

thanks again
Edoardo

Re: slow seqscan

From
Edoardo Ceccarelli
Date:
tried the

enable_seqscan = false

and I'm having all index scans, timing has improved from 600ms to 18ms

wondering what other implications I might expect.




Edoardo Ceccarelli ha scritto:

>
>> What happens if you go:
>>
>> CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric,
>> LOWER(testo));
>>
>> or even just:
>>
>> CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo));
>>
> I wasn't able to make this 2 field index with lower:
>
> dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON
> annuncio400(rubric, LOWER(testo));
> ERROR:  parser: parse error at or near "(" at character 71
>
> seems impossible to creat 2 field indexes with lower function.
>
> The other one does not make it use the index.
>
>
>>> But the strangest thing ever is that if I change the filter with
>>> another one that represent a smaller amount of data  it uses the
>>> index scan!!!
>>
>>
>>
>> What's strange about that?  The less data is going to be retrieved,
>> the more likely postgres is to use the index.
>>
> can't understand this policy:
>
> dba400=# SELECT count(*) from annuncio400 where rubric='DD';
> count
> -------
>  6753
> (1 row)
>
> dba400=# SELECT count(*) from annuncio400 where rubric='MA';
> count
> -------
>  2165
> (1 row)
>
> so it's using the index on 2000 rows and not for 6000?  it's not that
> big difference, isn't it?
>
>
>> I suggest maybe increasing the amount of stats recorded for your
>> rubrik column:
>>
>> ALTER TABLE annuncio400 ALTER rubrik SET STATISTICS 100;
>> ANALYZE annuncio400;
>>
> done, almost the same, still not using index
>
>> You could also try reducing the random_page_cost value in your
>> postgresql.conf a little, say to 3 (if it's currently 4).  That will
>> make postgres more likely to use index scans over seq scans.
>>
>
> changed the setting on postgresql.conf, restarted the server,
> nothing has changed.
>
> what about setting this to false?
> #enable_seqscan = true
>
> thanks again
> Edoardo
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>
>

Re: slow seqscan

From
Christopher Kings-Lynne
Date:
> enable_seqscan = false
>
> and I'm having all index scans, timing has improved from 600ms to 18ms
>
> wondering what other implications I might expect.

Lots of really bad implications...it's really not a good idea.

Chris


Re: slow seqscan

From
Edoardo Ceccarelli
Date:
just created a copy of the same database and it shows that is the
analyze that's messing things:

Slow seqscan query executed on dba400

dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric =
'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;
                                                     QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..3116.00 rows=11 width=546) (actual time=46.66..51.40
rows=11 loops=1)
   ->  Seq Scan on annuncio400  (cost=0.00..35490.60 rows=125 width=546)
(actual time=46.66..51.38 rows=12 loops=1)
         Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~
'cbr%'::text))
 Total runtime: 51.46 msec
(4 rows)


fastest index scan query on dba400b (exact copy of dba400)


dba400b=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric =
'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;
                                                          QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..7058.40 rows=9 width=546) (actual time=1.36..8.18
rows=11 loops=1)
   ->  Index Scan using rubric on annuncio400  (cost=0.00..7369.42
rows=9 width=546) (actual time=1.35..8.15 rows=12 loops=1)
         Index Cond: (rubric = 'DD'::bpchar)
         Filter: (lower((testo)::text) ~~ 'cbr%'::text)
 Total runtime: 8.28 msec
(5 rows)



what about this index you suggested? it gives me sintax error while
trying to create it:

CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric,
LOWER(testo));


Thanks
Edoardo

Christopher Kings-Lynne ha scritto:

>
>> enable_seqscan = false
>>
>> and I'm having all index scans, timing has improved from 600ms to 18ms
>>
>> wondering what other implications I might expect.
>
>
> Lots of really bad implications...it's really not a good idea.
>
> Chris
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>      joining column's datatypes do not match
>
>
>

Re: slow seqscan

From
Stephan Szabo
Date:
On Wed, 21 Apr 2004, Edoardo Ceccarelli wrote:

>
> > What happens if you go:
> >
> > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric,
> > LOWER(testo));
> >
> > or even just:
> >
> > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo));
> >
> I wasn't able to make this 2 field index with lower:
>
> dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON
> annuncio400(rubric, LOWER(testo));
> ERROR:  parser: parse error at or near "(" at character 71

That's a 7.4 feature I think (and I think the version with two columns
may need extra parens around the lower()). I think the only way to do
something equivalent in 7.3 is to make a function that concatenates the
two in some fashion after having applied the lower to the one part and
then using that in the queries as well.  Plus, if you're not in "C"
locale, I'm not sure that it'd help in 7.3 anyway.

> >> But the strangest thing ever is that if I change the filter with
> >> another one that represent a smaller amount of data  it uses the
> >> index scan!!!
> >
> >
> > What's strange about that?  The less data is going to be retrieved,
> > the more likely postgres is to use the index.
> >
> can't understand this policy:
>
> dba400=# SELECT count(*) from annuncio400 where rubric='DD';
>  count
> -------
>   6753
> (1 row)
>
> dba400=# SELECT count(*) from annuncio400 where rubric='MA';
>  count
> -------
>   2165
> (1 row)
>
> so it's using the index on 2000 rows and not for 6000?  it's not that
> big difference, isn't it?

It's a question of how many pages it thinks it's going to have to retrieve
in order to handle the request.  If it say needs (or think it needs) to
retrieve 50% of the pages, then given a random_page_cost of 4, it's going
to expect the index scan to be about twice the cost.

Generally speaking one good way to compare is to try the query with
explain analyze and then change parameters like enable_seqscan and try the
query with explain analyze again and compare the estimated rows and costs.
That'll give an idea of how it expects the two versions of the query to
compare speed wise.

Re: slow seqscan

From
Edoardo Ceccarelli
Date:
>>can't understand this policy:
>>
>>dba400=# SELECT count(*) from annuncio400 where rubric='DD';
>> count
>>-------
>>  6753
>>(1 row)
>>
>>dba400=# SELECT count(*) from annuncio400 where rubric='MA';
>> count
>>-------
>>  2165
>>(1 row)
>>
>>so it's using the index on 2000 rows and not for 6000?  it's not that
>>big difference, isn't it?
>>
>>
>
>It's a question of how many pages it thinks it's going to have to retrieve
>in order to handle the request.  If it say needs (or think it needs) to
>retrieve 50% of the pages, then given a random_page_cost of 4, it's going
>to expect the index scan to be about twice the cost.
>
>Generally speaking one good way to compare is to try the query with
>explain analyze and then change parameters like enable_seqscan and try the
>query with explain analyze again and compare the estimated rows and costs.
>That'll give an idea of how it expects the two versions of the query to
>compare speed wise.
>
>
>
>
Ok then how do you explain this?
just created a copy of the same database

Slow seqscan query executed on dba400

dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric =
'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;
                                                    QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------

Limit  (cost=0.00..3116.00 rows=11 width=546) (actual time=46.66..51.40
rows=11 loops=1)
  ->  Seq Scan on annuncio400  (cost=0.00..35490.60 rows=125 width=546)
(actual time=46.66..51.38 rows=12 loops=1)
        Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~
'cbr%'::text))
Total runtime: 51.46 msec
(4 rows)


fastest index scan query on dba400b (exact copy of dba400)


dba400b=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric =
'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;
                                                         QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------


Limit  (cost=0.00..7058.40 rows=9 width=546) (actual time=1.36..8.18
rows=11 loops=1)
  ->  Index Scan using rubric on annuncio400  (cost=0.00..7369.42 rows=9
width=546) (actual time=1.35..8.15 rows=12 loops=1)
        Index Cond: (rubric = 'DD'::bpchar)
        Filter: (lower((testo)::text) ~~ 'cbr%'::text)
Total runtime: 8.28 msec
(5 rows)


anyway, shall I try to lower the random_page value since I get an index
scan? I mean that in my case I've already noted that with index scan
that query get executed in 1/10 of the seqscan speed.

Thank you
Edoardo

Re: slow seqscan

From
Tom Lane
Date:
Edoardo Ceccarelli <eddy@axa.it> writes:
> I wasn't able to make this 2 field index with lower:

> dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON
> annuncio400(rubric, LOWER(testo));
> ERROR:  parser: parse error at or near "(" at character 71

> seems impossible to creat 2 field indexes with lower function.

You need 7.4 to do that; previous releases don't support multi-column
functional indexes.

            regards, tom lane