Thread: Enabling and Disabling Sequencial Scan

Enabling and Disabling Sequencial Scan

From
Yusuf
Date:
In the application, that I'm working on, I have a query that'll be a lot
60% faster if I disable sequential scan forcing it to you my index.

Is it bad practice to disable sequential scan ( set
enable_seqscan=false), run my query then enable sequential scan,
whenever I'm running this query?  Why?

Thanks in advance

- David Wendy


Re: Enabling and Disabling Sequencial Scan

From
"scott.marlowe"
Date:
On Fri, 30 May 2003, Yusuf wrote:

> In the application, that I'm working on, I have a query that'll be a lot
> 60% faster if I disable sequential scan forcing it to you my index.
>
> Is it bad practice to disable sequential scan ( set
> enable_seqscan=false), run my query then enable sequential scan,
> whenever I'm running this query?  Why?

setting seqscan to off is more of a troubleshooting tool than a tuning
tool, albeit sometimes it's the only tuning tool that MIGHT work.

Once you've determined that the database is picking the wrong plan when
you turn seqscan back on, you need to figure out how to convince the
database to use the right plan more often.

The best parameters to change and see how they affect this are the
*cost* parameters and the effective cache size.

show all; will show them to you, the ones we're interested in are these:

NOTICE:  effective_cache_size is 100000
NOTICE:  random_page_cost is 1
NOTICE:  cpu_tuple_cost is 0.01
NOTICE:  cpu_index_tuple_cost is 0.0001
NOTICE:  cpu_operator_cost is 0.0025

To change them for one session, just use the set command.  To make the
changes the permanent default, edit the $PGDATA/postgresql.conf file.

effective_cache_size tells the planner about how big the kernel's file
level cache is.  On my machine it's about 800 meg. It's measured in 8k
blocks, so 100,000 * 8k ~ 800 meg.  The smaller this is, the more likely
the database will have to access the hard drive, and therefore the more
likely it will pick a seqscan if the other numbers point to it.

random_page_cost tells the planner how much more a random page access
costs.  The default is 4.  Most systems seem to work well with numbers
from 1 to 2.

lowering the cpu_index_tuple_cost also favors index scans.



Re: Enabling and Disabling Sequencial Scan

From
Robert Creager
Date:
On Fri, 30 May 2003 14:46:12 -0600 (MDT)
"scott.marlowe" <scott.marlowe@ihs.com> said something like:

>
> level cache is.  On my machine it's about 800 meg. It's measured in 8k > blocks, so 100,000 * 8k ~ 800 meg.  The
smallerthis is, the more  

Any thoughts on how to figure this out (disk buffer size)?  For some reason, my system (2xAMD 2800+, 2Gb RAM 2.4.21 -
/proc/meminfo)only shows a usage of  88kb of 'Buffers' usage, and that never changes.  My 'Cached' usage is 1.7Gb.
I'vehit the kernel mailing list, and the one response I got said don't worry about it :-( 

Cheers,
Rob

--
O_

Attachment

Re: Enabling and Disabling Sequencial Scan

From
"Nikolaus Dilger"
Date:
David,

I say go ahead and use it since you get a significant
performance gain.  This is a special case where you
know more about your data than the planer does with
general system wide settings.  In Oracle you could use
"hints".  Since there are no hints in PostgreSQL
disabling and reenabling an option just before and
after a query has the same effect.

Regards,
Nikolaus

On Fri, 30 May 2003 16:33:07 -0400, Yusuf wrote:

>
> In the application, that I'm working on, I have a
query
> that'll be a lot
> 60% faster if I disable sequential scan forcing it to
> you my index.
>
> Is it bad practice to disable sequential scan ( set
> enable_seqscan=false), run my query then enable
> sequential scan,
> whenever I'm running this query?  Why?
>
> Thanks in advance
>
> - David Wendy
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org

why Sequencial Scan when selecting on primary key of table?

From
"Brian Tarbox"
Date:
I have a simple table with a dozen integer fields and a primary key.

When I say "explain select * from Patient where Patient_primary_key = 100"

I get sequential scan.

I've just converted my application from MySQL and am seeing everything run
about 3X slower.  What do I have to do to get postgres to use indexes?

Brian Tarbox


Re: why Sequencial Scan when selecting on primary key of table?

From
Andrew Sullivan
Date:
On Sat, May 31, 2003 at 12:30:40PM -0400, Brian Tarbox wrote:
> I have a simple table with a dozen integer fields and a primary key.
>
> When I say "explain select * from Patient where Patient_primary_key = 100"
>
> I get sequential scan.
>
> I've just converted my application from MySQL and am seeing everything run
> about 3X slower.  What do I have to do to get postgres to use indexes?

Usual questions: have you vacuumed?  EXPLAIN ANALYSE output, schema,
&c.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: why Sequencial Scan when selecting on primary key of table?

From
Tom Lane
Date:
"Brian Tarbox" <btarbox@theworld.com> writes:
> When I say "explain select * from Patient where Patient_primary_key = 100"
> I get sequential scan.

Perhaps Patient_primary_key is not an integer field?  If not, you need
to cast the constant 100 to the right type.  Or write '100' with
single quotes around it, which leaves Postgres to choose the constant's
datatype.  (Yeah, I know, it's a pain in the neck.  We've had a lot of
discussions about how to fix this without breaking datatype extensibility;
no luck so far.)

            regards, tom lane

Re: why Sequencial Scan when selecting on primary key of table?

From
"Brian Tarbox"
Date:
The primary key field is an integer and I have performed vacuum analyse but
that does not seem to change anything.

I've also heard that postgres will not indexes when JOINing tables.  Can
that really be true??

Brian


-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Saturday, May 31, 2003 1:14 PM
To: Brian Tarbox
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] why Sequencial Scan when selecting on primary key
of table?


"Brian Tarbox" <btarbox@theworld.com> writes:
> When I say "explain select * from Patient where Patient_primary_key = 100"
> I get sequential scan.

Perhaps Patient_primary_key is not an integer field?  If not, you need
to cast the constant 100 to the right type.  Or write '100' with
single quotes around it, which leaves Postgres to choose the constant's
datatype.  (Yeah, I know, it's a pain in the neck.  We've had a lot of
discussions about how to fix this without breaking datatype extensibility;
no luck so far.)

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: why Sequencial Scan when selecting on primary key of table?

From
Rod Taylor
Date:
On Sat, 2003-05-31 at 13:13, Tom Lane wrote:
> "Brian Tarbox" <btarbox@theworld.com> writes:
> > When I say "explain select * from Patient where Patient_primary_key = 100"
> > I get sequential scan.
>
> Perhaps Patient_primary_key is not an integer field?  If not, you need
> to cast the constant 100 to the right type.  Or write '100' with
> single quotes around it, which leaves Postgres to choose the constant's
> datatype.

Out of curiosity, why don't we confirm the unquoted value is an integer,
numeric, etc, then change it into type 'unknown'?  From that point
forward it would be treated like it's quoted counterpart.

Is this noticeably slower or am I missing something?

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: why Sequencial Scan when selecting on primary key of table?

From
Tom Lane
Date:
"Brian Tarbox" <btarbox@theworld.com> writes:
> The primary key field is an integer and I have performed vacuum analyse but
> that does not seem to change anything.

Hm.  So how big is the table, exactly?  On small tables a seqscan will
be preferred because the extra I/O to examine the index costs more than
the CPU to examine all the tuples on a disk page.

> I've also heard that postgres will not indexes when JOINing tables.  Can
> that really be true??

We have some join methods that like indexes and we have some that find
no benefit in 'em.  Again, testing on toy-size tables is not a good
guide to what will happen on larger tables.

            regards, tom lane

Re: why Sequencial Scan when selecting on primary key of table?

From
Andrew Sullivan
Date:
On Sat, May 31, 2003 at 01:45:50PM -0400, Brian Tarbox wrote:
> The primary key field is an integer and I have performed vacuum analyse but
> that does not seem to change anything.

int4?  int8?  int2?  Makes a difference.  Please post the results of
EXPLAIN ANALYSE on the query you're having trouble with, and someone
may be able to help you.  (You'll need to show us the table, too.)

> I've also heard that postgres will not indexes when JOINing tables.  Can
> that really be true??

No.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: why Sequencial Scan when selecting on primary key of table?

From
Tom Lane
Date:
Rod Taylor <rbt@rbt.ca> writes:
> Out of curiosity, why don't we confirm the unquoted value is an integer,
> numeric, etc, then change it into type 'unknown'?

UNKNOWNNUMERIC is one of the ideas that's been proposed, but it's not
clear to me that it is better than other alternatives.  In particular,
I don't like losing knowledge of the form and size of the constant.
Something like "WHERE int4col = 4.8" should yield FALSE, not "ERROR:
pg_atoi: unable to parse '4.8'" which is what you're likely to get with
a naive "unknown numeric type" approach.  A perhaps-more-realistic
objection is that it only solves the problem for trivial "var = const"
cases.  As soon as you look at even slightly more complicated
expressions, it stops doing much good.

I'm still of the opinion that the best solution in the long run is to
get rid of most of the cross-datatype numeric operators, but there are
pitfalls there too.  See last thread on the issue:
http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php

            regards, tom lane

are views typically any faster/slower than equivilent joins?

From
"Brian Tarbox"
Date:
I am working with a highly normalized database.  Most of my meaningful
queries involve joins from a primary table to 4-6 secondary tables.

Would my query performance be significantly faster/slower using a View as
opposed to a prepared query using join?

(Assume all join fields are ints, say 10,000 records in main table and a few
dozen records in each of the secondary tables).

Thank you.

Brian Tarbox


Re: are views typically any faster/slower than equivilent joins?

From
Rod Taylor
Date:
> Would my query performance be significantly faster/slower using a View as
> opposed to a prepared query using join?

I missed this part.  Views and prepared queries are not the same time.
Use of a view still needs to be optimized.

Prepared queries will run the optimization portion on the entire query
including the view segments of it.  Think of a view as a MACRO.
Depending on the context of what surrounds it, the view may be executed
very differently.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: are views typically any faster/slower than equivilent joins?

From
Rod Taylor
Date:
On Sat, 2003-05-31 at 22:55, Brian Tarbox wrote:
> I am working with a highly normalized database.  Most of my meaningful
> queries involve joins from a primary table to 4-6 secondary tables.
>
> Would my query performance be significantly faster/slower using a View as
> opposed to a prepared query using join?

There are some corner cases where a view would be slower than a standard
query in 7.3 (bug fix / disabled optimization -- fixed right in 7.4),
but generally you can assume it will be about the same speed.


Some views such as unions will not be as fast as you would like, but
thats a general issue with PostgreSQLs inability to throw away selects
when it won't find results on one side of a union.

CREATE VIEW sales AS SELECT * FROM sales_archive_2002 UNION ALL SELECT *
FROM sales_current;


SELECT * FROM sales WHERE timestamp => CURRENT_TIMESTAMP - INTERVAL '1
day';

The above query would not be so quick.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: are views typically any faster/slower than equivilent joins?

From
Bruno Wolff III
Date:
On Sun, Jun 01, 2003 at 00:02:39 -0400,
  Rod Taylor <rbt@rbt.ca> wrote:
>
> Some views such as unions will not be as fast as you would like, but
> thats a general issue with PostgreSQLs inability to throw away selects
> when it won't find results on one side of a union.
>
> CREATE VIEW sales AS SELECT * FROM sales_archive_2002 UNION ALL SELECT *
> FROM sales_current;
>
>
> SELECT * FROM sales WHERE timestamp => CURRENT_TIMESTAMP - INTERVAL '1
> day';
>
> The above query would not be so quick.

I thought some work had been done on pushing where conditions down into
unions? If so the above wouldn't be too bad. It would still look at
the archive table, but it should return no rows relatively quickly
assuming an appropiate index exists.

Re: are views typically any faster/slower than

From
Rod Taylor
Date:
On Sun, 2003-06-01 at 01:43, Bruno Wolff III wrote:
> On Sun, Jun 01, 2003 at 00:02:39 -0400,
>   Rod Taylor <rbt@rbt.ca> wrote:
> >
> > Some views such as unions will not be as fast as you would like, but
> > thats a general issue with PostgreSQLs inability to throw away selects
> > when it won't find results on one side of a union.
> >
> > CREATE VIEW sales AS SELECT * FROM sales_archive_2002 UNION ALL SELECT *
> > FROM sales_current;
> >
> >
> > SELECT * FROM sales WHERE timestamp => CURRENT_TIMESTAMP - INTERVAL '1
> > day';
> >
> > The above query would not be so quick.
>
> I thought some work had been done on pushing where conditions down into
> unions? If so the above wouldn't be too bad. It would still look at
> the archive table, but it should return no rows relatively quickly
> assuming an appropiate index exists.

Certainly, if the index exists it won't be so bad (for any single
archive table).  It's when the index doesn't exist or there are several
hundred archive tables then it starts to get a little worse.

Anyway, anyone doing the above in PostgreSQL should probably be looking
at partial indexes and merging the information back into a single table
again.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: Enabling and Disabling Sequencial Scan

From
"scott.marlowe"
Date:
On Fri, 30 May 2003, Robert Creager wrote:

> On Fri, 30 May 2003 14:46:12 -0600 (MDT)
> "scott.marlowe" <scott.marlowe@ihs.com> said something like:
>
> >
> > level cache is.  On my machine it's about 800 meg. It's measured in 8k > blocks, so 100,000 * 8k ~ 800 meg.  The
smallerthis is, the more  
>
> Any thoughts on how to figure this out (disk buffer size)?  For some
> reason, my system (2xAMD 2800+, 2Gb RAM 2.4.21 - /proc/meminfo) only
> shows a usage of  88kb of 'Buffers' usage, and that never changes.  My
> 'Cached' usage is 1.7Gb.  I've hit the kernel mailing list, and the one
> response I got said don't worry about it :-(

Are you sure that's not 88213Kb or so of buffers?  88kb is awfully small.

It's normal to have a cache size many times larger than the buffer size.
Buffers are assigned to individual disks, and sit under the larger single
pool that is the cache.

I just take the approximate size of the cache under load and use that for
the effective_cache_size.  Since it's pretty much a "fudge factor"
variable anyway.

P.s. My use of the term fudge factor here is in no way meant to be
derogatory.  It's just that as long as the effective cache size is within
some reasonable range of the actual cache/buffer in the machine, it'll be
close enough to push the query planner in the right direction.

Note that you can adopt two philosophies on the planner.  One is that the
planner will always make certain mistakes, and you've got to fool it in
order to get the right query plan.

The other philosophy is that you give the query planner all the variables
you can reasonably give it to let it decide the proper course of action,
and you fine tune each one so that eventually it makes the right choice
for all the querys you throw at it.

While the first philosophy provides for the fastest functional solutions
on a static platform (i.e. we're running postgresql 7.0.2 and aren't going
to upgrade.) but it kind of supports the idea that the query planner can
never be given the right information and programmed with the right code to
make the right decision 99% of the time, and when it makes the wrong
decision, it's only a little wrong.

The second choice will require you to spend more time fine tuning all the
parameters fed to the query planner with your own queries using explain
analyze and repeated testing with different settings.

What I look for are the corner cases.  I.e. if I do some select that
returns 500 records with a seq scan, and it takes 5 seconds, and with 450
records it switches to index scan and takes 1 second, then likely the
planner is choosing to switch to seq scans too quickly when I raise the
result size from 450 to 500.

At this point use the set seq_scan option to test the database
performance with it on and off and increasing set size.

Somewhere around 2,000 or so in this scenario, we'll notice that the seq
scan has now the same speed as the index scan, and as we raise the number
of rows we are getting, the index scan would now be slower than the seq
scan.

Assuming we set effective_cache_size right at the beginning, we now can
turn seq_scan back on, and adjust the default cost options until the
planner chooses a seq scan at the break point we found (in our imaginary
case of 2000).  It doesn't have to be perfect, since the performance at or
around the break point is similar for index and seq scans alike.

Then, throw the next query at it and see how it does.

I've found that on fast machines, it's good to lower the cpu costs,
especially the index one.  I usually drop these by a divisor of 2 to 10.
For the random_page_cost, settings of 1.x to 2.x seem a good choice for
fast I/O subsystems.


Re: Enabling and Disabling Sequencial Scan

From
"scott.marlowe"
Date:
On Fri, 30 May 2003, Robert Creager wrote:

> On Fri, 30 May 2003 14:46:12 -0600 (MDT)
> "scott.marlowe" <scott.marlowe@ihs.com> said something like:
>
> >
> > level cache is.  On my machine it's about 800 meg. It's measured in 8k
> > blocks, so 100,000 * 8k ~ 800 meg.  The smaller this is, the more
>
> My 'Cached' usage is 1.7Gb.  I've hit the kernel mailing list, and the
> one response I got said don't worry about it :-(

Oh, yeah, just a bit on that.  as far as the kernel developers are
concerned, the buffer / cache is working perfectly, and they're right, it
is.  What they probably don't understand if your need to tell postgresql
how much cache/buffer is allocated to it.

so don't worry about the kernel, the linux kernel really is pretty good at
caching disk access.