Thread: disabling seq scans

disabling seq scans

From

"Marcus Andree S. Magalhaes"

Date:

01 April 2004, 16:29:06

Hi, guys,

We're experiencing a little problem with one of our queries.
It isn't using an index specially created for it. When we
disable seq scans with psql, we can ensure the query finishes
much faster than without using index, as it should be.

So, whats the best procedure in this case, but when have a
JDBC based client? Do we mess around with planner
settings even when all other queries are using the best
index for them?

Is it safe (but some may find ugly) to issue a command to
disable seq scanning from the java side?

Since we're using the pooled connection classes that comes
with the JDBC3 driver, once a connection is got from the pool,
do we need to explicitly set seq scanning to true? This is
assuming the later option is the more recommended one...

TIA

Re: disabling seq scans

From

Sean Shanny

Date:

01 April 2004, 16:47:16

I do the following in several reports I run.....

statement = m_conn.createStatement();
statement.executeUpdate( "set enable_seqscan = false" );
do your thing....
statement.executeUpdate( "set enable_seqscan = true" );

--sean

Marcus Andree S. Magalhaes wrote:

>Hi, guys,
>
>We're experiencing a little problem with one of our queries.
>It isn't using an index specially created for it. When we
>disable seq scans with psql, we can ensure the query finishes
>much faster than without using index, as it should be.
>
>So, whats the best procedure in this case, but when have a
>JDBC based client? Do we mess around with planner
>settings even when all other queries are using the best
>index for them?
>
>Is it safe (but some may find ugly) to issue a command to
>disable seq scanning from the java side?
>
>Since we're using the pooled connection classes that comes
>with the JDBC3 driver, once a connection is got from the pool,
>do we need to explicitly set seq scanning to true? This is
>assuming the later option is the more recommended one...
>
>TIA
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend
>
>
>

Re: disabling seq scans

From

"Marcus Andree S. Magalhaes"

Date:

01 April 2004, 17:12:56

Yes, that is what I thought 1st...
But, since we're using pooled connections, we must be extra
careful and ensure all connections have the enable_seqscan
reset to its default.
So, the main question is:

Do the pooling system resets the "enable_seqscan" variable
when the connections are delivered to the clients or we
*must* do it ourselves?

Another issue: when enable_seqscan is false, I found that
the sequential scanning is performed normally (as in
"select count (*) from <table>") but have them any performance
issues?

Last one... This time, specifically to the java side: what
do you think about adding new methods, say, setEnableSeqScan(boolean)
and getEnableSeqScan() to our (by our I mean
postgres) drivers?

> I do the following in several reports I run.....
>
> statement = m_conn.createStatement();
> statement.executeUpdate( "set enable_seqscan = false" );
> do your thing....
> statement.executeUpdate( "set enable_seqscan = true" );
>
> --sean
>
> Marcus Andree S. Magalhaes wrote:
>
>>Hi, guys,
>>
>>We're experiencing a little problem with one of our queries.
>>It isn't using an index specially created for it. When we
>>disable seq scans with psql, we can ensure the query finishes
>>much faster than without using index, as it should be.
>>
>>So, whats the best procedure in this case, but when have a
>>JDBC based client? Do we mess around with planner
>>settings even when all other queries are using the best
>>index for them?
>>
>>Is it safe (but some may find ugly) to issue a command to
>>disable seq scanning from the java side?
>>
>>Since we're using the pooled connection classes that comes
>>with the JDBC3 driver, once a connection is got from the pool,
>>do we need to explicitly set seq scanning to true? This is
>>assuming the later option is the more recommended one...
>>
>>TIA
>>
>>
>>
>>---------------------------(end of
>> broadcast)--------------------------- TIP 8: explain analyze is your
>> friend
>>
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>       joining column's datatypes do not match

Re: disabling seq scans

From

Sean Shanny

Date:

01 April 2004, 17:32:14

See answers in-line....

Marcus Andree S. Magalhaes wrote:

>Yes, that is what I thought 1st...
>But, since we're using pooled connections, we must be extra
>careful and ensure all connections have the enable_seqscan
>reset to its default.
>So, the main question is:
>
>Do the pooling system resets the "enable_seqscan" variable
>when the connections are delivered to the clients or we
>*must* do it ourselves?
>
>
We use pooling as well and there is NO logic in the pool that would
reset anything on a connection.  It is up to the user of the connection
to set it back to an acceptable state, hence the call to set it back to
true prior to returning the connection back to the pool.

>Another issue: when enable_seqscan is false, I found that
>the sequential scanning is performed normally (as in
>"select count (*) from <table>") but have them any performance
>issues?
>
>
Not sure I understand you question.  I set enable_indexscan=false in my
code as I am working with a 350GB DB with some tables having 100 million
rows.  With the hardware we have it is sometimes faster to do a full
table scan the bounce around with indexes.

>Last one... This time, specifically to the java side: what
>do you think about adding new methods, say, setEnableSeqScan(boolean)
>and getEnableSeqScan() to our (by our I mean
>postgres) drivers?
>
>
This would be up the folks that write the code, I am just a user. :-)

>
>
>>I do the following in several reports I run.....
>>
>>statement = m_conn.createStatement();
>>statement.executeUpdate( "set enable_seqscan = false" );
>>do your thing....
>>statement.executeUpdate( "set enable_seqscan = true" );
>>
>>--sean
>>
>>Marcus Andree S. Magalhaes wrote:
>>
>>
>>
>>>Hi, guys,
>>>
>>>We're experiencing a little problem with one of our queries.
>>>It isn't using an index specially created for it. When we
>>>disable seq scans with psql, we can ensure the query finishes
>>>much faster than without using index, as it should be.
>>>
>>>So, whats the best procedure in this case, but when have a
>>>JDBC based client? Do we mess around with planner
>>>settings even when all other queries are using the best
>>>index for them?
>>>
>>>Is it safe (but some may find ugly) to issue a command to
>>>disable seq scanning from the java side?
>>>
>>>Since we're using the pooled connection classes that comes
>>>with the JDBC3 driver, once a connection is got from the pool,
>>>do we need to explicitly set seq scanning to true? This is
>>>assuming the later option is the more recommended one...
>>>
>>>TIA
>>>
>>>
>>>
>>>---------------------------(end of
>>>broadcast)--------------------------- TIP 8: explain analyze is your
>>>friend
>>>
>>>
>>>
>>>
>>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 9: the planner will ignore your desire to choose an index scan if
>>your
>>      joining column's datatypes do not match
>>
>>
>
>
>
>
>
>

Re: disabling seq scans

From

"scott.marlowe"

Date:

01 April 2004, 17:36:52

On Thu, 1 Apr 2004, Marcus Andree S. Magalhaes wrote:

>
> Hi, guys,
>
> We're experiencing a little problem with one of our queries.
> It isn't using an index specially created for it. When we
> disable seq scans with psql, we can ensure the query finishes
> much faster than without using index, as it should be.
>
> So, whats the best procedure in this case, but when have a
> JDBC based client? Do we mess around with planner
> settings even when all other queries are using the best
> index for them?
>
> Is it safe (but some may find ugly) to issue a command to
> disable seq scanning from the java side?
>
> Since we're using the pooled connection classes that comes
> with the JDBC3 driver, once a connection is got from the pool,
> do we need to explicitly set seq scanning to true? This is
> assuming the later option is the more recommended one...

What's likely happening is that the queries that are choosing the right
plan (i.e. the other queries) are such obviously bad candidates for a seq
scan that they still get an index scan.  It might be that as you increase
the % of the table being read by those other queries that they might
switch to a seq scan too soon for your setup.

I'm assuming you've read the tuning guide here:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

And upped shared buffers, effect_cache_size, etc...

random page costs of 1.0 to 1.4 make sense for certain setups with lots of
RAM and medium to small datasets that can (mostly) fit in memory.  On our
machine with 2G ram at work, that works out to about 1.2 to 1.3 as
optimal.

Re: disabling seq scans

From

Kris Jurka

Date:

01 April 2004, 18:40:07

On Thu, 1 Apr 2004, Marcus Andree S. Magalhaes wrote:

> Last one... This time, specifically to the java side: what
> do you think about adding new methods, say, setEnableSeqScan(boolean)
> and getEnableSeqScan() to our (by our I mean
> postgres) drivers?
>

I don't see this as a useful general purpose method worth adding.  What
about all the other GUC parameters?  There was a discussion a while ago
about making these and other variables configurable through the connection
URL or datasource, but that would be to establish defaults, not to modify
it on the fly.

http://archives.postgresql.org/pgsql-jdbc/2004-02/msg00022.php

Kris Jurka

Re: disabling seq scans

From

"Marcus Andree S. Magalhaes"

Date:

01 April 2004, 18:51:24

<snip>

> We use pooling as well and there is NO logic in the pool that would
> reset anything on a connection.  It is up to the user of the connection
> to set it back to an acceptable state, hence the call to set it back to
> true prior to returning the connection back to the pool.
>

Ok. So we should set the connections to a default, accepted state
before using it.

>>Another issue: when enable_seqscan is false, I found that
>>the sequential scanning is performed normally (as in
>>"select count (*) from <table>") but have them any performance
>>issues?
>>
>>
> Not sure I understand you question.  I set enable_indexscan=false in my
> code as I am working with a 350GB DB with some tables having 100 million
>  rows.  With the hardware we have it is sometimes faster to do a full
> table scan the bounce around with indexes.
>

Let me try to explain better. I understand that, in some circumstances,
a seq scan can be faster than using indexes, so I'm talking about a
more general, or theoretical point of view.

Consider two queries, identical and causing a sequential scan on a
more or less large table (we have about ~150k records). The system
they're running differ
only in a single setting: enable_indexscan is false in one system and
true in the other. Which one is faster? Or they should _not_ be
affected by this setting?

Sorry if it sounded like a 4th grade question, but I believe I
made my point clearer now. ;-)


>
>>
>>
>>>I do the following in several reports I run.....
>>>
>>>statement = m_conn.createStatement();
>>>statement.executeUpdate( "set enable_seqscan = false" );
>>>do your thing....
>>>statement.executeUpdate( "set enable_seqscan = true" );
>>>
>>>--sean
>>>
>>>Marcus Andree S. Magalhaes wrote:
>>>
>>>
>>>
>>>>Hi, guys,
>>>>
>>>>We're experiencing a little problem with one of our queries.
>>>>It isn't using an index specially created for it. When we
>>>>disable seq scans with psql, we can ensure the query finishes
>>>>much faster than without using index, as it should be.
>>>>
>>>>So, whats the best procedure in this case, but when have a
>>>>JDBC based client? Do we mess around with planner
>>>>settings even when all other queries are using the best
>>>>index for them?
>>>>
>>>>Is it safe (but some may find ugly) to issue a command to
>>>>disable seq scanning from the java side?
>>>>
>>>>Since we're using the pooled connection classes that comes
>>>>with the JDBC3 driver, once a connection is got from the pool,
>>>>do we need to explicitly set seq scanning to true? This is
>>>>assuming the later option is the more recommended one...
>>>>
>>>>TIA
>>>>
>>>>
>>>>
>>>>---------------------------(end of
>>>>broadcast)--------------------------- TIP 8: explain analyze is your
>>>> friend
>>>>
>>>>
>>>>
>>>>
>>>>
>>>---------------------------(end of
>>> broadcast)--------------------------- TIP 9: the planner will ignore
>>> your desire to choose an index scan if your
>>>      joining column's datatypes do not match
>>>
>>>
>>
>>
>>
>>
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

Re: disabling seq scans

From

"Marcus Andree S. Magalhaes"

Date:

01 April 2004, 18:53:19

Thanks, Kris.

I remembr this thread. Searched the archives for a hint but found
nothing that could happen.

Your point is clear and I believe a local solution is the best answer
for this problem, be it messing around with planner settings or
by sending a "set enable_seqscan=false" to the backend.
>
>
> On Thu, 1 Apr 2004, Marcus Andree S. Magalhaes wrote:
>
>> Last one... This time, specifically to the java side: what
>> do you think about adding new methods, say, setEnableSeqScan(boolean)
>> and getEnableSeqScan() to our (by our I mean
>> postgres) drivers?
>>
>
> I don't see this as a useful general purpose method worth adding.  What
> about all the other GUC parameters?  There was a discussion a while ago
> about making these and other variables configurable through the
> connection  URL or datasource, but that would be to establish defaults,
> not to modify  it on the fly.
>
> http://archives.postgresql.org/pgsql-jdbc/2004-02/msg00022.php
>
> Kris Jurka
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: disabling seq scans

From

"Marcus Andree S. Magalhaes"

Date:

01 April 2004, 19:00:58

<snip>
> What's likely happening is that the queries that are choosing the right
> plan (i.e. the other queries) are such obviously bad candidates for a
> seq  scan that they still get an index scan.  It might be that as you
> increase  the % of the table being read by those other queries that they
> might  switch to a seq scan too soon for your setup.
>

Agreed. This seems to be the "better of all" solution, but, in our case,
it's not that feasible. We can't reproduce the load, now, on a spare
and identical machine and begin testing differente parameters.

On the other side, we just can't restart the server to endure the proper
values are loaded and provide a standard reference to any tuning...

> I'm assuming you've read the tuning guide here:
>
> http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
>

Yes. I've read them. That's why I asked if we can deal with it in
another way (sending an enable_seqscan = false) without creating
colateral damage to other parts of the program that use the
same (pooled) connection

> And upped shared buffers, effect_cache_size, etc...
>

Shared buffers was set up a couple weeks ago. Didn't change the
default values to planner-specific variables. I guess the random page
cost is set to 4. So, I think a bit weird a seq scan is choosen, after all
(yes, we do constant vacuum analyze).

> random page costs of 1.0 to 1.4 make sense for certain setups with lots
> of  RAM and medium to small datasets that can (mostly) fit in memory.
> On our  machine with 2G ram at work, that works out to about 1.2 to 1.3
> as  optimal.
>

Un-register

From

"Sanjay Singh"

Date:

01 April 2004, 23:57:19

Re: disabling seq scans

From

Tom Lane

Date:

02 April 2004, 00:16:50

"Marcus Andree S. Magalhaes" <marcus.magalhaes@vlinfo.com.br> writes:
> But, since we're using pooled connections, we must be extra
> careful and ensure all connections have the enable_seqscan
> reset to its default.

If you are concerned about failing to reset parameters, use "SET LOCAL
foo" instead of plain "SET foo" (and of course, do this inside a
transaction block).  With the LOCAL option the parameter is guaranteed
to be reset at transaction end.

> Do the pooling system resets the "enable_seqscan" variable
> when the connections are delivered to the clients or we
> *must* do it ourselves?

I think it is reasonable to expect a pooling system to deliver you a
fresh transaction.  It is not reasonable to expect it to go and issue
explicit SET commands to undo things you've done.

            regards, tom lane

Re: disabling seq scans

From

"scott.marlowe"

Date:

02 April 2004, 12:02:47

On Thu, 1 Apr 2004, Marcus Andree S. Magalhaes wrote:

> > I'm assuming you've read the tuning guide here:
> >
> > http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
> >
>
> Yes. I've read them. That's why I asked if we can deal with it in
> another way (sending an enable_seqscan = false) without creating
> colateral damage to other parts of the program that use the
> same (pooled) connection

Like Tom said in his post, using set local and transactions should take
care of it.

> > And upped shared buffers, effect_cache_size, etc...
> >
>
> Shared buffers was set up a couple weeks ago. Didn't change the
> default values to planner-specific variables. I guess the random page
> cost is set to 4. So, I think a bit weird a seq scan is choosen, after all
> (yes, we do constant vacuum analyze).

I'm not sure here, are you saying you didn't change effective_cache_size,
or or the cpu_*_cost vars?  Or random_page_cost?

Basically, random pages cost random_page_cost*1 seq_scan_page, unless it's
likely to be in memory.  If effective cache size is small, the planner
is less likely to assume that the data is in kernel cache and will tend to
choose a sequential scan.  I.e. pages in kernel cache have an effective
random_page_cost of 1.something where the .something is the extra cost of
hitting the index in memory as well as the table.

So if effective_cache_size is set to the default, then the planner is
assuming a random page cost of 4 and picking a seq scan.  Without lowering
the random_page_cost, setting the effective_cache_size to be larger should
result in more index scan usage.

If you've got lots o ram and / or a fast RAID setup, you might do well to
drop random_page_cost to something lower but still conservative, like 2.0
with no bad consequences.  Note that the effect on the planner seems
non-linear, and noticeably so as you approach 1.0.