Thread: disabling seq scans
Hi, guys, We're experiencing a little problem with one of our queries. It isn't using an index specially created for it. When we disable seq scans with psql, we can ensure the query finishes much faster than without using index, as it should be. So, whats the best procedure in this case, but when have a JDBC based client? Do we mess around with planner settings even when all other queries are using the best index for them? Is it safe (but some may find ugly) to issue a command to disable seq scanning from the java side? Since we're using the pooled connection classes that comes with the JDBC3 driver, once a connection is got from the pool, do we need to explicitly set seq scanning to true? This is assuming the later option is the more recommended one... TIA
I do the following in several reports I run..... statement = m_conn.createStatement(); statement.executeUpdate( "set enable_seqscan = false" ); do your thing.... statement.executeUpdate( "set enable_seqscan = true" ); --sean Marcus Andree S. Magalhaes wrote: >Hi, guys, > >We're experiencing a little problem with one of our queries. >It isn't using an index specially created for it. When we >disable seq scans with psql, we can ensure the query finishes >much faster than without using index, as it should be. > >So, whats the best procedure in this case, but when have a >JDBC based client? Do we mess around with planner >settings even when all other queries are using the best >index for them? > >Is it safe (but some may find ugly) to issue a command to >disable seq scanning from the java side? > >Since we're using the pooled connection classes that comes >with the JDBC3 driver, once a connection is got from the pool, >do we need to explicitly set seq scanning to true? This is >assuming the later option is the more recommended one... > >TIA > > > >---------------------------(end of broadcast)--------------------------- >TIP 8: explain analyze is your friend > > >
Yes, that is what I thought 1st... But, since we're using pooled connections, we must be extra careful and ensure all connections have the enable_seqscan reset to its default. So, the main question is: Do the pooling system resets the "enable_seqscan" variable when the connections are delivered to the clients or we *must* do it ourselves? Another issue: when enable_seqscan is false, I found that the sequential scanning is performed normally (as in "select count (*) from <table>") but have them any performance issues? Last one... This time, specifically to the java side: what do you think about adding new methods, say, setEnableSeqScan(boolean) and getEnableSeqScan() to our (by our I mean postgres) drivers? > I do the following in several reports I run..... > > statement = m_conn.createStatement(); > statement.executeUpdate( "set enable_seqscan = false" ); > do your thing.... > statement.executeUpdate( "set enable_seqscan = true" ); > > --sean > > Marcus Andree S. Magalhaes wrote: > >>Hi, guys, >> >>We're experiencing a little problem with one of our queries. >>It isn't using an index specially created for it. When we >>disable seq scans with psql, we can ensure the query finishes >>much faster than without using index, as it should be. >> >>So, whats the best procedure in this case, but when have a >>JDBC based client? Do we mess around with planner >>settings even when all other queries are using the best >>index for them? >> >>Is it safe (but some may find ugly) to issue a command to >>disable seq scanning from the java side? >> >>Since we're using the pooled connection classes that comes >>with the JDBC3 driver, once a connection is got from the pool, >>do we need to explicitly set seq scanning to true? This is >>assuming the later option is the more recommended one... >> >>TIA >> >> >> >>---------------------------(end of >> broadcast)--------------------------- TIP 8: explain analyze is your >> friend >> >> >> > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if > your > joining column's datatypes do not match
See answers in-line.... Marcus Andree S. Magalhaes wrote: >Yes, that is what I thought 1st... >But, since we're using pooled connections, we must be extra >careful and ensure all connections have the enable_seqscan >reset to its default. >So, the main question is: > >Do the pooling system resets the "enable_seqscan" variable >when the connections are delivered to the clients or we >*must* do it ourselves? > > We use pooling as well and there is NO logic in the pool that would reset anything on a connection. It is up to the user of the connection to set it back to an acceptable state, hence the call to set it back to true prior to returning the connection back to the pool. >Another issue: when enable_seqscan is false, I found that >the sequential scanning is performed normally (as in >"select count (*) from <table>") but have them any performance >issues? > > Not sure I understand you question. I set enable_indexscan=false in my code as I am working with a 350GB DB with some tables having 100 million rows. With the hardware we have it is sometimes faster to do a full table scan the bounce around with indexes. >Last one... This time, specifically to the java side: what >do you think about adding new methods, say, setEnableSeqScan(boolean) >and getEnableSeqScan() to our (by our I mean >postgres) drivers? > > This would be up the folks that write the code, I am just a user. :-) > > >>I do the following in several reports I run..... >> >>statement = m_conn.createStatement(); >>statement.executeUpdate( "set enable_seqscan = false" ); >>do your thing.... >>statement.executeUpdate( "set enable_seqscan = true" ); >> >>--sean >> >>Marcus Andree S. Magalhaes wrote: >> >> >> >>>Hi, guys, >>> >>>We're experiencing a little problem with one of our queries. >>>It isn't using an index specially created for it. When we >>>disable seq scans with psql, we can ensure the query finishes >>>much faster than without using index, as it should be. >>> >>>So, whats the best procedure in this case, but when have a >>>JDBC based client? Do we mess around with planner >>>settings even when all other queries are using the best >>>index for them? >>> >>>Is it safe (but some may find ugly) to issue a command to >>>disable seq scanning from the java side? >>> >>>Since we're using the pooled connection classes that comes >>>with the JDBC3 driver, once a connection is got from the pool, >>>do we need to explicitly set seq scanning to true? This is >>>assuming the later option is the more recommended one... >>> >>>TIA >>> >>> >>> >>>---------------------------(end of >>>broadcast)--------------------------- TIP 8: explain analyze is your >>>friend >>> >>> >>> >>> >>> >>---------------------------(end of broadcast)--------------------------- >>TIP 9: the planner will ignore your desire to choose an index scan if >>your >> joining column's datatypes do not match >> >> > > > > > >
On Thu, 1 Apr 2004, Marcus Andree S. Magalhaes wrote: > > Hi, guys, > > We're experiencing a little problem with one of our queries. > It isn't using an index specially created for it. When we > disable seq scans with psql, we can ensure the query finishes > much faster than without using index, as it should be. > > So, whats the best procedure in this case, but when have a > JDBC based client? Do we mess around with planner > settings even when all other queries are using the best > index for them? > > Is it safe (but some may find ugly) to issue a command to > disable seq scanning from the java side? > > Since we're using the pooled connection classes that comes > with the JDBC3 driver, once a connection is got from the pool, > do we need to explicitly set seq scanning to true? This is > assuming the later option is the more recommended one... What's likely happening is that the queries that are choosing the right plan (i.e. the other queries) are such obviously bad candidates for a seq scan that they still get an index scan. It might be that as you increase the % of the table being read by those other queries that they might switch to a seq scan too soon for your setup. I'm assuming you've read the tuning guide here: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html And upped shared buffers, effect_cache_size, etc... random page costs of 1.0 to 1.4 make sense for certain setups with lots of RAM and medium to small datasets that can (mostly) fit in memory. On our machine with 2G ram at work, that works out to about 1.2 to 1.3 as optimal.
On Thu, 1 Apr 2004, Marcus Andree S. Magalhaes wrote: > Last one... This time, specifically to the java side: what > do you think about adding new methods, say, setEnableSeqScan(boolean) > and getEnableSeqScan() to our (by our I mean > postgres) drivers? > I don't see this as a useful general purpose method worth adding. What about all the other GUC parameters? There was a discussion a while ago about making these and other variables configurable through the connection URL or datasource, but that would be to establish defaults, not to modify it on the fly. http://archives.postgresql.org/pgsql-jdbc/2004-02/msg00022.php Kris Jurka
<snip> > We use pooling as well and there is NO logic in the pool that would > reset anything on a connection. It is up to the user of the connection > to set it back to an acceptable state, hence the call to set it back to > true prior to returning the connection back to the pool. > Ok. So we should set the connections to a default, accepted state before using it. >>Another issue: when enable_seqscan is false, I found that >>the sequential scanning is performed normally (as in >>"select count (*) from <table>") but have them any performance >>issues? >> >> > Not sure I understand you question. I set enable_indexscan=false in my > code as I am working with a 350GB DB with some tables having 100 million > rows. With the hardware we have it is sometimes faster to do a full > table scan the bounce around with indexes. > Let me try to explain better. I understand that, in some circumstances, a seq scan can be faster than using indexes, so I'm talking about a more general, or theoretical point of view. Consider two queries, identical and causing a sequential scan on a more or less large table (we have about ~150k records). The system they're running differ only in a single setting: enable_indexscan is false in one system and true in the other. Which one is faster? Or they should _not_ be affected by this setting? Sorry if it sounded like a 4th grade question, but I believe I made my point clearer now. ;-) > >> >> >>>I do the following in several reports I run..... >>> >>>statement = m_conn.createStatement(); >>>statement.executeUpdate( "set enable_seqscan = false" ); >>>do your thing.... >>>statement.executeUpdate( "set enable_seqscan = true" ); >>> >>>--sean >>> >>>Marcus Andree S. Magalhaes wrote: >>> >>> >>> >>>>Hi, guys, >>>> >>>>We're experiencing a little problem with one of our queries. >>>>It isn't using an index specially created for it. When we >>>>disable seq scans with psql, we can ensure the query finishes >>>>much faster than without using index, as it should be. >>>> >>>>So, whats the best procedure in this case, but when have a >>>>JDBC based client? Do we mess around with planner >>>>settings even when all other queries are using the best >>>>index for them? >>>> >>>>Is it safe (but some may find ugly) to issue a command to >>>>disable seq scanning from the java side? >>>> >>>>Since we're using the pooled connection classes that comes >>>>with the JDBC3 driver, once a connection is got from the pool, >>>>do we need to explicitly set seq scanning to true? This is >>>>assuming the later option is the more recommended one... >>>> >>>>TIA >>>> >>>> >>>> >>>>---------------------------(end of >>>>broadcast)--------------------------- TIP 8: explain analyze is your >>>> friend >>>> >>>> >>>> >>>> >>>> >>>---------------------------(end of >>> broadcast)--------------------------- TIP 9: the planner will ignore >>> your desire to choose an index scan if your >>> joining column's datatypes do not match >>> >>> >> >> >> >> >> >> > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
Thanks, Kris. I remembr this thread. Searched the archives for a hint but found nothing that could happen. Your point is clear and I believe a local solution is the best answer for this problem, be it messing around with planner settings or by sending a "set enable_seqscan=false" to the backend. > > > On Thu, 1 Apr 2004, Marcus Andree S. Magalhaes wrote: > >> Last one... This time, specifically to the java side: what >> do you think about adding new methods, say, setEnableSeqScan(boolean) >> and getEnableSeqScan() to our (by our I mean >> postgres) drivers? >> > > I don't see this as a useful general purpose method worth adding. What > about all the other GUC parameters? There was a discussion a while ago > about making these and other variables configurable through the > connection URL or datasource, but that would be to establish defaults, > not to modify it on the fly. > > http://archives.postgresql.org/pgsql-jdbc/2004-02/msg00022.php > > Kris Jurka > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
<snip> > What's likely happening is that the queries that are choosing the right > plan (i.e. the other queries) are such obviously bad candidates for a > seq scan that they still get an index scan. It might be that as you > increase the % of the table being read by those other queries that they > might switch to a seq scan too soon for your setup. > Agreed. This seems to be the "better of all" solution, but, in our case, it's not that feasible. We can't reproduce the load, now, on a spare and identical machine and begin testing differente parameters. On the other side, we just can't restart the server to endure the proper values are loaded and provide a standard reference to any tuning... > I'm assuming you've read the tuning guide here: > > http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html > Yes. I've read them. That's why I asked if we can deal with it in another way (sending an enable_seqscan = false) without creating colateral damage to other parts of the program that use the same (pooled) connection > And upped shared buffers, effect_cache_size, etc... > Shared buffers was set up a couple weeks ago. Didn't change the default values to planner-specific variables. I guess the random page cost is set to 4. So, I think a bit weird a seq scan is choosen, after all (yes, we do constant vacuum analyze). > random page costs of 1.0 to 1.4 make sense for certain setups with lots > of RAM and medium to small datasets that can (mostly) fit in memory. > On our machine with 2G ram at work, that works out to about 1.2 to 1.3 > as optimal. >
"Marcus Andree S. Magalhaes" <marcus.magalhaes@vlinfo.com.br> writes: > But, since we're using pooled connections, we must be extra > careful and ensure all connections have the enable_seqscan > reset to its default. If you are concerned about failing to reset parameters, use "SET LOCAL foo" instead of plain "SET foo" (and of course, do this inside a transaction block). With the LOCAL option the parameter is guaranteed to be reset at transaction end. > Do the pooling system resets the "enable_seqscan" variable > when the connections are delivered to the clients or we > *must* do it ourselves? I think it is reasonable to expect a pooling system to deliver you a fresh transaction. It is not reasonable to expect it to go and issue explicit SET commands to undo things you've done. regards, tom lane
On Thu, 1 Apr 2004, Marcus Andree S. Magalhaes wrote: > > I'm assuming you've read the tuning guide here: > > > > http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html > > > > Yes. I've read them. That's why I asked if we can deal with it in > another way (sending an enable_seqscan = false) without creating > colateral damage to other parts of the program that use the > same (pooled) connection Like Tom said in his post, using set local and transactions should take care of it. > > And upped shared buffers, effect_cache_size, etc... > > > > Shared buffers was set up a couple weeks ago. Didn't change the > default values to planner-specific variables. I guess the random page > cost is set to 4. So, I think a bit weird a seq scan is choosen, after all > (yes, we do constant vacuum analyze). I'm not sure here, are you saying you didn't change effective_cache_size, or or the cpu_*_cost vars? Or random_page_cost? Basically, random pages cost random_page_cost*1 seq_scan_page, unless it's likely to be in memory. If effective cache size is small, the planner is less likely to assume that the data is in kernel cache and will tend to choose a sequential scan. I.e. pages in kernel cache have an effective random_page_cost of 1.something where the .something is the extra cost of hitting the index in memory as well as the table. So if effective_cache_size is set to the default, then the planner is assuming a random page cost of 4 and picking a seq scan. Without lowering the random_page_cost, setting the effective_cache_size to be larger should result in more index scan usage. If you've got lots o ram and / or a fast RAID setup, you might do well to drop random_page_cost to something lower but still conservative, like 2.0 with no bad consequences. Note that the effect on the planner seems non-linear, and noticeably so as you approach 1.0.