Thread: Enabling and Disabling Sequencial Scan
In the application, that I'm working on, I have a query that'll be a lot 60% faster if I disable sequential scan forcing it to you my index. Is it bad practice to disable sequential scan ( set enable_seqscan=false), run my query then enable sequential scan, whenever I'm running this query? Why? Thanks in advance - David Wendy
On Fri, 30 May 2003, Yusuf wrote: > In the application, that I'm working on, I have a query that'll be a lot > 60% faster if I disable sequential scan forcing it to you my index. > > Is it bad practice to disable sequential scan ( set > enable_seqscan=false), run my query then enable sequential scan, > whenever I'm running this query? Why? setting seqscan to off is more of a troubleshooting tool than a tuning tool, albeit sometimes it's the only tuning tool that MIGHT work. Once you've determined that the database is picking the wrong plan when you turn seqscan back on, you need to figure out how to convince the database to use the right plan more often. The best parameters to change and see how they affect this are the *cost* parameters and the effective cache size. show all; will show them to you, the ones we're interested in are these: NOTICE: effective_cache_size is 100000 NOTICE: random_page_cost is 1 NOTICE: cpu_tuple_cost is 0.01 NOTICE: cpu_index_tuple_cost is 0.0001 NOTICE: cpu_operator_cost is 0.0025 To change them for one session, just use the set command. To make the changes the permanent default, edit the $PGDATA/postgresql.conf file. effective_cache_size tells the planner about how big the kernel's file level cache is. On my machine it's about 800 meg. It's measured in 8k blocks, so 100,000 * 8k ~ 800 meg. The smaller this is, the more likely the database will have to access the hard drive, and therefore the more likely it will pick a seqscan if the other numbers point to it. random_page_cost tells the planner how much more a random page access costs. The default is 4. Most systems seem to work well with numbers from 1 to 2. lowering the cpu_index_tuple_cost also favors index scans.
On Fri, 30 May 2003 14:46:12 -0600 (MDT) "scott.marlowe" <scott.marlowe@ihs.com> said something like: > > level cache is. On my machine it's about 800 meg. It's measured in 8k > blocks, so 100,000 * 8k ~ 800 meg. The smallerthis is, the more Any thoughts on how to figure this out (disk buffer size)? For some reason, my system (2xAMD 2800+, 2Gb RAM 2.4.21 - /proc/meminfo)only shows a usage of 88kb of 'Buffers' usage, and that never changes. My 'Cached' usage is 1.7Gb. I'vehit the kernel mailing list, and the one response I got said don't worry about it :-( Cheers, Rob -- O_
Attachment
David, I say go ahead and use it since you get a significant performance gain. This is a special case where you know more about your data than the planer does with general system wide settings. In Oracle you could use "hints". Since there are no hints in PostgreSQL disabling and reenabling an option just before and after a query has the same effect. Regards, Nikolaus On Fri, 30 May 2003 16:33:07 -0400, Yusuf wrote: > > In the application, that I'm working on, I have a query > that'll be a lot > 60% faster if I disable sequential scan forcing it to > you my index. > > Is it bad practice to disable sequential scan ( set > enable_seqscan=false), run my query then enable > sequential scan, > whenever I'm running this query? Why? > > Thanks in advance > > - David Wendy > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org
I have a simple table with a dozen integer fields and a primary key. When I say "explain select * from Patient where Patient_primary_key = 100" I get sequential scan. I've just converted my application from MySQL and am seeing everything run about 3X slower. What do I have to do to get postgres to use indexes? Brian Tarbox
On Sat, May 31, 2003 at 12:30:40PM -0400, Brian Tarbox wrote: > I have a simple table with a dozen integer fields and a primary key. > > When I say "explain select * from Patient where Patient_primary_key = 100" > > I get sequential scan. > > I've just converted my application from MySQL and am seeing everything run > about 3X slower. What do I have to do to get postgres to use indexes? Usual questions: have you vacuumed? EXPLAIN ANALYSE output, schema, &c. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
"Brian Tarbox" <btarbox@theworld.com> writes: > When I say "explain select * from Patient where Patient_primary_key = 100" > I get sequential scan. Perhaps Patient_primary_key is not an integer field? If not, you need to cast the constant 100 to the right type. Or write '100' with single quotes around it, which leaves Postgres to choose the constant's datatype. (Yeah, I know, it's a pain in the neck. We've had a lot of discussions about how to fix this without breaking datatype extensibility; no luck so far.) regards, tom lane
The primary key field is an integer and I have performed vacuum analyse but that does not seem to change anything. I've also heard that postgres will not indexes when JOINing tables. Can that really be true?? Brian -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Tom Lane Sent: Saturday, May 31, 2003 1:14 PM To: Brian Tarbox Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] why Sequencial Scan when selecting on primary key of table? "Brian Tarbox" <btarbox@theworld.com> writes: > When I say "explain select * from Patient where Patient_primary_key = 100" > I get sequential scan. Perhaps Patient_primary_key is not an integer field? If not, you need to cast the constant 100 to the right type. Or write '100' with single quotes around it, which leaves Postgres to choose the constant's datatype. (Yeah, I know, it's a pain in the neck. We've had a lot of discussions about how to fix this without breaking datatype extensibility; no luck so far.) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org
On Sat, 2003-05-31 at 13:13, Tom Lane wrote: > "Brian Tarbox" <btarbox@theworld.com> writes: > > When I say "explain select * from Patient where Patient_primary_key = 100" > > I get sequential scan. > > Perhaps Patient_primary_key is not an integer field? If not, you need > to cast the constant 100 to the right type. Or write '100' with > single quotes around it, which leaves Postgres to choose the constant's > datatype. Out of curiosity, why don't we confirm the unquoted value is an integer, numeric, etc, then change it into type 'unknown'? From that point forward it would be treated like it's quoted counterpart. Is this noticeably slower or am I missing something? -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Attachment
"Brian Tarbox" <btarbox@theworld.com> writes: > The primary key field is an integer and I have performed vacuum analyse but > that does not seem to change anything. Hm. So how big is the table, exactly? On small tables a seqscan will be preferred because the extra I/O to examine the index costs more than the CPU to examine all the tuples on a disk page. > I've also heard that postgres will not indexes when JOINing tables. Can > that really be true?? We have some join methods that like indexes and we have some that find no benefit in 'em. Again, testing on toy-size tables is not a good guide to what will happen on larger tables. regards, tom lane
On Sat, May 31, 2003 at 01:45:50PM -0400, Brian Tarbox wrote: > The primary key field is an integer and I have performed vacuum analyse but > that does not seem to change anything. int4? int8? int2? Makes a difference. Please post the results of EXPLAIN ANALYSE on the query you're having trouble with, and someone may be able to help you. (You'll need to show us the table, too.) > I've also heard that postgres will not indexes when JOINing tables. Can > that really be true?? No. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Rod Taylor <rbt@rbt.ca> writes: > Out of curiosity, why don't we confirm the unquoted value is an integer, > numeric, etc, then change it into type 'unknown'? UNKNOWNNUMERIC is one of the ideas that's been proposed, but it's not clear to me that it is better than other alternatives. In particular, I don't like losing knowledge of the form and size of the constant. Something like "WHERE int4col = 4.8" should yield FALSE, not "ERROR: pg_atoi: unable to parse '4.8'" which is what you're likely to get with a naive "unknown numeric type" approach. A perhaps-more-realistic objection is that it only solves the problem for trivial "var = const" cases. As soon as you look at even slightly more complicated expressions, it stops doing much good. I'm still of the opinion that the best solution in the long run is to get rid of most of the cross-datatype numeric operators, but there are pitfalls there too. See last thread on the issue: http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php regards, tom lane
I am working with a highly normalized database. Most of my meaningful queries involve joins from a primary table to 4-6 secondary tables. Would my query performance be significantly faster/slower using a View as opposed to a prepared query using join? (Assume all join fields are ints, say 10,000 records in main table and a few dozen records in each of the secondary tables). Thank you. Brian Tarbox
> Would my query performance be significantly faster/slower using a View as > opposed to a prepared query using join? I missed this part. Views and prepared queries are not the same time. Use of a view still needs to be optimized. Prepared queries will run the optimization portion on the entire query including the view segments of it. Think of a view as a MACRO. Depending on the context of what surrounds it, the view may be executed very differently. -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Attachment
On Sat, 2003-05-31 at 22:55, Brian Tarbox wrote: > I am working with a highly normalized database. Most of my meaningful > queries involve joins from a primary table to 4-6 secondary tables. > > Would my query performance be significantly faster/slower using a View as > opposed to a prepared query using join? There are some corner cases where a view would be slower than a standard query in 7.3 (bug fix / disabled optimization -- fixed right in 7.4), but generally you can assume it will be about the same speed. Some views such as unions will not be as fast as you would like, but thats a general issue with PostgreSQLs inability to throw away selects when it won't find results on one side of a union. CREATE VIEW sales AS SELECT * FROM sales_archive_2002 UNION ALL SELECT * FROM sales_current; SELECT * FROM sales WHERE timestamp => CURRENT_TIMESTAMP - INTERVAL '1 day'; The above query would not be so quick. -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Attachment
On Sun, Jun 01, 2003 at 00:02:39 -0400, Rod Taylor <rbt@rbt.ca> wrote: > > Some views such as unions will not be as fast as you would like, but > thats a general issue with PostgreSQLs inability to throw away selects > when it won't find results on one side of a union. > > CREATE VIEW sales AS SELECT * FROM sales_archive_2002 UNION ALL SELECT * > FROM sales_current; > > > SELECT * FROM sales WHERE timestamp => CURRENT_TIMESTAMP - INTERVAL '1 > day'; > > The above query would not be so quick. I thought some work had been done on pushing where conditions down into unions? If so the above wouldn't be too bad. It would still look at the archive table, but it should return no rows relatively quickly assuming an appropiate index exists.
On Sun, 2003-06-01 at 01:43, Bruno Wolff III wrote: > On Sun, Jun 01, 2003 at 00:02:39 -0400, > Rod Taylor <rbt@rbt.ca> wrote: > > > > Some views such as unions will not be as fast as you would like, but > > thats a general issue with PostgreSQLs inability to throw away selects > > when it won't find results on one side of a union. > > > > CREATE VIEW sales AS SELECT * FROM sales_archive_2002 UNION ALL SELECT * > > FROM sales_current; > > > > > > SELECT * FROM sales WHERE timestamp => CURRENT_TIMESTAMP - INTERVAL '1 > > day'; > > > > The above query would not be so quick. > > I thought some work had been done on pushing where conditions down into > unions? If so the above wouldn't be too bad. It would still look at > the archive table, but it should return no rows relatively quickly > assuming an appropiate index exists. Certainly, if the index exists it won't be so bad (for any single archive table). It's when the index doesn't exist or there are several hundred archive tables then it starts to get a little worse. Anyway, anyone doing the above in PostgreSQL should probably be looking at partial indexes and merging the information back into a single table again. -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Attachment
On Fri, 30 May 2003, Robert Creager wrote: > On Fri, 30 May 2003 14:46:12 -0600 (MDT) > "scott.marlowe" <scott.marlowe@ihs.com> said something like: > > > > > level cache is. On my machine it's about 800 meg. It's measured in 8k > blocks, so 100,000 * 8k ~ 800 meg. The smallerthis is, the more > > Any thoughts on how to figure this out (disk buffer size)? For some > reason, my system (2xAMD 2800+, 2Gb RAM 2.4.21 - /proc/meminfo) only > shows a usage of 88kb of 'Buffers' usage, and that never changes. My > 'Cached' usage is 1.7Gb. I've hit the kernel mailing list, and the one > response I got said don't worry about it :-( Are you sure that's not 88213Kb or so of buffers? 88kb is awfully small. It's normal to have a cache size many times larger than the buffer size. Buffers are assigned to individual disks, and sit under the larger single pool that is the cache. I just take the approximate size of the cache under load and use that for the effective_cache_size. Since it's pretty much a "fudge factor" variable anyway. P.s. My use of the term fudge factor here is in no way meant to be derogatory. It's just that as long as the effective cache size is within some reasonable range of the actual cache/buffer in the machine, it'll be close enough to push the query planner in the right direction. Note that you can adopt two philosophies on the planner. One is that the planner will always make certain mistakes, and you've got to fool it in order to get the right query plan. The other philosophy is that you give the query planner all the variables you can reasonably give it to let it decide the proper course of action, and you fine tune each one so that eventually it makes the right choice for all the querys you throw at it. While the first philosophy provides for the fastest functional solutions on a static platform (i.e. we're running postgresql 7.0.2 and aren't going to upgrade.) but it kind of supports the idea that the query planner can never be given the right information and programmed with the right code to make the right decision 99% of the time, and when it makes the wrong decision, it's only a little wrong. The second choice will require you to spend more time fine tuning all the parameters fed to the query planner with your own queries using explain analyze and repeated testing with different settings. What I look for are the corner cases. I.e. if I do some select that returns 500 records with a seq scan, and it takes 5 seconds, and with 450 records it switches to index scan and takes 1 second, then likely the planner is choosing to switch to seq scans too quickly when I raise the result size from 450 to 500. At this point use the set seq_scan option to test the database performance with it on and off and increasing set size. Somewhere around 2,000 or so in this scenario, we'll notice that the seq scan has now the same speed as the index scan, and as we raise the number of rows we are getting, the index scan would now be slower than the seq scan. Assuming we set effective_cache_size right at the beginning, we now can turn seq_scan back on, and adjust the default cost options until the planner chooses a seq scan at the break point we found (in our imaginary case of 2000). It doesn't have to be perfect, since the performance at or around the break point is similar for index and seq scans alike. Then, throw the next query at it and see how it does. I've found that on fast machines, it's good to lower the cpu costs, especially the index one. I usually drop these by a divisor of 2 to 10. For the random_page_cost, settings of 1.x to 2.x seem a good choice for fast I/O subsystems.
On Fri, 30 May 2003, Robert Creager wrote: > On Fri, 30 May 2003 14:46:12 -0600 (MDT) > "scott.marlowe" <scott.marlowe@ihs.com> said something like: > > > > > level cache is. On my machine it's about 800 meg. It's measured in 8k > > blocks, so 100,000 * 8k ~ 800 meg. The smaller this is, the more > > My 'Cached' usage is 1.7Gb. I've hit the kernel mailing list, and the > one response I got said don't worry about it :-( Oh, yeah, just a bit on that. as far as the kernel developers are concerned, the buffer / cache is working perfectly, and they're right, it is. What they probably don't understand if your need to tell postgresql how much cache/buffer is allocated to it. so don't worry about the kernel, the linux kernel really is pretty good at caching disk access.