Thread: 8.2rc1 (much) slower than 8.2dev?
Hi List, We've been doing some benchmarks lately (one of them made it to the PostgreSQL frontpage) with postgresql 8.2 dev (cvs checkout of 3 june 2006). But we prefer of course to run a more production-like version and installed postgresql 8.2rc1. As it turns out after a dump/restore (to go from 820 to 822), copying the configuration files doing a fresh 'vacuumdb -z' (z is analyze) and 'clusterdb' the RC1 processes about 50% *less* (webpage)requests than the 8.2dev we had, on the same machine/linux kernel/etc. On all cpu-configurations and loads we throw at it. Since its a read-mostly database the location on disk should matter only very slightly. For instance, with the system currently at hand it peaks at about 20 concurrent clients in pg8.2 dev with 465407 requests processed in a 10 minuten timeframe. 8.2rc1 can only achieve 332499 requests in that same time frame with the same load and has a peak of 335995 with 35 concurrent clients (but with 30 it only reached 287624). And we see the same for all loads we throw at them. So either I'm missing something, there is a (significant enough) difference in how the tables where analyzed or there have been some code-changes since then to change the behaviour and thereby decreasing performance in our set-up. Preferably I'd load the statistics from the 8.2-dev database in the 8.2-rc1 one, but a simple insert or copy-statement won't work due to the 'anyarray'-fields of pg_statistic, will it? I'll run another analyze on the database to see if that makes any difference, but after that I'm not sure what to check first to figure out where things go wrong? Best regards, Arjen van der Meijden Tweakers.net
Arjen van der Meijden <acmmailing@tweakers.net> writes: > I'll run another analyze on the database to see if that makes any > difference, but after that I'm not sure what to check first to figure > out where things go wrong? Look for changes in plans? regards, tom lane
Tom Lane wrote: > Arjen van der Meijden <acmmailing@tweakers.net> writes: >> I'll run another analyze on the database to see if that makes any >> difference, but after that I'm not sure what to check first to figure >> out where things go wrong? > > Look for changes in plans? Yeah, there are a few number of small changes in plans and costs estimated. I've a large list of queries executed against both databases, and I haven't seen any differences in row-estimates, so the analyze's have yielded similar enough results. I'm not sure whether some of the changes are for better or worse, you can probably spot that a bit faster than I can. I saw a few index scans replaced by seq scans (on small tables), all index scans seem to have doubled in cost? And I saw a few bitmap scans in stead of normal index scans and more such small changes. But not so small if you execute a hundreds of thousands of those queries. Since I'd rather not send the entire list of queries to the entire world, is it OK to send both explain analyze-files to you off list? Best regards, Arjen
On Mon, Dec 04, 2006 at 05:41:14PM +0100, Arjen van der Meijden wrote: > Since I'd rather not send the entire list of queries to the entire > world, is it OK to send both explain analyze-files to you off list? Can you post them on the web somewhere so everyone can look at them? Also, are you looking at EXPLAIN or EXPLAIN ANALYZE? -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On 7-12-2006 7:01 Jim C. Nasby wrote: > Can you post them on the web somewhere so everyone can look at them? No, its not (only) the size that matters, its the confidentiality I'm not allowed to just break by myself. Well, at least not on a scale like that. I've been mailing off-list with Tom and we found at least one query that in some circumstances takes a lot more time than it should, due to it mistakenly chosing to do a bitmap index scan rather than a normal index scan. > Also, are you looking at EXPLAIN or EXPLAIN ANALYZE? Explain analyze and normal query execution times of several millions of queries executed on both versions of postgresql, so we can say something about them statistically. Best regards, Arjen
Arjen van der Meijden <acmmailing@tweakers.net> writes: > I've been mailing off-list with Tom and we found at least one > query that in some circumstances takes a lot more time than it should, > due to it mistakenly chosing to do a bitmap index scan rather than a > normal index scan. Just to clue folks in: the problem queries seem to be cases like WHERE col1 = 'const' AND col2 = othertab.colx AND col3 IN (several hundred integers) where the table has an index on (col1,col2,col3). 8.2 is generating a plan involving a nestloop with inner bitmap indexscan on this index, and using all three of these WHERE clauses with the index. The ability to use an IN clause (ie, ScalarArrayOpExpr) in an index condition is new in 8.2, and we seem to have a few bugs left in the cost estimation for it. The problem is that a ScalarArrayOpExpr effectively causes a BitmapOr across N index scans using each of the array elements as an individual scan qualifier. So the above amounts to several hundred index probes for each outer row. In Arjen's scenario it seems that the first two index columns are already pretty selective, and it comes out a lot faster if you just do one indexscan using the first two columns and then apply the IN-condition as a filter to the relatively small number of rows you get that way. What's not clear to me yet is why the 8.2dev code didn't fall into this same trap, because the ScalarArrayOpExpr indexing code was already there on 3-June. But we didn't and still don't have any code that considers the possibility that a potential indexqual condition should be deliberately *not* used with the index. regards, tom lane