Thread: Slow Postgresql server
Hello all, My website has been having issues with our new Linux/PostgreSQL server being somewhat slow. I have done tests using Apache Benchmark and for pages that do not connect to Postgres, the speeds are much faster (334 requests/second v. 1-2 requests/second), so it seems that Postgres is what's causing the problem and not Apache. I did some reserach, and it seems that the bottleneck is in fact the hard drives! Here's an excerpt from vmstat: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 1 140 24780 166636 575144 0 0 0 3900 1462 3299 1 4 49 48 0 0 1 140 24780 166636 575144 0 0 0 3828 1455 3391 0 4 48 48 0 1 1 140 24780 166636 575144 0 0 0 2440 960 2033 0 3 48 48 0 0 1 140 24780 166636 575144 0 0 0 2552 1001 2131 0 2 50 49 0 0 1 140 24780 166636 575144 0 0 0 3188 1233 2755 0 3 49 48 0 0 1 140 24780 166636 575144 0 0 0 2048 868 1812 0 2 49 49 0 0 1 140 24780 166636 575144 0 0 0 2720 1094 2386 0 3 49 49 0 As you can see, almost 50% of the CPU is waiting on I/O. This doesn't seem like it should be happening, however, since we are using a RAID 1 setup (160+160). We have 1GB ram, and have upped shared_buffers to 13000 and work_mem to 8096. What would cause the computer to only use such a small percentage of the CPU, with more than half of it waiting on I/O requests? Thanks a lot Jason
Jason Lustig skrev: > and work_mem to 8096. What would cause the computer to only use such a > small percentage of the CPU, with more than half of it waiting on I/O > requests? Do your webpages write things to the database on each connect? Maybe it do a bunch of writes each individually commited? For every commit pg will wait for the data to be written down to the disk platter before it move on. So if you do several writes you want to do them in one transaction so you only need one commit. /Dennis
On Wed, 11 Apr 2007, Jason Lustig wrote: > Hello all, > > My website has been having issues with our new Linux/PostgreSQL server being > somewhat slow. I have done tests using Apache Benchmark and for pages that do > not connect to Postgres, the speeds are much faster (334 requests/second v. > 1-2 requests/second), so it seems that Postgres is what's causing the problem > and not Apache. I did some reserach, and it seems that the bottleneck is in > fact the hard drives! Here's an excerpt from vmstat: > > procs -----------memory---------- ---swap-- -----io---- --system-- > -----cpu------ > r b swpd free buff cache si so bi bo in cs us sy id wa > st > 1 1 140 24780 166636 575144 0 0 0 3900 1462 3299 1 4 49 48 > 0 > 0 1 140 24780 166636 575144 0 0 0 3828 1455 3391 0 4 48 48 > 0 > 1 1 140 24780 166636 575144 0 0 0 2440 960 2033 0 3 48 48 > 0 > 0 1 140 24780 166636 575144 0 0 0 2552 1001 2131 0 2 50 49 > 0 > 0 1 140 24780 166636 575144 0 0 0 3188 1233 2755 0 3 49 48 > 0 > 0 1 140 24780 166636 575144 0 0 0 2048 868 1812 0 2 49 49 > 0 > 0 1 140 24780 166636 575144 0 0 0 2720 1094 2386 0 3 49 49 > 0 > > As you can see, almost 50% of the CPU is waiting on I/O. This doesn't seem > like it should be happening, however, since we are using a RAID 1 setup > (160+160). We have 1GB ram, and have upped shared_buffers to 13000 and > work_mem to 8096. What would cause the computer to only use such a small > percentage of the CPU, with more than half of it waiting on I/O requests? Well, the simple answer is a slow disk subsystem. Is it hardware or software RAID1? If hardware, what's the RAID controller? Based on your vmstat output, I'd guess that this query activity is all writes since I see only blocks out. Can you identify what the slow queries are? What version of postgres? How large is the database? Can you post the non-default values in your postgresql.conf? I'd suggest you test your disk subsystem to see if it's as performant as you think with bonnie++. Here's some output from my RAID1 test server: Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP pgtest 4G 47090 92 52348 11 30954 6 41838 65 73396 8 255.9 1 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 894 2 +++++ +++ 854 1 817 2 +++++ +++ 969 2 So, that's 52MB/sec block writes and 73MB/sec block reads. That's typical of a RAID1 on 2 semi-fast SATA drives. If you're doing writes to the DB on every web page, you might consider playing with the commit_delay and commit_siblings parameters in the postgresql.conf. Also, if you're doing multiple inserts as separate transactions, you should consider batching them up in one transaction. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
1= RAID 1improves data =intregrity=, not IO performance. Your HD IO performance is essentially that of 1 160GB HD of whatever performance one of those HDs have. (what kind of HDs are they anyway? For instance 7200rpm 160GB HDs are not particularly "high performance") BEST case is streaming IO involving no seeks => ~50 MBps. You can't get even that as the back end of a website. 2= 1GB of RAM is -small- for a DB server. You need to buy RAM and HD. Boost the RAM to 4GB, change pg config parameters appropriately and see how much it helps. Non ECC RAM is currently running ~$60-$75 per GB for 1 or 2 GB sticks ECC RAM prices will be ~ 1.5x - 2x that, $120 - $150 per GB for 1 or 2 GB sticks. (do !not! buy 4GB sticks unless you have a large budget. Their price pr GB is still too high) If adding RAM helps as much as I suspect it will, find out how big the "hot" section of your DB is and see if you can buy enough RAM to make it RAM resident. If you can do this, it will result in the lowest term DB maintenance. If you can't do that for whatever reason, the next step is to improve your HD subsystem. Cheap RAID cards with enough BB cache to allow writes to be coalesced into larger streams (reducing seeks) will help, but you fundamentally you will need more HDs. RAID 5 is an reasonable option for most website DBs workloads. To hit the 300MBps speeds attainable by the cheap RAID cards, you are going to at least 7 HDs (6 HDs * 50MBps ASTR = 300MBps ASTR + the equivalent of 1 HD gets used for the "R" in RAID). A minimum of 8 HDs are need for this performance if you want to use RAID 6. Most tower case (not mini-tower, tower) cases can hold this internally. Price per MBps of HD is all over the map. The simplest (but not necessarily best) option is to buy more of the 160GB HDs you already have. Optimizing the money spent when buying HDs for a RAID set is a bit more complicated than doing so for RAM. Lot's of context dependent things affect the final decision. I see you are mailing from Brandeis. I'm local. Drop me some private email at the address I'm posting from if you want and I'll send you further contact info so we can talk in more detail. Cheers, Ron Peacetree At 06:02 PM 4/11/2007, Jason Lustig wrote: >Hello all, > >My website has been having issues with our new Linux/PostgreSQL >server being somewhat slow. I have done tests using Apache Benchmark >and for pages that do not connect to Postgres, the speeds are much >faster (334 requests/second v. 1-2 requests/second), so it seems that >Postgres is what's causing the problem and not Apache. I did some >reserach, and it seems that the bottleneck is in fact the hard >drives! Here's an excerpt from vmstat: > >procs -----------memory---------- ---swap-- -----io---- --system-- >-----cpu------ >r b swpd free buff cache si so bi bo in cs us >sy id wa st >1 1 140 24780 166636 575144 0 0 0 3900 1462 3299 1 >4 49 48 0 >0 1 140 24780 166636 575144 0 0 0 3828 1455 3391 0 >4 48 48 0 >1 1 140 24780 166636 575144 0 0 0 2440 960 2033 0 >3 48 48 0 >0 1 140 24780 166636 575144 0 0 0 2552 1001 2131 0 >2 50 49 0 >0 1 140 24780 166636 575144 0 0 0 3188 1233 2755 0 >3 49 48 0 >0 1 140 24780 166636 575144 0 0 0 2048 868 1812 0 >2 49 49 0 >0 1 140 24780 166636 575144 0 0 0 2720 1094 2386 0 >3 49 49 0 > >As you can see, almost 50% of the CPU is waiting on I/O. This doesn't >seem like it should be happening, however, since we are using a RAID >1 setup (160+160). We have 1GB ram, and have upped shared_buffers to >13000 and work_mem to 8096. What would cause the computer to only use >such a small percentage of the CPU, with more than half of it waiting >on I/O requests? > >Thanks a lot >Jason > > >---------------------------(end of broadcast)--------------------------- >TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On 12.04.2007, at 07:26, Ron wrote: > You need to buy RAM and HD. Before he does that, wouldn't it be more useful, to find out WHY he has so much IO? Have I missed that or has nobody suggested finding the slow queries (when you have much IO on them, they might be slow at least with a high shared memory setting). So, my first idea is, to turn on query logging for queries longer than a xy milliseconds, "explain analyse" these queries and see wether there are a lot of seq scans involved, which would explain the high IO. Just an idea, perhaps I missed that step in that discussion somewhere ... But yes, it might also be, that the server is swapping, that's another thing to find out. cug
At 10:08 AM 4/12/2007, Guido Neitzer wrote: >On 12.04.2007, at 07:26, Ron wrote: > >>You need to buy RAM and HD. > >Before he does that, wouldn't it be more useful, to find out WHY he >has so much IO? 1= Unless I missed something, the OP described pg being used as a backend DB for a webserver. I know the typical IO demands of that scenario better than I sometimes want to. :-( 2= 1GB of RAM + effectively 1 160GB HD = p*ss poor DB IO support. ~ 1/2 that RAM is going to be used for OS stuff, leaving only ~512MB of RAM to be used supporting pg. That RAID 1 set is effectively 1 HD head that all IO requests are going to contend for. Even if the HD in question is a 15Krpm screamer, that level of HW contention has very adverse implications. Completely agree that at some point the queries need to be examined (ditto the table schema, etc), but this system is starting off in a Bad Place for its stated purpose IME. Some minimum stuff is obvious even w/o spending time looking at anything beyond the HW config. Cheers, Ron Peacetree >Have I missed that or has nobody suggested finding the slow queries >(when you have much IO on them, they might be slow at least with a >high shared memory setting). > >So, my first idea is, to turn on query logging for queries longer >than a xy milliseconds, "explain analyse" these queries and see >wether there are a lot of seq scans involved, which would explain the >high IO. > >Just an idea, perhaps I missed that step in that discussion >somewhere ... > >But yes, it might also be, that the server is swapping, that's >another thing to find out. > >cug > >---------------------------(end of broadcast)--------------------------- >TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
On 12.04.2007, at 08:59, Ron wrote: > 1= Unless I missed something, the OP described pg being used as a > backend DB for a webserver. Yep. > I know the typical IO demands of that scenario better than I > sometimes want to. > :-( Yep. Same here. ;-) > 2= 1GB of RAM + effectively 1 160GB HD = p*ss poor DB IO support. Absolutely right. Depending a little bit on the DB and WebSite layout and on the actual requirements, but yes - it's not really a kick-ass machine ... > Completely agree that at some point the queries need to be examined > (ditto the table schema, etc), but this system is starting off in a > Bad Place for its stated purpose IME. > Some minimum stuff is obvious even w/o spending time looking at > anything beyond the HW config. Depends. As I said - if the whole DB fits into the remaining space, and a lot of website backend DBs do, it might just work out. But this seems not to be the case - either the site is chewing on seq scans all the time which will cause I/O or it is bound by the lack of memory and swaps the whole time ... He has to find out. cug
On Thu, 12 Apr 2007, Jason Lustig wrote: > 0 <-- BM starts here > 10 0 180 700436 16420 91740 0 0 0 176 278 2923 59 41 0 > 0 0 > 11 0 180 696736 16420 91740 0 0 0 0 254 2904 57 43 0 > 0 0 > 12 0 180 691272 16420 91740 0 0 0 0 255 3043 60 39 1 > 0 0 > 9 0 180 690396 16420 91740 0 0 0 0 254 3078 63 36 2 0 > 0 > > Obviously, I've turned off logging now but I'd like to get it running again > (without bogging down the server) so that I can profile the system and find > out which queries I need to optimize. My logging settings (with unnecessary > comments taken out) were: So what did you get in the logs when you had logging turned on? If you have the statement logging, perhaps it's worth running through pgfouine to generate a report. > > log_destination = 'syslog' # Valid values are combinations of > redirect_stderr = off # Enable capturing of stderr into log > log_min_duration_statement = 0 # -1 is disabled, 0 logs all > statements > silent_mode = on # DO NOT USE without syslog or > log_duration = off > log_line_prefix = 'user=%u,db=%d' # Special values: > log_statement = 'none' # none, ddl, mod, all > Perhaps you just want to log slow queries > 100ms? But since you don't seem to know what queries you're running on each web page, I'd suggest you just turn on the following and run your benchmark against it, then turn it back off: log_duration = on log_statement = 'all' Then go grab pgfouine and run the report against the logs to see what queries are chewing up all your time. > So you know, we're using Postgres 8.2.3. The database currently is pretty > small (we're just running a testing database right now with a few megabytes > of data). No doubt some of our queries are slow, but I was concerned because > no matter how slow the queries were (at most the worst were taking a couple > of msecs anyway), I was getting ridiculously slow responses from the server. > Outside of logging, our only other non-default postgresql.conf items are: > > shared_buffers = 13000 # min 128kB or max_connections*16kB > work_mem = 8096 # min 64kB > > In terms of the server itself, I think that it uses software raid. How can I > tell? Our hosting company set it up with the server so I guess I could ask > them, but is there a program I can run which will tell me the information? I > also ran bonnie++ and got this output: > > Version 1.03 ------Sequential Output------ --Sequential Input- > --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec > %CP > pgtest 2000M 29277 67 33819 15 15446 4 35144 62 48887 5 152.7 0 > ------Sequential Create------ --------Random > Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec > %CP > 16 17886 77 +++++ +++ +++++ +++ 23258 99 +++++ +++ +++++ > +++ > > So I'm getting 33MB and 48MB write/read respectively. Is this slow? Is there > anything I should be doing to optimize our RAID configuration? > It's not fast, but at least it's about the same speed as an average IDE drive from this era. More disks would help, but since you indicate the DB fits in RAM with plenty of room to spare, how about you update your effective_cache_size to something reasonable. You can use the output of the 'free' command and take the cache number and divide by 8 to get a reasonable value on linux. Then turn on logging and run your benchmark. After that, run a pgfouine report against the log and post us the explain analyze from your slow queries. And if Ron is indeed local, it might be worthwhile to contact him. Someone onsite would likely get this taken care of much faster than we can on the mailing list. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On Thu, 2007-04-12 at 10:19, Guido Neitzer wrote: > On 12.04.2007, at 08:59, Ron wrote: > > Depends. As I said - if the whole DB fits into the remaining space, > and a lot of website backend DBs do, it might just work out. But this > seems not to be the case - either the site is chewing on seq scans > all the time which will cause I/O or it is bound by the lack of > memory and swaps the whole time ... He has to find out. It could also be something as simple as a very bloated data store. I'd ask the user what vacuum verbose says at the end
On Thu, 12 Apr 2007, Scott Marlowe wrote: > On Thu, 2007-04-12 at 10:19, Guido Neitzer wrote: >> On 12.04.2007, at 08:59, Ron wrote: > >> >> Depends. As I said - if the whole DB fits into the remaining space, >> and a lot of website backend DBs do, it might just work out. But this >> seems not to be the case - either the site is chewing on seq scans >> all the time which will cause I/O or it is bound by the lack of >> memory and swaps the whole time ... He has to find out. > > It could also be something as simple as a very bloated data store. > > I'd ask the user what vacuum verbose says at the end You know, I should answer emails at night...we didn't ask when the last time the data was vacuumed or analyzed and I believe he indicated that the only non-default values were memory related, so no autovacuum running. Jason, Before you go any further, run 'vacuum analyze;' on your DB if you're not doing this with regularity and strongly consider enabling autovacuum. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost wrote: > > You know, I should answer emails at night... Indeed you shouldN'T ;-) Carlos --
Hey there; On a Postgres 8.2.3 server, I've got a query that is running very slow in some cases. With some work, I've determined the 'slow part' of the query. :) This is a query on a table with like 10 million rows or something like that. encounter_id is an integer and receipt is of type 'date'. This query runs really slow [minutes] (query and explain below): select extract(epoch from ds.receipt) from detail_summary ds where ds.receipt >= '1998-12-30 0:0:0' and ds.encounter_id in (8813186,8813187,8813188,8813189,8813190,8813191,8813192, 8813193,8813194,8813195,8813196,8813197,8813198,8813199, 8813200,8813201,8813202,8813203,8813204,8813205,8813206, 8813207,8813208,8813209,8813210,8813211,8813212,8813213, 8813214,8813215,8813216,8813217,8813218,8813219,8813220, 8813221,8813222,8813223,8813224,8813225,8813226,8813227, 8813228,8813229,8813230,8813231,8813232,8813233,8813234, 8813235,8813236,8813237,8813238,8813239,8813240,8813241, 8813242,8813243,8813244,8813245,8813246,8813247,8813248, 8813249,8813250,8813251,8813252,8813253,8813254,8813255, 8813256,8813257,8813258,8813259,8813260,8813261,8813262, 8813263,8813264,8813265,8813266,8813267,8813268,8813269, 8813270,8813271,8813272,8813273,8813274,8813275,8813276, 8813277,8813278,8813279,8813280,8813281,8813282,8813283, 8813284,8815534) Results in the 'explain' : Seq Scan on detail_summary ds (cost=0.00..1902749.83 rows=9962 width=4) Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))) (2 rows) Turning enable_seqscan to off results in a slightly more interesting explain, but an equally slow query. HOWEVER! The simple removal of the receipt date paramater results in a fast query, as such: select extract(epoch from ds.receipt) from detail_summary ds where ds.encounter_id in (8813186,8813187,8813188,8813189,8813190,8813191,8813192, 8813193,8813194,8813195,8813196,8813197,8813198,8813199, 8813200,8813201,8813202,8813203,8813204,8813205,8813206, 8813207,8813208,8813209,8813210,8813211,8813212,8813213, 8813214,8813215,8813216,8813217,8813218,8813219,8813220, 8813221,8813222,8813223,8813224,8813225,8813226,8813227, 8813228,8813229,8813230,8813231,8813232,8813233,8813234, 8813235,8813236,8813237,8813238,8813239,8813240,8813241, 8813242,8813243,8813244,8813245,8813246,8813247,8813248, 8813249,8813250,8813251,8813252,8813253,8813254,8813255, 8813256,8813257,8813258,8813259,8813260,8813261,8813262, 8813263,8813264,8813265,8813266,8813267,8813268,8813269, 8813270,8813271,8813272,8813273,8813274,8813275,8813276, 8813277,8813278,8813279,8813280,8813281,8813282,8813283, 8813284 ,8815534) This query returns instantly and explains as: Bitmap Heap Scan on detail_summary ds (cost=161.00..14819.81 rows=9963 width=4) Recheck Cond: (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])) -> Bitmap Index Scan on detail_summary_encounter_id_idx (cost=0.00..160.75 rows=9963 width=0) Index Cond: (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])) Any thoughts? Both encounter_id and receipt date are indexed columns. I've vacuumed and analyzed the table. I tried making a combined index of encounter_id and receipt and it hasn't worked out any better. Thanks! Steve
Steve <cheetah@tanabi.org> writes: > On a Postgres 8.2.3 server, I've got a query that is running very slow in > some cases. Could we see the exact definition of that table and its indexes? It looks like the planner is missing the bitmap scan for some reason, but I've not seen a case like that before. Also, I assume the restriction on receipt date is very nonselective? It doesn't seem to have changed the estimated rowcount much. regards, tom lane
> Could we see the exact definition of that table and its indexes? > It looks like the planner is missing the bitmap scan for some reason, > but I've not seen a case like that before. > > Also, I assume the restriction on receipt date is very nonselective? > It doesn't seem to have changed the estimated rowcount much. > This is true -- This particular receipt date is actually quite meaningless. It's equivalent to saying 'all receipt dates'. I don't think there's even any data that goes back before 2005. Here's the table and it's indexes. Before looking, a note; there's several 'revop' indexes, this is for sorting. The customer insisted on, frankly, meaninglessly complicated sorts. I don't think any of that matters for our purposes here though :) Column | Type | Modifiers -----------------------+------------------------+-------------------------------------------------------------------- detailsummary_id | integer | not null default nextval(('detailsummary_id_seq'::text)::regclass) detailgroup_id | integer | receipt | date | batchnum | integer | encounternum | integer | procedureseq | integer | procedurecode | character varying(5) | wrong_procedurecode | character varying(5) | batch_id | integer | encounter_id | integer | procedure_id | integer | carrier_id | integer | product_line | integer | provider_id | integer | member_num | character varying(20) | wrong_member_num | character varying(20) | member_name | character varying(40) | patient_control | character varying(20) | rendering_prov_id | character varying(15) | rendering_prov_name | character varying(30) | referring_prov_id | character varying(15) | referring_prov_name | character varying(30) | servicedate | date | wrong_servicedate | date | diagnosis_codes | character varying(5)[] | wrong_diagnosis_codes | character varying(5)[] | ffs_charge | double precision | export_date | date | hedis_date | date | raps_date | date | diagnosis_pointers | character(1)[] | modifiers | character(2)[] | units | double precision | pos | character(2) | isduplicate | boolean | duplicate_id | integer | encounter_corrected | boolean | procedure_corrected | boolean | numerrors | integer | encerrors_codes | integer[] | procerror_code | integer | error_servicedate | text | e_duplicate_id | integer | ecode_counts | integer[] | p_record_status | integer | e_record_status | integer | e_delete_date | date | p_delete_date | date | b_record_status | integer | b_confirmation | character varying(20) | b_carrier_cobol_id | character varying(16) | b_provider_cobol_id | character varying(20) | b_provider_tax_id | character varying(16) | b_carrier_name | character varying(50) | b_provider_name | character varying(50) | b_submitter_file_id | character varying(40) | e_hist_carrier_id | integer | p_hist_carrier_id | integer | e_duplicate_id_orig | character varying(25) | p_duplicate_id_orig | character varying(25) | num_procerrors | integer | num_encerrors | integer | export_id | integer | raps_id | integer | hedis_id | integer | Indexes: "detail_summary_b_record_status_idx" btree (b_record_status) "detail_summary_batch_id_idx" btree (batch_id) "detail_summary_batchnum_idx" btree (batchnum) "detail_summary_carrier_id_idx" btree (carrier_id) "detail_summary_duplicate_id_idx" btree (duplicate_id) "detail_summary_e_record_status_idx" btree (e_record_status) "detail_summary_encounter_id_idx" btree (encounter_id) "detail_summary_encounternum_idx" btree (encounternum) "detail_summary_export_date_idx" btree (export_date) "detail_summary_hedis_date_idx" btree (hedis_date) "detail_summary_member_name_idx" btree (member_name) "detail_summary_member_num_idx" btree (member_num) "detail_summary_p_record_status_idx" btree (p_record_status) "detail_summary_patient_control_idx" btree (patient_control) "detail_summary_procedurecode_idx" btree (procedurecode) "detail_summary_product_line_idx" btree (product_line) "detail_summary_provider_id_idx" btree (provider_id) "detail_summary_raps_date_idx" btree (raps_date) "detail_summary_receipt_encounter_idx" btree (receipt, encounter_id) "detail_summary_receipt_id_idx" btree (receipt) "detail_summary_referrering_prov_id_idx" btree (referring_prov_id) "detail_summary_rendering_prov_id_idx" btree (rendering_prov_id) "detail_summary_rendering_prov_name_idx" btree (rendering_prov_name) "detail_summary_servicedate_idx" btree (servicedate) "ds_sort_1" btree (receipt date_revop, carrier_id, batchnum, encounternum, procedurecode, encounter_id) "ds_sort_10" btree (receipt date_revop, carrier_id, batchnum, encounternum, procedurecode, encounter_id, procedure_id) "ed_cbee_norev" btree (export_date, carrier_id, batchnum, encounternum, encounter_id) "ed_cbee_norev_p" btree (export_date, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "ed_cbee_rev" btree (export_date date_revop, carrier_id, batchnum, encounternum, encounter_id) "ed_cbee_rev_p" btree (export_date date_revop, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "mcbe" btree (member_name, carrier_id, batchnum, encounternum, encounter_id) "mcbe_p" btree (member_name, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "mcbe_rev" btree (member_name text_revop, carrier_id, batchnum, encounternum, encounter_id) "mcbe_rev_p" btree (member_name text_revop, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "mcbee_norev" btree (member_num, carrier_id, batchnum, encounternum, encounter_id) "mcbee_norev_p" btree (member_num, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "mcbee_rev" btree (member_num text_revop, carrier_id, batchnum, encounternum, encounter_id) "mcbee_rev_p" btree (member_num text_revop, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "pcbee_norev" btree (patient_control, carrier_id, batchnum, encounternum, encounter_id) "pcbee_norev_p" btree (patient_control, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "pcbee_rev" btree (patient_control text_revop, carrier_id, batchnum, encounternum, encounter_id) "pcbee_rev_p" btree (patient_control text_revop, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "rcbee_norev" btree (receipt, carrier_id, batchnum, encounternum, encounter_id) "rcbee_norev_p" btree (receipt, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "rp_cbee_norev" btree (rendering_prov_name, carrier_id, batchnum, encounternum, encounter_id) "rp_cbee_norev_p" btree (rendering_prov_name, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "rp_cbee_rev" btree (rendering_prov_name text_revop, carrier_id, batchnum, encounternum, encounter_id) "rp_cbee_rev_p" btree (rendering_prov_name text_revop, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "sd_cbee_norev" btree (servicedate, carrier_id, batchnum, encounternum, encounter_id) "sd_cbee_norev_p" btree (servicedate, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "sd_cbee_rev" btree (servicedate date_revop, carrier_id, batchnum, encounternum, encounter_id) "sd_cbee_rev_p" btree (servicedate date_revop, carrier_id, batchnum, encounternum, encounter_id, procedure_id) "testrev" btree (receipt date_revop, carrier_id, batchnum, encounternum, encounter_id) "testrev_p" btree (receipt date_revop, carrier_id, batchnum, encounternum, encounter_id, procedure_id)
On Thu, 2007-04-12 at 16:03, Steve wrote: > Hey there; > > On a Postgres 8.2.3 server, I've got a query that is running very slow in > some cases. With some work, I've determined the 'slow part' of the query. > :) This is a query on a table with like 10 million rows or something like > that. encounter_id is an integer and receipt is of type 'date'. SNIP > Seq Scan on detail_summary ds (cost=0.00..1902749.83 rows=9962 width=4) > Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY > ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))) > (2 rows) How accurate is the row estimate made by the planner? (explain analyze to be sure)
Hi all, Wow! That's a lot to respond to. Let me go through some of the ideas... First, I just turned on autovacuum, I forgot to do that. I'm not seeing a major impact though. Also, I know that it's not optimal for a dedicated server. It's not just for postgres, it's also got our apache server on it. We're just getting started and didn't want to make the major investment right now in getting the most expensive server we can get. Within the next year, as our traffic grows, we will most likely upgrade, but for now when we're in the beginning phases of our project, we're going to work with this server. In terms of RAID not helping speed-wise (only making an impact in data integrity) - I was under the impression that even a mirrored disk set improves speed, because read requests can be sent to either of the disk controllers. Is this incorrect? I turned on logging again, only logging queries > 5ms. and it caused the same problems. I think it might be an issue within the OS's logging facilities, since it's going through stderr. Some of the queries are definitely making an impact on the speed. We are constantly trying to improve performance, and part of that is reassessing our indexes and denormalizing data where it would help. We're also doing work with memcached to cache the results of some of the more expensive operations. Thanks for all your help guys - it's really fantastic to see the community here! I've got a lot of database experience (mostly with ms sql and mysql) but this is my first time doing serious work with postgres and it's really a great system with great people too. Jason On Apr 12, 2007, at 11:35 AM, Jeff Frost wrote: > On Thu, 12 Apr 2007, Jason Lustig wrote: > >> 0 <-- BM starts here >> 10 0 180 700436 16420 91740 0 0 0 176 278 2923 >> 59 41 0 0 0 >> 11 0 180 696736 16420 91740 0 0 0 0 254 2904 >> 57 43 0 0 0 >> 12 0 180 691272 16420 91740 0 0 0 0 255 3043 >> 60 39 1 0 0 >> 9 0 180 690396 16420 91740 0 0 0 0 254 3078 >> 63 36 2 0 0 >> >> Obviously, I've turned off logging now but I'd like to get it >> running again (without bogging down the server) so that I can >> profile the system and find out which queries I need to optimize. >> My logging settings (with unnecessary comments taken out) were: > > So what did you get in the logs when you had logging turned on? If > you have the statement logging, perhaps it's worth running through > pgfouine to generate a report. > >> >> log_destination = 'syslog' # Valid values are >> combinations of >> redirect_stderr = off # Enable capturing of >> stderr into log >> log_min_duration_statement = 0 # -1 is disabled, 0 >> logs all statements >> silent_mode = on # DO NOT USE without >> syslog or >> log_duration = off >> log_line_prefix = 'user=%u,db=%d' # Special >> values: >> log_statement = 'none' # none, ddl, mod, all >> > > Perhaps you just want to log slow queries > 100ms? But since you > don't seem to know what queries you're running on each web page, > I'd suggest you just turn on the following and run your benchmark > against it, then turn it back off: > > log_duration = on > log_statement = 'all' > > Then go grab pgfouine and run the report against the logs to see > what queries are chewing up all your time. > >> So you know, we're using Postgres 8.2.3. The database currently is >> pretty small (we're just running a testing database right now with >> a few megabytes of data). No doubt some of our queries are slow, >> but I was concerned because no matter how slow the queries were >> (at most the worst were taking a couple of msecs anyway), I was >> getting ridiculously slow responses from the server. Outside of >> logging, our only other non-default postgresql.conf items are: >> >> shared_buffers = 13000 # min 128kB or >> max_connections*16kB >> work_mem = 8096 # min 64kB >> >> In terms of the server itself, I think that it uses software raid. >> How can I tell? Our hosting company set it up with the server so I >> guess I could ask them, but is there a program I can run which >> will tell me the information? I also ran bonnie++ and got this >> output: >> >> Version 1.03 ------Sequential Output------ --Sequential >> Input- --Random- >> -Per Chr- --Block-- -Rewrite- -Per Chr- -- >> Block-- --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % >> CP /sec %CP >> pgtest 2000M 29277 67 33819 15 15446 4 35144 62 48887 5 >> 152.7 0 >> ------Sequential Create------ --------Random >> Create-------- >> -Create-- --Read--- -Delete-- -Create-- -- >> Read--- -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec % >> CP /sec %CP >> 16 17886 77 +++++ +++ +++++ +++ 23258 99 +++++ ++ >> + +++++ +++ >> >> So I'm getting 33MB and 48MB write/read respectively. Is this >> slow? Is there anything I should be doing to optimize our RAID >> configuration? >> > > It's not fast, but at least it's about the same speed as an average > IDE drive from this era. More disks would help, but since you > indicate the DB fits in RAM with plenty of room to spare, how about > you update your effective_cache_size to something reasonable. You > can use the output of the 'free' command and take the cache number > and divide by 8 to get a reasonable value on linux. Then turn on > logging and run your benchmark. After that, run a pgfouine report > against the log and post us the explain analyze from your slow > queries. > > And if Ron is indeed local, it might be worthwhile to contact him. > Someone onsite would likely get this taken care of much faster than > we can on the mailing list. > > -- > Jeff Frost, Owner <jeff@frostconsultingllc.com> > Frost Consulting, LLC http://www.frostconsultingllc.com/ > Phone: 650-780-7908 FAX: 650-649-1954
Steve <cheetah@tanabi.org> writes: > Here's the table and it's indexes. Before looking, a note; there's > several 'revop' indexes, this is for sorting. The customer insisted on, > frankly, meaninglessly complicated sorts. I don't think any of that > matters for our purposes here though :) Oy vey ... I hope this is a read-mostly table, because having that many indexes has got to be killing your insert/update performance. I see that some of the revop indexes might be considered relevant to this query, so how exactly have you got those opclasses defined? There's built-in support for reverse sort as of CVS HEAD, but in existing releases you must have cobbled something together, and I wonder if that could be a contributing factor ... regards, tom lane
> > Oy vey ... I hope this is a read-mostly table, because having that many > indexes has got to be killing your insert/update performance. Hahaha yeah these are read-only tables. Nightly inserts/updates. Takes a few hours, depending on how many records (between 4 and 10 usually). But during the day, while querying, read only. > I see that some of the revop indexes might be considered relevant to > this query, so how exactly have you got those opclasses defined? > There's built-in support for reverse sort as of CVS HEAD, but in > existing releases you must have cobbled something together, and I wonder > if that could be a contributing factor ... Here's the revops (the c functions are at the bottom): CREATE FUNCTION ddd_date_revcmp(date, date) RETURNS integer AS '/usr/local/pgsql/contrib/cmplib.so', 'ddd_date_revcmp' LANGUAGE c STRICT; CREATE FUNCTION ddd_int_revcmp(integer, integer) RETURNS integer AS '/usr/local/pgsql/contrib/cmplib.so', 'ddd_int_revcmp' LANGUAGE c STRICT; CREATE FUNCTION ddd_text_revcmp(text, text) RETURNS integer AS '/usr/local/pgsql/contrib/cmplib.so', 'ddd_text_revcmp' LANGUAGE c STRICT; CREATE OPERATOR CLASS date_revop FOR TYPE date USING btree AS OPERATOR 1 >(date,date) , OPERATOR 2 >=(date,date) , OPERATOR 3 =(date,date) , OPERATOR 4 <=(date,date) , OPERATOR 5 <(date,date) , FUNCTION 1 ddd_date_revcmp(date,date); CREATE OPERATOR CLASS int4_revop FOR TYPE integer USING btree AS OPERATOR 1 >(integer,integer) , OPERATOR 2 >=(integer,integer) , OPERATOR 3 =(integer,integer) , OPERATOR 4 <=(integer,integer) , OPERATOR 5 <(integer,integer) , FUNCTION 1 ddd_int_revcmp(integer,integer); CREATE OPERATOR CLASS text_revop FOR TYPE text USING btree AS OPERATOR 1 >(text,text) , OPERATOR 2 >=(text,text) , OPERATOR 3 =(text,text) , OPERATOR 4 <=(text,text) , OPERATOR 5 <(text,text) , FUNCTION 1 ddd_text_revcmp(text,text); Datum ddd_date_revcmp(PG_FUNCTION_ARGS){ DateADT arg1=PG_GETARG_DATEADT(0); DateADT arg2=PG_GETARG_DATEADT(1); PG_RETURN_INT32(arg2 - arg1); } Datum ddd_int_revcmp(PG_FUNCTION_ARGS){ int32 arg1=PG_GETARG_INT32(0); int32 arg2=PG_GETARG_INT32(1); PG_RETURN_INT32(arg2 - arg1); } Datum ddd_text_revcmp(PG_FUNCTION_ARGS){ char* arg1=(char*)VARDATA(PG_GETARG_TEXT_P(0)); char* arg2=(char*)VARDATA(PG_GETARG_TEXT_P(1)); if((*arg1) != (*arg2)){ PG_RETURN_INT32(*arg2 - *arg1); }else{ PG_RETURN_INT32(strcmp(arg2,arg1)); } }
>> Seq Scan on detail_summary ds (cost=0.00..1902749.83 rows=9962 width=4) >> Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY >> ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))) >> (2 rows) > > How accurate is the row estimate made by the planner? (explain analyze > to be sure) > Results: Seq Scan on detail_summary ds (cost=0.00..1902749.83 rows=9962 width=4) (actual time=62871.386..257258.249 rows=112 loops=1) Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))) Total runtime: 257258.652 ms
On Thu, 2007-04-12 at 17:04, Steve wrote: > >> Seq Scan on detail_summary ds (cost=0.00..1902749.83 rows=9962 width=4) > >> Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY > >> ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))) > >> (2 rows) > > > > How accurate is the row estimate made by the planner? (explain analyze > > to be sure) > > > > Results: > > Seq Scan on detail_summary ds (cost=0.00..1902749.83 rows=9962 width=4) > (actual time=62871.386..257258.249 rows=112 loops=1) > Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY > ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))) > Total runtime: 257258.652 ms So there's a misjudgment of the number of rows returned by a factor of about 88. That's pretty big. Since you had the same number without the receipt date (I think...) then it's the encounter_id that's not being counted right. Try upping the stats target on that column and running analyze again and see if you get closer to 112 in your analyze or not.
On 12.04.2007, at 15:58, Jason Lustig wrote: > Wow! That's a lot to respond to. Let me go through some of the > ideas... First, I just turned on autovacuum, I forgot to do that. > I'm not seeing a major impact though. Also, I know that it's not > optimal for a dedicated server. Hmm, why not? Have you recently vacuumed your db manually so it gets cleaned up? Even a vacuum full might be useful if the db is really bloated. > It's not just for postgres, it's also got our apache server on it. > We're just getting started and didn't want to make the major > investment right now in getting the most expensive server we can get Hmmm, but more RAM would definitely make sense, especially in that szenaria. It really sounds like you machine is swapping to dead. What does the system say about memory usage? > Some of the queries are definitely making an impact on the speed. > We are constantly trying to improve performance, and part of that > is reassessing our indexes and denormalizing data where it would > help. We're also doing work with memcached to cache the results of > some of the more expensive operations. Hmmm, that kills you even more, as it uses RAM. I really don't think at the moment that it has something to do with PG itself, but with not enough memory for what you want to achieve. What perhaps helps might be connection pooling, so that not so many processes are created for the requests. It depends on your "middle- ware" what you can do about that. pg_pool might be an option. cug
Steve <cheetah@tanabi.org> writes: > Datum ddd_text_revcmp(PG_FUNCTION_ARGS){ > char* arg1=(char*)VARDATA(PG_GETARG_TEXT_P(0)); > char* arg2=(char*)VARDATA(PG_GETARG_TEXT_P(1)); > if((*arg1) != (*arg2)){ > PG_RETURN_INT32(*arg2 - *arg1); > }else{ > PG_RETURN_INT32(strcmp(arg2,arg1)); > } > } [ itch... ] That code is just completely wrong, because the contents of a TEXT datum aren't guaranteed null-terminated. It'd be better to invoke bttextcmp and negate its result. That's not relevant to your immediate problem, but if you've noticed any strange behavior with your text_revop indexes, that's the reason... regards, tom lane
> [ itch... ] That code is just completely wrong, because the contents > of a TEXT datum aren't guaranteed null-terminated. It'd be better to > invoke bttextcmp and negate its result. > > That's not relevant to your immediate problem, but if you've noticed > any strange behavior with your text_revop indexes, that's the reason... The indexes have all worked, though I'll make the change anyway. Documentation on how to code these things is pretty sketchy and I believe I followed an example on the site if I remember right. :/ Thanks for the info though :) Steve
> > So there's a misjudgment of the number of rows returned by a factor of > about 88. That's pretty big. Since you had the same number without the > receipt date (I think...) then it's the encounter_id that's not being > counted right. > > Try upping the stats target on that column and running analyze again and > see if you get closer to 112 in your analyze or not. > If I max the statistics targets at 1000, I get: Seq Scan on detail_summary ds (cost=0.00..1903030.26 rows=1099 width=4) Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))) Setting it ot 500 makes the estimated rows twice as much. It seems to have no effect on anything though, either way. :) Steve
Scott Marlowe <smarlowe@g2switchworks.com> writes: > So there's a misjudgment of the number of rows returned by a factor of > about 88. That's pretty big. Since you had the same number without the > receipt date (I think...) then it's the encounter_id that's not being > counted right. I don't think that's Steve's problem, though. It's certainly misestimating, but nonetheless the cost estimate for the seqscan is 1902749.83 versus 14819.81 for the bitmap scan; it should've picked the bitmap scan anyway. I tried to duplicate the problem here, without any success; I get Bitmap Heap Scan on detail_summary ds (cost=422.01..801.27 rows=100 width=4) Recheck Cond: (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])) Filter: (receipt >= '1998-12-30'::date) -> Bitmap Index Scan on detail_summary_encounter_id_idx (cost=0.00..421.98 rows=100 width=0) Index Cond: (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])) so either this has been fixed by a post-8.2.3 bug fix (which I doubt, it doesn't seem familiar at all) or there's some additional contributing factor. Steve, are you using any nondefault planner parameters? regards, tom lane
Here's my planner parameters: seq_page_cost = 1.0 # measured on an arbitrary scale random_page_cost = 1.5 # same scale as above cpu_tuple_cost = 0.001 # same scale as above cpu_index_tuple_cost = 0.0005 # same scale as above cpu_operator_cost = 0.00025 # same scale as above effective_cache_size = 8192MB default_statistics_target = 100 # range 1-1000 On a machine with 16 gigs of RAM. I tried to make it skew towards indexes. However, even if I force it to use the indexes (enable_seqscan=off) it doesn't make it any faster really :/ Steve On Thu, 12 Apr 2007, Tom Lane wrote: > Scott Marlowe <smarlowe@g2switchworks.com> writes: >> So there's a misjudgment of the number of rows returned by a factor of >> about 88. That's pretty big. Since you had the same number without the >> receipt date (I think...) then it's the encounter_id that's not being >> counted right. > > I don't think that's Steve's problem, though. It's certainly > misestimating, but nonetheless the cost estimate for the seqscan is > 1902749.83 versus 14819.81 for the bitmap scan; it should've picked > the bitmap scan anyway. > > I tried to duplicate the problem here, without any success; I get > > Bitmap Heap Scan on detail_summary ds (cost=422.01..801.27 rows=100 width=4) > Recheck Cond: (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])) > Filter: (receipt >= '1998-12-30'::date) > -> Bitmap Index Scan on detail_summary_encounter_id_idx (cost=0.00..421.98 rows=100 width=0) > Index Cond: (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])) > > so either this has been fixed by a post-8.2.3 bug fix (which I doubt, > it doesn't seem familiar at all) or there's some additional contributing > factor. Steve, are you using any nondefault planner parameters? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
Steve <cheetah@tanabi.org> writes: > Here's my planner parameters: I copied all these, and my 8.2.x still likes the bitmap scan a lot better than the seqscan. Furthermore, I double-checked the CVS history and there definitely haven't been any changes in that area in REL8_2 branch since 8.2.3. So I'm a bit baffled. Maybe the misbehavior is platform-specific ... what are you on exactly? Is there anything nonstandard about your Postgres installation? regards, tom lane
Steve <cheetah@tanabi.org> writes: > ... even if I force it to use the indexes > (enable_seqscan=off) it doesn't make it any faster really :/ Does that change the plan, or do you still get a seqscan? BTW, how big is this table really (how many rows)? regards, tom lane
It's a redhat enterprise machine running AMD x64 processors. Linux ers3.dddcorp.com 2.6.9-42.0.10.ELsmp #1 SMP Fri Feb 16 17:13:42 EST 2007 x86_64 x86_64 x86_64 GNU/Linux It was compiled by me, straight up, nothing weird at all, no odd compiler options or wahtever :) So yeah :/ I'm quite baffled as well, Talk to you later, Steve On Thu, 12 Apr 2007, Tom Lane wrote: > Steve <cheetah@tanabi.org> writes: >> Here's my planner parameters: > > I copied all these, and my 8.2.x still likes the bitmap scan a lot > better than the seqscan. Furthermore, I double-checked the CVS history > and there definitely haven't been any changes in that area in REL8_2 > branch since 8.2.3. So I'm a bit baffled. Maybe the misbehavior is > platform-specific ... what are you on exactly? Is there anything > nonstandard about your Postgres installation? > > regards, tom lane >
Table size: 16,037,728 rows With enable_seqscan=off I get: Bitmap Heap Scan on detail_summary ds (cost=4211395.20..4213045.32 rows=1099 width=4) Recheck Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))) -> Bitmap Index Scan on detail_summary_receipt_encounter_idx (cost=0.00..4211395.17 rows=1099 width=0) Index Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))) The explain analyze is pending, running it now (however it doens't really appear to be any faster using this plan). Steve On Thu, 12 Apr 2007, Tom Lane wrote: > Steve <cheetah@tanabi.org> writes: >> ... even if I force it to use the indexes >> (enable_seqscan=off) it doesn't make it any faster really :/ > > Does that change the plan, or do you still get a seqscan? > > BTW, how big is this table really (how many rows)? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
Here's the explain analyze with seqscan = off: Bitmap Heap Scan on detail_summary ds (cost=4211395.20..4213045.32 rows=1099 width=4) (actual time=121288.825..121305.908 rows=112 loops=1) Recheck Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))) -> Bitmap Index Scan on detail_summary_receipt_encounter_idx (cost=0.00..4211395.17 rows=1099 width=0) (actual time=121256.681..121256.681 rows=112 loops=1) Index Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))) Total runtime: 121306.233 ms Your other question is answered in the other mail along with the non-analyze'd query plan :D Steve On Thu, 12 Apr 2007, Tom Lane wrote: > Steve <cheetah@tanabi.org> writes: >> ... even if I force it to use the indexes >> (enable_seqscan=off) it doesn't make it any faster really :/ > > Does that change the plan, or do you still get a seqscan? > > BTW, how big is this table really (how many rows)? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
Steve <cheetah@tanabi.org> writes: > With enable_seqscan=off I get: > -> Bitmap Index Scan on detail_summary_receipt_encounter_idx > (cost=0.00..4211395.17 rows=1099 width=0) > Index Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id = > ANY ... > The explain analyze is pending, running it now (however it doens't really > appear to be any faster using this plan). Yeah, that index is nearly useless for this query --- since the receipt condition isn't really eliminating anything, it'll have to look at every index entry :-( ... in fact, do so again for each of the IN arms :-( :-( So it's definitely right not to want to use that plan. Question is, why is it seemingly failing to consider the "right" index? I'm busy setting up my test case on an x86_64 machine right now, but I rather fear it'll still work just fine for me. Have you got any nondefault parameter settings besides the ones you already mentioned? Another thing that might be interesting, if you haven't got a problem with exclusive-locking the table for a little bit, is BEGIN; DROP INDEX each index except detail_summary_encounter_id_idx EXPLAIN the problem query ROLLBACK; just to see if it does the right thing when it's not distracted by all the "wrong" indexes (yeah, I'm grasping at straws here). If you set up the above as a SQL script it should only take a second to run. Please try this with both settings of enable_seqscan --- you don't need to do "explain analyze" though, we just want to know which plan it picks and what the cost estimate is. regards, tom lane
If the other indexes are removed, with enable_seqscan=on: Bitmap Heap Scan on detail_summary ds (cost=154.10..1804.22 rows=1099 width=4) Recheck Cond: (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])) Filter: (receipt >= '1998-12-30'::date) -> Bitmap Index Scan on detail_summary_encounter_id_idx (cost=0.00..154.07 rows=1099 width=0) Index Cond: (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])) With it off: Bitmap Heap Scan on detail_summary ds (cost=154.10..1804.22 rows=1099 width=4) Recheck Cond: (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])) Filter: (receipt >= '1998-12-30'::date) -> Bitmap Index Scan on detail_summary_encounter_id_idx (cost=0.00..154.07 rows=1099 width=0) Index Cond: (encounter_id = ANY ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])) Either way, it runs perfectly fast. So it looks like the indexes are confusing this query like you suspected. Any advise? This isn't the only query we run on this table, much as I'd absolutely love to kill off some indexes to imrpove our nightly load times I can't foul up the other queries :) Thank you very much for all your help on this issue, too! Steve On Thu, 12 Apr 2007, Tom Lane wrote: > Steve <cheetah@tanabi.org> writes: >> With enable_seqscan=off I get: > >> -> Bitmap Index Scan on detail_summary_receipt_encounter_idx >> (cost=0.00..4211395.17 rows=1099 width=0) >> Index Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id = >> ANY ... > >> The explain analyze is pending, running it now (however it doens't really >> appear to be any faster using this plan). > > Yeah, that index is nearly useless for this query --- since the receipt > condition isn't really eliminating anything, it'll have to look at every > index entry :-( ... in fact, do so again for each of the IN arms :-( :-( > So it's definitely right not to want to use that plan. Question is, why > is it seemingly failing to consider the "right" index? > > I'm busy setting up my test case on an x86_64 machine right now, but > I rather fear it'll still work just fine for me. Have you got any > nondefault parameter settings besides the ones you already mentioned? > > Another thing that might be interesting, if you haven't got a problem > with exclusive-locking the table for a little bit, is > > BEGIN; > DROP INDEX each index except detail_summary_encounter_id_idx > EXPLAIN the problem query > ROLLBACK; > > just to see if it does the right thing when it's not distracted by > all the "wrong" indexes (yeah, I'm grasping at straws here). If you > set up the above as a SQL script it should only take a second to run. > Please try this with both settings of enable_seqscan --- you don't need > to do "explain analyze" though, we just want to know which plan it picks > and what the cost estimate is. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
Steve <cheetah@tanabi.org> writes: > Either way, it runs perfectly fast. So it looks like the indexes are > confusing this query like you suspected. Any advise? Wow --- sometimes grasping at straws pays off. I was testing here with just a subset of the indexes to save build time, but I bet that one of the "irrelevant" ones is affecting this somehow. Time to re-test. If you have some time to kill, it might be interesting to vary that begin/rollback test script to leave one or two other indexes in place, and see if you can identify exactly which other index(es) get it confused. I'm about to go out to dinner with the wife, but will have a closer look when I get back, or tomorrow morning. We'll figure this out. regards, tom lane
Okay -- I started leaving indexes on one by one. The explain broke when the detail_summary_receipt_encounter_idx index was left on (receipt, encounter_id). Just dropping that index had no effect, but there's a LOT of indexes that refer to receipt. So on a hunch I tried dropping all indexes that refer to receipt date and that worked -- so it's the indexes that contain receipt date that are teh problem. For more fun, I tried leaving the index that's just receipt date alone (index detail_summary_receipt_id_idx) and THAT produced the correct query; it's all these multi-column queries that are fouling things up, it would seem! .... So does this mean I should experiment with dropping those indexes? I'm not sure if that will result in 'bad things' as there are other complicated actions like sorts that may go real slow if I drop those indexes. BUT I think it'll be easy to convince the customer to drop their absurdly complicated sorts if I can come back with serious results like what we've worked out here. And thanks again -- have a good dinner! :) Steve On Thu, 12 Apr 2007, Tom Lane wrote: > Steve <cheetah@tanabi.org> writes: >> Either way, it runs perfectly fast. So it looks like the indexes are >> confusing this query like you suspected. Any advise? > > Wow --- sometimes grasping at straws pays off. I was testing here with > just a subset of the indexes to save build time, but I bet that one of > the "irrelevant" ones is affecting this somehow. Time to re-test. > > If you have some time to kill, it might be interesting to vary that > begin/rollback test script to leave one or two other indexes in place, > and see if you can identify exactly which other index(es) get it > confused. > > I'm about to go out to dinner with the wife, but will have a closer > look when I get back, or tomorrow morning. We'll figure this out. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate >
Steve <cheetah@tanabi.org> writes: > Okay -- I started leaving indexes on one by one. > ... > .... So does this mean I should experiment with dropping those indexes? No, I think this means there's a planner bug to fix. I haven't quite scoped out what it is yet, though. regards, tom lane
Steve <cheetah@tanabi.org> writes: > Just dropping that index had no effect, but there's a LOT of indexes that > refer to receipt. So on a hunch I tried dropping all indexes that refer > to receipt date and that worked -- so it's the indexes that contain > receipt date that are teh problem. I'm still not having any luck reproducing the failure here. Grasping at straws again, I wonder if it's got something to do with the order in which the planner examines the indexes --- which is OID order. Could you send me the results of select indexrelid::regclass from pg_index where indrelid = 'detail_summary'::regclass order by indexrelid; regards, tom lane
Here you go: detail_summary_b_record_status_idx detail_summary_batch_id_idx detail_summary_batchnum_idx detail_summary_carrier_id_idx detail_summary_duplicate_id_idx detail_summary_e_record_status_idx detail_summary_encounter_id_idx detail_summary_encounternum_idx detail_summary_export_date_idx detail_summary_hedis_date_idx detail_summary_member_name_idx detail_summary_member_num_idx detail_summary_p_record_status_idx detail_summary_patient_control_idx detail_summary_procedurecode_idx detail_summary_product_line_idx detail_summary_provider_id_idx detail_summary_raps_date_idx detail_summary_receipt_id_idx detail_summary_referrering_prov_id_idx detail_summary_rendering_prov_id_idx detail_summary_rendering_prov_name_idx detail_summary_servicedate_idx ds_sort_1 ds_sort_10 ed_cbee_norev ed_cbee_norev_p ed_cbee_rev ed_cbee_rev_p mcbe mcbe_p mcbe_rev mcbe_rev_p mcbee_norev mcbee_norev_p mcbee_rev mcbee_rev_p pcbee_norev pcbee_norev_p pcbee_rev pcbee_rev_p rcbee_norev rcbee_norev_p rp_cbee_norev rp_cbee_norev_p rp_cbee_rev rp_cbee_rev_p sd_cbee_norev sd_cbee_norev_p sd_cbee_rev sd_cbee_rev_p testrev testrev_p detail_summary_receipt_encounter_idx On Thu, 12 Apr 2007, Tom Lane wrote: > Steve <cheetah@tanabi.org> writes: >> Just dropping that index had no effect, but there's a LOT of indexes that >> refer to receipt. So on a hunch I tried dropping all indexes that refer >> to receipt date and that worked -- so it's the indexes that contain >> receipt date that are teh problem. > > I'm still not having any luck reproducing the failure here. Grasping at > straws again, I wonder if it's got something to do with the order in > which the planner examines the indexes --- which is OID order. Could > you send me the results of > > select indexrelid::regclass from pg_index where indrelid = 'detail_summary'::regclass order by indexrelid; > > regards, tom lane >
Steve <cheetah@tanabi.org> writes: > On Thu, 12 Apr 2007, Tom Lane wrote: >> I'm still not having any luck reproducing the failure here. Grasping at >> straws again, I wonder if it's got something to do with the order in >> which the planner examines the indexes --- which is OID order. Could >> you send me the results of > Here you go: Nope, still doesn't fail for me. [ baffled and annoyed... ] There must be something about your situation that's relevant but we aren't recognizing it. Are you in a position to let me ssh into your machine and try to debug it? Or other options like sending me a dump of your database? I'm about out of steam for tonight, but let's talk about it off-list tomorrow. regards, tom lane
Steve <cheetah@tanabi.org> writes: > [ strange planner misbehavior in 8.2.3 ] After some off-list investigation (thanks, Steve, for letting me poke at your machine), the short answer is that the heuristics used by choose_bitmap_and() suck. The problem query is like select ... from ds where ds.receipt >= '1998-12-30 0:0:0' and ds.encounter_id in ( ... 100 distinct values ... ); and the table has a truly remarkable variety of indexes on encounter_id, receipt, and combinations of them with other columns. The receipt condition is actually in effect a no-op, because all receipt dates are later than that, but because ineq_histogram_selectivity doesn't trust histogram data unreservedly we compute a selectivity of about 0.99997 for it. That means that the indexes that cover both receipt and encounter_id are given a selectivity score just fractionally better than those involving encounter_id alone, and therefore they sort first in choose_bitmap_and's sort step, and the way that that routine is coded, only combinations of the very first index with other ones will be considered for a bitmap heap scan. So the possibility of using just the index on encounter_id alone is never considered, even though that alternative is vastly cheaper than the alternatives that are considered. (It happens that encounter_id is a low-order column in all the indexes that include receipt, and so these scans end up covering the whole index ... multiple times even. The cost estimation is fine --- the thing knows these are expensive --- what's falling down is the heuristic for which combinations of indexes to consider using in a bitmap scan.) The original coding of choose_bitmap_and involved a "fuzzy" comparison of selectivities, which would have avoided this problem, but we got rid of that later because it had its own problems. In fact, choose_bitmap_and has caused us enough problems that I'm thinking we need a fundamental rethink of how it works, rather than just marginal tweaks. If you haven't looked at this code before, the comments explain the idea well enough: /* * choose_bitmap_and * Given a nonempty list of bitmap paths, AND them into one path. * * This is a nontrivial decision since we can legally use any subset of the * given path set. We want to choose a good tradeoff between selectivity * and cost of computing the bitmap. * * The result is either a single one of the inputs, or a BitmapAndPath * combining multiple inputs. */ ... /* * In theory we should consider every nonempty subset of the given paths. * In practice that seems like overkill, given the crude nature of the * estimates, not to mention the possible effects of higher-level AND and * OR clauses. As a compromise, we sort the paths by selectivity. We * always take the first, and sequentially add on paths that result in a * lower estimated cost. * * We also make some effort to detect directly redundant input paths, as * can happen if there are multiple possibly usable indexes. (Another way * it can happen is that best_inner_indexscan will find the same OR join * clauses that create_or_index_quals has pulled OR restriction clauses * out of, and then both versions show up as duplicate paths.) We * consider an index redundant if any of its index conditions were already * used by earlier indexes. (We could use predicate_implied_by to have a * more intelligent, but much more expensive, check --- but in most cases * simple pointer equality should suffice, since after all the index * conditions are all coming from the same RestrictInfo lists.) * * You might think the condition for redundancy should be "all index * conditions already used", not "any", but this turns out to be wrong. * For example, if we use an index on A, and then come to an index with * conditions on A and B, the only way that the second index can be later * in the selectivity-order sort is if the condition on B is completely * non-selective. In any case, we'd surely be drastically misestimating * the selectivity if we count the same condition twice. * * We include index predicate conditions in the redundancy test. Because * the test is just for pointer equality and not equal(), the effect is * that use of the same partial index in two different AND elements is * considered redundant. (XXX is this too strong?) * * Note: outputting the selected sub-paths in selectivity order is a good * thing even if we weren't using that as part of the selection method, * because it makes the short-circuit case in MultiExecBitmapAnd() more * likely to apply. */ One idea I thought about was to sort by index scan cost, using selectivity only as a tiebreaker for cost, rather than the other way around as is currently done. This seems fairly plausible because indexscans that are cheaper than other indexscans likely return fewer rows too, and so selectivity is already accounted for to some extent --- at least you can't have an enormously worse selectivity at lower cost, whereas Steve's example proves it doesn't work the other way. But I'm worried about breaking the reasoning about redundant indexes that's mentioned in the comments. Another alternative that would respond to the immediate problem is to maintain the current sort order, but as we come to each index, consider using that one alone, and throw away whatever AND we might have built up if that one alone beats the AND-so-far. This seems more conservative, as it's unlikely to break any cases that work well now, but on the other hand it feels like plastering another wart atop a structure that's already rather rickety. Has anyone got any thoughts about the best way to do this? regards, tom lane
Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)
From
Alvaro Herrera
Date:
Tom Lane wrote: > One idea I thought about was to sort by index scan cost, using > selectivity only as a tiebreaker for cost, rather than the other way > around as is currently done. This seems fairly plausible because > indexscans that are cheaper than other indexscans likely return fewer > rows too, and so selectivity is already accounted for to some extent --- > at least you can't have an enormously worse selectivity at lower cost, > whereas Steve's example proves it doesn't work the other way. But I'm > worried about breaking the reasoning about redundant indexes that's > mentioned in the comments. > > Another alternative that would respond to the immediate problem is to > maintain the current sort order, but as we come to each index, consider > using that one alone, and throw away whatever AND we might have built up > if that one alone beats the AND-so-far. This seems more conservative, > as it's unlikely to break any cases that work well now, but on the other > hand it feels like plastering another wart atop a structure that's > already rather rickety. > > Has anyone got any thoughts about the best way to do this? How about doing both: sort the index by index scan cost; then pick the first index on the list and start adding indexes when they lower the cost. When adding each index, consider it by itself against the already stacked indexes. If the cost is lower, put this index at the top of the list, and restart the algorithm (after the sorting step of course). I think the concern about condition redundancy should be attacked separately. How about just comparing whether they have common prefixes of conditions? I admit I don't understand what would happen with indexes defined like (lower(A), B, C) versus (A, B) for example. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)
From
Tom Lane
Date:
"Dann Corbit" <DCorbit@connx.com> writes: > Instead of sorting, I suggest the quickselect() algorithm, which is > O(n). What for? Common cases have less than half a dozen entries. That is not the place we need to be spending engineering effort --- what we need to worry about is what's the choice algorithm, not implementation details. regards, tom lane
Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes: > I think the concern about condition redundancy should be attacked > separately. How about just comparing whether they have common prefixes > of conditions? I admit I don't understand what would happen with > indexes defined like (lower(A), B, C) versus (A, B) for example. I understand that issue a bit better than I did when I wrote the comment (so I suppose I better rewrite it). The $64 reason for rejecting AND-combinations of indexes that are using some of the same WHERE-conditions is that if we don't, we effectively double-count the selectivity of those conditions, causing us to prefer useless AND-combinations. An example is "WHERE A > 42 AND B < 100" where we have an index on A and one on (A,B). The selectivity calculation will blindly assume that the selectivities of the two indexes are independent and thus prefer to BitmapAnd them, when obviously there is no point in using both. Ideally we should improve the selectivity calculation to not get fooled like this, but that seems hard and probably slow. So for the moment we have the heuristic that no WHERE-clause should be used twice in any AND-combination. Given that we are using that heuristic, it becomes important that we visit the indexes in the "right order" --- as the code stands, in the (A) vs (A,B) case it will consider only the first index it arrives at in the selectivity sort order, because the second will be rejected on the basis of re-use of the WHERE A > 42 condition. Sorting by selectivity tends to ensure that we pick the index that can make use of as many WHERE-clauses as possible. The idea of considering each index alone fixes the order dependency for cases where a single index is the best answer, but it doesn't help much for cases where you really do want a BitmapAnd, only not one using the index with the individually best selectivity. We really need a heuristic here --- exhaustive search will be impractical in cases where there are many indexes, because you'd be looking at 2^N combinations. (In Steve's example there are actually 38 relevant indexes, which is bad database design if you ask me, but in any case we cannot afford to search through 2^38 possibilities.) But the one we're using now is too fragile. Maybe we should use a cutoff similar to the GEQO one: do exhaustive search if there are less than N relevant indexes, for some N. But that's not going to help Steve; we still need a smarter heuristic for what to look for above the cutoff. regards, tom lane
Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Has anyone got any thoughts about the best way to do this? > How about doing both: sort the index by index scan cost; then pick the > first index on the list and start adding indexes when they lower the > cost. When adding each index, consider it by itself against the > already stacked indexes. If the cost is lower, put this index at the > top of the list, and restart the algorithm (after the sorting step of > course). The "restart" part of that bothers me --- it's not entirely clear that you couldn't get into an infinite loop. (Imagine that A+B is better than A alone, so we adopt it, but worse than C alone, so we take C as the new leader and start over. Then perhaps C+B is better than C alone but worse than A alone, so we take A as the leader and start over. Maybe this is impossible but I'm unsure.) I looked back at the gdb results I'd gotten from Steve's example and noticed that for his 38 indexes there were only three distinct index selectivity values, and the sort step grouped indexes by cost within those groups. In hindsight of course this is obvious: the selectivity depends on the set of WHERE-clauses used, so with two WHERE-clauses there are three possible selectivities (to within roundoff error anyway) depending on whether the index uses one or both clauses. So the existing algorithm gets one thing right: for any two indexes that make use of just the same set of WHERE-clauses, it will always take the one with cheaper scan cost. Thinking more about this leads me to the following proposal: 1. Explicitly group the indexes according to the subset of WHERE-conditions (and partial index conditions, if any) they use. Within each such group, discard all but the cheapest-scan-cost one. 2. Sort the remaining indexes according to scan cost. 3. For each index in order, consider it as a standalone scan, and also consider adding it on to the AND-group led by each preceding index, using the same logic as now: reject using any WHERE-condition twice in a group, and then add on only if the total cost of the AND-group scan is reduced. This would be approximately O(N^2) in the number of relevant indexes, whereas the current scheme is roughly linear (handwaving a bit here because the number of WHERE-clauses is a factor too). But that seems unlikely to be a problem for sane numbers of indexes, unlike the O(2^N) behavior of an exhaustive search. It would get rid of (most of) the order-sensitivity problem, since we would definitely consider the AND-combination of every pair of combinable indexes. I can imagine cases where it wouldn't notice useful three-or-more-way combinations because the preceding two-way combination didn't win, but they seem pretty remote possibilities. Comments? regards, tom lane
Hi Tom / Steve,
Am one of the silent readers of performance issues that come up on this list (and are discussed in detail) ... just like this one.
If and when you do come up with a solution, please do post some details about them here... (i say that coz it seems that for obvious reasons, things must have gone off air after tom's last email, and one can understand that). But an analysis, or atleast a pointer may be of help to someone (like me) reading this list.
Thanks
Robins
---------- Forwarded message ----------
From: Tom Lane < tgl@sss.pgh.pa.us>
Date: Apr 13, 2007 10:08 AM
Subject: Re: [PERFORM] Strangely Variable Query Performance
To: Steve <cheetah@tanabi.org>
Cc: Scott Marlowe < smarlowe@g2switchworks.com >, PostgreSQL Performance <pgsql-performance@postgresql.org>
Steve < cheetah@tanabi.org> writes:
> On Thu, 12 Apr 2007, Tom Lane wrote:
>> I'm still not having any luck reproducing the failure here. Grasping at
>> straws again, I wonder if it's got something to do with the order in
>> which the planner examines the indexes --- which is OID order. Could
>> you send me the results of
> Here you go:
Nope, still doesn't fail for me. [ baffled and annoyed... ] There must
be something about your situation that's relevant but we aren't
recognizing it. Are you in a position to let me ssh into your machine
and try to debug it? Or other options like sending me a dump of your
database? I'm about out of steam for tonight, but let's talk about it
off-list tomorrow.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Am one of the silent readers of performance issues that come up on this list (and are discussed in detail) ... just like this one.
If and when you do come up with a solution, please do post some details about them here... (i say that coz it seems that for obvious reasons, things must have gone off air after tom's last email, and one can understand that). But an analysis, or atleast a pointer may be of help to someone (like me) reading this list.
Thanks
Robins
---------- Forwarded message ----------
From: Tom Lane < tgl@sss.pgh.pa.us>
Date: Apr 13, 2007 10:08 AM
Subject: Re: [PERFORM] Strangely Variable Query Performance
To: Steve <cheetah@tanabi.org>
Cc: Scott Marlowe < smarlowe@g2switchworks.com >, PostgreSQL Performance <pgsql-performance@postgresql.org>
Steve < cheetah@tanabi.org> writes:
> On Thu, 12 Apr 2007, Tom Lane wrote:
>> I'm still not having any luck reproducing the failure here. Grasping at
>> straws again, I wonder if it's got something to do with the order in
>> which the planner examines the indexes --- which is OID order. Could
>> you send me the results of
> Here you go:
Nope, still doesn't fail for me. [ baffled and annoyed... ] There must
be something about your situation that's relevant but we aren't
recognizing it. Are you in a position to let me ssh into your machine
and try to debug it? Or other options like sending me a dump of your
database? I'm about out of steam for tonight, but let's talk about it
off-list tomorrow.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Robins <tharakan@gmail.com> writes: > If and when you do come up with a solution, please do post some details > about them here... (i say that coz it seems that for obvious reasons, things > must have gone off air after tom's last email, and one can understand that). > But an analysis, or atleast a pointer may be of help to someone (like me) > reading this list. Oh, sorry, the subsequent discussion moved over to pgsql-hackers: http://archives.postgresql.org/pgsql-hackers/2007-04/msg00621.php and -patches: http://archives.postgresql.org/pgsql-patches/2007-04/msg00374.php Those are good places to look if a discussion on -bugs or other lists seems to tail off... regards, tom lane PS: the reason I couldn't reproduce the behavior was just that the dummy data I was using didn't have the right statistics.
Hi Tom / Steve,
Am one of the silent readers of performance issues that come up on this list (and are discussed in detail) ... just like this one.
If and when you do come up with a solution, please do post some details about them here... (i say that coz it seems that for obvious reasons, things must have gone off air after tom's last email, and one can understand that). But an analysis, or atleast a pointer may be of help to someone (like me) reading this list.
Thanks
Robins
---------- Forwarded message ----------
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Apr 13, 2007 10:08 AM
Subject: Re: [PERFORM] Strangely Variable Query Performance
To: Steve <cheetah@tanabi.org>
Cc: Scott Marlowe <smarlowe@g2switchworks.com >, PostgreSQL Performance <pgsql-performance@postgresql.org>
Steve <cheetah@tanabi.org> writes:
> On Thu, 12 Apr 2007, Tom Lane wrote:
>> I'm still not having any luck reproducing the failure here. Grasping at
>> straws again, I wonder if it's got something to do with the order in
>> which the planner examines the indexes --- which is OID order. Could
>> you send me the results of
> Here you go:
Nope, still doesn't fail for me. [ baffled and annoyed... ] There must
be something about your situation that's relevant but we aren't
recognizing it. Are you in a position to let me ssh into your machine
and try to debug it? Or other options like sending me a dump of your
database? I'm about out of steam for tonight, but let's talk about it
off-list tomorrow.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
--
Robins
Am one of the silent readers of performance issues that come up on this list (and are discussed in detail) ... just like this one.
If and when you do come up with a solution, please do post some details about them here... (i say that coz it seems that for obvious reasons, things must have gone off air after tom's last email, and one can understand that). But an analysis, or atleast a pointer may be of help to someone (like me) reading this list.
Thanks
Robins
---------- Forwarded message ----------
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Apr 13, 2007 10:08 AM
Subject: Re: [PERFORM] Strangely Variable Query Performance
To: Steve <cheetah@tanabi.org>
Cc: Scott Marlowe <smarlowe@g2switchworks.com >, PostgreSQL Performance <pgsql-performance@postgresql.org>
Steve <cheetah@tanabi.org> writes:
> On Thu, 12 Apr 2007, Tom Lane wrote:
>> I'm still not having any luck reproducing the failure here. Grasping at
>> straws again, I wonder if it's got something to do with the order in
>> which the planner examines the indexes --- which is OID order. Could
>> you send me the results of
> Here you go:
Nope, still doesn't fail for me. [ baffled and annoyed... ] There must
be something about your situation that's relevant but we aren't
recognizing it. Are you in a position to let me ssh into your machine
and try to debug it? Or other options like sending me a dump of your
database? I'm about out of steam for tonight, but let's talk about it
off-list tomorrow.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
--
Robins