Thread: Performance
I have been wrestling with the configuration of the dedicated Postges 9.0.3 server at work and granted, there's more activity on the production server, but the same queries take twice as long on the beefier server than my mac at home. I have pasted what I have changed in postgresql.conf - I am wondering if there's any way one can help me change things around to be more efficient.
Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
Heavy write and read (for reporting and calculations) server.
max_connections = 350
shared_buffers = 4096MB
work_mem = 32MB
maintenance_work_mem = 512MB
seq_page_cost = 0.02 # measured on an arbitrary scale
random_page_cost = 0.03
cpu_tuple_cost = 0.02
effective_cache_size = 8192MB
The planner costs seem a bit low but this was from suggestions from this very list a while ago.
Thank you
Ogden
Ogden <lists@darkstatic.com> wrote: > I have been wrestling with the configuration of the dedicated Postges 9.0.3 > server at work and granted, there's more activity on the production server, but > the same queries take twice as long on the beefier server than my mac at home. > I have pasted what I have changed in postgresql.conf - I am wondering if > there's any way one can help me change things around to be more efficient. > > Dedicated PostgreSQL 9.0.3 Server with 16GB Ram > > Heavy write and read (for reporting and calculations) server. > > max_connections = 350 > shared_buffers = 4096MB > work_mem = 32MB > maintenance_work_mem = 512MB That's okay. > > > seq_page_cost = 0.02 # measured on an arbitrary scale > random_page_cost = 0.03 Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost and random_page_cost) are completly wrong. > cpu_tuple_cost = 0.02 > effective_cache_size = 8192MB > > > > The planner costs seem a bit low but this was from suggestions from this very > list a while ago. Sure? Can you tell us a link into the archive? Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote: > Ogden <lists@darkstatic.com> wrote: > >> I have been wrestling with the configuration of the dedicated Postges 9.0.3 >> server at work and granted, there's more activity on the production server, but >> the same queries take twice as long on the beefier server than my mac at home. >> I have pasted what I have changed in postgresql.conf - I am wondering if >> there's any way one can help me change things around to be more efficient. >> >> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram >> >> Heavy write and read (for reporting and calculations) server. >> >> max_connections = 350 >> shared_buffers = 4096MB >> work_mem = 32MB >> maintenance_work_mem = 512MB > > That's okay. > > >> >> >> seq_page_cost = 0.02 # measured on an arbitrary scale >> random_page_cost = 0.03 > > Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost > and random_page_cost) are completly wrong. > No, I don't have super fast disks. Just the 15K SCSI over RAID. I find by raising them to: seq_page_cost = 1.0 random_page_cost = 3.0 cpu_tuple_cost = 0.3 #cpu_index_tuple_cost = 0.005 # same scale as above - 0.005 #cpu_operator_cost = 0.0025 # same scale as above effective_cache_size = 8192MB That this is better, some queries run much faster. Is this better? I will find the archive and post. Thank you Ogden
Dne 12.4.2011 19:23, Ogden napsal(a): > > On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote: > >> Ogden <lists@darkstatic.com> wrote: >> >>> I have been wrestling with the configuration of the dedicated Postges 9.0.3 >>> server at work and granted, there's more activity on the production server, but >>> the same queries take twice as long on the beefier server than my mac at home. >>> I have pasted what I have changed in postgresql.conf - I am wondering if >>> there's any way one can help me change things around to be more efficient. >>> >>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram >>> >>> Heavy write and read (for reporting and calculations) server. >>> >>> max_connections = 350 >>> shared_buffers = 4096MB >>> work_mem = 32MB >>> maintenance_work_mem = 512MB >> >> That's okay. >> >> >>> >>> >>> seq_page_cost = 0.02 # measured on an arbitrary scale >>> random_page_cost = 0.03 >> >> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost >> and random_page_cost) are completly wrong. >> > > No, I don't have super fast disks. Just the 15K SCSI over RAID. I > find by raising them to: > > seq_page_cost = 1.0 > random_page_cost = 3.0 > cpu_tuple_cost = 0.3 > #cpu_index_tuple_cost = 0.005 # same scale as above - 0.005 > #cpu_operator_cost = 0.0025 # same scale as above > effective_cache_size = 8192MB > > That this is better, some queries run much faster. Is this better? I guess it is. What really matters with those cost variables is the relative scale - the original values seq_page_cost = 0.02 random_page_cost = 0.03 cpu_tuple_cost = 0.02 suggest that the random reads are almost as expensive as sequential reads (which usually is not true - the random reads are significantly more expensive), and that processing each row is about as expensive as reading the page from disk (again, reading data from disk is much more expensive than processing them). So yes, the current values are much more likely to give good results. You've mentioned those values were recommended on this list - can you point out the actual discussion? regards Tomas
On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote: > Dne 12.4.2011 19:23, Ogden napsal(a): >> >> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote: >> >>> Ogden <lists@darkstatic.com> wrote: >>> >>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3 >>>> server at work and granted, there's more activity on the production server, but >>>> the same queries take twice as long on the beefier server than my mac at home. >>>> I have pasted what I have changed in postgresql.conf - I am wondering if >>>> there's any way one can help me change things around to be more efficient. >>>> >>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram >>>> >>>> Heavy write and read (for reporting and calculations) server. >>>> >>>> max_connections = 350 >>>> shared_buffers = 4096MB >>>> work_mem = 32MB >>>> maintenance_work_mem = 512MB >>> >>> That's okay. >>> >>> >>>> >>>> >>>> seq_page_cost = 0.02 # measured on an arbitrary scale >>>> random_page_cost = 0.03 >>> >>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost >>> and random_page_cost) are completly wrong. >>> >> >> No, I don't have super fast disks. Just the 15K SCSI over RAID. I >> find by raising them to: >> >> seq_page_cost = 1.0 >> random_page_cost = 3.0 >> cpu_tuple_cost = 0.3 >> #cpu_index_tuple_cost = 0.005 # same scale as above - 0.005 >> #cpu_operator_cost = 0.0025 # same scale as above >> effective_cache_size = 8192MB >> >> That this is better, some queries run much faster. Is this better? > > I guess it is. What really matters with those cost variables is the > relative scale - the original values > > seq_page_cost = 0.02 > random_page_cost = 0.03 > cpu_tuple_cost = 0.02 > > suggest that the random reads are almost as expensive as sequential > reads (which usually is not true - the random reads are significantly > more expensive), and that processing each row is about as expensive as > reading the page from disk (again, reading data from disk is much more > expensive than processing them). > > So yes, the current values are much more likely to give good results. > > You've mentioned those values were recommended on this list - can you > point out the actual discussion? > > Thank you for your reply. http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values... Ogden
Dne 12.4.2011 20:28, Ogden napsal(a): > > On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote: > >> Dne 12.4.2011 19:23, Ogden napsal(a): >>> >>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote: >>> >>>> Ogden <lists@darkstatic.com> wrote: >>>> >>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3 >>>>> server at work and granted, there's more activity on the production server, but >>>>> the same queries take twice as long on the beefier server than my mac at home. >>>>> I have pasted what I have changed in postgresql.conf - I am wondering if >>>>> there's any way one can help me change things around to be more efficient. >>>>> >>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram >>>>> >>>>> Heavy write and read (for reporting and calculations) server. >>>>> >>>>> max_connections = 350 >>>>> shared_buffers = 4096MB >>>>> work_mem = 32MB >>>>> maintenance_work_mem = 512MB >>>> >>>> That's okay. >>>> >>>> >>>>> >>>>> >>>>> seq_page_cost = 0.02 # measured on an arbitrary scale >>>>> random_page_cost = 0.03 >>>> >>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost >>>> and random_page_cost) are completly wrong. >>>> >>> >>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I >>> find by raising them to: >>> >>> seq_page_cost = 1.0 >>> random_page_cost = 3.0 >>> cpu_tuple_cost = 0.3 >>> #cpu_index_tuple_cost = 0.005 # same scale as above - 0.005 >>> #cpu_operator_cost = 0.0025 # same scale as above >>> effective_cache_size = 8192MB >>> >>> That this is better, some queries run much faster. Is this better? >> >> I guess it is. What really matters with those cost variables is the >> relative scale - the original values >> >> seq_page_cost = 0.02 >> random_page_cost = 0.03 >> cpu_tuple_cost = 0.02 >> >> suggest that the random reads are almost as expensive as sequential >> reads (which usually is not true - the random reads are significantly >> more expensive), and that processing each row is about as expensive as >> reading the page from disk (again, reading data from disk is much more >> expensive than processing them). >> >> So yes, the current values are much more likely to give good results. >> >> You've mentioned those values were recommended on this list - can you >> point out the actual discussion? >> >> > > Thank you for your reply. > > http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values... > OK, what JD said there generally makes sense, although those values are a bit extreme - in most cases it's recommended to leave seq_page_cost=1 and decrease the random_page_cost (to 2, the dafault value is 4). That usually pushes the planner towards index scans. I'm not saying those small values (0.02 etc.) are bad, but I guess the effect is about the same and it changes the impact of the other cost variables (cpu_tuple_cost, etc.) I see there is 16GB of RAM but shared_buffers are just 4GB. So there's nothing else running and the rest of the RAM is used for pagecache? I've noticed the previous discussion mentions there are 8GB of RAM and the DB size is 7GB (so it might fit into memory). Is this still the case? regards Tomas
On Apr 12, 2011, at 4:09 PM, Tomas Vondra wrote: > Dne 12.4.2011 20:28, Ogden napsal(a): >> >> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote: >> >>> Dne 12.4.2011 19:23, Ogden napsal(a): >>>> >>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote: >>>> >>>>> Ogden <lists@darkstatic.com> wrote: >>>>> >>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3 >>>>>> server at work and granted, there's more activity on the production server, but >>>>>> the same queries take twice as long on the beefier server than my mac at home. >>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if >>>>>> there's any way one can help me change things around to be more efficient. >>>>>> >>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram >>>>>> >>>>>> Heavy write and read (for reporting and calculations) server. >>>>>> >>>>>> max_connections = 350 >>>>>> shared_buffers = 4096MB >>>>>> work_mem = 32MB >>>>>> maintenance_work_mem = 512MB >>>>> >>>>> That's okay. >>>>> >>>>> >>>>>> >>>>>> >>>>>> seq_page_cost = 0.02 # measured on an arbitrary scale >>>>>> random_page_cost = 0.03 >>>>> >>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost >>>>> and random_page_cost) are completly wrong. >>>>> >>>> >>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I >>>> find by raising them to: >>>> >>>> seq_page_cost = 1.0 >>>> random_page_cost = 3.0 >>>> cpu_tuple_cost = 0.3 >>>> #cpu_index_tuple_cost = 0.005 # same scale as above - 0.005 >>>> #cpu_operator_cost = 0.0025 # same scale as above >>>> effective_cache_size = 8192MB >>>> >>>> That this is better, some queries run much faster. Is this better? >>> >>> I guess it is. What really matters with those cost variables is the >>> relative scale - the original values >>> >>> seq_page_cost = 0.02 >>> random_page_cost = 0.03 >>> cpu_tuple_cost = 0.02 >>> >>> suggest that the random reads are almost as expensive as sequential >>> reads (which usually is not true - the random reads are significantly >>> more expensive), and that processing each row is about as expensive as >>> reading the page from disk (again, reading data from disk is much more >>> expensive than processing them). >>> >>> So yes, the current values are much more likely to give good results. >>> >>> You've mentioned those values were recommended on this list - can you >>> point out the actual discussion? >>> >>> >> >> Thank you for your reply. >> >> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values... >> > > OK, what JD said there generally makes sense, although those values are > a bit extreme - in most cases it's recommended to leave seq_page_cost=1 > and decrease the random_page_cost (to 2, the dafault value is 4). That > usually pushes the planner towards index scans. > > I'm not saying those small values (0.02 etc.) are bad, but I guess the > effect is about the same and it changes the impact of the other cost > variables (cpu_tuple_cost, etc.) > > I see there is 16GB of RAM but shared_buffers are just 4GB. So there's > nothing else running and the rest of the RAM is used for pagecache? I've > noticed the previous discussion mentions there are 8GB of RAM and the DB > size is 7GB (so it might fit into memory). Is this still the case? > > regards > Tomas Thomas, By decreasing random_page_cost to 2 (instead of 4), there is a slight performance decrease as opposed to leaving it justat 4. For example, if I set it 3 (or 4), a query may take 0.057 seconds. The same query takes 0.144s when I set random_page_costto 2. Should I keep it at 3 (or 4) as I have done now? Yes there is 16GB of RAM but the database is much bigger than that. Should I increase shared_buffers? Thank you so very much Ogden
Dne 12.4.2011 23:19, Ogden napsal(a): > > On Apr 12, 2011, at 4:09 PM, Tomas Vondra wrote: > >> Dne 12.4.2011 20:28, Ogden napsal(a): >>> >>> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote: >>> >>>> Dne 12.4.2011 19:23, Ogden napsal(a): >>>>> >>>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote: >>>>> >>>>>> Ogden <lists@darkstatic.com> wrote: >>>>>> >>>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3 >>>>>>> server at work and granted, there's more activity on the production server, but >>>>>>> the same queries take twice as long on the beefier server than my mac at home. >>>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if >>>>>>> there's any way one can help me change things around to be more efficient. >>>>>>> >>>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram >>>>>>> >>>>>>> Heavy write and read (for reporting and calculations) server. >>>>>>> >>>>>>> max_connections = 350 >>>>>>> shared_buffers = 4096MB >>>>>>> work_mem = 32MB >>>>>>> maintenance_work_mem = 512MB >>>>>> >>>>>> That's okay. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> seq_page_cost = 0.02 # measured on an arbitrary scale >>>>>>> random_page_cost = 0.03 >>>>>> >>>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost >>>>>> and random_page_cost) are completly wrong. >>>>>> >>>>> >>>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I >>>>> find by raising them to: >>>>> >>>>> seq_page_cost = 1.0 >>>>> random_page_cost = 3.0 >>>>> cpu_tuple_cost = 0.3 >>>>> #cpu_index_tuple_cost = 0.005 # same scale as above - 0.005 >>>>> #cpu_operator_cost = 0.0025 # same scale as above >>>>> effective_cache_size = 8192MB >>>>> >>>>> That this is better, some queries run much faster. Is this better? >>>> >>>> I guess it is. What really matters with those cost variables is the >>>> relative scale - the original values >>>> >>>> seq_page_cost = 0.02 >>>> random_page_cost = 0.03 >>>> cpu_tuple_cost = 0.02 >>>> >>>> suggest that the random reads are almost as expensive as sequential >>>> reads (which usually is not true - the random reads are significantly >>>> more expensive), and that processing each row is about as expensive as >>>> reading the page from disk (again, reading data from disk is much more >>>> expensive than processing them). >>>> >>>> So yes, the current values are much more likely to give good results. >>>> >>>> You've mentioned those values were recommended on this list - can you >>>> point out the actual discussion? >>>> >>>> >>> >>> Thank you for your reply. >>> >>> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values... >>> >> >> OK, what JD said there generally makes sense, although those values are >> a bit extreme - in most cases it's recommended to leave seq_page_cost=1 >> and decrease the random_page_cost (to 2, the dafault value is 4). That >> usually pushes the planner towards index scans. >> >> I'm not saying those small values (0.02 etc.) are bad, but I guess the >> effect is about the same and it changes the impact of the other cost >> variables (cpu_tuple_cost, etc.) >> >> I see there is 16GB of RAM but shared_buffers are just 4GB. So there's >> nothing else running and the rest of the RAM is used for pagecache? I've >> noticed the previous discussion mentions there are 8GB of RAM and the DB >> size is 7GB (so it might fit into memory). Is this still the case? >> >> regards >> Tomas > > > Thomas, > > By decreasing random_page_cost to 2 (instead of 4), there is a slight performance decrease as opposed to leaving it justat 4. For example, if I set it 3 (or 4), a query may take 0.057 seconds. The same query takes 0.144s when I set random_page_costto 2. Should I keep it at 3 (or 4) as I have done now? > > Yes there is 16GB of RAM but the database is much bigger than that. Should I increase shared_buffers? OK, that's a very important information and it kinda explains all the problems you had. When the planner decides what execution plan to use, it computes a 'virtual cost' for different plans and then chooses the cheapest one. Decreasing 'random_page_cost' decreases the expected cost of plans involving index scans, so that at a certain point it seems cheaper than a plan using sequential scans etc. You can see this when using EXPLAIN - do it with the original cost values, then change the values (for that session only) and do the EXPLAIN only. You'll see how the execution plan suddenly changes and starts to use index scans. The problem with random I/O is that it's usually much more expensive than sequential I/O as the drives need to seek etc. The only case when random I/O is just as cheap as sequential I/O is when all the data is cached in memory, because within RAM there's no difference between random and sequential access (right, that's why it's called Random Access Memory). So in the previous post setting both random_page_cost and seq_page_cost to the same value makes sense, because when the whole database fits into the memory, there's no difference and index scans are favorable. In this case (the database is much bigger than the available RAM) this no longer holds - index scans hit the drives, resulting in a lot of seeks etc. So it's a serious performance killer ... Not sure about increasing the shared_buffers - if the block is not found in shared buffers, it still might be found in pagecache (without need to do a physical read). There are ways to check if the current size of shared buffers is enough or not - I usually use pg_stat views (bgwriter and database). regards Tomas
On Apr 12, 2011, at 5:36 PM, Tomas Vondra wrote: > Dne 12.4.2011 23:19, Ogden napsal(a): >> >> On Apr 12, 2011, at 4:09 PM, Tomas Vondra wrote: >> >>> Dne 12.4.2011 20:28, Ogden napsal(a): >>>> >>>> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote: >>>> >>>>> Dne 12.4.2011 19:23, Ogden napsal(a): >>>>>> >>>>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote: >>>>>> >>>>>>> Ogden <lists@darkstatic.com> wrote: >>>>>>> >>>>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3 >>>>>>>> server at work and granted, there's more activity on the production server, but >>>>>>>> the same queries take twice as long on the beefier server than my mac at home. >>>>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if >>>>>>>> there's any way one can help me change things around to be more efficient. >>>>>>>> >>>>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram >>>>>>>> >>>>>>>> Heavy write and read (for reporting and calculations) server. >>>>>>>> >>>>>>>> max_connections = 350 >>>>>>>> shared_buffers = 4096MB >>>>>>>> work_mem = 32MB >>>>>>>> maintenance_work_mem = 512MB >>>>>>> >>>>>>> That's okay. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> seq_page_cost = 0.02 # measured on an arbitrary scale >>>>>>>> random_page_cost = 0.03 >>>>>>> >>>>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost >>>>>>> and random_page_cost) are completly wrong. >>>>>>> >>>>>> >>>>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I >>>>>> find by raising them to: >>>>>> >>>>>> seq_page_cost = 1.0 >>>>>> random_page_cost = 3.0 >>>>>> cpu_tuple_cost = 0.3 >>>>>> #cpu_index_tuple_cost = 0.005 # same scale as above - 0.005 >>>>>> #cpu_operator_cost = 0.0025 # same scale as above >>>>>> effective_cache_size = 8192MB >>>>>> >>>>>> That this is better, some queries run much faster. Is this better? >>>>> >>>>> I guess it is. What really matters with those cost variables is the >>>>> relative scale - the original values >>>>> >>>>> seq_page_cost = 0.02 >>>>> random_page_cost = 0.03 >>>>> cpu_tuple_cost = 0.02 >>>>> >>>>> suggest that the random reads are almost as expensive as sequential >>>>> reads (which usually is not true - the random reads are significantly >>>>> more expensive), and that processing each row is about as expensive as >>>>> reading the page from disk (again, reading data from disk is much more >>>>> expensive than processing them). >>>>> >>>>> So yes, the current values are much more likely to give good results. >>>>> >>>>> You've mentioned those values were recommended on this list - can you >>>>> point out the actual discussion? >>>>> >>>>> >>>> >>>> Thank you for your reply. >>>> >>>> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values... >>>> >>> >>> OK, what JD said there generally makes sense, although those values are >>> a bit extreme - in most cases it's recommended to leave seq_page_cost=1 >>> and decrease the random_page_cost (to 2, the dafault value is 4). That >>> usually pushes the planner towards index scans. >>> >>> I'm not saying those small values (0.02 etc.) are bad, but I guess the >>> effect is about the same and it changes the impact of the other cost >>> variables (cpu_tuple_cost, etc.) >>> >>> I see there is 16GB of RAM but shared_buffers are just 4GB. So there's >>> nothing else running and the rest of the RAM is used for pagecache? I've >>> noticed the previous discussion mentions there are 8GB of RAM and the DB >>> size is 7GB (so it might fit into memory). Is this still the case? >>> >>> regards >>> Tomas >> >> >> Thomas, >> >> By decreasing random_page_cost to 2 (instead of 4), there is a slight performance decrease as opposed to leaving it justat 4. For example, if I set it 3 (or 4), a query may take 0.057 seconds. The same query takes 0.144s when I set random_page_costto 2. Should I keep it at 3 (or 4) as I have done now? >> >> Yes there is 16GB of RAM but the database is much bigger than that. Should I increase shared_buffers? > > OK, that's a very important information and it kinda explains all the > problems you had. When the planner decides what execution plan to use, > it computes a 'virtual cost' for different plans and then chooses the > cheapest one. > > Decreasing 'random_page_cost' decreases the expected cost of plans > involving index scans, so that at a certain point it seems cheaper than > a plan using sequential scans etc. > > You can see this when using EXPLAIN - do it with the original cost > values, then change the values (for that session only) and do the > EXPLAIN only. You'll see how the execution plan suddenly changes and > starts to use index scans. > > The problem with random I/O is that it's usually much more expensive > than sequential I/O as the drives need to seek etc. The only case when > random I/O is just as cheap as sequential I/O is when all the data is > cached in memory, because within RAM there's no difference between > random and sequential access (right, that's why it's called Random > Access Memory). > > So in the previous post setting both random_page_cost and seq_page_cost > to the same value makes sense, because when the whole database fits into > the memory, there's no difference and index scans are favorable. > > In this case (the database is much bigger than the available RAM) this > no longer holds - index scans hit the drives, resulting in a lot of > seeks etc. So it's a serious performance killer ... > > Not sure about increasing the shared_buffers - if the block is not found > in shared buffers, it still might be found in pagecache (without need to > do a physical read). There are ways to check if the current size of > shared buffers is enough or not - I usually use pg_stat views (bgwriter > and database). Thomas, Thank you for your very detailed and well written description. In conclusion, I should keep my random_page_cost (3.0) toa value more than seq_page_cost (1.0)? Is this bad practice or will this suffice for my setup (where the database is muchbigger than the RAM in the system)? Or is this not what you are suggesting at all? Thank you Ogden
> Thomas, > > Thank you for your very detailed and well written description. In > conclusion, I should keep my random_page_cost (3.0) to a value more than > seq_page_cost (1.0)? Is this bad practice or will this suffice for my > setup (where the database is much bigger than the RAM in the system)? Or > is this not what you are suggesting at all? Yes, keep it that way. The fact that 'random_page_cost >= seq_page_cost' generally means that random reads are more expensive than sequential reads. The actual values are dependent but 4:1 is usually OK, unless your db fits into memory etc. The decrease of performance after descreasing random_page_cost to 3 due to changes of some execution plans (the index scan becomes slightly less expensive than seq scan), but in your case it's a false assumption. So keep it at 4 (you may even try to increase it, just to see if that improves the performance). regards Tomas
Ogden <lists@darkstatic.com> wrote: > In conclusion, I should keep my random_page_cost (3.0) to a value > more than seq_page_cost (1.0)? Is this bad practice or will this > suffice for my setup (where the database is much bigger than the > RAM in the system)? The idea is to adjust the costing factors to model the actual relative costs of the various actions in your environment with your workload. The best way to determine whether your settings are good is to gauge how happy the those using the database are with performance. :-) The degree of caching has a large effect on the page costs. We've managed to keep the active portion of our databases cached to a degree that we have always benefited by reducing the random_page_cost to 2 or less. Where the entire database is cached, we get the best plans with seq_page_cost and random_page_cost set to equal values in the 0.1 to 0.05 range. We've occasionally needed to bump the cpu_tuple_cost up a bit relative to other cpu costs, too. On the other hand, I've seen reports of people who have found it necessary to increase random_page_cost to get good plans. These have been people with large databases where the entire database is "active" (versus our databases where recent, active data is accessed much more heavily than, say, 20 year old data). If you model the costing to reflect the reality on your server, good plans will be chosen. -Kevin
On Wed, Apr 13, 2011 at 4:32 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > If you model the costing to reflect the reality on your server, good > plans will be chosen. Wouldn't it be "better" to derive those costs from actual performance data measured at runtime? Say, pg could measure random/seq page cost, *per tablespace* even. Has that been tried?
Claudio Freire <klaussfreire@gmail.com> writes: > On Wed, Apr 13, 2011 at 4:32 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: >> If you model the costing to reflect the reality on your server, good >> plans will be chosen. > Wouldn't it be "better" to derive those costs from actual performance > data measured at runtime? > Say, pg could measure random/seq page cost, *per tablespace* even. > Has that been tried? Getting numbers that mean much of anything is a slow, expensive process. You really don't want the database trying to do that for you. Once you've got them, you *really* don't want the database editorializing on them. regards, tom lane
On Wed, Apr 13, 2011 at 11:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Getting numbers that mean much of anything is a slow, expensive > process. You really don't want the database trying to do that for you. > Once you've got them, you *really* don't want the database > editorializing on them. > So it hasn't even been tried.
Claudio Freire <klaussfreire@gmail.com> wrote: > So it hasn't even been tried. If you want to do that, I would be interested in your benchmark numbers. Or if you're not up to that, there are a number of companies which I'd bet would be willing to spend the time if they had a sponsor to pay for their hours. So far nobody has felt it likely enough to be beneficial to want to put their time or money on the line for it. Here's your chance to be first. -Kevin
>> If you model the costing to reflect the reality on your server, good >> plans will be chosen. > > Wouldn't it be "better" to derive those costs from actual performance > data measured at runtime? > > Say, pg could measure random/seq page cost, *per tablespace* even. > > Has that been tried? FWIW, awhile ago I wrote a simple script to measure this and found that the *actual* random_page / seq_page cost ratio was much higher than 4/1. The problem is that caching effects have a large effect on the time it takes to access a random page, and caching effects are very workload dependent. So anything automated would probably need to optimize the parameter values over a set of 'typical' queries, which is exactly what a good DBA does when they set random_page_cost... Best, Nathan
Nathan Boley <npboley@gmail.com> wrote: > The problem is that caching effects have a large effect on the > time it takes to access a random page, and caching effects are > very workload dependent. So anything automated would probably need > to optimize the parameter values over a set of 'typical' queries, > which is exactly what a good DBA does when they set > random_page_cost... Another database product I've used has a stored procedure you can run to turn on monitoring of workload, another to turn it off and report on what happened during the interval. It drags performance enough that you don't want to leave it running except as a tuning exercise, but it does produce very detailed statistics and actually offers suggestions on what you might try tuning to improve performance. If someone wanted to write something to deal with this issue, that seems like a sound overall strategy. -Kevin
Dne 14.4.2011 00:05, Nathan Boley napsal(a): >>> If you model the costing to reflect the reality on your server, good >>> plans will be chosen. >> >> Wouldn't it be "better" to derive those costs from actual performance >> data measured at runtime? >> >> Say, pg could measure random/seq page cost, *per tablespace* even. >> >> Has that been tried? > > FWIW, awhile ago I wrote a simple script to measure this and found > that the *actual* random_page / seq_page cost ratio was much higher > than 4/1. > > The problem is that caching effects have a large effect on the time it > takes to access a random page, and caching effects are very workload > dependent. So anything automated would probably need to optimize the > parameter values over a set of 'typical' queries, which is exactly > what a good DBA does when they set random_page_cost... Plus there's a separate pagecache outside shared_buffers, which adds another layer of complexity. What I was thinking about was a kind of 'autotuning' using real workload. I mean - measure the time it takes to process a request (depends on the application - could be time to load a page, process an invoice, whatever ...) and compute some reasonable metric on it (average, median, variance, ...). Move the cost variables a bit (e.g. the random_page_cost) and see how that influences performance. If it improved, do another step in the same direction, otherwise do step in the other direction (or do no change the values at all). Yes, I've had some lectures on non-linear programming so I'm aware that this won't work if the cost function has multiple extremes (walleys / hills etc.) but I somehow suppose that's not the case of cost estimates. Another issue is that when measuring multiple values (processing of different requests), the decisions may be contradictory so it really can't be fully automatic. regards Tomas
On Thu, Apr 14, 2011 at 12:19 AM, Tomas Vondra <tv@fuzzy.cz> wrote: > > Another issue is that when measuring multiple values (processing of > different requests), the decisions may be contradictory so it really > can't be fully automatic. > I don't think it's soooo dependant on workload. It's dependant on access patterns (and working set sizes), and that all can be quantified, as opposed to "workload". I've been meaning to try this for a while yet, and it needs not be as expensive as one would imagine. It just needs a clever implementation that isn't too intrusive and that is customizable enough not to alienate DBAs. I'm not doing database stuff ATM (though I've been doing it for several years), and I don't expect to return to database tasks for a few months. But whenever I get back to it, sure, I'd be willing to invest time on it. What an automated system can do and a DBA cannot, and it's why this idea occurred to me in the first place, is tailor the metrics for variable contexts and situations. Like, I had a DB that was working perfectly fine most of the time, but some days it got "overworked" and sticking with fixed cost variables made no sense - in those situations, random page cost was insanely high because of the workload, but sequential scans would have ran much faster because of OS read-ahead and because of synchroscans. I'm talking of a decision support system that did lots of heavy duty queries, where sequential scans are an alternative. I reckon most OLTP systems are different. So, to make things short, adaptability to varying conditions is what I'd imagine this technique would provide, and a DBA cannot no matter how skilled. That and the advent of SSDs and really really different characteristics of different tablespaces only strengthen my intuition that automation might be better than parameterization.
Dne 14.4.2011 01:10, Claudio Freire napsal(a): > On Thu, Apr 14, 2011 at 12:19 AM, Tomas Vondra <tv@fuzzy.cz> wrote: >> >> Another issue is that when measuring multiple values (processing of >> different requests), the decisions may be contradictory so it really >> can't be fully automatic. >> > > I don't think it's soooo dependant on workload. It's dependant on > access patterns (and working set sizes), and that all can be > quantified, as opposed to "workload". Well, think about a database that's much bigger than the available RAM. Workload A: Touches just a very small portion of the database, to the 'active' part actually fits into the memory. In this case the cache hit ratio can easily be close to 99%. Workload B: Touches large portion of the database, so it hits the drive very often. In this case the cache hit ratio is usually around RAM/(size of the database). So yes, it may be very workload dependent. In the first case you may actually significantly lower the random_page_cost (even to seq_page_cost) and it's going to be quite fast (thanks to the cache). If you do the same thing with workload B, the database is going to burn. I'm not saying it's not possible to do some autotuning, but it's a bit tricky and it's not just about hardware. The workload *is* a very important part of the equation. But I have to admit this post probably sounds like an overengineering. If you can develop something simple (even if that does not consider workload at all), it might be a useful starting point. If I could help you in any way with this, let me know. regards Tomas
Nathan Boley <npboley@gmail.com> writes: > FWIW, awhile ago I wrote a simple script to measure this and found > that the *actual* random_page / seq_page cost ratio was much higher > than 4/1. That 4:1 ratio is based on some rather extensive experimentation that I did back in 2000. In the interim, disk transfer rates have improved quite a lot more than disk seek times have, and the CPU cost to process a page's worth of data has also improved compared to the seek time. So yeah, you'd likely get a higher number if you redid those experiments on modern hardware (at least assuming it was rotating media and not SSD). On the other hand, the effects of caching push the numbers in the other direction, and modern machines also have a lot more RAM to cache in than was typical ten years ago. I'm not sure how much point there is in trying to improve the default number in the abstract --- we'd really need to have a more robust model of cache effects before I'd trust any automatic tuning procedure to set the value for me. regards, tom lane
On 04/13/2011 05:03 PM, Tom Lane wrote: > That 4:1 ratio is based on some rather extensive experimentation that > I did back in 2000. In the interim, disk transfer rates have improved > quite a lot more than disk seek times have, and the CPU cost to process > a page's worth of data has also improved compared to the seek time. My experience is that at least a 1/1 is more appropriate. JD
On Wed, Apr 13, 2011 at 5:26 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > Workload A: Touches just a very small portion of the database, to the > 'active' part actually fits into the memory. In this case the cache hit > ratio can easily be close to 99%. > > Workload B: Touches large portion of the database, so it hits the drive > very often. In this case the cache hit ratio is usually around RAM/(size > of the database). I've had this kind of split-brain operation in the past, where 99% of all accesses would be cached, and the 1% that weren't needed their own tuning. Luckily you can tune by user (alter user set random_page_cost etc) so I was able to do that. One of the best features of pgsql imnsho.
On Thu, Apr 14, 2011 at 1:26 AM, Tomas Vondra <tv@fuzzy.cz> wrote: > Workload A: Touches just a very small portion of the database, to the > 'active' part actually fits into the memory. In this case the cache hit > ratio can easily be close to 99%. > > Workload B: Touches large portion of the database, so it hits the drive > very often. In this case the cache hit ratio is usually around RAM/(size > of the database). You've answered it yourself without even realized it. This particular factor is not about an abstract and opaque "Workload" the server can't know about. It's about cache hit rate, and the server can indeed measure that.
> On Thu, Apr 14, 2011 at 1:26 AM, Tomas Vondra <tv@fuzzy.cz> wrote: >> Workload A: Touches just a very small portion of the database, to the >> 'active' part actually fits into the memory. In this case the cache hit >> ratio can easily be close to 99%. >> >> Workload B: Touches large portion of the database, so it hits the drive >> very often. In this case the cache hit ratio is usually around RAM/(size >> of the database). > > You've answered it yourself without even realized it. > > This particular factor is not about an abstract and opaque "Workload" > the server can't know about. It's about cache hit rate, and the server > can indeed measure that. OK, so it's not a matter of tuning random_page_cost/seq_page_cost? Because tuning based on cache hit ratio is something completely different (IMHO). Anyway I'm not an expert in this field, but AFAIK something like this already happens - btw that's the purpose of effective_cache_size. But I'm afraid there might be serious fail cases where the current model works better, e.g. what if you ask for data that's completely uncached (was inactive for a long time). But if you have an idea on how to improve this, great - start a discussion in the hackers list and let's see. regards Tomas
2011/4/14 Tom Lane <tgl@sss.pgh.pa.us>: > Nathan Boley <npboley@gmail.com> writes: >> FWIW, awhile ago I wrote a simple script to measure this and found >> that the *actual* random_page / seq_page cost ratio was much higher >> than 4/1. > > That 4:1 ratio is based on some rather extensive experimentation that > I did back in 2000. In the interim, disk transfer rates have improved > quite a lot more than disk seek times have, and the CPU cost to process > a page's worth of data has also improved compared to the seek time. > So yeah, you'd likely get a higher number if you redid those experiments > on modern hardware (at least assuming it was rotating media and not SSD). > On the other hand, the effects of caching push the numbers in the other > direction, and modern machines also have a lot more RAM to cache in than > was typical ten years ago. I'm not sure how much point there is in > trying to improve the default number in the abstract --- we'd really > need to have a more robust model of cache effects before I'd trust any > automatic tuning procedure to set the value for me. Well, at spare time, I am doing some POC with "ANALYZE OSCACHE relation;", pg stats are updated accordingly with new data ( it is not finish yet) : at least the percentage in OS cache, maybe the number of groups in cache and/or the distribution. Anyway the idea is to allow the planner to use random and seq page cost to be applyed on the part not-in-cache, without replacing the algo using effective_cache_size. The planner may have one other GUC like 'mem_page_cost' to set a cost on access from cache and use it while estinating the cost... Side effect is that random page cost and seq page cost should be more stable and easiest to set based on a script because they won't have the mixed sources of disk/memory, only the disk acces cost. (if ANALYZE OSCACHE is good enough) -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On Apr 14, 2011, at 2:49 AM, Claudio Freire <klaussfreire@gmail.com> wrote: > This particular factor is not about an abstract and opaque "Workload" > the server can't know about. It's about cache hit rate, and the server > can indeed measure that. The server can and does measure hit rates for the PG buffer pool, but to my knowledge there is no clear-cut way for PG toknow whether read() is satisfied from the OS cache or a drive cache or the platter. ...Robert
On Apr 13, 2011, at 6:19 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > Yes, I've had some lectures on non-linear programming so I'm aware that > this won't work if the cost function has multiple extremes (walleys / > hills etc.) but I somehow suppose that's not the case of cost estimates. I think that supposition might turn out to be incorrect, though. Probably what will happen on simple queries is that a smallchange will make no difference, and a large enough change will cause a plan change. On complex queries it will approachcontinuous variation but why shouldn't there be local minima? ...Robert
On Tue, Apr 26, 2011 at 7:30 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Apr 14, 2011, at 2:49 AM, Claudio Freire <klaussfreire@gmail.com> wrote: >> This particular factor is not about an abstract and opaque "Workload" >> the server can't know about. It's about cache hit rate, and the server >> can indeed measure that. > > The server can and does measure hit rates for the PG buffer pool, but to my knowledge there is no clear-cut way for PGto know whether read() is satisfied from the OS cache or a drive cache or the platter. Isn't latency an indicator? If you plot latencies, you should see three markedly obvious clusters: OS cache (microseconds), Drive cache (slightly slower), platter (tail). I think I had seen a study of sorts somewhere[0]... Ok, that link is about sequential/random access, but I distinctively remember one about caches and CAV... [0] http://blogs.sun.com/brendan/entry/heat_map_analytics
Dne 26.4.2011 07:35, Robert Haas napsal(a): > On Apr 13, 2011, at 6:19 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >> Yes, I've had some lectures on non-linear programming so I'm aware that >> this won't work if the cost function has multiple extremes (walleys / >> hills etc.) but I somehow suppose that's not the case of cost estimates. > > I think that supposition might turn out to be incorrect, though. Probably > what will happen on simple queries is that a small change will make no > difference, and a large enough change will cause a plan change. On > complex queries it will approach continuous variation but why > shouldn't there be local minima? Aaaah, damn! I was not talking about cost estimates - those obviously do not have this feature, as you've pointed out (thanks!). I was talking about the 'response time' I mentioned when describing the autotuning using real workload. The idea is to change the costs a bit and then measure the average response time - if the overall performance improved, do another step in the same direction. Etc. I wonder if there are cases where an increase of random_page_cost would hurt performance, and another increase would improve it ... And I'm not talking about individual queries, I'm talking about overall performance. regards Tomas
tv@fuzzy.cz wrote: > Anyway I'm not an expert in this field, but AFAIK something like this > already happens - btw that's the purpose of effective_cache_size. effective_cache_size probably doesn't do as much as you suspect. It is used for one of the computations for whether an index is small enough that it can likely be read into memory efficiently. It has no impact on caching decisions outside of that. As for the ideas bouncing around here for tinkering with random_page_size more automatically, I have a notebook with about a dozen different ways to do that I've come up with over the last few years. The reason no work can be done in this area is because there are no standardized benchmarks of query execution in PostgreSQL being run regularly right now. Bringing up ideas for changing the computation is easy; proving that such a change is positive on enough workloads to be worth considering is the hard part. There is no useful discussion to be made on the hackers list that doesn't start with "here's the mix the benchmarks I intend to test this new model against". Performance regression testing for the the query optimizer is a giant pile of boring work we get minimal volunteers interested in. Nobody gets to do the fun model change work without doing that first though. For this type of change, you're guaranteed to just be smacking around parameters to optimize for only a single case without some broader benchmarking context. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
Greg Smith <greg@2ndquadrant.com> wrote: > The reason no work can be done in this area is because there are > no standardized benchmarks of query execution in PostgreSQL being > run regularly right now. Bringing up ideas for changing the > computation is easy; proving that such a change is positive on > enough workloads to be worth considering is the hard part. There > is no useful discussion to be made on the hackers list that > doesn't start with "here's the mix the benchmarks I intend to test > this new model against". This is looming as an ever-more-acute need for the project, in several areas. -Kevin
On Tue, Apr 26, 2011 at 8:54 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > I wonder if there are cases where an increase of random_page_cost would > hurt performance, and another increase would improve it ... And I'm not > talking about individual queries, I'm talking about overall performance. I don't think there are many. But I don't think you can assume that there are none. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 26, 2011 at 9:49 AM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Tue, Apr 26, 2011 at 7:30 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Apr 14, 2011, at 2:49 AM, Claudio Freire <klaussfreire@gmail.com> wrote: >>> This particular factor is not about an abstract and opaque "Workload" >>> the server can't know about. It's about cache hit rate, and the server >>> can indeed measure that. >> >> The server can and does measure hit rates for the PG buffer pool, but to my knowledge there is no clear-cut way for PGto know whether read() is satisfied from the OS cache or a drive cache or the platter. > > Isn't latency an indicator? > > If you plot latencies, you should see three markedly obvious clusters: > OS cache (microseconds), Drive cache (slightly slower), platter > (tail). What if the user is using an SSD or ramdisk? Admittedly, in many cases, we could probably get somewhat useful numbers this way. But I think it would be pretty expensive. gettimeofday() is one of the reasons why running EXPLAIN ANALYZE on a query is significantly slower than just running it normally. I bet if we put such calls around every read() and write(), it would cause a BIG slowdown for workloads that don't fit in shared_buffers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Dne 27.4.2011 20:56, Kevin Grittner napsal(a): > Greg Smith <greg@2ndquadrant.com> wrote: > >> The reason no work can be done in this area is because there are >> no standardized benchmarks of query execution in PostgreSQL being >> run regularly right now. Bringing up ideas for changing the >> computation is easy; proving that such a change is positive on >> enough workloads to be worth considering is the hard part. There >> is no useful discussion to be made on the hackers list that >> doesn't start with "here's the mix the benchmarks I intend to test >> this new model against". > > This is looming as an ever-more-acute need for the project, in > several areas. Hmmm, just wondering - what would be needed to build such 'workload library'? Building it from scratch is not feasible IMHO, but I guess people could provide their own scripts (as simple as 'set up a a bunch of tables, fill it with data, run some queries') and there's a pile of such examples in the pgsql-performance list. regards Tomas
On Wed, Apr 27, 2011 at 10:27 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > What if the user is using an SSD or ramdisk? > > Admittedly, in many cases, we could probably get somewhat useful > numbers this way. But I think it would be pretty expensive. > gettimeofday() is one of the reasons why running EXPLAIN ANALYZE on a > query is significantly slower than just running it normally. I bet if > we put such calls around every read() and write(), it would cause a > BIG slowdown for workloads that don't fit in shared_buffers. I've just been reading an article about something intimately related with that in ACM. The article was about cache-conscious scheduling. Mostly memory cache, but disk cache isn't that different. There are lots of work, real, serious work in characterizing cache contention, and the article showed how a simplified version of the cache reuse profile model behaves under various workloads. The simplified model simply used cache miss rates, and it performed even better than the more complex model - they went on and analyzed why. Long story short, there is indeed a lot of literature about the subject, there is a lot of formal and experimental results. One of those models have to be embodied into a patch, and tested - that's about it. The patch may be simple, the testing not so much. I know that. What tools do we have to do that testing? There are lots, and all imply a lot of work. Is that work worth the trouble? Because if it is... why not work? I would propose a step in the right direction: a patch to compute and log periodical estimations of the main I/O tunables: random_page_cost, sequential_page_cost and effective_cache_size. Maybe per-tablespace. Evaluate the performance impact, and work from there. Because, probably just using those values as input to the optimizer won't work, because dbas will want a way to tune the optimizer, because the system may not be stable enough, even because even with accurate estimates for those values, the optimizer may not perform as expected. I mean, right now those values are tunables, not real metrics, so perhaps the optimizer won't respond well to real values. But having the ability to measure them without a serious performance impact is a step in the right direction, right?
Tomas Vondra wrote: > Hmmm, just wondering - what would be needed to build such 'workload > library'? Building it from scratch is not feasible IMHO, but I guess > people could provide their own scripts (as simple as 'set up a a bunch > of tables, fill it with data, run some queries') and there's a pile of > such examples in the pgsql-performance list. > The easiest place to start is by re-using the work already done by the TPC for benchmarking commercial databases. There are ports of the TPC workloads to PostgreSQL available in the DBT-2, DBT-3, and DBT-5 tests; see http://wiki.postgresql.org/wiki/Category:Benchmarking for initial information on those (the page on TPC-H is quite relevant too). I'd like to see all three of those DBT tests running regularly, as well as two tests it's possible to simulate with pgbench or sysbench: an in-cache read-only test, and a write as fast as possible test. The main problem with re-using posts from this list for workload testing is getting an appropriately sized data set for them that stays relevant. The nature of this sort of benchmark always includes some notion of the size of the database, and you get different results based on how large things are relative to RAM and the database parameters. That said, some sort of systematic collection of "hard queries" would also be a very useful project for someone to take on. People show up regularly who want to play with the optimizer in some way. It's still possible to do that by targeting specific queries you want to accelerate, where it's obvious (or, more likely, hard but still straightforward) how to do better. But I don't think any of these proposed exercises adjusting the caching model or default optimizer parameters in the database is going anywhere without some sort of benchmarking framework for evaluating the results. And the TPC tests are a reasonable place to start. They're a good mixed set of queries, and improving results on those does turn into a real commercial benefit to PostgreSQL in the future too. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
Just want to share the DBT(2&5) thing
http://archives.postgresql.org/pgsql-performance/2011-04/msg00145.php
http://sourceforge.net/mailarchive/forum.php?forum_name=osdldbt-general&max_rows=25&style=nested&viewmonth=201104
http://archives.postgresql.org/pgsql-performance/2011-04/msg00145.php
http://sourceforge.net/mailarchive/forum.php?forum_name=osdldbt-general&max_rows=25&style=nested&viewmonth=201104
On Wed, Apr 27, 2011 at 11:55 PM, Greg Smith <greg@2ndquadrant.com> wrote:
Tomas Vondra wrote:The easiest place to start is by re-using the work already done by the TPC for benchmarking commercial databases. There are ports of the TPC workloads to PostgreSQL available in the DBT-2, DBT-3, and DBT-5 tests; see http://wiki.postgresql.org/wiki/Category:Benchmarking for initial information on those (the page on TPC-H is quite relevant too). I'd like to see all three of those DBT tests running regularly, as well as two tests it's possible to simulate with pgbench or sysbench: an in-cache read-only test, and a write as fast as possible test.Hmmm, just wondering - what would be needed to build such 'workload
library'? Building it from scratch is not feasible IMHO, but I guess
people could provide their own scripts (as simple as 'set up a a bunch
of tables, fill it with data, run some queries') and there's a pile of
such examples in the pgsql-performance list.
The main problem with re-using posts from this list for workload testing is getting an appropriately sized data set for them that stays relevant. The nature of this sort of benchmark always includes some notion of the size of the database, and you get different results based on how large things are relative to RAM and the database parameters. That said, some sort of systematic collection of "hard queries" would also be a very useful project for someone to take on.
People show up regularly who want to play with the optimizer in some way. It's still possible to do that by targeting specific queries you want to accelerate, where it's obvious (or, more likely, hard but still straightforward) how to do better. But I don't think any of these proposed exercises adjusting the caching model or default optimizer parameters in the database is going anywhere without some sort of benchmarking framework for evaluating the results. And the TPC tests are a reasonable place to start. They're a good mixed set of queries, and improving results on those does turn into a real commercial benefit to PostgreSQL in the future too.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
--Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Dne 27.4.2011 23:55, Greg Smith napsal(a): > The easiest place to start is by re-using the work already done by the > TPC for benchmarking commercial databases. There are ports of the TPC > workloads to PostgreSQL available in the DBT-2, DBT-3, and DBT-5 tests; > see http://wiki.postgresql.org/wiki/Category:Benchmarking for initial > information on those (the page on TPC-H is quite relevant too). I'd > like to see all three of those DBT tests running regularly, as well as > two tests it's possible to simulate with pgbench or sysbench: an > in-cache read-only test, and a write as fast as possible test. That's a natural first step, I guess. > The main problem with re-using posts from this list for workload testing > is getting an appropriately sized data set for them that stays > relevant. The nature of this sort of benchmark always includes some > notion of the size of the database, and you get different results based > on how large things are relative to RAM and the database parameters. > That said, some sort of systematic collection of "hard queries" would > also be a very useful project for someone to take on. Yes, I'm aware of that. The examples posted to the lists usually lack the data, but I guess we could get it at least from some of the posters (anonymized etc.). And some of the examples are rather simple so it's possible to generate as much data as you want using a PL/pgSQL or so. Anyway I hesitate to call those examples 'workloads' - it's usually just one query, sometimes two. But it's still a useful test IMHO. I was thinking about several VMs, each with a different configuration (amount of RAM, CPU, ...). The benchmarks might be a bunch of very simple scripts I guess, each one taking care of preparing the data, running the test, uploading the results somewhere. And I guess it'd be useful to make this awailable for download, so that everyone can run the tests locally ... A bit naive question - where to run this? I know there's a build farm but I guess this it's mostly for building and not for such benchmarks. > People show up regularly who want to play with the optimizer in some > way. It's still possible to do that by targeting specific queries you > want to accelerate, where it's obvious (or, more likely, hard but still > straightforward) how to do better. But I don't think any of these > proposed exercises adjusting the caching model or default optimizer > parameters in the database is going anywhere without some sort of > benchmarking framework for evaluating the results. And the TPC tests > are a reasonable place to start. They're a good mixed set of queries, > and improving results on those does turn into a real commercial benefit > to PostgreSQL in the future too. 100% true. regards Tomas
All, > The easiest place to start is by re-using the work already done by the > TPC for benchmarking commercial databases. There are ports of the TPC > workloads to PostgreSQL available in the DBT-2, DBT-3, and DBT-5 > tests; Also EAStress, which I think the project still has a license for. The drawback to these is that they're quite difficult and time-consuming to run, making them unsuitable for doing, say, incrementaltuning tests which need to run 100 iterations. At least, now that we don't have access to the OSDL or Sun labsanymore. On the other hand, Greg has made the first steps in a benchmark constructor kit by making it possible for pgBench to runarbitrary workloads. Someone could build on Greg's foundation by: a) building a more complex database model with random data generators, and b) designing a wide series of queries designed to test specific performance problems, i.e, "large object reads", "complexnested subqueries", "mass bulk correllated updates" c) finally creating scripts which generate benchmarks by choosing a database size and a "mix" of the query menu This would give us kit which would be capable of testing performance regressions and improvements for PostgreSQL. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com San Francisco
Robert Haas wrote: > The server can and does measure hit rates for the PG buffer pool, but to my knowledge there is no clear-cut way for PGto know whether read() is satisfied from the OS cache or a drive cache or the platter. > > Does the server know which IO it thinks is sequential, and which it thinks is random? Could it not time the IOs (perhaps optionally) and at least keep some sort of statistics of the actual observed times? It might not be appropriate for the server to attempt auto-tuning, but it might be able to provide some information that can be used by a DBA to make informed decisions. James
James Mansion wrote: > Does the server know which IO it thinks is sequential, and which it > thinks is random? Could it not time the IOs (perhaps optionally) and > at least keep some sort of statistics of the actual observed times? It makes some assumptions based on what the individual query nodes are doing. Sequential scans are obviously sequential; index lookupss random; bitmap index scans random. The "measure the I/O and determine cache state from latency profile" has been tried, I believe it was Greg Stark who ran a good experiment of that a few years ago. Based on the difficulties of figuring out what you're actually going to with that data, I don't think the idea will ever go anywhere. There are some really nasty feedback loops possible in all these approaches for better modeling what's in cache, and this one suffers the worst from that possibility. If for example you discover that accessing index blocks is slow, you might avoid using them in favor of a measured fast sequential scan. Once you've fallen into that local minimum, you're stuck there. Since you never access the index blocks, they'll never get into RAM so that accessing them becomes fast--even though doing that once might be much more efficient, long-term, than avoiding the index. There are also some severe query plan stability issues with this idea beyond this. The idea that your plan might vary based on execution latency, that the system load going up can make query plans alter with it, is terrifying for a production server. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On 4/29/2011 1:55 PM, Greg Smith wrote: > James Mansion wrote: >> Does the server know which IO it thinks is sequential, and which it >> thinks is random? Could it not time the IOs (perhaps optionally) and >> at least keep some sort of statistics of the actual observed times? > > It makes some assumptions based on what the individual query nodes are > doing. Sequential scans are obviously sequential; index lookupss random; > bitmap index scans random. > > The "measure the I/O and determine cache state from latency profile" has > been tried, I believe it was Greg Stark who ran a good experiment of > that a few years ago. Based on the difficulties of figuring out what > you're actually going to with that data, I don't think the idea will > ever go anywhere. There are some really nasty feedback loops possible in > all these approaches for better modeling what's in cache, and this one > suffers the worst from that possibility. If for example you discover > that accessing index blocks is slow, you might avoid using them in favor > of a measured fast sequential scan. Once you've fallen into that local > minimum, you're stuck there. Since you never access the index blocks, > they'll never get into RAM so that accessing them becomes fast--even > though doing that once might be much more efficient, long-term, than > avoiding the index. > > There are also some severe query plan stability issues with this idea > beyond this. The idea that your plan might vary based on execution > latency, that the system load going up can make query plans alter with > it, is terrifying for a production server. > How about if the stats were kept, but had no affect on plans, or optimizer or anything else. It would be a diag tool. When someone wrote the list saying "AH! It used the wrong index!". You could say, "please post your config settings, and the stats from 'select * from pg_stats_something'" We (or, you really) could compare the seq_page_cost and random_page_cost from the config to the stats collected by PG and determine they are way off... and you should edit your config a little and restart PG. -Andy
Greg Smith wrote: > There are also some severe query plan stability issues with this idea > beyond this. The idea that your plan might vary based on execution > latency, that the system load going up can make query plans alter with > it, is terrifying for a production server. > I thought I was clear that it should present some stats to the DBA, not that it would try to auto-tune? This thread started with a discussion of appropriate tunings for random page cost vs sequential page cost I believe,, based on some finger in the air based on total size vs available disk cache. And it was observed that on systems that have very large databases but modest hot data, you can perform like a fully cached system, for much of the time. I'm just suggesting providing statistical information to the DBA which will indicate whether the system has 'recently' been behaving like a system that runs from buffer cache and/or subsystem caches, or one that runs from disk platters, and what the actual observed latency difference is. It may well be that this varies with time of day or day of week. Whether the actual latencies translate directly into the relative costs is another matter.
James Mansion wrote: > I thought I was clear that it should present some stats to the DBA, > not that it would try to auto-tune? You were. But people are bound to make decisions about how to retune their database based on that information. The situation when doing manual tuning isn't that much different, it just occurs more slowly, and with the potential to not react at all if the data is incoherent. That might be better, but you have to assume that a naive person will just follow suggestions on how to re-tune based on that the same way an auto-tune process would. I don't like this whole approach because it takes something the database and DBA have no control over (read timing) and makes it a primary input to the tuning model. Plus, the overhead of collecting this data is big relative to its potential value. Anyway, how to collect this data is a separate problem from what should be done with it in the optimizer. I don't actually care about the collection part very much; there are a bunch of approaches with various trade-offs. Deciding how to tell the optimizer about what's cached already is the more important problem that needs to be solved before any of this takes you somewhere useful, and focusing on the collection part doesn't move that forward. Trying to map the real world into the currently exposed parameter set isn't a solvable problem. We really need cached_page_cost and random_page_cost, plus a way to model the cached state per relation that doesn't fall easily into feedback loops. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Fri, Apr 29, 2011 at 11:37 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Anyway, how to collect this data is a separate problem from what should be > done with it in the optimizer. I don't actually care about the collection > part very much; there are a bunch of approaches with various trade-offs. > Deciding how to tell the optimizer about what's cached already is the more > important problem that needs to be solved before any of this takes you > somewhere useful, and focusing on the collection part doesn't move that > forward. Trying to map the real world into the currently exposed parameter > set isn't a solvable problem. We really need cached_page_cost and > random_page_cost, plus a way to model the cached state per relation that > doesn't fall easily into feedback loops. This is valuable input... I was already worried about feedback loops, and hearing that it has been tried and resulted in them is invaluable. From my experience, what really blows up in your face when your servers are saturated, is the effective cache size. Postgres thinks an index will fit into the cache, but it doesn't at times of high load, meaning that, actually, a sequential scan would be orders of magnitude better - if it's a "small enough table". Perhaps just adjusting effective cache size would provide a good enough benefit without the disastrous feedback loops? I'll have to test that idea...
On Apr 29, 2011, at 10:25 AM, James Mansion <james@mansionfamily.plus.com> wrote: > Robert Haas wrote: >> The server can and does measure hit rates for the PG buffer pool, but to my knowledge there is no clear-cut way for PGto know whether read() is satisfied from the OS cache or a drive cache or the platter. >> >> > Does the server know which IO it thinks is sequential, and which it thinks is random? No. It models this in the optimizer, but the executor has no clue. And sometimes we model I/O as partly random, partly sequential,as in the case of heap fetches on a clustered index. So the answer isn't even a Boolean. ...Robert
On Apr 27, 2011, at 11:01 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > The patch may be simple, the testing not so much. I know that. > > What tools do we have to do that testing? There are lots, and all > imply a lot of work. Is that work worth the trouble? Because if it > is... why not work? > > I would propose a step in the right direction: a patch to compute and > log periodical estimations of the main I/O tunables: random_page_cost, > sequential_page_cost and effective_cache_size. Maybe per-tablespace. > Evaluate the performance impact, and work from there. > > Because, probably just using those values as input to the optimizer > won't work, because dbas will want a way to tune the optimizer, > because the system may not be stable enough, even because even with > accurate estimates for those values, the optimizer may not perform as > expected. I mean, right now those values are tunables, not real > metrics, so perhaps the optimizer won't respond well to real values. > > But having the ability to measure them without a serious performance > impact is a step in the right direction, right? Sure. It's not a real easy problem, but don't let that discourage you from working on it. Getting more eyeballs on theseissues can only be a good thing. ...Robert