Home > mailing lists

Thread: Performance

Performance

From

Ogden

Date:

12 April 2011, 13:36:30

I have been wrestling with the configuration of the dedicated Postges 9.0.3 server at work and granted, there's more activity on the production server, but the same queries take twice as long on the beefier server than my mac at home. I have pasted what I have changed in postgresql.conf - I am wondering if there's any way one can help me change things around to be more efficient.

Dedicated PostgreSQL 9.0.3 Server with 16GB Ram

Heavy write and read (for reporting and calculations) server.

max_connections = 350

shared_buffers = 4096MB

work_mem = 32MB

maintenance_work_mem = 512MB

seq_page_cost = 0.02 # measured on an arbitrary scale

random_page_cost = 0.03

cpu_tuple_cost = 0.02

effective_cache_size = 8192MB

The planner costs seem a bit low but this was from suggestions from this very list a while ago.

Thank you

Ogden

Re: Performance

From

Andreas Kretschmer

Date:

12 April 2011, 14:19:19

Ogden <lists@darkstatic.com> wrote:

> I have been wrestling with the configuration of the dedicated Postges 9.0.3
> server at work and granted, there's more activity on the production server, but
> the same queries take twice as long on the beefier server than my mac at home.
> I have pasted what I have changed in postgresql.conf - I am wondering if
> there's any way one can help me change things around to be more efficient.
>
> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>
> Heavy write and read (for reporting and calculations) server.
>
> max_connections = 350
> shared_buffers = 4096MB
> work_mem = 32MB
> maintenance_work_mem = 512MB

That's okay.


>
>
> seq_page_cost = 0.02                    # measured on an arbitrary scale
> random_page_cost = 0.03

Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
and random_page_cost) are completly wrong.



> cpu_tuple_cost = 0.02
> effective_cache_size = 8192MB
>
>
>
> The planner costs seem a bit low but this was from suggestions from this very
> list a while ago.

Sure? Can you tell us a link into the archive?


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: Performance

From

Ogden

Date:

12 April 2011, 14:23:25

On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:

> Ogden <lists@darkstatic.com> wrote:
>
>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>> server at work and granted, there's more activity on the production server, but
>> the same queries take twice as long on the beefier server than my mac at home.
>> I have pasted what I have changed in postgresql.conf - I am wondering if
>> there's any way one can help me change things around to be more efficient.
>>
>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>
>> Heavy write and read (for reporting and calculations) server.
>>
>> max_connections = 350
>> shared_buffers = 4096MB
>> work_mem = 32MB
>> maintenance_work_mem = 512MB
>
> That's okay.
>
>
>>
>>
>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>> random_page_cost = 0.03
>
> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
> and random_page_cost) are completly wrong.
>

No, I don't have super fast disks. Just the 15K SCSI over RAID. I find by raising them to:

seq_page_cost = 1.0
random_page_cost = 3.0
cpu_tuple_cost = 0.3
#cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
#cpu_operator_cost = 0.0025             # same scale as above
effective_cache_size = 8192MB

That this is better, some queries run much faster. Is this better?

I will find the archive and post.

Thank you

Ogden

Re: Performance

From

Tomas Vondra

Date:

12 April 2011, 15:25:07

Dne 12.4.2011 19:23, Ogden napsal(a):
>
> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>
>> Ogden <lists@darkstatic.com> wrote:
>>
>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>> server at work and granted, there's more activity on the production server, but
>>> the same queries take twice as long on the beefier server than my mac at home.
>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>> there's any way one can help me change things around to be more efficient.
>>>
>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>
>>> Heavy write and read (for reporting and calculations) server.
>>>
>>> max_connections = 350
>>> shared_buffers = 4096MB
>>> work_mem = 32MB
>>> maintenance_work_mem = 512MB
>>
>> That's okay.
>>
>>
>>>
>>>
>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>> random_page_cost = 0.03
>>
>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>> and random_page_cost) are completly wrong.
>>
>
> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
> find by raising them to:
>
> seq_page_cost = 1.0
> random_page_cost = 3.0
> cpu_tuple_cost = 0.3
> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
> #cpu_operator_cost = 0.0025             # same scale as above
> effective_cache_size = 8192MB
>
> That this is better, some queries run much faster. Is this better?

I guess it is. What really matters with those cost variables is the
relative scale - the original values

seq_page_cost = 0.02
random_page_cost = 0.03
cpu_tuple_cost = 0.02

suggest that the random reads are almost as expensive as sequential
reads (which usually is not true - the random reads are significantly
more expensive), and that processing each row is about as expensive as
reading the page from disk (again, reading data from disk is much more
expensive than processing them).

So yes, the current values are much more likely to give good results.

You've mentioned those values were recommended on this list - can you
point out the actual discussion?

regards
Tomas

Re: Performance

From

Ogden

Date:

12 April 2011, 15:34:27

On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote:

> Dne 12.4.2011 19:23, Ogden napsal(a):
>>
>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>>
>>> Ogden <lists@darkstatic.com> wrote:
>>>
>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>>> server at work and granted, there's more activity on the production server, but
>>>> the same queries take twice as long on the beefier server than my mac at home.
>>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>>> there's any way one can help me change things around to be more efficient.
>>>>
>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>>
>>>> Heavy write and read (for reporting and calculations) server.
>>>>
>>>> max_connections = 350
>>>> shared_buffers = 4096MB
>>>> work_mem = 32MB
>>>> maintenance_work_mem = 512MB
>>>
>>> That's okay.
>>>
>>>
>>>>
>>>>
>>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>>> random_page_cost = 0.03
>>>
>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>>> and random_page_cost) are completly wrong.
>>>
>>
>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
>> find by raising them to:
>>
>> seq_page_cost = 1.0
>> random_page_cost = 3.0
>> cpu_tuple_cost = 0.3
>> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
>> #cpu_operator_cost = 0.0025             # same scale as above
>> effective_cache_size = 8192MB
>>
>> That this is better, some queries run much faster. Is this better?
>
> I guess it is. What really matters with those cost variables is the
> relative scale - the original values
>
> seq_page_cost = 0.02
> random_page_cost = 0.03
> cpu_tuple_cost = 0.02
>
> suggest that the random reads are almost as expensive as sequential
> reads (which usually is not true - the random reads are significantly
> more expensive), and that processing each row is about as expensive as
> reading the page from disk (again, reading data from disk is much more
> expensive than processing them).
>
> So yes, the current values are much more likely to give good results.
>
> You've mentioned those values were recommended on this list - can you
> point out the actual discussion?
>
>

Thank you for your reply.

http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values...

Ogden

Re: Performance

From

Tomas Vondra

Date:

12 April 2011, 18:10:06

Dne 12.4.2011 20:28, Ogden napsal(a):
>
> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote:
>
>> Dne 12.4.2011 19:23, Ogden napsal(a):
>>>
>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>>>
>>>> Ogden <lists@darkstatic.com> wrote:
>>>>
>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>>>> server at work and granted, there's more activity on the production server, but
>>>>> the same queries take twice as long on the beefier server than my mac at home.
>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>>>> there's any way one can help me change things around to be more efficient.
>>>>>
>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>>>
>>>>> Heavy write and read (for reporting and calculations) server.
>>>>>
>>>>> max_connections = 350
>>>>> shared_buffers = 4096MB
>>>>> work_mem = 32MB
>>>>> maintenance_work_mem = 512MB
>>>>
>>>> That's okay.
>>>>
>>>>
>>>>>
>>>>>
>>>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>>>> random_page_cost = 0.03
>>>>
>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>>>> and random_page_cost) are completly wrong.
>>>>
>>>
>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
>>> find by raising them to:
>>>
>>> seq_page_cost = 1.0
>>> random_page_cost = 3.0
>>> cpu_tuple_cost = 0.3
>>> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
>>> #cpu_operator_cost = 0.0025             # same scale as above
>>> effective_cache_size = 8192MB
>>>
>>> That this is better, some queries run much faster. Is this better?
>>
>> I guess it is. What really matters with those cost variables is the
>> relative scale - the original values
>>
>> seq_page_cost = 0.02
>> random_page_cost = 0.03
>> cpu_tuple_cost = 0.02
>>
>> suggest that the random reads are almost as expensive as sequential
>> reads (which usually is not true - the random reads are significantly
>> more expensive), and that processing each row is about as expensive as
>> reading the page from disk (again, reading data from disk is much more
>> expensive than processing them).
>>
>> So yes, the current values are much more likely to give good results.
>>
>> You've mentioned those values were recommended on this list - can you
>> point out the actual discussion?
>>
>>
>
> Thank you for your reply.
>
> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values...
>

OK, what JD said there generally makes sense, although those values are
a bit extreme - in most cases it's recommended to leave seq_page_cost=1
and decrease the random_page_cost (to 2, the dafault value is 4). That
usually pushes the planner towards index scans.

I'm not saying those small values (0.02 etc.) are bad, but I guess the
effect is about the same and it changes the impact of the other cost
variables (cpu_tuple_cost, etc.)

I see there is 16GB of RAM but shared_buffers are just 4GB. So there's
nothing else running and the rest of the RAM is used for pagecache? I've
noticed the previous discussion mentions there are 8GB of RAM and the DB
size is 7GB (so it might fit into memory). Is this still the case?

regards
Tomas

Re: Performance

From

Ogden

Date:

12 April 2011, 18:21:37

On Apr 12, 2011, at 4:09 PM, Tomas Vondra wrote:

> Dne 12.4.2011 20:28, Ogden napsal(a):
>>
>> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote:
>>
>>> Dne 12.4.2011 19:23, Ogden napsal(a):
>>>>
>>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>>>>
>>>>> Ogden <lists@darkstatic.com> wrote:
>>>>>
>>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>>>>> server at work and granted, there's more activity on the production server, but
>>>>>> the same queries take twice as long on the beefier server than my mac at home.
>>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>>>>> there's any way one can help me change things around to be more efficient.
>>>>>>
>>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>>>>
>>>>>> Heavy write and read (for reporting and calculations) server.
>>>>>>
>>>>>> max_connections = 350
>>>>>> shared_buffers = 4096MB
>>>>>> work_mem = 32MB
>>>>>> maintenance_work_mem = 512MB
>>>>>
>>>>> That's okay.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>>>>> random_page_cost = 0.03
>>>>>
>>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>>>>> and random_page_cost) are completly wrong.
>>>>>
>>>>
>>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
>>>> find by raising them to:
>>>>
>>>> seq_page_cost = 1.0
>>>> random_page_cost = 3.0
>>>> cpu_tuple_cost = 0.3
>>>> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
>>>> #cpu_operator_cost = 0.0025             # same scale as above
>>>> effective_cache_size = 8192MB
>>>>
>>>> That this is better, some queries run much faster. Is this better?
>>>
>>> I guess it is. What really matters with those cost variables is the
>>> relative scale - the original values
>>>
>>> seq_page_cost = 0.02
>>> random_page_cost = 0.03
>>> cpu_tuple_cost = 0.02
>>>
>>> suggest that the random reads are almost as expensive as sequential
>>> reads (which usually is not true - the random reads are significantly
>>> more expensive), and that processing each row is about as expensive as
>>> reading the page from disk (again, reading data from disk is much more
>>> expensive than processing them).
>>>
>>> So yes, the current values are much more likely to give good results.
>>>
>>> You've mentioned those values were recommended on this list - can you
>>> point out the actual discussion?
>>>
>>>
>>
>> Thank you for your reply.
>>
>> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values...
>>
>
> OK, what JD said there generally makes sense, although those values are
> a bit extreme - in most cases it's recommended to leave seq_page_cost=1
> and decrease the random_page_cost (to 2, the dafault value is 4). That
> usually pushes the planner towards index scans.
>
> I'm not saying those small values (0.02 etc.) are bad, but I guess the
> effect is about the same and it changes the impact of the other cost
> variables (cpu_tuple_cost, etc.)
>
> I see there is 16GB of RAM but shared_buffers are just 4GB. So there's
> nothing else running and the rest of the RAM is used for pagecache? I've
> noticed the previous discussion mentions there are 8GB of RAM and the DB
> size is 7GB (so it might fit into memory). Is this still the case?
>
> regards
> Tomas


Thomas,

By decreasing random_page_cost to 2 (instead of 4), there is a slight performance decrease as opposed to leaving it
justat 4. For example, if I set it 3 (or 4), a query may take 0.057 seconds. The same query takes 0.144s when I set
random_page_costto 2. Should I keep it at 3 (or 4) as I have done now? 

Yes there is 16GB of RAM but the database is much bigger than that. Should I increase shared_buffers?

Thank you so very much

Ogden

Re: Performance

From

Tomas Vondra

Date:

12 April 2011, 19:37:07

Dne 12.4.2011 23:19, Ogden napsal(a):
>
> On Apr 12, 2011, at 4:09 PM, Tomas Vondra wrote:
>
>> Dne 12.4.2011 20:28, Ogden napsal(a):
>>>
>>> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote:
>>>
>>>> Dne 12.4.2011 19:23, Ogden napsal(a):
>>>>>
>>>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>>>>>
>>>>>> Ogden <lists@darkstatic.com> wrote:
>>>>>>
>>>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>>>>>> server at work and granted, there's more activity on the production server, but
>>>>>>> the same queries take twice as long on the beefier server than my mac at home.
>>>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>>>>>> there's any way one can help me change things around to be more efficient.
>>>>>>>
>>>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>>>>>
>>>>>>> Heavy write and read (for reporting and calculations) server.
>>>>>>>
>>>>>>> max_connections = 350
>>>>>>> shared_buffers = 4096MB
>>>>>>> work_mem = 32MB
>>>>>>> maintenance_work_mem = 512MB
>>>>>>
>>>>>> That's okay.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>>>>>> random_page_cost = 0.03
>>>>>>
>>>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>>>>>> and random_page_cost) are completly wrong.
>>>>>>
>>>>>
>>>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
>>>>> find by raising them to:
>>>>>
>>>>> seq_page_cost = 1.0
>>>>> random_page_cost = 3.0
>>>>> cpu_tuple_cost = 0.3
>>>>> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
>>>>> #cpu_operator_cost = 0.0025             # same scale as above
>>>>> effective_cache_size = 8192MB
>>>>>
>>>>> That this is better, some queries run much faster. Is this better?
>>>>
>>>> I guess it is. What really matters with those cost variables is the
>>>> relative scale - the original values
>>>>
>>>> seq_page_cost = 0.02
>>>> random_page_cost = 0.03
>>>> cpu_tuple_cost = 0.02
>>>>
>>>> suggest that the random reads are almost as expensive as sequential
>>>> reads (which usually is not true - the random reads are significantly
>>>> more expensive), and that processing each row is about as expensive as
>>>> reading the page from disk (again, reading data from disk is much more
>>>> expensive than processing them).
>>>>
>>>> So yes, the current values are much more likely to give good results.
>>>>
>>>> You've mentioned those values were recommended on this list - can you
>>>> point out the actual discussion?
>>>>
>>>>
>>>
>>> Thank you for your reply.
>>>
>>> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values...
>>>
>>
>> OK, what JD said there generally makes sense, although those values are
>> a bit extreme - in most cases it's recommended to leave seq_page_cost=1
>> and decrease the random_page_cost (to 2, the dafault value is 4). That
>> usually pushes the planner towards index scans.
>>
>> I'm not saying those small values (0.02 etc.) are bad, but I guess the
>> effect is about the same and it changes the impact of the other cost
>> variables (cpu_tuple_cost, etc.)
>>
>> I see there is 16GB of RAM but shared_buffers are just 4GB. So there's
>> nothing else running and the rest of the RAM is used for pagecache? I've
>> noticed the previous discussion mentions there are 8GB of RAM and the DB
>> size is 7GB (so it might fit into memory). Is this still the case?
>>
>> regards
>> Tomas
>
>
> Thomas,
>
> By decreasing random_page_cost to 2 (instead of 4), there is a slight performance decrease as opposed to leaving it
justat 4. For example, if I set it 3 (or 4), a query may take 0.057 seconds. The same query takes 0.144s when I set
random_page_costto 2. Should I keep it at 3 (or 4) as I have done now? 
>
> Yes there is 16GB of RAM but the database is much bigger than that. Should I increase shared_buffers?

OK, that's a very important information and it kinda explains all the
problems you had. When the planner decides what execution plan to use,
it computes a 'virtual cost' for different plans and then chooses the
cheapest one.

Decreasing 'random_page_cost' decreases the expected cost of plans
involving index scans, so that at a certain point it seems cheaper than
a plan using sequential scans etc.

You can see this when using EXPLAIN - do it with the original cost
values, then change the values (for that session only) and do the
EXPLAIN only. You'll see how the execution plan suddenly changes and
starts to use index scans.

The problem with random I/O is that it's usually much more expensive
than sequential I/O as the drives need to seek etc. The only case when
random I/O is just as cheap as sequential I/O is when all the data is
cached in memory, because within RAM there's no difference between
random and sequential access (right, that's why it's called Random
Access Memory).

So in the previous post setting both random_page_cost and seq_page_cost
to the same value makes sense, because when the whole database fits into
the memory, there's no difference and index scans are favorable.

In this case (the database is much bigger than the available RAM) this
no longer holds - index scans hit the drives, resulting in a lot of
seeks etc. So it's a serious performance killer ...

Not sure about increasing the shared_buffers - if the block is not found
in shared buffers, it still might be found in pagecache (without need to
do a physical read). There are ways to check if the current size of
shared buffers is enough or not - I usually use pg_stat views (bgwriter
and database).

regards
Tomas

Re: Performance

From

Ogden

Date:

13 April 2011, 11:05:27

On Apr 12, 2011, at 5:36 PM, Tomas Vondra wrote:

> Dne 12.4.2011 23:19, Ogden napsal(a):
>>
>> On Apr 12, 2011, at 4:09 PM, Tomas Vondra wrote:
>>
>>> Dne 12.4.2011 20:28, Ogden napsal(a):
>>>>
>>>> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote:
>>>>
>>>>> Dne 12.4.2011 19:23, Ogden napsal(a):
>>>>>>
>>>>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>>>>>>
>>>>>>> Ogden <lists@darkstatic.com> wrote:
>>>>>>>
>>>>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>>>>>>> server at work and granted, there's more activity on the production server, but
>>>>>>>> the same queries take twice as long on the beefier server than my mac at home.
>>>>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>>>>>>> there's any way one can help me change things around to be more efficient.
>>>>>>>>
>>>>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>>>>>>
>>>>>>>> Heavy write and read (for reporting and calculations) server.
>>>>>>>>
>>>>>>>> max_connections = 350
>>>>>>>> shared_buffers = 4096MB
>>>>>>>> work_mem = 32MB
>>>>>>>> maintenance_work_mem = 512MB
>>>>>>>
>>>>>>> That's okay.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>>>>>>> random_page_cost = 0.03
>>>>>>>
>>>>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>>>>>>> and random_page_cost) are completly wrong.
>>>>>>>
>>>>>>
>>>>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
>>>>>> find by raising them to:
>>>>>>
>>>>>> seq_page_cost = 1.0
>>>>>> random_page_cost = 3.0
>>>>>> cpu_tuple_cost = 0.3
>>>>>> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
>>>>>> #cpu_operator_cost = 0.0025             # same scale as above
>>>>>> effective_cache_size = 8192MB
>>>>>>
>>>>>> That this is better, some queries run much faster. Is this better?
>>>>>
>>>>> I guess it is. What really matters with those cost variables is the
>>>>> relative scale - the original values
>>>>>
>>>>> seq_page_cost = 0.02
>>>>> random_page_cost = 0.03
>>>>> cpu_tuple_cost = 0.02
>>>>>
>>>>> suggest that the random reads are almost as expensive as sequential
>>>>> reads (which usually is not true - the random reads are significantly
>>>>> more expensive), and that processing each row is about as expensive as
>>>>> reading the page from disk (again, reading data from disk is much more
>>>>> expensive than processing them).
>>>>>
>>>>> So yes, the current values are much more likely to give good results.
>>>>>
>>>>> You've mentioned those values were recommended on this list - can you
>>>>> point out the actual discussion?
>>>>>
>>>>>
>>>>
>>>> Thank you for your reply.
>>>>
>>>> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values...
>>>>
>>>
>>> OK, what JD said there generally makes sense, although those values are
>>> a bit extreme - in most cases it's recommended to leave seq_page_cost=1
>>> and decrease the random_page_cost (to 2, the dafault value is 4). That
>>> usually pushes the planner towards index scans.
>>>
>>> I'm not saying those small values (0.02 etc.) are bad, but I guess the
>>> effect is about the same and it changes the impact of the other cost
>>> variables (cpu_tuple_cost, etc.)
>>>
>>> I see there is 16GB of RAM but shared_buffers are just 4GB. So there's
>>> nothing else running and the rest of the RAM is used for pagecache? I've
>>> noticed the previous discussion mentions there are 8GB of RAM and the DB
>>> size is 7GB (so it might fit into memory). Is this still the case?
>>>
>>> regards
>>> Tomas
>>
>>
>> Thomas,
>>
>> By decreasing random_page_cost to 2 (instead of 4), there is a slight performance decrease as opposed to leaving it
justat 4. For example, if I set it 3 (or 4), a query may take 0.057 seconds. The same query takes 0.144s when I set
random_page_costto 2. Should I keep it at 3 (or 4) as I have done now? 
>>
>> Yes there is 16GB of RAM but the database is much bigger than that. Should I increase shared_buffers?
>
> OK, that's a very important information and it kinda explains all the
> problems you had. When the planner decides what execution plan to use,
> it computes a 'virtual cost' for different plans and then chooses the
> cheapest one.
>
> Decreasing 'random_page_cost' decreases the expected cost of plans
> involving index scans, so that at a certain point it seems cheaper than
> a plan using sequential scans etc.
>
> You can see this when using EXPLAIN - do it with the original cost
> values, then change the values (for that session only) and do the
> EXPLAIN only. You'll see how the execution plan suddenly changes and
> starts to use index scans.
>
> The problem with random I/O is that it's usually much more expensive
> than sequential I/O as the drives need to seek etc. The only case when
> random I/O is just as cheap as sequential I/O is when all the data is
> cached in memory, because within RAM there's no difference between
> random and sequential access (right, that's why it's called Random
> Access Memory).
>
> So in the previous post setting both random_page_cost and seq_page_cost
> to the same value makes sense, because when the whole database fits into
> the memory, there's no difference and index scans are favorable.
>
> In this case (the database is much bigger than the available RAM) this
> no longer holds - index scans hit the drives, resulting in a lot of
> seeks etc. So it's a serious performance killer ...
>
> Not sure about increasing the shared_buffers - if the block is not found
> in shared buffers, it still might be found in pagecache (without need to
> do a physical read). There are ways to check if the current size of
> shared buffers is enough or not - I usually use pg_stat views (bgwriter
> and database).


Thomas,

Thank you for your very detailed and well written description. In conclusion, I should keep my random_page_cost (3.0)
toa value more than seq_page_cost (1.0)? Is this bad practice or will this suffice for my setup (where the database is
muchbigger than the RAM in the system)? Or is this not what you are suggesting at all? 

Thank you

Ogden

Re: Performance

From

tv@fuzzy.cz

Date:

13 April 2011, 11:15:03

> Thomas,
>
> Thank you for your very detailed and well written description. In
> conclusion, I should keep my random_page_cost (3.0) to a value more than
> seq_page_cost (1.0)? Is this bad practice or will this suffice for my
> setup (where the database is much bigger than the RAM in the system)? Or
> is this not what you are suggesting at all?

Yes, keep it that way. The fact that 'random_page_cost >= seq_page_cost'
generally means that random reads are more expensive than sequential
reads. The actual values are  dependent but 4:1 is usually OK, unless your
db fits into memory etc.

The decrease of performance after descreasing random_page_cost to 3 due to
changes of some execution plans (the index scan becomes slightly less
expensive than seq scan), but in your case it's a false assumption. So
keep it at 4 (you may even try to increase it, just to see if that
improves the performance).

regards
Tomas

Re: Performance

From

"Kevin Grittner"

Date:

13 April 2011, 11:32:45

Ogden <lists@darkstatic.com> wrote:

> In conclusion, I should keep my random_page_cost (3.0) to a value
> more than seq_page_cost (1.0)? Is this bad practice or will this
> suffice for my setup (where the database is much bigger than the
> RAM in the system)?

The idea is to adjust the costing factors to model the actual
relative costs of the various actions in your environment with your
workload.  The best way to determine whether your settings are good
is to gauge how happy the those using the database are with
performance.  :-)

The degree of caching has a large effect on the page costs.  We've
managed to keep the active portion of our databases cached to a
degree that we have always benefited by reducing the
random_page_cost to 2 or less.  Where the entire database is cached,
we get the best plans with seq_page_cost and random_page_cost set to
equal values in the 0.1 to 0.05 range.  We've occasionally needed to
bump the cpu_tuple_cost up a bit relative to other cpu costs, too.

On the other hand, I've seen reports of people who have found it
necessary to increase random_page_cost to get good plans.  These
have been people with large databases where the entire database is
"active" (versus our databases where recent, active data is accessed
much more heavily than, say, 20 year old data).

If you model the costing to reflect the reality on your server, good
plans will be chosen.

-Kevin

Re: Performance

From

Claudio Freire

Date:

13 April 2011, 18:18:56

On Wed, Apr 13, 2011 at 4:32 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> If you model the costing to reflect the reality on your server, good
> plans will be chosen.

Wouldn't it be "better" to derive those costs from actual performance
data measured at runtime?

Say, pg could measure random/seq page cost, *per tablespace* even.

Has that been tried?

Re: Performance

From

Tom Lane

Date:

13 April 2011, 18:52:50

Claudio Freire <klaussfreire@gmail.com> writes:
> On Wed, Apr 13, 2011 at 4:32 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> If you model the costing to reflect the reality on your server, good
>> plans will be chosen.

> Wouldn't it be "better" to derive those costs from actual performance
> data measured at runtime?

> Say, pg could measure random/seq page cost, *per tablespace* even.

> Has that been tried?

Getting numbers that mean much of anything is a slow, expensive
process.  You really don't want the database trying to do that for you.
Once you've got them, you *really* don't want the database
editorializing on them.

            regards, tom lane

Re: Performance

From

Claudio Freire

Date:

13 April 2011, 18:55:15

On Wed, Apr 13, 2011 at 11:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Getting numbers that mean much of anything is a slow, expensive
> process.  You really don't want the database trying to do that for you.
> Once you've got them, you *really* don't want the database
> editorializing on them.
>

So it hasn't even been tried.

Re: Performance

From

"Kevin Grittner"

Date:

13 April 2011, 18:59:57

Claudio Freire <klaussfreire@gmail.com> wrote:

> So it hasn't even been tried.

If you want to do that, I would be interested in your benchmark
numbers.  Or if you're not up to that, there are a number of
companies which I'd bet would be willing to spend the time if they
had a sponsor to pay for their hours.  So far nobody has felt it
likely enough to be beneficial to want to put their time or money on
the line for it.  Here's your chance to be first.

-Kevin

Re: Performance

From

Nathan Boley

Date:

13 April 2011, 19:05:44

>> If you model the costing to reflect the reality on your server, good
>> plans will be chosen.
>
> Wouldn't it be "better" to derive those costs from actual performance
> data measured at runtime?
>
> Say, pg could measure random/seq page cost, *per tablespace* even.
>
> Has that been tried?

FWIW, awhile ago I wrote a simple script to measure this and found
that the *actual* random_page / seq_page cost ratio was much higher
than 4/1.

The problem is that caching effects have a large effect on the time it
takes to access a random page, and caching effects are very workload
dependent. So anything automated would probably need to optimize the
parameter values over a set of 'typical' queries, which is exactly
what a good DBA does when they set random_page_cost...

Best,
Nathan

Re: Performance

From

"Kevin Grittner"

Date:

13 April 2011, 19:15:41

Nathan Boley <npboley@gmail.com> wrote:

> The problem is that caching effects have a large effect on the
> time it takes to access a random page, and caching effects are
> very workload dependent. So anything automated would probably need
> to optimize the parameter values over a set of 'typical' queries,
> which is exactly what a good DBA does when they set
> random_page_cost...

Another database product I've used has a stored procedure you can
run to turn on monitoring of workload, another to turn it off and
report on what happened during the interval.  It drags performance
enough that you don't want to leave it running except as a tuning
exercise, but it does produce very detailed statistics and actually
offers suggestions on what you might try tuning to improve
performance.  If someone wanted to write something to deal with this
issue, that seems like a sound overall strategy.

-Kevin

Re: Performance

From

Tomas Vondra

Date:

13 April 2011, 19:19:35

Dne 14.4.2011 00:05, Nathan Boley napsal(a):
>>> If you model the costing to reflect the reality on your server, good
>>> plans will be chosen.
>>
>> Wouldn't it be "better" to derive those costs from actual performance
>> data measured at runtime?
>>
>> Say, pg could measure random/seq page cost, *per tablespace* even.
>>
>> Has that been tried?
>
> FWIW, awhile ago I wrote a simple script to measure this and found
> that the *actual* random_page / seq_page cost ratio was much higher
> than 4/1.
>
> The problem is that caching effects have a large effect on the time it
> takes to access a random page, and caching effects are very workload
> dependent. So anything automated would probably need to optimize the
> parameter values over a set of 'typical' queries, which is exactly
> what a good DBA does when they set random_page_cost...

Plus there's a separate pagecache outside shared_buffers, which adds
another layer of complexity.

What I was thinking about was a kind of 'autotuning' using real
workload. I mean - measure the time it takes to process a request
(depends on the application - could be time to load a page, process an
invoice, whatever ...) and compute some reasonable metric on it
(average, median, variance, ...). Move the cost variables a bit (e.g.
the random_page_cost) and see how that influences performance. If it
improved, do another step in the same direction, otherwise do step in
the other direction (or do no change the values at all).

Yes, I've had some lectures on non-linear programming so I'm aware that
this won't work if the cost function has multiple extremes (walleys /
hills etc.) but I somehow suppose that's not the case of cost estimates.

Another issue is that when measuring multiple values (processing of
different requests), the decisions may be contradictory so it really
can't be fully automatic.

regards
Tomas

Re: Performance

From

Claudio Freire

Date:

13 April 2011, 20:10:18

On Thu, Apr 14, 2011 at 12:19 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>
> Another issue is that when measuring multiple values (processing of
> different requests), the decisions may be contradictory so it really
> can't be fully automatic.
>

I don't think it's soooo dependant on workload. It's dependant on
access patterns (and working set sizes), and that all can be
quantified, as opposed to "workload".

I've been meaning to try this for a while yet, and it needs not be as
expensive as one would imagine. It just needs a clever implementation
that isn't too intrusive and that is customizable enough not to
alienate DBAs.

I'm not doing database stuff ATM (though I've been doing it for
several years), and I don't expect to return to database tasks for a
few months. But whenever I get back to it, sure, I'd be willing to
invest time on it.

What an automated system can do and a DBA cannot, and it's why this
idea occurred to me in the first place, is tailor the metrics for
variable contexts and situations. Like, I had a DB that was working
perfectly fine most of the time, but some days it got "overworked" and
sticking with fixed cost variables made no sense - in those
situations, random page cost was insanely high because of the
workload, but sequential scans would have ran much faster because of
OS read-ahead and because of synchroscans. I'm talking of a decision
support system that did lots of heavy duty queries, where sequential
scans are an alternative. I reckon most OLTP systems are different.

So, to make things short, adaptability to varying conditions is what
I'd imagine this technique would provide, and a DBA cannot no matter
how skilled. That and the advent of SSDs and really really different
characteristics of different tablespaces only strengthen my intuition
that automation might be better than parameterization.

Re: Performance

From

Tomas Vondra

Date:

13 April 2011, 20:26:37

Dne 14.4.2011 01:10, Claudio Freire napsal(a):
> On Thu, Apr 14, 2011 at 12:19 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>
>> Another issue is that when measuring multiple values (processing of
>> different requests), the decisions may be contradictory so it really
>> can't be fully automatic.
>>
>
> I don't think it's soooo dependant on workload. It's dependant on
> access patterns (and working set sizes), and that all can be
> quantified, as opposed to "workload".

Well, think about a database that's much bigger than the available RAM.

Workload A: Touches just a very small portion of the database, to the
'active' part actually fits into the memory. In this case the cache hit
ratio can easily be close to 99%.

Workload B: Touches large portion of the database, so it hits the drive
very often. In this case the cache hit ratio is usually around RAM/(size
of the database).

So yes, it may be very workload dependent. In the first case you may
actually significantly lower the random_page_cost (even to
seq_page_cost) and it's going to be quite fast (thanks to the cache).

If you do the same thing with workload B, the database is going to burn.

I'm not saying it's not possible to do some autotuning, but it's a bit
tricky and it's not just about hardware. The workload *is* a very
important part of the equation.

But I have to admit this post probably sounds like an overengineering.
If you can develop something simple (even if that does not consider
workload at all), it might be a useful starting point. If I could help
you in any way with this, let me know.

regards
Tomas

Re: Performance

From

Tom Lane

Date:

13 April 2011, 21:03:37

Nathan Boley <npboley@gmail.com> writes:
> FWIW, awhile ago I wrote a simple script to measure this and found
> that the *actual* random_page / seq_page cost ratio was much higher
> than 4/1.

That 4:1 ratio is based on some rather extensive experimentation that
I did back in 2000.  In the interim, disk transfer rates have improved
quite a lot more than disk seek times have, and the CPU cost to process
a page's worth of data has also improved compared to the seek time.
So yeah, you'd likely get a higher number if you redid those experiments
on modern hardware (at least assuming it was rotating media and not SSD).
On the other hand, the effects of caching push the numbers in the other
direction, and modern machines also have a lot more RAM to cache in than
was typical ten years ago.  I'm not sure how much point there is in
trying to improve the default number in the abstract --- we'd really
need to have a more robust model of cache effects before I'd trust any
automatic tuning procedure to set the value for me.

            regards, tom lane

Re: Performance

From

"Joshua D. Drake"

Date:

13 April 2011, 21:37:32

On 04/13/2011 05:03 PM, Tom Lane wrote:
> That 4:1 ratio is based on some rather extensive experimentation that
> I did back in 2000.  In the interim, disk transfer rates have improved
> quite a lot more than disk seek times have, and the CPU cost to process
> a page's worth of data has also improved compared to the seek time.
My experience is that at least a 1/1 is more appropriate.

JD

Re: Performance

From

Scott Marlowe

Date:

14 April 2011, 00:39:54

On Wed, Apr 13, 2011 at 5:26 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

> Workload A: Touches just a very small portion of the database, to the
> 'active' part actually fits into the memory. In this case the cache hit
> ratio can easily be close to 99%.
>
> Workload B: Touches large portion of the database, so it hits the drive
> very often. In this case the cache hit ratio is usually around RAM/(size
> of the database).

I've had this kind of split-brain operation in the past, where 99% of
all accesses would be cached, and the 1% that weren't needed their own
tuning.  Luckily you can tune by user (alter user set random_page_cost
etc) so I was able to do that.  One of the best features of pgsql
imnsho.

Re: Performance

From

Claudio Freire

Date:

14 April 2011, 03:50:05

On Thu, Apr 14, 2011 at 1:26 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
> Workload A: Touches just a very small portion of the database, to the
> 'active' part actually fits into the memory. In this case the cache hit
> ratio can easily be close to 99%.
>
> Workload B: Touches large portion of the database, so it hits the drive
> very often. In this case the cache hit ratio is usually around RAM/(size
> of the database).

You've answered it yourself without even realized it.

This particular factor is not about an abstract and opaque "Workload"
the server can't know about. It's about cache hit rate, and the server
can indeed measure that.

Re: Performance

From

tv@fuzzy.cz

Date:

14 April 2011, 05:23:35

> On Thu, Apr 14, 2011 at 1:26 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> Workload A: Touches just a very small portion of the database, to the
>> 'active' part actually fits into the memory. In this case the cache hit
>> ratio can easily be close to 99%.
>>
>> Workload B: Touches large portion of the database, so it hits the drive
>> very often. In this case the cache hit ratio is usually around RAM/(size
>> of the database).
>
> You've answered it yourself without even realized it.
>
> This particular factor is not about an abstract and opaque "Workload"
> the server can't know about. It's about cache hit rate, and the server
> can indeed measure that.

OK, so it's not a matter of tuning random_page_cost/seq_page_cost? Because
tuning based on cache hit ratio is something completely different (IMHO).

Anyway I'm not an expert in this field, but AFAIK something like this
already happens - btw that's the purpose of effective_cache_size. But I'm
afraid there might be serious fail cases where the current model works
better, e.g. what if you ask for data that's completely uncached (was
inactive for a long time). But if you have an idea on how to improve this,
great - start a discussion in the hackers list and let's see.

regards
Tomas

Re: Performance

From

Cédric Villemain

Date:

14 April 2011, 08:30:16

2011/4/14 Tom Lane <tgl@sss.pgh.pa.us>:
> Nathan Boley <npboley@gmail.com> writes:
>> FWIW, awhile ago I wrote a simple script to measure this and found
>> that the *actual* random_page / seq_page cost ratio was much higher
>> than 4/1.
>
> That 4:1 ratio is based on some rather extensive experimentation that
> I did back in 2000.  In the interim, disk transfer rates have improved
> quite a lot more than disk seek times have, and the CPU cost to process
> a page's worth of data has also improved compared to the seek time.
> So yeah, you'd likely get a higher number if you redid those experiments
> on modern hardware (at least assuming it was rotating media and not SSD).
> On the other hand, the effects of caching push the numbers in the other
> direction, and modern machines also have a lot more RAM to cache in than
> was typical ten years ago.  I'm not sure how much point there is in
> trying to improve the default number in the abstract --- we'd really
> need to have a more robust model of cache effects before I'd trust any
> automatic tuning procedure to set the value for me.

Well, at spare time, I am doing some POC with  "ANALYZE OSCACHE
relation;", pg stats are updated accordingly with new data ( it is not
finish yet) : at least the percentage in OS cache, maybe the number of
groups in cache and/or the distribution.

Anyway the idea is to allow the planner to use random and seq page
cost to be applyed on the part not-in-cache, without replacing the
algo using effective_cache_size. The planner may have one other GUC
like 'mem_page_cost' to set a cost on access from cache and use it
while estinating the cost...

Side effect is that random page cost and seq page cost should be more
stable and easiest to set based on a script because they won't have
the mixed sources of disk/memory, only the disk acces cost. (if
ANALYZE OSCACHE is good enough)
--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: Performance

From

Robert Haas

Date:

26 April 2011, 02:30:23

On Apr 14, 2011, at 2:49 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> This particular factor is not about an abstract and opaque "Workload"
> the server can't know about. It's about cache hit rate, and the server
> can indeed measure that.

The server can and does measure hit rates for the PG buffer pool, but to my knowledge there is no clear-cut way for PG
toknow whether read() is satisfied from the OS cache or a drive cache or the platter. 

...Robert

Re: Performance

From

Robert Haas

Date:

26 April 2011, 02:34:59

On Apr 13, 2011, at 6:19 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> Yes, I've had some lectures on non-linear programming so I'm aware that
> this won't work if the cost function has multiple extremes (walleys /
> hills etc.) but I somehow suppose that's not the case of cost estimates.

I think that supposition might turn out to be incorrect, though.  Probably what will happen on simple queries is that a
smallchange will make no difference, and a large enough change will cause a plan change.  On complex queries it will
approachcontinuous variation but why shouldn't there be local minima? 

...Robert

Re: Performance

From

Claudio Freire

Date:

26 April 2011, 04:49:48

On Tue, Apr 26, 2011 at 7:30 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Apr 14, 2011, at 2:49 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> This particular factor is not about an abstract and opaque "Workload"
>> the server can't know about. It's about cache hit rate, and the server
>> can indeed measure that.
>
> The server can and does measure hit rates for the PG buffer pool, but to my knowledge there is no clear-cut way for
PGto know whether read() is satisfied from the OS cache or a drive cache or the platter. 

Isn't latency an indicator?

If you plot latencies, you should see three markedly obvious clusters:
OS cache (microseconds), Drive cache (slightly slower), platter
(tail).

I think I had seen a study of sorts somewhere[0]...

Ok, that link is about sequential/random access, but I distinctively
remember one about caches and CAV...

[0] http://blogs.sun.com/brendan/entry/heat_map_analytics

Re: Performance

From

Tomas Vondra

Date:

26 April 2011, 15:54:46

Dne 26.4.2011 07:35, Robert Haas napsal(a):
> On Apr 13, 2011, at 6:19 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> Yes, I've had some lectures on non-linear programming so I'm aware that
>> this won't work if the cost function has multiple extremes (walleys /
>> hills etc.) but I somehow suppose that's not the case of cost estimates.
>
> I think that supposition might turn out to be incorrect, though. Probably
> what will happen on simple queries is that a small change will make no
> difference, and a large enough change will cause a plan change.  On
> complex queries it will approach continuous variation but why
> shouldn't there be local minima?

Aaaah, damn! I was not talking about cost estimates - those obviously do
not have this feature, as you've pointed out (thanks!).

I was talking about the 'response time' I mentioned when describing the
autotuning using real workload. The idea is to change the costs a bit
and then measure the average response time - if the overall performance
improved, do another step in the same direction. Etc.

I wonder if there are cases where an increase of random_page_cost would
hurt performance, and another increase would improve it ... And I'm not
talking about individual queries, I'm talking about overall performance.

regards
Tomas

Re: Performance

From

Greg Smith

Date:

27 April 2011, 15:50:52

tv@fuzzy.cz wrote:
> Anyway I'm not an expert in this field, but AFAIK something like this
> already happens - btw that's the purpose of effective_cache_size.

effective_cache_size probably doesn't do as much as you suspect.  It is
used for one of the computations for whether an index is small enough
that it can likely be read into memory efficiently.  It has no impact on
caching decisions outside of that.

As for the ideas bouncing around here for tinkering with
random_page_size more automatically, I have a notebook with about a
dozen different ways to do that I've come up with over the last few
years.  The reason no work can be done in this area is because there are
no standardized benchmarks of query execution in PostgreSQL being run
regularly right now.  Bringing up ideas for changing the computation is
easy; proving that such a change is positive on enough workloads to be
worth considering is the hard part.  There is no useful discussion to be
made on the hackers list that doesn't start with "here's the mix the
benchmarks I intend to test this new model against".

Performance regression testing for the the query optimizer is a giant
pile of boring work we get minimal volunteers interested in.  Nobody
gets to do the fun model change work without doing that first though.
For this type of change, you're guaranteed to just be smacking around
parameters to optimize for only a single case without some broader
benchmarking context.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Performance

From

"Kevin Grittner"

Date:

27 April 2011, 15:57:00

Greg Smith <greg@2ndquadrant.com> wrote:

> The reason no work can be done in this area is because there are
> no standardized benchmarks of query execution in PostgreSQL being
> run regularly right now.  Bringing up ideas for changing the
> computation is easy; proving that such a change is positive on
> enough workloads to be worth considering is the hard part.  There
> is no useful discussion to be made on the hackers list that
> doesn't start with "here's the mix the benchmarks I intend to test
> this new model against".

This is looming as an ever-more-acute need for the project, in
several areas.

-Kevin

Re: Performance

From

Robert Haas

Date:

27 April 2011, 17:29:05

On Tue, Apr 26, 2011 at 8:54 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> I wonder if there are cases where an increase of random_page_cost would
> hurt performance, and another increase would improve it ... And I'm not
> talking about individual queries, I'm talking about overall performance.

I don't think there are many.  But I don't think you can assume that
there are none.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Performance

From

Robert Haas

Date:

27 April 2011, 17:33:33

On Tue, Apr 26, 2011 at 9:49 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Tue, Apr 26, 2011 at 7:30 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Apr 14, 2011, at 2:49 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> This particular factor is not about an abstract and opaque "Workload"
>>> the server can't know about. It's about cache hit rate, and the server
>>> can indeed measure that.
>>
>> The server can and does measure hit rates for the PG buffer pool, but to my knowledge there is no clear-cut way for
PGto know whether read() is satisfied from the OS cache or a drive cache or the platter. 
>
> Isn't latency an indicator?
>
> If you plot latencies, you should see three markedly obvious clusters:
> OS cache (microseconds), Drive cache (slightly slower), platter
> (tail).

What if the user is using an SSD or ramdisk?

Admittedly, in many cases, we could probably get somewhat useful
numbers this way.  But I think it would be pretty expensive.
gettimeofday() is one of the reasons why running EXPLAIN ANALYZE on a
query is significantly slower than just running it normally.  I bet if
we put such calls around every read() and write(), it would cause a
BIG slowdown for workloads that don't fit in shared_buffers.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Performance

From

Tomas Vondra

Date:

27 April 2011, 17:44:55

Dne 27.4.2011 20:56, Kevin Grittner napsal(a):
> Greg Smith <greg@2ndquadrant.com> wrote:
>
>> The reason no work can be done in this area is because there are
>> no standardized benchmarks of query execution in PostgreSQL being
>> run regularly right now.  Bringing up ideas for changing the
>> computation is easy; proving that such a change is positive on
>> enough workloads to be worth considering is the hard part.  There
>> is no useful discussion to be made on the hackers list that
>> doesn't start with "here's the mix the benchmarks I intend to test
>> this new model against".
>
> This is looming as an ever-more-acute need for the project, in
> several areas.

Hmmm, just wondering - what would be needed to build such 'workload
library'? Building it from scratch is not feasible IMHO, but I guess
people could provide their own scripts (as simple as 'set up a a bunch
of tables, fill it with data, run some queries') and there's a pile of
such examples in the pgsql-performance list.

regards
Tomas

Re: Performance

From

Claudio Freire

Date:

27 April 2011, 18:01:53

On Wed, Apr 27, 2011 at 10:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> What if the user is using an SSD or ramdisk?
>
> Admittedly, in many cases, we could probably get somewhat useful
> numbers this way.  But I think it would be pretty expensive.
> gettimeofday() is one of the reasons why running EXPLAIN ANALYZE on a
> query is significantly slower than just running it normally.  I bet if
> we put such calls around every read() and write(), it would cause a
> BIG slowdown for workloads that don't fit in shared_buffers.

I've just been reading an article about something intimately related
with that in ACM.

The article was about cache-conscious scheduling. Mostly memory cache,
but disk cache isn't that different. There are lots of work, real,
serious work in characterizing cache contention, and the article
showed how a simplified version of the cache reuse profile model
behaves under various workloads.

The simplified model simply used cache miss rates, and it performed
even better than the more complex model - they went on and analyzed
why.

Long story short, there is indeed a lot of literature about the
subject, there is a lot of formal and experimental results. One of
those models have to be embodied into a patch, and tested - that's
about it.

The patch may be simple, the testing not so much. I know that.

What tools do we have to do that testing? There are lots, and all
imply a lot of work. Is that work worth the trouble? Because if it
is... why not work?

I would propose a step in the right direction: a patch to compute and
log periodical estimations of the main I/O tunables: random_page_cost,
sequential_page_cost and effective_cache_size. Maybe per-tablespace.
Evaluate the performance impact, and work from there.

Because, probably just using those values as input to the optimizer
won't work, because dbas will want a way to tune the optimizer,
because the system may not be stable enough, even because even with
accurate estimates for those values, the optimizer may not perform as
expected. I mean, right now those values are tunables, not real
metrics, so perhaps the optimizer won't respond well to real values.

But having the ability to measure them without a serious performance
impact is a step in the right direction, right?

Re: Performance

From

Greg Smith

Date:

27 April 2011, 18:55:53

Tomas Vondra wrote:
> Hmmm, just wondering - what would be needed to build such 'workload
> library'? Building it from scratch is not feasible IMHO, but I guess
> people could provide their own scripts (as simple as 'set up a a bunch
> of tables, fill it with data, run some queries') and there's a pile of
> such examples in the pgsql-performance list.
>

The easiest place to start is by re-using the work already done by the
TPC for benchmarking commercial databases.  There are ports of the TPC
workloads to PostgreSQL available in the DBT-2, DBT-3, and DBT-5 tests;
see http://wiki.postgresql.org/wiki/Category:Benchmarking for initial
information on those (the page on TPC-H is quite relevant too).  I'd
like to see all three of those DBT tests running regularly, as well as
two tests it's possible to simulate with pgbench or sysbench:  an
in-cache read-only test, and a write as fast as possible test.

The main problem with re-using posts from this list for workload testing
is getting an appropriately sized data set for them that stays
relevant.  The nature of this sort of benchmark always includes some
notion of the size of the database, and you get different results based
on how large things are relative to RAM and the database parameters.
That said, some sort of systematic collection of "hard queries" would
also be a very useful project for someone to take on.

People show up regularly who want to play with the optimizer in some
way.  It's still possible to do that by targeting specific queries you
want to accelerate, where it's obvious (or, more likely, hard but still
straightforward) how to do better.  But I don't think any of these
proposed exercises adjusting the caching model or default optimizer
parameters in the database is going anywhere without some sort of
benchmarking framework for evaluating the results.  And the TPC tests
are a reasonable place to start.  They're a good mixed set of queries,
and improving results on those does turn into a real commercial benefit
to PostgreSQL in the future too.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Performance

From

Sethu Prasad

Date:

28 April 2011, 05:03:53

Just want to share the DBT(2&5) thing

http://archives.postgresql.org/pgsql-performance/2011-04/msg00145.php
http://sourceforge.net/mailarchive/forum.php?forum_name=osdldbt-general&max_rows=25&style=nested&viewmonth=201104

On Wed, Apr 27, 2011 at 11:55 PM, Greg Smith <greg@2ndquadrant.com> wrote:

Tomas Vondra wrote:
Hmmm, just wondering - what would be needed to build such 'workload
library'? Building it from scratch is not feasible IMHO, but I guess
people could provide their own scripts (as simple as 'set up a a bunch
of tables, fill it with data, run some queries') and there's a pile of
such examples in the pgsql-performance list.

The easiest place to start is by re-using the work already done by the TPC for benchmarking commercial databases. There are ports of the TPC workloads to PostgreSQL available in the DBT-2, DBT-3, and DBT-5 tests; see http://wiki.postgresql.org/wiki/Category:Benchmarking for initial information on those (the page on TPC-H is quite relevant too). I'd like to see all three of those DBT tests running regularly, as well as two tests it's possible to simulate with pgbench or sysbench: an in-cache read-only test, and a write as fast as possible test.

The main problem with re-using posts from this list for workload testing is getting an appropriately sized data set for them that stays relevant. The nature of this sort of benchmark always includes some notion of the size of the database, and you get different results based on how large things are relative to RAM and the database parameters. That said, some sort of systematic collection of "hard queries" would also be a very useful project for someone to take on.

People show up regularly who want to play with the optimizer in some way. It's still possible to do that by targeting specific queries you want to accelerate, where it's obvious (or, more likely, hard but still straightforward) how to do better. But I don't think any of these proposed exercises adjusting the caching model or default optimizer parameters in the database is going anywhere without some sort of benchmarking framework for evaluating the results. And the TPC tests are a reasonable place to start. They're a good mixed set of queries, and improving results on those does turn into a real commercial benefit to PostgreSQL in the future too.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance

From

Tomas Vondra

Date:

28 April 2011, 21:22:17

Dne 27.4.2011 23:55, Greg Smith napsal(a):

> The easiest place to start is by re-using the work already done by the
> TPC for benchmarking commercial databases.  There are ports of the TPC
> workloads to PostgreSQL available in the DBT-2, DBT-3, and DBT-5 tests;
> see http://wiki.postgresql.org/wiki/Category:Benchmarking for initial
> information on those (the page on TPC-H is quite relevant too).  I'd
> like to see all three of those DBT tests running regularly, as well as
> two tests it's possible to simulate with pgbench or sysbench:  an
> in-cache read-only test, and a write as fast as possible test.

That's a natural first step, I guess.

> The main problem with re-using posts from this list for workload testing
> is getting an appropriately sized data set for them that stays
> relevant.  The nature of this sort of benchmark always includes some
> notion of the size of the database, and you get different results based
> on how large things are relative to RAM and the database parameters.
> That said, some sort of systematic collection of "hard queries" would
> also be a very useful project for someone to take on.

Yes, I'm aware of that. The examples posted to the lists usually lack
the data, but I guess we could get it at least from some of the posters
(anonymized etc.). And some of the examples are rather simple so it's
possible to generate as much data as you want using a PL/pgSQL or so.

Anyway I hesitate to call those examples 'workloads' - it's usually just
one query, sometimes two. But it's still a useful test IMHO.

I was thinking about several VMs, each with a different configuration
(amount of RAM, CPU, ...). The benchmarks might be a bunch of very
simple scripts I guess, each one taking care of preparing the data,
running the test, uploading the results somewhere.

And I guess it'd be useful to make this awailable for download, so that
everyone can run the tests locally ...

A bit naive question - where to run this? I know there's a build farm
but I guess this it's mostly for building and not for such benchmarks.

> People show up regularly who want to play with the optimizer in some
> way.  It's still possible to do that by targeting specific queries you
> want to accelerate, where it's obvious (or, more likely, hard but still
> straightforward) how to do better.  But I don't think any of these
> proposed exercises adjusting the caching model or default optimizer
> parameters in the database is going anywhere without some sort of
> benchmarking framework for evaluating the results.  And the TPC tests
> are a reasonable place to start.  They're a good mixed set of queries,
> and improving results on those does turn into a real commercial benefit
> to PostgreSQL in the future too.

100% true.

regards
Tomas

Re: Performance

From

Joshua Berkus

Date:

29 April 2011, 01:58:53

All,

> The easiest place to start is by re-using the work already done by the
> TPC for benchmarking commercial databases. There are ports of the TPC
> workloads to PostgreSQL available in the DBT-2, DBT-3, and DBT-5
> tests;

Also EAStress, which I think the project still has a license for.

The drawback to these is that they're quite difficult and time-consuming to run, making them unsuitable for doing, say,
incrementaltuning tests which need to run 100 iterations.  At least, now that we don't have access to the OSDL or Sun
labsanymore.   

On the other hand, Greg has made the first steps in a benchmark constructor kit by making it possible for pgBench to
runarbitrary workloads.  Someone could build on Greg's foundation by: 

a) building a more complex database model with random data generators, and
b) designing a wide series of queries designed to test specific performance problems, i.e, "large object reads",
"complexnested subqueries", "mass bulk correllated updates" 
c) finally creating scripts which generate benchmarks by choosing a database size and a "mix" of the query menu

This would give us kit which would be capable of testing performance regressions and improvements for PostgreSQL.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

Re: Performance

From

James Mansion

Date:

29 April 2011, 05:35:07

Robert Haas wrote:
> The server can and does measure hit rates for the PG buffer pool, but to my knowledge there is no clear-cut way for
PGto know whether read() is satisfied from the OS cache or a drive cache or the platter. 
>
>
Does the server know which IO it thinks is sequential, and which it
thinks is random?  Could it not time the IOs (perhaps optionally) and at
least keep some sort of statistics of the actual observed times?

It might not be appropriate for the server to attempt auto-tuning, but
it might be able to provide some information that can be used by a DBA
to make informed decisions.

James

Re: Performance

From

Greg Smith

Date:

29 April 2011, 15:56:18

James Mansion wrote:
> Does the server know which IO it thinks is sequential, and which it
> thinks is random?  Could it not time the IOs (perhaps optionally) and
> at least keep some sort of statistics of the actual observed times?

It makes some assumptions based on what the individual query nodes are
doing.  Sequential scans are obviously sequential; index lookupss
random; bitmap index scans random.

The "measure the I/O and determine cache state from latency profile" has
been tried, I believe it was Greg Stark who ran a good experiment of
that a few years ago.  Based on the difficulties of figuring out what
you're actually going to with that data, I don't think the idea will
ever go anywhere.  There are some really nasty feedback loops possible
in all these approaches for better modeling what's in cache, and this
one suffers the worst from that possibility.  If for example you
discover that accessing index blocks is slow, you might avoid using them
in favor of a measured fast sequential scan.  Once you've fallen into
that local minimum, you're stuck there.  Since you never access the
index blocks, they'll never get into RAM so that accessing them becomes
fast--even though doing that once might be much more efficient,
long-term, than avoiding the index.

There are also some severe query plan stability issues with this idea
beyond this.  The idea that your plan might vary based on execution
latency, that the system load going up can make query plans alter with
it, is terrifying for a production server.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Performance

From

Andy Colson

Date:

29 April 2011, 17:23:41

On 4/29/2011 1:55 PM, Greg Smith wrote:
> James Mansion wrote:
>> Does the server know which IO it thinks is sequential, and which it
>> thinks is random? Could it not time the IOs (perhaps optionally) and
>> at least keep some sort of statistics of the actual observed times?
>
> It makes some assumptions based on what the individual query nodes are
> doing. Sequential scans are obviously sequential; index lookupss random;
> bitmap index scans random.
>
> The "measure the I/O and determine cache state from latency profile" has
> been tried, I believe it was Greg Stark who ran a good experiment of
> that a few years ago. Based on the difficulties of figuring out what
> you're actually going to with that data, I don't think the idea will
> ever go anywhere. There are some really nasty feedback loops possible in
> all these approaches for better modeling what's in cache, and this one
> suffers the worst from that possibility. If for example you discover
> that accessing index blocks is slow, you might avoid using them in favor
> of a measured fast sequential scan. Once you've fallen into that local
> minimum, you're stuck there. Since you never access the index blocks,
> they'll never get into RAM so that accessing them becomes fast--even
> though doing that once might be much more efficient, long-term, than
> avoiding the index.
>
> There are also some severe query plan stability issues with this idea
> beyond this. The idea that your plan might vary based on execution
> latency, that the system load going up can make query plans alter with
> it, is terrifying for a production server.
>

How about if the stats were kept, but had no affect on plans, or
optimizer or anything else.

It would be a diag tool.  When someone wrote the list saying "AH! It
used the wrong index!".  You could say, "please post your config
settings, and the stats from 'select * from pg_stats_something'"

We (or, you really) could compare the seq_page_cost and random_page_cost
from the config to the stats collected by PG and determine they are way
off... and you should edit your config a little and restart PG.

-Andy

Re: Performance

From

James Mansion

Date:

29 April 2011, 17:27:31

Greg Smith wrote:
> There are also some severe query plan stability issues with this idea
> beyond this.  The idea that your plan might vary based on execution
> latency, that the system load going up can make query plans alter with
> it, is terrifying for a production server.
>
I thought I was clear that it should present some stats to the DBA, not
that it would try to auto-tune?  This thread started with a discussion
of appropriate tunings for random page cost vs sequential page cost I
believe,, based on some finger in the air based on total size vs
available disk cache.  And it was observed that on systems that have
very large databases but modest hot data, you can perform like a fully
cached system, for much of the time.

I'm just suggesting providing statistical information to the DBA which
will indicate whether the system has 'recently' been behaving like a
system that runs from buffer cache and/or subsystem caches, or one that
runs from disk platters, and what the actual observed latency difference
is.  It may well be that this varies with time of day or day of week.
Whether the actual latencies translate directly into the relative costs
is another matter.

Re: Performance

From

Greg Smith

Date:

29 April 2011, 19:00:41

James Mansion wrote:
> I thought I was clear that it should present some stats to the DBA,
> not that it would try to auto-tune?

You were.  But people are bound to make decisions about how to retune
their database based on that information.  The situation when doing
manual tuning isn't that much different, it just occurs more slowly, and
with the potential to not react at all if the data is incoherent.  That
might be better, but you have to assume that a naive person will just
follow suggestions on how to re-tune based on that the same way an
auto-tune process would.

I don't like this whole approach because it takes something the database
and DBA have no control over (read timing) and makes it a primary input
to the tuning model.  Plus, the overhead of collecting this data is big
relative to its potential value.

Anyway, how to collect this data is a separate problem from what should
be done with it in the optimizer.  I don't actually care about the
collection part very much; there are a bunch of approaches with various
trade-offs.  Deciding how to tell the optimizer about what's cached
already is the more important problem that needs to be solved before any
of this takes you somewhere useful, and focusing on the collection part
doesn't move that forward.  Trying to map the real world into the
currently exposed parameter set isn't a solvable problem.  We really
need cached_page_cost and random_page_cost, plus a way to model the
cached state per relation that doesn't fall easily into feedback loops.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Performance

From

Claudio Freire

Date:

29 April 2011, 19:08:31

On Fri, Apr 29, 2011 at 11:37 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Anyway, how to collect this data is a separate problem from what should be
> done with it in the optimizer.  I don't actually care about the collection
> part very much; there are a bunch of approaches with various trade-offs.
>  Deciding how to tell the optimizer about what's cached already is the more
> important problem that needs to be solved before any of this takes you
> somewhere useful, and focusing on the collection part doesn't move that
> forward.  Trying to map the real world into the currently exposed parameter
> set isn't a solvable problem.  We really need cached_page_cost and
> random_page_cost, plus a way to model the cached state per relation that
> doesn't fall easily into feedback loops.

This is valuable input...

I was already worried about feedback loops, and hearing that it has
been tried and resulted in them is invaluable.

From my experience, what really blows up in your face when your
servers are saturated, is the effective cache size. Postgres thinks an
index will fit into the cache, but it doesn't at times of high load,
meaning that, actually, a sequential scan would be orders of magnitude
better - if it's a "small enough table".

Perhaps just adjusting effective cache size would provide a good
enough benefit without the disastrous feedback loops?

I'll have to test that idea...

Re: Performance

From

Robert Haas

Date:

29 April 2011, 20:00:01

On Apr 29, 2011, at 10:25 AM, James Mansion <james@mansionfamily.plus.com> wrote:
> Robert Haas wrote:
>> The server can and does measure hit rates for the PG buffer pool, but to my knowledge there is no clear-cut way for
PGto know whether read() is satisfied from the OS cache or a drive cache or the platter. 
>>
>>
> Does the server know which IO it thinks is sequential, and which it thinks is random?

No. It models this in the optimizer, but the executor has no clue.  And sometimes we model I/O as partly random, partly
sequential,as in the case of heap fetches on a clustered index.  So the answer isn't even a Boolean. 

...Robert

Re: Performance

From

Robert Haas

Date:

29 April 2011, 20:02:46

On Apr 27, 2011, at 11:01 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> The patch may be simple, the testing not so much. I know that.
>
> What tools do we have to do that testing? There are lots, and all
> imply a lot of work. Is that work worth the trouble? Because if it
> is... why not work?
>
> I would propose a step in the right direction: a patch to compute and
> log periodical estimations of the main I/O tunables: random_page_cost,
> sequential_page_cost and effective_cache_size. Maybe per-tablespace.
> Evaluate the performance impact, and work from there.
>
> Because, probably just using those values as input to the optimizer
> won't work, because dbas will want a way to tune the optimizer,
> because the system may not be stable enough, even because even with
> accurate estimates for those values, the optimizer may not perform as
> expected. I mean, right now those values are tunables, not real
> metrics, so perhaps the optimizer won't respond well to real values.
>
> But having the ability to measure them without a serious performance
> impact is a step in the right direction, right?

Sure. It's not a real easy problem, but don't let that discourage you from working on it. Getting more eyeballs on
theseissues can only be a good thing. 

...Robert