Re: Cost limited statements RFC - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Cost limited statements RFC
Date
Msg-id CA+Tgmobii2+67BzC2O+PPsEBEChGEaEM1tQPwKvBi2TYD2p5WA@mail.gmail.com
Whole thread Raw
In response to Re: Cost limited statements RFC  (Greg Smith <greg@2ndQuadrant.com>)
Responses Re: Cost limited statements RFC
List pgsql-hackers
On Fri, Jun 7, 2013 at 11:35 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> I wasn't talking about disruption of the data that's in the buffer cache.
> The only time the scenario I was describing plays out is when the data is
> already in shared_buffers.  The concern is damage done to the CPU's data
> cache by this activity.  Right now you can't even reach 100MB/s of damage to
> your CPU caches in an autovacuum process.  Ripping out the page hit cost
> will eliminate that cap.  Autovacuum could introduce gigabytes per second of
> memory -> L1 cache transfers.  That's what all my details about memory
> bandwidth were trying to put into context.  I don't think it really matter
> much because the new bottleneck will be the processing speed of a single
> core, and that's still a decent cap to most people now.

OK, I see.  No objection here; not sure how others feel.

>> I think you're missing my point here, which is is that we shouldn't
>> have any such things as a "cost limit".  We should limit reads and
>> writes *completely separately*.  IMHO, there should be a limit on
>> reading, and a limit on dirtying data, and those two limits should not
>> be tied to any common underlying "cost limit".  If they are, they will
>> not actually enforce precisely the set limit, but some other composite
>> limit which will just be weird.
>
> I see the distinction you're making now, don't need a mock up to follow you.
> The main challenge of moving this way is that read and write rates never end
> up being completely disconnected from one another.  A read will only cost
> some fraction of what a write does, but they shouldn't be completely
> independent.
>
> Just because I'm comfortable doing 10MB/s of reads and 5MB/s of writes, I
> may not be happy with the server doing 9MB/s read + 5MB/s write=14MB/s of
> I/O in an implementation where they float independently.  It's certainly
> possible to disconnect the two like that, and people will be able to work
> something out anyway.  I personally would prefer not to lose some ability to
> specify how expensive read and write operations should be considered in
> relation to one another.

OK.  I was hoping that wasn't a distinction that we needed to
preserve, but if it is, it is.

The trouble, though, is that I think it makes it hard to structure the
GUCs in terms of units that are meaningful to the user.  One could
have something like io_rate_limit (measured in MB/s),
io_read_multiplier = 1.0, io_dirty_multiplier = 1.0, and I think that
would be reasonably clear.  By default io_rate_limit would govern the
sum of read activity and dirtying activity, but you could overweight
or underweight either of those two things by adjusting the multiplier.That's not a huge improvement in clarity, though,
especiallyif the
 
default values aren't anywhere close to 1.0.

If the limits aren't independent, I really *don't* think it's OK to
name them as if they are.  That just seems like a POLA violation.

> Related aside:  shared_buffers is becoming a decreasing fraction of total
> RAM each release, because it's stuck with this rough 8GB limit right now.
> As the OS cache becomes a larger multiple of the shared_buffers size, the
> expense of the average read is dropping.  Reads are getting more likely to
> be in the OS cache but not shared_buffers, which makes the average cost of
> any one read shrink.  But writes are as expensive as ever.
>
> Real-world tunings I'm doing now reflecting that, typically in servers with
>>128GB of RAM, have gone this far in that direction:
>
> vacuum_cost_page_hit = 0
> vacuum_cost_page_hit = 2
> vacuum_cost_page_hit = 20
>
> That's 4MB/s of writes, 40MB/s of reads, or some blended mix that considers
> writes 10X as expensive as reads.  The blend is a feature.

Fair enough, but note that limiting the two things independently, to
4MB/s and 40MB/s, would not be significantly different.  If the
workload is all reads or all writes, it won't be different at all.
The biggest difference would many or all writes also require reads, in
which case the write rate will drop from 4MB/s to perhaps as low as
3.6MB/s.  That's not a big difference.

In general, the benefits of the current system are greatest when the
costs of reads and writes are similar.  If reads and writes have equal
cost, it's clearly very important to have a blended cost.  But the
more the cost of writes dominates the costs of reads, the less it
really matters.  It sounds like we're already well on the way to a
situation where only the write cost really matters most of the time -
except for large scans that read a lot of data without changing it,
when only the read cost will matter.

I'm not really questioning your conclusion that we need to keep the
blended limit.  I just want to make sure we're keeping it for a good
reason, because I think it increases the user-perceived complexity
here quite a bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Hard limit on WAL space used (because PANIC sucks)
Next
From: Heikki Linnakangas
Date:
Subject: Re: Hard limit on WAL space used (because PANIC sucks)