Re: Slow count(*) again... - Mailing list pgsql-performance

From Craig Ringer
Subject Re: Slow count(*) again...
Date
Msg-id 4CB167CA.9070109@postnewspapers.com.au
Whole thread Raw
In response to Re: Slow count(*) again...  (Mladen Gogala <mladen.gogala@vmsinfo.com>)
Responses Re: Slow count(*) again...
List pgsql-performance
On 10/10/2010 9:54 AM, Mladen Gogala wrote:

> Unfortunately, the problem is in the rather primitive way PostgreSQL
> does I/O. It didn't change in 9.0 so there is nothing you could gain by
> upgrading. If you execute strace -o /tmp/pg.out -e read <PID of the
> sequential scan process> and inspect the file /tmp/pg.out when the query
> finishes, you will notice a gazillion of read requests, all of them 8192
> bytes in size. That means that PostgreSQL is reading the table block by
> block, without any merging of the requests.

I'd be really interested in any measurements you've done to determine
the cost of this over doing reads in larger chunks. If they're properly
detailed and thought out, the -hackers list is likely to be interested
as well.

The Linux kernel, at least, does request merging (and splitting, and
merging, and more splitting) along the request path, and I'd personally
expect that most of the cost of 8k requests would be in the increased
number of system calls, buffer copies, etc required. Measurements
demonstrating or contradicting this would be good to see.

It's worth being aware that there are memory costs to doing larger
reads, especially when you have many backends each of which want to
allocate a larger buffer for reading. If you can use a chunk of
shared_buffers as the direct destination for the read that's OK, but
otherwise you're up for (1mb-8kb)*num_backends extra memory use on I/O
buffers that could otherwise be used as shared_buffers or OS cache.

Async I/O, too, has costs.

 > PostgreSQL is in
> dire need of something similar and it wouldn't even be that hard to
> implement.

I'd really like to see both those assertions backed with data or patches ;-)

Personally, I know just enough about how PG's I/O path works to suspect
that "not that hard to implement" is probably a little ...
over-optimistic. Sure, it's not that hard to implement in a new program
with no wired-in architectural and design choices; that doesn't mean
it's easy to retrofit onto existing code, especially a bunch of
co-operating processes with their own buffer management.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Slow count(*) again...
Next
From: Craig Ringer
Date:
Subject: Re: large dataset with write vs read clients