Thread: Scaling SELECT:s with the number of disks on a stripe

Scaling SELECT:s with the number of disks on a stripe

From
Peter Schuller
Date:
Hello,

I am looking to use PostgreSQL for storing some very simple flat data
mostly in a single table. The amount of data will be in the hundreds
of gigabytes range. Each row is on the order of 100-300 bytes in size;
in other words, small enough that I am expecting disk I/O to be seek
bound (even if PostgreSQL reads a full pg page at a time, since a page
is significantly smaller than the stripe size of the volume).

The only important performance characteristics are insertion/deletion
performance, and the performance of trivial SELECT queries whose WHERE
clause tests equality on one of the columns.

Other than absolute performance, an important goal is to be able to
scale fairly linearly with the number of underlying disk drives. We
are fully willing to take a disk seek per item selected, as long as it
scales.

To this end I have been doing some benchmarking to see whether the
plan is going to be feasable. On a 12 disk hardware stripe, insertion
performance does scale somewhat with concurrent inserters. However, I
am seeing surprising effects with SELECT:s: a single selecter
generates the same amount of disk activity as two concurrent selecters
(I was easily expecting about twice).

The query is simple:

SELECT * FROM test WHERE value = 'xxx' LIMIT 1000;

No ordering, no joins, no nothing. Selecting concurrently with two
different values of 'xxx' yields the same amount of disk activity
(never any significant CPU activity). Note that the total amount of
data is too large to fit in RAM (> 500 million rows), and the number
of distinct values in the value column is 10000. The column in the
WHERE clause is indexed.

So my first question is - why am I not seeing this scaling? The
absolute amount of disk activity with a single selecter is consistent
with what I would expect from a SINGLE disk, which is completely
expected since I never thought PostgreSQL would introduce disk I/O
concurrency on its own. But this means that adding additional readers
doing random-access reads *should* scale very well with 12 underlying
disks in a stripe.

(Note that I have seen fairly similar results on other RAID variants
too, including software RAID5 (yes yes I know), in addition to the
hardware stripe.)

These tests have been done Linux 2.6.19.3 and PostgreSQL 8.1.

Secondly, I am seeing a query plan switch after a certain
threshold. Observe:

perftest=# explain select * from test where val='7433' limit 1000;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Limit  (cost=0.00..4016.50 rows=1000 width=143)
   ->  Index Scan using test_val_ix on test  (cost=0.00..206620.88 rows=51443 width=143)
         Index Cond: ((val)::text = '7433'::text)
(3 rows)

Now increasing to a limit of 10000:

perftest=# explain select * from test where val='7433' limit 10000;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Limit  (cost=360.05..38393.36 rows=10000 width=143)
   ->  Bitmap Heap Scan on test  (cost=360.05..196014.82 rows=51443 width=143)
         Recheck Cond: ((val)::text = '7433'::text)
         ->  Bitmap Index Scan on test_val_ix  (cost=0.00..360.05 rows=51443 width=0)
               Index Cond: ((val)::text = '7433'::text)
(5 rows)

The interesting part is that the latter query is entirely CPU bound
(no disk I/O at all) for an extended period of time before even
beginning to read data from disk. And when it *does* start performing
disk I/O, the performance is about the same as for the other case. In
other words, the change in query plan seems to do nothing but add
overhead.

What is the bitmap heap scan supposed to be doing that would increase
performance above a "seek once per matching row" plan? I haven't been
able to Google my way to what the intended benefit is of a heap scan
vs. a plain index scan.

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


Attachment

Re: Scaling SELECT:s with the number of disks on a stripe

From
"Marc Mamin"
Date:
Hello Peter,

If you are dealing with timed data or similar, you may consider to
partition your table(s).

In order to deal with large data, I've built a "logical" partition
system,
whereas the target partition is defined by the date of my data (the date
is part of the filenames that I import...).

Instead of using the Postgres partitioning framework, I keep the tables
boundaries within a refererence table.
Then I've built a function that takes the different query parameters as
argument (column list, where clause...).
This functions retrieve the list of tables to query from my reference
table and build the final query, binding
the different subqueries from each partition with "UNION ALL".
It also requires an additional reference table that describes the table
columns (data type, behaviour , e.g. groupable,summable...)


This allowed me to replace many "delete" with "drop table" statements,
whis is probably the main advantage of the solution.


The biggest issue was the implementation time ;-) but I'm really happy
with the resulting performances.

HTH,

Marc





-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Peter
Schuller
Sent: Friday, March 30, 2007 7:17 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Scaling SELECT:s with the number of disks on a stripe

Hello,

I am looking to use PostgreSQL for storing some very simple flat data
mostly in a single table. The amount of data will be in the hundreds of
gigabytes range. Each row is on the order of 100-300 bytes in size; in
other words, small enough that I am expecting disk I/O to be seek bound
(even if PostgreSQL reads a full pg page at a time, since a page is
significantly smaller than the stripe size of the volume).

The only important performance characteristics are insertion/deletion
performance, and the performance of trivial SELECT queries whose WHERE
clause tests equality on one of the columns.

Other than absolute performance, an important goal is to be able to
scale fairly linearly with the number of underlying disk drives. We are
fully willing to take a disk seek per item selected, as long as it
scales.

To this end I have been doing some benchmarking to see whether the plan
is going to be feasable. On a 12 disk hardware stripe, insertion
performance does scale somewhat with concurrent inserters. However, I am
seeing surprising effects with SELECT:s: a single selecter generates the
same amount of disk activity as two concurrent selecters (I was easily
expecting about twice).

The query is simple:

SELECT * FROM test WHERE value = 'xxx' LIMIT 1000;

No ordering, no joins, no nothing. Selecting concurrently with two
different values of 'xxx' yields the same amount of disk activity (never
any significant CPU activity). Note that the total amount of data is too
large to fit in RAM (> 500 million rows), and the number of distinct
values in the value column is 10000. The column in the WHERE clause is
indexed.

So my first question is - why am I not seeing this scaling? The absolute
amount of disk activity with a single selecter is consistent with what I
would expect from a SINGLE disk, which is completely expected since I
never thought PostgreSQL would introduce disk I/O concurrency on its
own. But this means that adding additional readers doing random-access
reads *should* scale very well with 12 underlying disks in a stripe.

(Note that I have seen fairly similar results on other RAID variants
too, including software RAID5 (yes yes I know), in addition to the
hardware stripe.)

These tests have been done Linux 2.6.19.3 and PostgreSQL 8.1.

Secondly, I am seeing a query plan switch after a certain threshold.
Observe:

perftest=# explain select * from test where val='7433' limit 1000;
                                       QUERY PLAN

------------------------------------------------------------------------
-----------------
 Limit  (cost=0.00..4016.50 rows=1000 width=143)
   ->  Index Scan using test_val_ix on test  (cost=0.00..206620.88
rows=51443 width=143)
         Index Cond: ((val)::text = '7433'::text)
(3 rows)

Now increasing to a limit of 10000:

perftest=# explain select * from test where val='7433' limit 10000;
                                      QUERY PLAN

------------------------------------------------------------------------
--------------
 Limit  (cost=360.05..38393.36 rows=10000 width=143)
   ->  Bitmap Heap Scan on test  (cost=360.05..196014.82 rows=51443
width=143)
         Recheck Cond: ((val)::text = '7433'::text)
         ->  Bitmap Index Scan on test_val_ix  (cost=0.00..360.05
rows=51443 width=0)
               Index Cond: ((val)::text = '7433'::text)
(5 rows)

The interesting part is that the latter query is entirely CPU bound (no
disk I/O at all) for an extended period of time before even beginning to
read data from disk. And when it *does* start performing disk I/O, the
performance is about the same as for the other case. In other words, the
change in query plan seems to do nothing but add overhead.

What is the bitmap heap scan supposed to be doing that would increase
performance above a "seek once per matching row" plan? I haven't been
able to Google my way to what the intended benefit is of a heap scan vs.
a plain index scan.

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


Re: Scaling SELECT:s with the number of disks on a stripe

From
Andrew - Supernews
Date:
On 2007-03-30, Peter Schuller <peter.schuller@infidyne.com> wrote:
[...]
> Other than absolute performance, an important goal is to be able to
> scale fairly linearly with the number of underlying disk drives. We
> are fully willing to take a disk seek per item selected, as long as it
> scales.
>
> To this end I have been doing some benchmarking to see whether the
> plan is going to be feasable. On a 12 disk hardware stripe, insertion
> performance does scale somewhat with concurrent inserters. However, I
> am seeing surprising effects with SELECT:s: a single selecter
> generates the same amount of disk activity as two concurrent selecters
> (I was easily expecting about twice).
>
> The query is simple:
>
> SELECT * FROM test WHERE value = 'xxx' LIMIT 1000;

I tested this on a 14-way software raid10 on freebsd, using pg 8.1.6, and
couldn't reproduce anything like it. With one client I get about 200 disk
requests per second, scaling almost exactly linearly for the first 5 or so
clients, as expected. At 14 clients it was down to about 150 reqs/sec per
client, but the total throughput continued to increase with additional
concurrency up to about 60 clients, giving about 3600 reqs/sec (260 per
disk, which is about right for 10krpm scsi disks under highly concurrent
random loads).

> So my first question is - why am I not seeing this scaling?

A good question. Have you tried testing the disks directly? e.g. create
some huge files, and run a few concurrent random readers on them? That
would test the array and the filesystem without involving postgres.

> Secondly, I am seeing a query plan switch after a certain
> threshold. Observe:
[snip index scan changing to bitmapscan]

This is entirely expected. With the larger row count, it is more likely
(or so the planner estimates) that rows will need to be fetched from
adjacent or at least nearby blocks, thus a plan which fetches rows in
physical table order rather than index order would be expected to be
superior. The planner takes into account the estimated startup cost and
per-row cost when planning LIMIT queries; therefore it is no surprise
that for larger limits, it switches to a plan with a higher startup cost
but lower per-row cost.

> The interesting part is that the latter query is entirely CPU bound
> (no disk I/O at all) for an extended period of time before even
> beginning to read data from disk.

Most likely your index is small enough that large parts of it will be
cached in RAM, so that the scan of the index to build the bitmap does
not need to hit the disk much if at all.

> And when it *does* start performing
> disk I/O, the performance is about the same as for the other case. In
> other words, the change in query plan seems to do nothing but add
> overhead.

This is going to depend quite a bit on the physical ordering of the data
relative to the indexed values.

> What is the bitmap heap scan supposed to be doing that would increase
> performance above a "seek once per matching row" plan? I haven't been
> able to Google my way to what the intended benefit is of a heap scan
> vs. a plain index scan.

The bitmap scan visits the heap in heap order, rather than index order,
thus enabling it to take advantage of prefetch and other sequential-read
optimizations (in the underlying OS and disk subsystem, rather than in
pg itself).

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Re: Scaling SELECT:s with the number of disks on a stripe

From
Peter Schuller
Date:
Hello,

> If you are dealing with timed data or similar, you may consider to
> partition your table(s).

Unfortunately this is not the case; the insertion is more or less
random (not quite, but for the purpose of this problem it is).

Thanks for the pointers though. That is sure to be useful in some
other context down the road.

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


Attachment

Re: Scaling SELECT:s with the number of disks on a stripe

From
Peter Schuller
Date:
Hello,

> > SELECT * FROM test WHERE value = 'xxx' LIMIT 1000;
>
> I tested this on a 14-way software raid10 on freebsd, using pg 8.1.6, and
> couldn't reproduce anything like it. With one client I get about 200 disk
> requests per second, scaling almost exactly linearly for the first 5 or so
> clients, as expected. At 14 clients it was down to about 150 reqs/sec per
> client, but the total throughput continued to increase with additional
> concurrency up to about 60 clients, giving about 3600 reqs/sec (260 per
> disk, which is about right for 10krpm scsi disks under highly concurrent
> random loads).

Ok. That is very intersting; so there is definitely nothing
fundamental in PG that prevents the scaling (even if on FreeBSD).

> A good question. Have you tried testing the disks directly? e.g. create
> some huge files, and run a few concurrent random readers on them? That
> would test the array and the filesystem without involving postgres.

I have confirmed that I am seeing expected performance for random
short and highly concurrent reads in one large (> 200 GB) file. The
I/O is done using libaio however, so depending on implementation I
suppose the I/O scheduling behavior of the fs/raid driver might be
affected compared to having a number of concurrent threads doing
synchronous reads. I will try to confirm performance in a way that
will more closely match PostgreSQL's behavior.

I have to say though that I will be pretty surprised if the
performance is not matched in that test.

Is there any chance there is some operation system conditional code in
pg itself that might affect this behavior? Some kind of purposeful
serialization of I/O for example (even if that sounds like an
extremely strange thing to do)?

> This is entirely expected. With the larger row count, it is more likely
> (or so the planner estimates) that rows will need to be fetched from
> adjacent or at least nearby blocks, thus a plan which fetches rows in
> physical table order rather than index order would be expected to be
> superior. The planner takes into account the estimated startup cost and
> per-row cost when planning LIMIT queries; therefore it is no surprise
> that for larger limits, it switches to a plan with a higher startup cost
> but lower per-row cost.

Roger that, makes sense. I had misunderstood the meaning of the heap
scan.

> Most likely your index is small enough that large parts of it will be
> cached in RAM, so that the scan of the index to build the bitmap does
> not need to hit the disk much if at all.

Even so however, several seconds of CPU activity to scan the index for
a few tens of thousands of entries sounds a bit excessive. Or does it
not? Because at that level, the CPU bound period alone is approaching
the time it would take to seek for each entry instead. But then I
presume the amount of work is similar/the same for the other case,
except it's being done at the beginning of the query instead of before
each seek.

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


Attachment

Re: Scaling SELECT:s with the number of disks on a stripe

From
Andrew - Supernews
Date:
On 2007-04-02, Peter Schuller <peter.schuller@infidyne.com> wrote:
> I have confirmed that I am seeing expected performance for random
> short and highly concurrent reads in one large (> 200 GB) file. The
> I/O is done using libaio however, so depending on implementation I
> suppose the I/O scheduling behavior of the fs/raid driver might be
> affected compared to having a number of concurrent threads doing
> synchronous reads. I will try to confirm performance in a way that
> will more closely match PostgreSQL's behavior.
>
> I have to say though that I will be pretty surprised if the
> performance is not matched in that test.

The next question then is whether anything in your postgres configuration
is preventing it getting useful performance from the OS. What settings
have you changed in postgresql.conf? Are you using any unusual settings
within the OS itself?

> Is there any chance there is some operation system conditional code in
> pg itself that might affect this behavior?

Unlikely.

>> Most likely your index is small enough that large parts of it will be
>> cached in RAM, so that the scan of the index to build the bitmap does
>> not need to hit the disk much if at all.
>
> Even so however, several seconds of CPU activity to scan the index for
> a few tens of thousands of entries sounds a bit excessive. Or does it
> not? Because at that level, the CPU bound period alone is approaching
> the time it would take to seek for each entry instead. But then I
> presume the amount of work is similar/the same for the other case,
> except it's being done at the beginning of the query instead of before
> each seek.

You're forgetting the LIMIT clause. For the straight index scan, the
query aborts when the LIMIT is reached having scanned only the specified
number of index rows (plus any index entries that turned out to be dead
in the heap). For the bitmap scan case, the limit can be applied only after
the heap scan is under way, therefore the index scan to build the bitmap
will need to scan ~50k rows, not the 10k specified in the limit, so the
amount of time spent scanning the index is 50 times larger than in the
straight index scan case.

However, I do suspect you have a problem here somewhere, because in my
tests the time taken to do the bitmap index scan on 50k rows, with the
index in cache, is on the order of 30ms (where the data is cached in
shared_buffers) to 60ms (where the data is cached by the OS). That's on
a 2.8GHz xeon.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Re: Scaling SELECT:s with the number of disks on a stripe

From
Peter Schuller
Date:
Hello,

> The next question then is whether anything in your postgres configuration
> is preventing it getting useful performance from the OS. What settings
> have you changed in postgresql.conf?

The only options not commented out are the following (it's not even
tweaked for buffer sizes and such, since in this case I am not
interested in things like sort performance and cache locality other
than as an afterthought):

hba_file = '/etc/postgresql/8.1/main/pg_hba.conf'
ident_file = '/etc/postgresql/8.1/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/8.1-main.pid'
listen_addresses = '*'
port = 5432
max_connections = 100
unix_socket_directory = '/var/run/postgresql'
ssl = true
shared_buffers = 1000
log_line_prefix = '%t '
stats_command_string = on
stats_row_level = on
autovacuum = on
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'

> Are you using any unusual settings within the OS itself?

No. It's a pretty standard kernel. The only local tweaking done is
enabling/disabling various things; there are no special patches used
or attempts to create a minimalistic kernel or anything like that.

> You're forgetting the LIMIT clause. For the straight index scan, the
> query aborts when the LIMIT is reached having scanned only the specified
> number of index rows (plus any index entries that turned out to be dead
> in the heap). For the bitmap scan case, the limit can be applied only after
> the heap scan is under way, therefore the index scan to build the bitmap
> will need to scan ~50k rows, not the 10k specified in the limit, so the
> amount of time spent scanning the index is 50 times larger than in the
> straight index scan case.

Ok - makes sense that it has to scan the entire subset of the index
for the value in question. I will have to tweak the CPU/disk costs
settings (which I have, on purpose, not yet done).

> However, I do suspect you have a problem here somewhere, because in my
> tests the time taken to do the bitmap index scan on 50k rows, with the
> index in cache, is on the order of 30ms (where the data is cached in
> shared_buffers) to 60ms (where the data is cached by the OS). That's on
> a 2.8GHz xeon.

This is on a machine with 2.33GHz xeons and I wasn't trying to
exaggerate. I timed it and it is CPU bound (in userspace; next to no
system CPU usage at all) for about 15 seconds for the case of
selecting with a limit of 10000.

Given that there is no disk activity I can't imagine any buffer sizes
or such affecting this other than userspace vs. kernelspace CPU
concerns (since obviously the data being worked on is in RAM). Or am I
missing something?

It is worth noting that the SELECT of fewer entries is entirely disk
bound; there is almost no CPU usage whatsoever. Even taking the
cumulative CPU usage into account (gut feeling calculation, nothing
scientific) and multiplying by 50 you are nowhere near 15 seconds of
CPU boundness. So it is indeed strange.

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


Attachment

Re: Scaling SELECT:s with the number of disks on a stripe

From
Dave Cramer
Date:
On 4-Apr-07, at 2:01 AM, Peter Schuller wrote:

> Hello,
>
>> The next question then is whether anything in your postgres
>> configuration
>> is preventing it getting useful performance from the OS. What
>> settings
>> have you changed in postgresql.conf?
>
> The only options not commented out are the following (it's not even
> tweaked for buffer sizes and such, since in this case I am not
> interested in things like sort performance and cache locality other
> than as an afterthought):
>
> hba_file = '/etc/postgresql/8.1/main/pg_hba.conf'
> ident_file = '/etc/postgresql/8.1/main/pg_ident.conf'
> external_pid_file = '/var/run/postgresql/8.1-main.pid'
> listen_addresses = '*'
> port = 5432
> max_connections = 100
> unix_socket_directory = '/var/run/postgresql'
> ssl = true
> shared_buffers = 1000
This is way too low, if this 8.x then set it to 25% of available
memory, and effective cache should be 3x that
> log_line_prefix = '%t '
> stats_command_string = on
> stats_row_level = on
> autovacuum = on
> lc_messages = 'C'
> lc_monetary = 'C'
> lc_numeric = 'C'
> lc_time = 'C'
>
>> Are you using any unusual settings within the OS itself?
>
> No. It's a pretty standard kernel. The only local tweaking done is
> enabling/disabling various things; there are no special patches used
> or attempts to create a minimalistic kernel or anything like that.
>
>> You're forgetting the LIMIT clause. For the straight index scan, the
>> query aborts when the LIMIT is reached having scanned only the
>> specified
>> number of index rows (plus any index entries that turned out to be
>> dead
>> in the heap). For the bitmap scan case, the limit can be applied
>> only after
>> the heap scan is under way, therefore the index scan to build the
>> bitmap
>> will need to scan ~50k rows, not the 10k specified in the limit,
>> so the
>> amount of time spent scanning the index is 50 times larger than in
>> the
>> straight index scan case.
>
> Ok - makes sense that it has to scan the entire subset of the index
> for the value in question. I will have to tweak the CPU/disk costs
> settings (which I have, on purpose, not yet done).
>
>> However, I do suspect you have a problem here somewhere, because
>> in my
>> tests the time taken to do the bitmap index scan on 50k rows, with
>> the
>> index in cache, is on the order of 30ms (where the data is cached in
>> shared_buffers) to 60ms (where the data is cached by the OS).
>> That's on
>> a 2.8GHz xeon.
>
> This is on a machine with 2.33GHz xeons and I wasn't trying to
> exaggerate. I timed it and it is CPU bound (in userspace; next to no
> system CPU usage at all) for about 15 seconds for the case of
> selecting with a limit of 10000.
>
> Given that there is no disk activity I can't imagine any buffer sizes
> or such affecting this other than userspace vs. kernelspace CPU
> concerns (since obviously the data being worked on is in RAM). Or am I
> missing something?
>
> It is worth noting that the SELECT of fewer entries is entirely disk
> bound; there is almost no CPU usage whatsoever. Even taking the
> cumulative CPU usage into account (gut feeling calculation, nothing
> scientific) and multiplying by 50 you are nowhere near 15 seconds of
> CPU boundness. So it is indeed strange.
>
> --
> / Peter Schuller
>
> PGP userID: 0xE9758B7D or 'Peter Schuller
> <peter.schuller@infidyne.com>'
> Key retrieval: Send an E-Mail to getpgpkey@scode.org
> E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org
>


Re: Scaling SELECT:s with the number of disks on a stripe

From
Andrew - Supernews
Date:
On 2007-04-04, Peter Schuller <peter.schuller@infidyne.com> wrote:
>> The next question then is whether anything in your postgres configuration
>> is preventing it getting useful performance from the OS. What settings
>> have you changed in postgresql.conf?
>
> The only options not commented out are the following (it's not even
> tweaked for buffer sizes and such, since in this case I am not
> interested in things like sort performance and cache locality other
> than as an afterthought):
>
> shared_buffers = 1000

I'd always do benchmarks with a realistic value of shared_buffers (i.e.
much higher than that).

Another thought that comes to mind is that the bitmap index scan does
depend on the size of work_mem.

Try increasing your shared_buffers to a reasonable working value (say
10%-15% of RAM - I was testing on a machine with 4GB of RAM, using a
shared_buffers setting of 50000), and increase work_mem to 16364, and
see if there are any noticable changes in behaviour.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Re: Scaling SELECT:s with the number of disks on a stripe

From
Peter Schuller
Date:
Hello,

> I'd always do benchmarks with a realistic value of shared_buffers (i.e.
> much higher than that).
>
> Another thought that comes to mind is that the bitmap index scan does
> depend on the size of work_mem.
>
> Try increasing your shared_buffers to a reasonable working value (say
> 10%-15% of RAM - I was testing on a machine with 4GB of RAM, using a
> shared_buffers setting of 50000), and increase work_mem to 16364, and
> see if there are any noticable changes in behaviour.

Increasing the buffer size and work_mem did have a significant
effect. I can understand it in the case of the heap scan, but I am
still surprised at the index scan. Could pg be serializing the entire
query as a result of insufficient buffers/work_mem to satisfy multiple
concurrent queries?

With both turned up, not only is the heap scan no longer visibly CPU
bound, I am seeing some nice scaling in terms of disk I/O. I have not
yet benchmarked to the point of being able to say whether it's
entirely linear, but it certainly seems to at least be approaching the
ballpark.

Thank you for the help! I guess I made a bad call not tweaking
this. My thinking was that I explicitly did not want to turn it up so
that I could benchmark the raw performance of disk I/O, rather than
having things be cached in memory more than it would already be. But
apparantly it had other side-effects I did not consider.

Thanks again,

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


Attachment