Re: RAID arrays and performance - Mailing list pgsql-performance

From Mark Mielke
Subject Re: RAID arrays and performance
Date
Msg-id 4755D15F.7080400@mark.mielke.cc
Whole thread Raw
In response to Re: RAID arrays and performance  (James Mansion <james@mansionfamily.plus.com>)
Responses Re: RAID arrays and performance
List pgsql-performance
James Mansion wrote:
> Mark Mielke wrote:
>> At a minimum, this breaks your query into: 1) Preload all the index
>> pages you will need
> Isn't this fairly predictable - the planner has chosen the index so it
> will be operating
> on a bounded subset.
What is the bounded subset? It is bounded by the value. What value? You
need to access the first page before you know what the second page is.
PostgreSQL or the kernel should already have the hottest pages in
memory, so the value of doing async I/O is very likely the cooler pages
that are unique to the query. We don't know what the cooler pages are
until we follow three tree down.

>> , 2) Scan the index pages you needed
> Yes, and AIO helps when you can scan them in arbitrary order, as they
> are returned.

I don't think you are talking about searching a B-Tree, as the order is
important when searching, and search performance would be reduced if one
reads and scans more pages than necessary to map from the value to the
row. I presume you are talking about scanning the entire index. Where
"needed" means "all". Again, this only benefits a subset of the queries.

>> , 3) Preload all the table page you will need
> No - just request that they load.  You can do work as soon as any page
> is returned.

The difference between preload and handling async I/O in terms of
performance is debatable. Greg reports that async I/O on Linux sucks,
but posix_fadvise*() has substantial benefits. posix_fadvise*() is
preload not async I/O (he also reported that async I/O on Solaris seems
to work well). Being able to do work as the first page is available is a
micro-optimization as far as I am concerned at this point (one that may
not yet work on Linux), as the real benefit comes from utilizing all 12
disks in Matthew's case, not from guaranteeing that data is processed as
soon as possible.

>> , 4) Scan the table pages you needed.
> In the order that is most naturally returned by the disks.

Micro-optimization.

>> But do you really need the whole index?
> I don't think I suggested that.
>> What if you only need parts of the index, and this plan now reads the
>> whole index using async I/O "just in case" it is useful?
> Where did you get that from?

I get it from your presumption that you can know which pages of the
index to load in advance. The only way you can know which pages must be
loaded, is to know that you need to query them all. Unless you have some
way of speculating with some degree of accuracy which colder pages in
the index you will need, without looking at the index?

>> index scan into a regular B-Tree scan, which is difficult to perform
>> async I/O for, as you don't necessarily know which pages to read next.
> Most B-trees are not so deep.  It would generally be a win to retain
> interior nodes of indices in
> shared memory, even if leaf pages are not present.  In such a case, it
> is quite quick to establish
> which leaf pages must be demand loaded.

This is bogus. The less deep the B-Tree is, the less there should be any
requirement for async I/O. Hot index pages will be in cache.

> I'm not suggesting that Postgres indices are structured in a way that
> would support this sort
> of thing now.

In your hand waving, you have assumed that PostgreSQL B-Tree index might
need to be replaced? :-)

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>

pgsql-performance by date:

Previous
From: Julian Mehnle
Date:
Subject: Re: Bad query plans for queries on partitioned table
Next
From: James Mansion
Date:
Subject: Re: RAID arrays and performance