Re: Air-traffic benchmark - Mailing list pgsql-performance

From Matthew Wakeling
Subject Re: Air-traffic benchmark
Date
Msg-id alpine.DEB.2.00.1001071608530.25000@aragorn.flymine.org
Whole thread Raw
In response to Re: Air-traffic benchmark  ("Gurgel, Flavio" <flavio@4linux.com.br>)
List pgsql-performance
On Thu, 7 Jan 2010, Gurgel, Flavio wrote:
> If one single query execution had a step that brought a page to the
> buffercache, it's enough to increase another step speed and change the
> execution plan, since the data access in memory is (usually) faster then
> disk.

Postgres does not change a query plan according to the shared_buffers
setting. It does not anticipate one step contributing to another step in
this way. It does however make use of the effective_cache_size setting to
estimate this effect, and that does affect the planner.

> The use of the index over seqscan has to be tested. I don't agree in 50%
> gain, since simple integers stored on B-Tree have a huge possibility of
> beeing retrieved in the required order, and the discarded data will be
> discarder quickly too, so the gain has to be measured.
>
> I bet that an index scan will be a lot faster, but it's just a bet :)

In a situation like this, the opposite will be true. If you were accessing
a very small part of a table, say to order by a field with a small limit,
then an index can be very useful by providing the results in the correct
order. However, in this case, almost the entire table has to be read.
Changing the order in which it is read will mean that the disc access is
no longer sequential, which will slow things down, not speed them up.
The Postgres planner isn't stupid (mostly), there is probably a good
reason why it isn't using an index scan.

>> The table is very wide, which is probably why the tested databases can
>> deal with it faster than PG. You could try and narrow the table down
>> (for instance: remove the Div* fields) to make the data more
>> "relational-like". In real life, speedups in this circumstances would
>> probably be gained by normalizing the data to make the basic table
>> smaller and easier to use with indexing.
>
> Ugh. I don't think so. That's why indexes were invented. PostgreSQL is
> smart enough to "jump" over columns using byte offsets.
> A better option for this table is to partition it in year (or year/month) chunks.

Postgres (mostly) stores the columns for a row together with a row, so
what you say is completely wrong. Postgres does not "jump" over columns
using byte offsets in this way. The index references a row in a page on
disc, and that page is fetched separately in order to retrieve the row.
The expensive part is physically moving the disc head to the right part of
the disc in order to fetch the correct page from the disc - jumping over
columns will not help with that at all.

Reducing the width of the table will greatly improve the performance of a
sequential scan, as it will reduce the size of the table on disc, and
therefore the time taken to read the entire table sequentially.

Moreover, your suggestion of partitioning the table may not help much with
this query. It will turn a single sequential scan into a UNION of many
tables, which may be harder for the planner to plan. Also, for queries
that access small parts of the table, indexes will help more than
partitioning will.

Partitioning will help most in the case where you want to summarise a
single year's data. Not really otherwise.

Matthew

--
 Q: What's the difference between ignorance and apathy?
 A: I don't know, and I don't care.

pgsql-performance by date:

Previous
From: Lefteris
Date:
Subject: Re: Air-traffic benchmark
Next
From: Craig James
Date:
Subject: Re: Air-traffic benchmark