Re: Air-traffic benchmark - Mailing list pgsql-performance

From Gurgel, Flavio
Subject Re: Air-traffic benchmark
Date
Msg-id 32176465.41521262879116337.JavaMail.root@mail.4linux.com.br
Whole thread Raw
In response to Air-traffic benchmark  (Lefteris <lsidir@gmail.com>)
Responses Re: Air-traffic benchmark
Re: Air-traffic benchmark
List pgsql-performance
----- "Lefteris" <lsidir@gmail.com> escreveu:
> > Did you ever try increasing shared_buffers to what was suggested
> (around
> > 4 GB) and see what happens (I didn't see it in your posts)?
>
> No I did not to that yet, mainly because I need the admin of the
> machine to change the shmmax of the kernel and also because I have no
> multiple queries running. Does Seq scan uses shared_buffers?

Having multiple queries running is *not* the only reason you need lots of shared_buffers.
Think of shared_buffers as a page cache, data in PostgreSQL is organized in pages.
If one single query execution had a step that brought a page to the buffercache, it's enough to increase another step
speedand change the execution plan, since the data access in memory is (usually) faster then disk. 

> > help performance very much on multiple exequtions of the same
> query.

This is also true.
This kind of test should, and will, give different results in subsequent executions.

> > From the description of the data ("...from years 1988 to 2009...")
> it
> > looks like the query for "between 2000 and 2009" pulls out about
> half of
> > the data. If an index could be used instead of seqscan, it could be
> > perhaps only 50% faster, which is still not very comparable to
> others.

The use of the index over seqscan has to be tested. I don't agree in 50% gain, since simple integers stored on B-Tree
havea huge possibility of beeing retrieved in the required order, and the discarded data will be discarder quickly too,
sothe gain has to be measured.  

I bet that an index scan will be a lot faster, but it's just a bet :)

> > The table is very wide, which is probably why the tested databases
> can
> > deal with it faster than PG. You could try and narrow the table
> down
> > (for instance: remove the Div* fields) to make the data more
> > "relational-like". In real life, speedups in this circumstances
> would
> > probably be gained by normalizing the data to make the basic table
> > smaller and easier to use with indexing.

Ugh. I don't think so. That's why indexes were invented. PostgreSQL is smart enough to "jump" over columns using byte
offsets.
A better option for this table is to partition it in year (or year/month) chunks.

45GB is not a so huge table compared to other ones I have seen before. I have systems where each partition is like 10
or20GB and data is very fast to access even whith aggregation queries. 

Flavio Henrique A. Gurgel
tel. 55-11-2125.4765
fax. 55-11-2125.4777
www.4linux.com.br
FREE SOFTWARE SOLUTIONS

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Massive table (500M rows) update nightmare
Next
From: Leo Mannhart
Date:
Subject: Re: Massive table (500M rows) update nightmare