Re: Air-traffic benchmark - Mailing list pgsql-performance

From Lefteris
Subject Re: Air-traffic benchmark
Date
Msg-id 852badbc1001070547p48b47175vf87efc473e3bd41b@mail.gmail.com
Whole thread Raw
In response to Re: Air-traffic benchmark  (Arjen van der Meijden <acmmailing@tweakers.net>)
Responses Re: Air-traffic benchmark
Re: Air-traffic benchmark
List pgsql-performance
Hi Arjen,

so I understand from all of you that you don't consider the use of 25k
for sorting to be the cause of the slowdown? Probably I am missing
something on the specific sort algorithm used by PG. My RAM does fill
up, mainly by file buffers from linux, but postgres process remains to
0.1% consumption of main memory. There is no way to force sort to use
say blocks of 128MB ? wouldn't that make a difference?

lefteris

p.s. i already started the analyze verbose again as Flavio suggested
and reset the parrameters, although I think some of Flavioo's
suggestions have to do with multiple users/queries and not 1 long
running query, like shared_buffers, or not?

On Thu, Jan 7, 2010 at 2:36 PM, Arjen van der Meijden
<acmmailing@tweakers.net> wrote:
> On 7-1-2010 13:38 Lefteris wrote:
>>
>> I decided to run the benchmark over postgres to get some more
>> experience and insights. Unfortunately, the query times I got from
>> postgres were not the expected ones:
>
> Why were they not expected? In the given scenario, column databases are
> having a huge advantage. Especially the given simple example is the type of
> query a column database *should* excel.
> You should, at the very least, compare the queries to MyISAM:
> http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/
>
> But unfortunately, that one also beats your postgresql-results.
>
>> The hardware characteristics are:
>> Platform Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz with 8GB RAM and
>> ample disk space (2x 500 GB SATA disk @ 7200 RPM as SW-RAID-0)
>
> Unfortunately, the blogpost fails to mention the disk-subsystem. So it may
> well be much faster than yours, although its not a new, big or fast server,
> so unless it has external storage, it shouldn't be too different for
> sequential scans.
>
>> SELECT "DayOfWeek", count(*) AS c FROM ontime WHERE "Year" BETWEEN
>> 2000 AND 2008 GROUP BY "DayOfWeek" ORDER BY c DESC;
>>
>> Reported query times are (in sec):
>> MonetDB 7.9s
>> InfoBright 12.13s
>> LucidDB 54.8s
>>
>> For pg-8.4.2  I got with 3 consecutive runs on the server:
>> 5m52.384s
>> 5m55.885s
>> 5m54.309s
>
> Maybe an index of the type 'year, dayofweek' will help for this query. But
> it'll have to scan about half the table any way, so a seq scan isn't a bad
> idea.
> In this case, a partitioned table with partitions per year and constraint
> exclusion enabled would help a bit more.
>
> Best regards,
>
> Arjen
>

pgsql-performance by date:

Previous
From: Arjen van der Meijden
Date:
Subject: Re: Air-traffic benchmark
Next
From: "A. Kretschmer"
Date:
Subject: Re: Air-traffic benchmark