Lefteris wrote:
> So we all agree that the problem is on the scans:)
>
> So the next question is why changing shared memory buffers will fix
> that? i only have one session with one connection, do I have like many
> reader workers or something?
>
I wouldn't expect it to. Large sequential scans like this one are
optimized in PostgreSQL to only use up a small portion of the
shared_buffers cache. Allocating more RAM to the database won't improve
the fact that you're spending the whole time waiting for physical I/O to
happen very much.
What might help is increasing effective_cache_size a lot though, because
there you might discover the database switching to all new sorts of
plans for some of these queries. But, again, that doesn't impact the
situation where a sequential scan is the only approach.
I have this whole data set on my PC already and have been trying to find
time to get it loaded and start my own tests here, it is a quite
interesting set of information. Can you tell me what you had to do in
order to get it running in PostgreSQL? If you made any customizations
there, I'd like to get a copy of them. Would save me some time and help
me get to where I could give suggestions out if I had a "pgdumpall
--schema-only" dump from your database for example, or however you got
the schema into there, and the set of PostgreSQL-compatible queries
you're using.
By the way: if anybody else wants to join in, here's a script that
generates a script to download the whole data set:
#!/usr/bin/env python
for y in range(1988,2010):
for m in range(1,13):
print "wget --limit-rate=100k
http://www.transtats.bts.gov/Download/On_Time_On_Time_Performance_%s_%s.zip"
% (y,m)
It's 3.8GB of download that uncompresses into 46GB of CSV data, which is
why I put the rate limiter on there--kept it from clogging my entire
Internet connection.
--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.com