Re: Performance of query - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Performance of query
Date
Msg-id CAMkU=1yjnokhcJx9eEeec50AS6iYYJ_vtXywxwg1-F=5AqGeBA@mail.gmail.com
Whole thread Raw
In response to Performance of query  (Cindy Makarowsky <cindymakarowsky@gmail.com>)
Responses Re: Performance of query  (Misa Simic <misa.simic@gmail.com>)
List pgsql-performance
On Friday, March 22, 2013, Cindy Makarowsky wrote:
I have two tables in Postgres 9.2 on a Linux server with 8GB of RAM.  The first table has 60 million records:

You have over 40GB of data in that table, so there is no way you are going to get it into 8GB RAM without some major reorganization.
 
  company character(35),
  address character(35),
  city character(20),
  contact character(35),
  title character(20),


All of those fixed width fields are probably taking up needless space, and in your case, space is time.  Varchar would probably be better.  (And probably longer maximum lengths as well.  Most people don't need more than 35 characters for their addresses, but the people who do are going to be cheesed off when you inform them that you deem their address to be unreasonable.  Unless your mailing labels only hold 35 characters)

 
When I run this query:

 select state.state, count(table1.id)  from state,table1 where table1.state = state.state group by state.state


The join to the "state" table is not necessary.  Between the foreign key and the primary key, you know that every state exists, and that every state exists only once.  But, that will not solve your problem, as the join to the state table is not where the time goes.

 
"        ->  Seq Scan on busbase  (cost=0.00..6378172.28 rows=55402728 width=7) (actual time=0.004..250046.673 rows=60057057 loops=1)"

Assuming that your cost parameters are all default, this means you have 
(6378172.28 - 0.01* 55402728)/1  = 5.8e6 pages, or 44.4 GB of table.  That is, less than 10 tuples per page. 

Tightly packed, you should be able to hold over 30 tuples per page.  You are probably not vacuuming aggressively enough, or you were not doing so in the past and never did a "vacuum full" to reclaim the bloated space.

In any event, your sequential scan is running at 181 MB/s.  Is this what you would expect given your IO hardware?

 

I've tried playing around with the settings in the config file for shared_buffers, work_mem, etc restarting Postgres each time and nothing seems to help.

How fast do you think it should run?  How fast do you need it to run?  This seems like the type of query that would get run once per financial quarter, or maybe once per day on off-peak times.  

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Index usage for tstzrange?
Next
From: Misa Simic
Date:
Subject: Re: Performance of query