Re: select on 1milion register = 6s - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: select on 1milion register = 6s
Date
Msg-id dcc563d10707282151m7a2de11ex5f593cfc19b98095@mail.gmail.com
Whole thread Raw
In response to RES: select on 1milion register = 6s  ("Bruno Rodrigues Siqueira" <bruno@ravnus.com>)
Responses Re: select on 1milion register = 6s  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: select on 1milion register = 6s  (Jan Dittmer <jdi@l4x.org>)
List pgsql-performance
On 7/28/07, Bruno Rodrigues Siqueira <bruno@ravnus.com> wrote:
>
> Ok.
> QUERY PLAN
> Sort  (cost=11449.37..11449.40 rows=119 width=8) (actual
> time=14431.537..14431.538 rows=2 loops=1)
>   Sort Key: to_char(data_encerramento, 'yyyy-mm'::text)
>   ->  HashAggregate  (cost=11448.79..11448.96 rows=119 width=8) (actual
> time=14431.521..14431.523 rows=2 loops=1)
>         ->  Index Scan using detalhamento_bas_idx3003 on detalhamento_bas
> (cost=0.00..11442.95 rows=11679 width=8) (actual time=0.135..12719.155
> rows=2335819 loops=1)

See the row mismatch there?  It expects about 11k rows, gets back 2.3
million.  That's a pretty big misestimate.  Have you run analyze
recently on this table?

Is there a reason you're doing this:


to_char( data_encerramento ,'yyyy-mm')
between   '2006-12' and  '2007-01'

when you should be able to just do:

data_encerramento between   '2006-12-01' and  '2007-01-31'
?  that should be able to use good estimates from analyze.  My guess
is the planner is making a bad guess because of the way you're
handling the dates.

>         SERVER
>                   DELL PowerEdge 2950
>                   XEON Quad-Core 3.0Ghz
>                   4Gb RAM
>                   Linux CentOS 5.0 64-bits
>      Postgres 8.1.4

>      Postgresql.conf
> # - Memory -
>
> shared_buffers = 50000                  # min 16 or max_connections*2, 8KB

400 Meg is kind of low for a server with 4 G ram.  25% is more
reasonable (i.e. 125000 buffers)

> work_mem = 3145728                      # min 64, size in KB
> maintenance_work_mem = 4194304          # min 1024, size in KB

Whoa nellie!  thats ~ 3 Gig of work mem, and 4 gig of maintenance work
mem.  In a machine with 4 gig ram, that's a recipe for disaster.

Something more reasonable would be 128000 (~125Meg) for each since
you've limited your machine to 10 connections you should be ok.
setting work_mem too high can run your machine out of memory and into
a swap storm that will kill performance.

> fsync = off                             # turns forced synchronization on or
> off

So, the data in this database isn't important?  Cause that's what
fsync = off says to me.  Better to buy yourself a nice battery backed
caching RAID controller than turn off fsync.

> effective_cache_size = 41943040         # typically 8KB each

And you're machine has 343,604,830,208 bytes of memory available for
caching?  Seems a little high to me.

> random_page_cost = 1                    # units are one sequential page
> fetch

Seldom if ever is it a good idea to bonk the planner on the head with
random_page_cost=1.  setting it to 1.2 ot 1.4 is low enough, but 1.4
to 2.0 is more realistic.

> stats_start_collector = off
> #stats_command_string = off
> #stats_block_level = off
> #stats_row_level = off
> #stats_reset_on_server_start = off

I think you need stats_row_level on for autovacuum, but I'm not 100% sure.

Let us know what happens after fixing these settings and running
analyze and running explain analyze, with possible changes to the
query.

pgsql-performance by date:

Previous
From: "Bruno Rodrigues Siqueira"
Date:
Subject: RES: select on 1milion register = 6s
Next
From: "Jay Kang"
Date:
Subject: Questions on Tags table schema