Home > mailing lists

Re: Massive performance issues - Mailing list pgsql-performance

From	Ron
Subject	Re: Massive performance issues
Date	September 1, 2005 17:07:48
Msg-id	6.2.3.4.0.20050901145054.021b1678@pop.earthlink.net Whole thread Raw
In response to	Massive performance issues (Matthew Sackman <matthew@lshift.net>)
Responses	Re: Massive performance issues
List	pgsql-performance

Tree view

This should be able to run _very_ fast.

At 01:42 PM 9/1/2005, Matthew Sackman wrote:
>Hi,
>
>I'm having performance issues with a table consisting of 2,043,133 rows. The
>schema is:
>
>\d address
>                   Table "public.address"
>         Column        |          Type          | Modifiers
>----------------------+------------------------+-----------
>  postcode_top         | character varying(2)   | not null
>  postcode_middle      | character varying(4)   | not null
>  postcode_bottom      | character varying(7)   | not null
>  postcode             | character varying(10)  | not null
>  property_type        | character varying(15)  | not null
>  sale_type            | character varying(10)  | not null
>  flat_extra           | character varying(100) | not null
>  number               | character varying(100) | not null
>  street               | character varying(100) | not null
>  locality_1           | character varying(100) | not null
>  locality_2           | character varying(100) | not null
>  city                 | character varying(100) | not null
>  county               | character varying(100) | not null
>Indexes:
>     "address_city_index" btree (city)
>     "address_county_index" btree (county)
>     "address_locality_1_index" btree (locality_1)
>     "address_locality_2_index" btree (locality_2)
>     "address_pc_bottom_index" btree (postcode_bottom)
>     "address_pc_middle_index" btree (postcode_middle)
>     "address_pc_top_index" btree (postcode_top)
>     "address_pc_top_middle_bottom_index" btree (postcode_top,
>                              postcode_middle, postcode_bottom)
>     "address_pc_top_middle_index" btree (postcode_top, postcode_middle)
>     "address_postcode_index" btree (postcode)
>     "address_property_type_index" btree (property_type)
>     "address_street_index" btree (street)
>     "street_prefix" btree (lower("substring"((street)::text, 1, 1)))

IOW, each row takes ~1KB on HD.  First suggestion: format your HD to
use 8KB pages with 1KB segments.  That'll out each row down on HD as
an atomic unit.  8KB pages also "play nice" with pg.

At 1KB per row, this table takes up ~2.1GB and should fit into RAM
fairly easily on a decently configured DB server (my _laptop_ has 2GB
of RAM after all...)

Since you are using ~2.1GB for 2 years worth of data, 15 years worth
should take no more than 2.1GB*7.5= 15.75GB.

If you replace some of those 100 char fields with integers for code
numbers and have an auxiliary table for each of those fields mapping
the code numbers to the associated 100 char string, you should be
able to shrink a row considerably.  Your target is to have each row
take <= 512B.  Once a row fits into one 512B sector on HD, there's a
no point in making it smaller unless you can shrink it enough to fit
2 rows into one sector (<= 256B).  Once two rows fit into one sector,
there's no point shrinking a row unless you can make 3 rows fit into
a sector.  Etc.

Assuming each 100 char (eg 100B) field can be replaced with a 4B int,
each row could be as small as 76B.  That makes 85B per row the goal
as it would allow you to fit 6 rows per 512B HD sector.  So in the
best case your table will be 12x smaller in terms of real HD space.

Fitting one (or more) row(s) into one sector will cut down the real
space used on HD for the table to ~7.88GB (or 1.32GB in the best
case).  Any such streamlining will make it faster to load, make the
working set that needs to be RAM for best performance smaller, etc, etc.

>This is with postgresql 7.4 running on linux 2.6.11 with a 3GHz P4
>and a SATA harddrive.

Upgrade pg to 8.0.3 and make sure you have enough RAM for your real
day to day load.  Frankly, RAM is so cheap ($75-$150/GB), I'd just
upgrade the machine to 4GB as a matter of course.  P4's have PAE, so
if your mainboard can hold it, put more than 4GB of RAM in if you
find you need it.

Since you are describing your workload as being predominantly reads,
you can get away with far less HD capability as long as you crank up
RAM high enough to hold the working set of the DB.  The indications
from the OP are that you may very well be able to hold the entire DB
in RAM.  That's a big win whenever you can achieve it.

After these steps, there may still be performance issues that need
attention, but the DBMS should be _much_ faster.

Ron Peacetree

pgsql-performance by date:

From: "Merlin Moncure"
Date: 01 September 2005, 17:02:45
Subject: Re: Massive performance issues

From: "Steinar H. Gunderson"
Date: 01 September 2005, 17:10:06
Subject: Re: Massive performance issues

Re: Massive performance issues - Mailing list pgsql-performance

Previous

Next