Re: Massive performance issues - Mailing list pgsql-performance
From | Ron |
---|---|
Subject | Re: Massive performance issues |
Date | |
Msg-id | 6.2.3.4.0.20050901145054.021b1678@pop.earthlink.net Whole thread Raw |
In response to | Massive performance issues (Matthew Sackman <matthew@lshift.net>) |
Responses |
Re: Massive performance issues
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-performance |
This should be able to run _very_ fast. At 01:42 PM 9/1/2005, Matthew Sackman wrote: >Hi, > >I'm having performance issues with a table consisting of 2,043,133 rows. The >schema is: > >\d address > Table "public.address" > Column | Type | Modifiers >----------------------+------------------------+----------- > postcode_top | character varying(2) | not null > postcode_middle | character varying(4) | not null > postcode_bottom | character varying(7) | not null > postcode | character varying(10) | not null > property_type | character varying(15) | not null > sale_type | character varying(10) | not null > flat_extra | character varying(100) | not null > number | character varying(100) | not null > street | character varying(100) | not null > locality_1 | character varying(100) | not null > locality_2 | character varying(100) | not null > city | character varying(100) | not null > county | character varying(100) | not null >Indexes: > "address_city_index" btree (city) > "address_county_index" btree (county) > "address_locality_1_index" btree (locality_1) > "address_locality_2_index" btree (locality_2) > "address_pc_bottom_index" btree (postcode_bottom) > "address_pc_middle_index" btree (postcode_middle) > "address_pc_top_index" btree (postcode_top) > "address_pc_top_middle_bottom_index" btree (postcode_top, > postcode_middle, postcode_bottom) > "address_pc_top_middle_index" btree (postcode_top, postcode_middle) > "address_postcode_index" btree (postcode) > "address_property_type_index" btree (property_type) > "address_street_index" btree (street) > "street_prefix" btree (lower("substring"((street)::text, 1, 1))) IOW, each row takes ~1KB on HD. First suggestion: format your HD to use 8KB pages with 1KB segments. That'll out each row down on HD as an atomic unit. 8KB pages also "play nice" with pg. At 1KB per row, this table takes up ~2.1GB and should fit into RAM fairly easily on a decently configured DB server (my _laptop_ has 2GB of RAM after all...) Since you are using ~2.1GB for 2 years worth of data, 15 years worth should take no more than 2.1GB*7.5= 15.75GB. If you replace some of those 100 char fields with integers for code numbers and have an auxiliary table for each of those fields mapping the code numbers to the associated 100 char string, you should be able to shrink a row considerably. Your target is to have each row take <= 512B. Once a row fits into one 512B sector on HD, there's a no point in making it smaller unless you can shrink it enough to fit 2 rows into one sector (<= 256B). Once two rows fit into one sector, there's no point shrinking a row unless you can make 3 rows fit into a sector. Etc. Assuming each 100 char (eg 100B) field can be replaced with a 4B int, each row could be as small as 76B. That makes 85B per row the goal as it would allow you to fit 6 rows per 512B HD sector. So in the best case your table will be 12x smaller in terms of real HD space. Fitting one (or more) row(s) into one sector will cut down the real space used on HD for the table to ~7.88GB (or 1.32GB in the best case). Any such streamlining will make it faster to load, make the working set that needs to be RAM for best performance smaller, etc, etc. >This is with postgresql 7.4 running on linux 2.6.11 with a 3GHz P4 >and a SATA harddrive. Upgrade pg to 8.0.3 and make sure you have enough RAM for your real day to day load. Frankly, RAM is so cheap ($75-$150/GB), I'd just upgrade the machine to 4GB as a matter of course. P4's have PAE, so if your mainboard can hold it, put more than 4GB of RAM in if you find you need it. Since you are describing your workload as being predominantly reads, you can get away with far less HD capability as long as you crank up RAM high enough to hold the working set of the DB. The indications from the OP are that you may very well be able to hold the entire DB in RAM. That's a big win whenever you can achieve it. After these steps, there may still be performance issues that need attention, but the DBMS should be _much_ faster. Ron Peacetree
pgsql-performance by date: