Re: Massive performance issues - Mailing list pgsql-performance
From | Sebastian Hennebrueder |
---|---|
Subject | Re: Massive performance issues |
Date | |
Msg-id | 431766F1.6080501@laliluna.de Whole thread Raw |
In response to | Massive performance issues (Matthew Sackman <matthew@lshift.net>) |
List | pgsql-performance |
Matthew Sackman schrieb: >Hi, > >I'm having performance issues with a table consisting of 2,043,133 rows. The >schema is: > >\d address > Table "public.address" > Column | Type | Modifiers >----------------------+------------------------+----------- > postcode_top | character varying(2) | not null > postcode_middle | character varying(4) | not null > postcode_bottom | character varying(7) | not null > postcode | character varying(10) | not null > property_type | character varying(15) | not null > sale_type | character varying(10) | not null > flat_extra | character varying(100) | not null > number | character varying(100) | not null > street | character varying(100) | not null > locality_1 | character varying(100) | not null > locality_2 | character varying(100) | not null > city | character varying(100) | not null > county | character varying(100) | not null >Indexes: > "address_city_index" btree (city) > "address_county_index" btree (county) > "address_locality_1_index" btree (locality_1) > "address_locality_2_index" btree (locality_2) > "address_pc_bottom_index" btree (postcode_bottom) > "address_pc_middle_index" btree (postcode_middle) > "address_pc_top_index" btree (postcode_top) > "address_pc_top_middle_bottom_index" btree (postcode_top, > postcode_middle, postcode_bottom) > "address_pc_top_middle_index" btree (postcode_top, postcode_middle) > "address_postcode_index" btree (postcode) > "address_property_type_index" btree (property_type) > "address_street_index" btree (street) > "street_prefix" btree (lower("substring"((street)::text, 1, 1))) > >This is with postgresql 7.4 running on linux 2.6.11 with a 3GHz P4 and a >SATA harddrive. > >Queries such as: > >select locality_2 from address where locality_2 = 'Manchester'; > >are taking 14 seconds to complete, and this is only 2 years worth of >data - we will have up to 15 years (so over 15 million rows). > >Interestingly, doing: >explain select locality_2 from address where locality_2 = 'Manchester'; >gives > QUERY PLAN >---------------------------------------------------------------- > Seq Scan on address (cost=0.00..80677.16 rows=27923 width=12) > Filter: ((locality_2)::text = 'Manchester'::text) > >but: >explain select locality_1 from address where locality_1 = 'Manchester'; >gives > QUERY PLAN >---------------------------------------------------------------- > Index Scan using address_locality_1_index on address >(cost=0.00..69882.18 rows=17708 width=13) > Index Cond: ((locality_1)::text = 'Manchester'::text) > >Sadly, using the index makes things worse, the query taking 17 seconds. > >locality_1 has 16650 distinct values and locality_2 has 1156 distinct >values. > >Whilst the locality_2 query is in progress, both the disk and the CPU >are maxed out with the disk constantly reading at 60MB/s and the CPU >rarely dropping under 100% load. > >With the locality_1 query in progress, the CPU is maxed out but the disk >is reading at just 3MB/s. > >Obviously, to me, this is a problem, I need these queries to be under a >second to complete. Is this unreasonable? What can I do to make this "go >faster"? I've considered normalising the table but I can't work out >whether the slowness is in dereferencing the pointers from the index >into the table or in scanning the index in the first place. And >normalising the table is going to cause much pain when inserting values >and I'm not entirely sure if I see why normalising it should cause a >massive performance improvement. > > > Just an idea: When you do not want to adapt your application to use a normalized database you may push the data into normalized table using triggers. Example: Add a table city with column id, name and add a column city_id to your main table. In this case you have redundant data in your main table (locality_1 and city_id) but you could make queries to the city table when searching for 'Man%' -- Best Regards / Viele Grüße Sebastian Hennebrueder ---- http://www.laliluna.de Tutorials for JSP, JavaServer Faces, Struts, Hibernate and EJB Get support, education and consulting for these technologies.
pgsql-performance by date: