Re: Massive performance issues - Mailing list pgsql-performance

From Sebastian Hennebrueder
Subject Re: Massive performance issues
Date
Msg-id 431766F1.6080501@laliluna.de
Whole thread Raw
In response to Massive performance issues  (Matthew Sackman <matthew@lshift.net>)
List pgsql-performance
Matthew Sackman schrieb:

>Hi,
>
>I'm having performance issues with a table consisting of 2,043,133 rows. The
>schema is:
>
>\d address
>                  Table "public.address"
>        Column        |          Type          | Modifiers
>----------------------+------------------------+-----------
> postcode_top         | character varying(2)   | not null
> postcode_middle      | character varying(4)   | not null
> postcode_bottom      | character varying(7)   | not null
> postcode             | character varying(10)  | not null
> property_type        | character varying(15)  | not null
> sale_type            | character varying(10)  | not null
> flat_extra           | character varying(100) | not null
> number               | character varying(100) | not null
> street               | character varying(100) | not null
> locality_1           | character varying(100) | not null
> locality_2           | character varying(100) | not null
> city                 | character varying(100) | not null
> county               | character varying(100) | not null
>Indexes:
>    "address_city_index" btree (city)
>    "address_county_index" btree (county)
>    "address_locality_1_index" btree (locality_1)
>    "address_locality_2_index" btree (locality_2)
>    "address_pc_bottom_index" btree (postcode_bottom)
>    "address_pc_middle_index" btree (postcode_middle)
>    "address_pc_top_index" btree (postcode_top)
>    "address_pc_top_middle_bottom_index" btree (postcode_top,
>                             postcode_middle, postcode_bottom)
>    "address_pc_top_middle_index" btree (postcode_top, postcode_middle)
>    "address_postcode_index" btree (postcode)
>    "address_property_type_index" btree (property_type)
>    "address_street_index" btree (street)
>    "street_prefix" btree (lower("substring"((street)::text, 1, 1)))
>
>This is with postgresql 7.4 running on linux 2.6.11 with a 3GHz P4 and a
>SATA harddrive.
>
>Queries such as:
>
>select locality_2 from address where locality_2 = 'Manchester';
>
>are taking 14 seconds to complete, and this is only 2 years worth of
>data - we will have up to 15 years (so over 15 million rows).
>
>Interestingly, doing:
>explain select locality_2 from address where locality_2 = 'Manchester';
>gives
>                           QUERY PLAN
>----------------------------------------------------------------
> Seq Scan on address  (cost=0.00..80677.16 rows=27923 width=12)
>   Filter: ((locality_2)::text = 'Manchester'::text)
>
>but:
>explain select locality_1 from address where locality_1 = 'Manchester';
>gives
>                           QUERY PLAN
>----------------------------------------------------------------
> Index Scan using address_locality_1_index on address
>(cost=0.00..69882.18 rows=17708 width=13)
>   Index Cond: ((locality_1)::text = 'Manchester'::text)
>
>Sadly, using the index makes things worse, the query taking 17 seconds.
>
>locality_1 has 16650 distinct values and locality_2 has 1156 distinct
>values.
>
>Whilst the locality_2 query is in progress, both the disk and the CPU
>are maxed out with the disk constantly reading at 60MB/s and the CPU
>rarely dropping under 100% load.
>
>With the locality_1 query in progress, the CPU is maxed out but the disk
>is reading at just 3MB/s.
>
>Obviously, to me, this is a problem, I need these queries to be under a
>second to complete. Is this unreasonable? What can I do to make this "go
>faster"? I've considered normalising the table but I can't work out
>whether the slowness is in dereferencing the pointers from the index
>into the table or in scanning the index in the first place. And
>normalising the table is going to cause much pain when inserting values
>and I'm not entirely sure if I see why normalising it should cause a
>massive performance improvement.
>
>
>
Just an idea: When you do not want to adapt your application to use a
normalized database you may push the data into normalized table using
triggers.
Example:
Add a table city with column id, name
and add a column city_id to your main table.
In this case you have redundant data in your main table (locality_1 and
city_id) but you could make queries to the city table when searching for
'Man%'

--
Best Regards / Viele Grüße

Sebastian Hennebrueder

----

http://www.laliluna.de

Tutorials for JSP, JavaServer Faces, Struts, Hibernate and EJB

Get support, education and consulting for these technologies.


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Massive performance issues
Next
From: Arjen van der Meijden
Date:
Subject: Re: Massive performance issues