Re: Massive performance issues - Mailing list pgsql-performance
From | Matthew Sackman |
---|---|
Subject | Re: Massive performance issues |
Date | |
Msg-id | 20050901210626.GC7131@pongo.lshift.net Whole thread Raw |
In response to | Re: Massive performance issues ("Steinar H. Gunderson" <sgunderson@bigfoot.com>) |
Responses |
Re: Massive performance issues
|
List | pgsql-performance |
On Thu, Sep 01, 2005 at 10:09:30PM +0200, Steinar H. Gunderson wrote: > > "address_city_index" btree (city) > > "address_county_index" btree (county) > > "address_locality_1_index" btree (locality_1) > > "address_locality_2_index" btree (locality_2) > > "address_pc_bottom_index" btree (postcode_bottom) > > "address_pc_middle_index" btree (postcode_middle) > > "address_pc_top_index" btree (postcode_top) > > "address_pc_top_middle_bottom_index" btree (postcode_top, > > postcode_middle, postcode_bottom) > > "address_pc_top_middle_index" btree (postcode_top, postcode_middle) > > "address_postcode_index" btree (postcode) > > "address_property_type_index" btree (property_type) > > "address_street_index" btree (street) > > "street_prefix" btree (lower("substring"((street)::text, 1, 1))) > > Wow, that's quite a lof of indexes... but your problem isn't reported as > being in insert/update/delete. Hah, well now that you mention it. Basically, 100,000 rows come in in a bulk import every month and the only way I can get it to complete in any sane time frame at all is to drop the indexes, do the import and then recreate the indexes. But that's something that I'm OK with - the imports don't have to be that fast and whilst important, it's not *the* critical path. Selection from the database is, hence the indexes. > > This is with postgresql 7.4 running on linux 2.6.11 with a 3GHz P4 and a > > SATA harddrive. > > 8.0 or 8.1 might help you some -- better (and more!) disks will probably help > a _lot_. Ok, I did try 8.0 when I started this and found that the server bind parameters (both via DBD::Pg (with pg_prepare_server => 1) and via JDBC (various versions I tried)) failed - the parameters were clearly not being substituted. This was Postgresql 8.0 from Debian unstable. That was a couple of weeks ago and I've not been back to check whether its been fixed. Anyway, because of these problems I dropped back to 7.4. > > Queries such as: > > > > select locality_2 from address where locality_2 = 'Manchester'; > > > > are taking 14 seconds to complete, and this is only 2 years worth of > > data - we will have up to 15 years (so over 15 million rows). > > As Tom pointed out; you're effectively doing random searches here, and using > CLUSTER might help. Normalizing your data to get smaller rows (and avoid > possibly costly string comparisons if your strcoll() is slow) will probably > also help. Ok, so you're saying that joining the address table into an address_city table (the obvious normalization) will help here? The locale settings in postgresql.conf all have en_GB and a \l shows encoding of LATIN1. So I don't think I've set anything to UTF8 or such like. > > I need to get to the stage where I can run queries such as: > > select street, locality_1, locality_2, city from address > > where (city = 'Nottingham' or locality_2 = 'Nottingham' > > or locality_1 = 'Nottingham') > > and upper(substring(street from 1 for 1)) = 'A' > > group by street, locality_1, locality_2, city > > order by street > > limit 20 offset 0 > > This might be a lot quicker than pulling all the records like in your example > queries... Yes, that certainly does seem to be the case - around 4 seconds. But I need it to be 10 times faster (or thereabouts) otherwise I have big problems! Many thanks for all the advice so far. Matthew
pgsql-performance by date: