Thread: Decrease in performance with 7.3/optimizing a query
Hello, I have a fairly simple table of about 250,000 rows as so: Column | Type | Modifiers --------+---------+----------- seqid | integer | set | text | contig | integer | Indexes: cap3_set_key btree ("set") I have occasion to perform the following query on it: => select set,max(contig) from cap3 where contig!=0 group by set order by set; Previously, with Postgres 7.2.1, this query would take about 10 seconds - not exactly speedy, but tolerable. However, since upgrading to 7.3, this query now takes about 30 seconds. I've looked through the release notes, tweaked postgresql.conf, and played around with creating various indices but nothing I've done has made a difference. While it would be interesting to know why the decrease going from 7.2.1 to 7.3, I'm more concerned about just getting this query to run in a more reasonable time. Does anyone have any suggestions? FWIW: "seqid" is a foreign key to another table of about 25,000 rows, "set" is a string of about 20-30 characters of which there are currently about a dozen distinct ones, and "contig" is a sequence of up to a few thousand. Rows are unique on set,contig. The "explain" of the query looks like this: QUERY PLAN ------------------------------------------------------------------------------- Aggregate (cost=22656.46..23897.71 rows=16550 width=29) -> Group (cost=22656.46..23483.96 rows=165501 width=29) -> Sort (cost=22656.46..23070.21 rows=165501 width=29) Sort Key: "set" -> Seq Scan on cap3 (cost=0.00..4650.46 rows=165501 width=29) Filter: (contig <> 0) The aforementioned upgrade was actually part of a larger upgrade of the entire system (RH 6.2 -> RH 7.3), so the performance decrease may be the consequence of something outside of Postgres (I'm hoping this isn't the case). Hardware: 650 MHz PIII, 1 GB RAM Thanks, Harry
Harry <h3@x-maru.org> writes: > Previously, with Postgres 7.2.1, this query would take about 10 seconds - not > exactly speedy, but tolerable. However, since upgrading to 7.3, this query now > takes about 30 seconds. Don't suppose you have the query plan that was used in 7.2? (Though I'd expect it to be the same, offhand; and it's hard to see how to improve it given your statements about the data.) Are you sure 7.3 is configured the same as 7.2? I'd wonder about sort_mem and locale setting in particular. You could be losing just on the extra sort time needed for a non-C-locale sort. regards, tom lane
On Sun, 22 Dec 2002 21:31:30 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Don't suppose you have the query plan that was used in 7.2? First, a correction: the previous version of Postgres was 7.1.2 - I apologize for transposing the minor version digits. Unfortunately, that previous version no longer runs due to incompatibilities with upgraded libraries - however, I can look into rebuilding it if you think that's a worthwhile avenue to pursue. > Are you sure 7.3 is configured the same as 7.2? Comparing the current and previous conf files, the only changes to default I had made were to sort_mem and shared_buffers (to 8192 and 256 respectively). I tried increasing the sort_mem (to 65536, higher values made no additional difference) and was able to shave off about 7-8 seconds. It makes me wonder how it would've affected the query under 7.1.2. > sort_mem and locale setting in particular. You could be losing just on > the extra sort time needed for a non-C-locale sort. I don't see any setting described in postgresql.conf and playing around with LANG in the environment didn't make any difference. Is there somewhere in particular that I should be looking to set this? It sounds intriguing with regards to my problem. Thanks, Harry
Harry <h3@x-maru.org> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... You could be losing just on >> the extra sort time needed for a non-C-locale sort. > I don't see any setting described in postgresql.conf and playing around with > LANG in the environment didn't make any difference. Is there somewhere in > particular that I should be looking to set this? It sounds intriguing with > regards to my problem. I'm thinking the same. Unfortunately, you cannot alter PG's locale settings short of an initdb and reload :-( --- the interesting values are frozen by initdb based on what it sees in its environment. Before 7.3 the default build switches didn't include --enable-locale, and so people wouldn't get bit by locale-related performance lossage. I'm guessing that you've been so bit. regards, tom lane
On Mon, 23 Dec 2002 01:51:28 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm thinking the same. Unfortunately, you cannot alter PG's locale > settings short of an initdb and reload :-( --- the interesting values > are frozen by initdb based on what it sees in its environment. I just did a test where I did an initdb after setting LANG=C (previously en_US) and my query times went from 19 seconds to 3. Woo hoo! Before I go ahead and do this on my production box, are there any ramifications I should be aware of? I noticed that though my sort was no longer case insensitive, I was able to "order by lower(blah)" without a performance hit. Any other gotchas like this? Thanks, -Harry
On Sun, 22 Dec 2002 13:59:26 -0800, Harry <h3@x-maru.org> wrote: >=> select set,max(contig) from cap3 where contig!=0 group by set order by set; >"set" is >a string of about 20-30 characters of which there are currently about a dozen >distinct ones, and "contig" is a sequence of up to a few thousand. Rows are >unique on set,contig. This seems to call for normalization: CREATE TABLE set ( id int NOT NULL PRIMARY KEY, txt text NOT NULL ); CREATE TABLE cap3 ( seqid INT, setid INT REFERENCES set, contig INT, CONSTRAINT cap3_sc_uq UNIQUE (setid, contig) ); Postgres automatically creates the indices you need. Given the low number of set entries, you would write your query like SELECT txt, (SELECT contig FROM cap3 WHERE cap3.setid = set.id AND contig != 0 ORDER BY setid desc, contig desc LIMIT 1) AS maxcontig FROM set ORDER BY txt; which should perform like | Sort (cost=1.34..1.37 rows=12 width=32) | (actual time=859.27..859.33 rows=12 loops=1) | Sort Key: txt | -> Seq Scan on "set" (cost=0.00..1.12 rows=12 width=32) (actual time=72.80..857.80 rows=12 loops=1) | SubPlan | -> Limit (cost=0.00..0.15 rows=1 width=8) (actual time=71.11..71.24 rows=1 loops=12) | -> Index Scan Backward using cap3_sc_uq on cap3 (cost=0.00..2470.74 rows=16383 width=8) (actual time=70.99..71.10rows=2 loops=12) | Index Cond: (setid = $0) | Filter: (contig <> 0) | Total runtime: 860.82 msec ... on a K5, 105 MHz, 48 MB :-) BTW, this is one of the rare cases where I recommed using a subselect instead of a join. Servus Manfred