Thread: Decrease in performance with 7.3/optimizing a query

Decrease in performance with 7.3/optimizing a query

From
Harry
Date:
Hello,

I have a fairly simple table of about 250,000 rows as so:

 Column |  Type   | Modifiers
--------+---------+-----------
 seqid  | integer |
 set    | text    |
 contig | integer |
Indexes: cap3_set_key btree ("set")

I have occasion to perform the following query on it:

=> select set,max(contig) from cap3 where contig!=0 group by set order by set;

Previously, with Postgres 7.2.1, this query would take about 10 seconds - not
exactly speedy, but tolerable. However, since upgrading to 7.3, this query now
takes about 30 seconds.

I've looked through the release notes, tweaked postgresql.conf, and played
around with creating various indices but nothing I've done has made a
difference.

While it would be interesting to know why the decrease going from 7.2.1 to 7.3,
I'm more concerned about just getting this query to run in a more reasonable
time. Does anyone have any suggestions?

FWIW: "seqid" is a foreign key to another table of about 25,000 rows, "set" is
a string of about 20-30 characters of which there are currently about a dozen
distinct ones, and "contig" is a sequence of up to a few thousand. Rows are
unique on set,contig. The "explain" of the query looks like this:

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Aggregate  (cost=22656.46..23897.71 rows=16550 width=29)
   ->  Group  (cost=22656.46..23483.96 rows=165501 width=29)
         ->  Sort  (cost=22656.46..23070.21 rows=165501 width=29)
               Sort Key: "set"
               ->  Seq Scan on cap3  (cost=0.00..4650.46 rows=165501 width=29)
                     Filter: (contig <> 0)

The aforementioned upgrade was actually part of a larger upgrade of the entire
system (RH 6.2 -> RH 7.3), so the performance decrease may be the consequence
of something outside of Postgres (I'm hoping this isn't the case).

Hardware: 650 MHz PIII, 1 GB RAM

Thanks,
Harry

Re: Decrease in performance with 7.3/optimizing a query

From
Tom Lane
Date:
Harry <h3@x-maru.org> writes:
> Previously, with Postgres 7.2.1, this query would take about 10 seconds - not
> exactly speedy, but tolerable. However, since upgrading to 7.3, this query now
> takes about 30 seconds.

Don't suppose you have the query plan that was used in 7.2?  (Though I'd
expect it to be the same, offhand; and it's hard to see how to improve it
given your statements about the data.)

Are you sure 7.3 is configured the same as 7.2?  I'd wonder about
sort_mem and locale setting in particular.  You could be losing just on
the extra sort time needed for a non-C-locale sort.

            regards, tom lane

Re: Decrease in performance with 7.3/optimizing a query

From
Harry
Date:
On Sun, 22 Dec 2002 21:31:30 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Don't suppose you have the query plan that was used in 7.2?

First, a correction: the previous version of Postgres was 7.1.2 - I apologize
for transposing the minor version digits.

Unfortunately, that previous version no longer runs due to incompatibilities
with upgraded libraries - however, I can look into rebuilding it if you think
that's a worthwhile avenue to pursue.

> Are you sure 7.3 is configured the same as 7.2?

Comparing the current and previous conf files, the only changes to default I
had made were to sort_mem and shared_buffers (to 8192 and 256 respectively). I
tried increasing the sort_mem (to 65536, higher values made no additional
difference) and was able to shave off about 7-8 seconds. It makes me wonder how
it would've affected the query under 7.1.2.

> sort_mem and locale setting in particular.  You could be losing just on
> the extra sort time needed for a non-C-locale sort.

I don't see any setting described in postgresql.conf and playing around with
LANG in the environment didn't make any difference. Is there somewhere in
particular that I should be looking to set this? It sounds intriguing with
regards to my problem.

Thanks,
Harry

Re: Decrease in performance with 7.3/optimizing a query

From
Tom Lane
Date:
Harry <h3@x-maru.org> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ...  You could be losing just on
>> the extra sort time needed for a non-C-locale sort.

> I don't see any setting described in postgresql.conf and playing around with
> LANG in the environment didn't make any difference. Is there somewhere in
> particular that I should be looking to set this? It sounds intriguing with
> regards to my problem.

I'm thinking the same.  Unfortunately, you cannot alter PG's locale
settings short of an initdb and reload :-( --- the interesting values
are frozen by initdb based on what it sees in its environment.

Before 7.3 the default build switches didn't include --enable-locale,
and so people wouldn't get bit by locale-related performance lossage.
I'm guessing that you've been so bit.

            regards, tom lane

Re: Decrease in performance with 7.3/optimizing a query

From
Harry
Date:
On Mon, 23 Dec 2002 01:51:28 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I'm thinking the same.  Unfortunately, you cannot alter PG's locale
> settings short of an initdb and reload :-( --- the interesting values
> are frozen by initdb based on what it sees in its environment.

I just did a test where I did an initdb after setting LANG=C (previously en_US)
and my query times went from 19 seconds to 3. Woo hoo!

Before I go ahead and do this on my production box, are there any ramifications
I should be aware of? I noticed that though my sort was no longer case
insensitive, I was able to "order by lower(blah)" without a performance hit.
Any other gotchas like this?

Thanks,
-Harry

Re: Decrease in performance with 7.3/optimizing a query

From
Manfred Koizar
Date:
On Sun, 22 Dec 2002 13:59:26 -0800, Harry <h3@x-maru.org> wrote:
>=> select set,max(contig) from cap3 where contig!=0 group by set order by set;

>"set" is
>a string of about 20-30 characters of which there are currently about a dozen
>distinct ones, and "contig" is a sequence of up to a few thousand. Rows are
>unique on set,contig.

This seems to call for normalization:

    CREATE TABLE set (
        id int NOT NULL PRIMARY KEY,
        txt text NOT NULL
    );
    CREATE TABLE cap3 (
        seqid INT,
        setid INT REFERENCES set,
        contig INT,
        CONSTRAINT cap3_sc_uq UNIQUE (setid, contig)
    );

Postgres automatically creates the indices you need.  Given the low
number of set entries, you would write your query like

    SELECT txt, (SELECT contig
                   FROM cap3
                  WHERE cap3.setid = set.id AND contig != 0
                  ORDER BY setid desc, contig desc
                  LIMIT 1) AS maxcontig
      FROM set
     ORDER BY txt;

which should perform like

| Sort  (cost=1.34..1.37 rows=12 width=32)
|       (actual time=859.27..859.33 rows=12 loops=1)
|   Sort Key: txt
|   ->  Seq Scan on "set"  (cost=0.00..1.12 rows=12 width=32) (actual time=72.80..857.80 rows=12 loops=1)
|         SubPlan
|           ->  Limit  (cost=0.00..0.15 rows=1 width=8) (actual time=71.11..71.24 rows=1 loops=12)
|                 ->  Index Scan Backward using cap3_sc_uq on cap3 (cost=0.00..2470.74 rows=16383 width=8) (actual
time=70.99..71.10rows=2 loops=12) 
|                       Index Cond: (setid = $0)
|                       Filter: (contig <> 0)
| Total runtime: 860.82 msec

... on a K5, 105 MHz, 48 MB :-)

BTW, this is one of the rare cases where I recommed using a subselect
instead of a join.

Servus
 Manfred