Re: Improving Query - Mailing list pgsql-performance

From Ketema Harris
Subject Re: Improving Query
Date
Msg-id 17A10AA7-73A1-4543-93C3-DAABA352C149@gmail.com
Whole thread Raw
In response to Re: Improving Query  (Richard Huxton <dev@archonet.com>)
List pgsql-performance
On Oct 30, 2007, at 9:23 AM, Richard Huxton wrote:

> Ketema wrote:
>> I have the following query that is a part of a function:
>
> Yikes! Difficult to get a clear view of what this query is doing.
It seems complicated because you only have a small subset of the
schema these tables tie into.
Be happy to share the whole thing, if it is needed.
>
> OK, I'm assuming you're vacuumed and analysed on all these tables...
Yes.  Auto-vacuum is on and do a Full vacuuum every 2 days.
>
>
>> My concern is with the sort step that takes 15 seconds by itself:
>> ->  Sort  (cost=1235567017.53..1238002161.29 rows=974057502
>> width=290)
>> (actual time=16576.997..16577.513 rows=3366 loops=1)
>
> That's taking hardly any time, the startup time is 16576.997
> already. Of course, the row estimate is *way* out of line.
OK. I misread the plan and took start up time as the time it took to
perform operation.  Thanks for the link to explain analyze.
>
> If you look here (where the explain is a bit easier to see)
> http://explain-analyze.info/query_plans/1258-query-plan-224
>
> The two main things to look at seem to be the nested loops near the
> top and a few lines down the materialise (cost=326...
>
> These two nested loops seem to be pushing the row estimates wildly
> out of reality. They also consume much of the time.
>
> The immediate thing that leaps out here is that you are trying to
> join an int to an array of ints. Why are you using this setup
> rather than a separate table?
I see what you are talking about.  When I initially used this set up
it was because I wanted to avoid a table that had a ton of rows in it
that I knew I would have to join to often.  So I made a column that
holds on average 4 or 5 ints representing "products" on a particular
"order". I did not realize that using a function in the join would be
worse that simply having a large table.
>
>> How can I improve this step?
>> Things I have thought about:
>> 1)Creating indexes on the aggregates...Found out this can't be done.
>
> Nope - not sure what it would mean in any case.
My initial thought was the counts were causing the slow up.  THis is
not the issue though as you have shown.
>
>> 2)Create Views of the counts and the sub select...is this any faster
>> as the view is executed at run time anyway?
>
> Might make the query easier to write, won't make it faster. Not
> without materialised views which are the fancy name for #3...
>
>> 3)Create actual tables of the sub select and aggregates...How would
>> this be maintained to ensure it was always accurate?
>
> Triggers.
Because of the use of this system I may take this route as I think it
will be less changes.
>
>> 4)Increasing hardware resources.  Currently box is on a single
>> processor amd64 with 8Gb of RAM.  below are the settings for resource
>> usage.
>> shared_buffers = 65536
>> temp_buffers = 5000
>> max_prepared_transactions = 2000
>
> ????
These are settings out of postgresql.conf  Currently systctl.conf is
set to kernel.shmmax = 805306368
connections are at 300 and I usually have about 200 connections open.
>
>> work_mem = 131072
>> maintenance_work_mem = 512000
>
> Can't say about these without knowing whether you've got only one
> connection or 100.
>
>> max_stack_depth = 7168
>> max_fsm_pages = 160000
>> max_fsm_relations = 4000
>> The only function of this box if for Pg, so I do not mind it using
>> every last drop of ram and resources that it can.
>> 5)Upgrade version of pg..currently is running 8.1.4
>
> Well every version gets better at planning, so it can't hurt.
At one point I did go to 8.2.3 on a dev box and performance was
horrible.  Have not had opportunity to see how to make
postgresql.conf file in 8.2 match settings in 8.1 as some things have
changed.
>
> --
>   Richard Huxton
>   Archonet Ltd


pgsql-performance by date:

Previous
From: Michael Glaesemann
Date:
Subject: Re: Improving Query
Next
From: Ketema Harris
Date:
Subject: Re: Improving Query