Re: Tuning queries and distinct behaviour - Mailing list pgsql-general

From Stephan Szabo
Subject Re: Tuning queries and distinct behaviour
Date
Msg-id Pine.BSF.4.21.0101180919480.28645-100000@megazone23.bigpanda.com
Whole thread Raw
In response to Tuning queries and distinct behaviour  ("Gordan Bobic" <gordan@freeuk.com>)
List pgsql-general
On Thu, 18 Jan 2001, Gordan Bobic wrote:

> Is there a way to tune queries?
>
> I'm doing queries that join around 5-6 tables. All join fields are indexed
> either in hash (where tables are small enough and join is done on "="), or
> btree (big tables, not joined on "="). The tables have between several
> hundred and several tens of millions of records. The problem is that this
> tends to take a _VERY_ long time. I know that I'm asking for a bit much on
> such a huge task, but if I break the queries down manually into 10 simper
> ones that I could run manually with temporary tables, each of those would
> take a few seconds at most. The optimizer occasionally decides to do
> sequential scans, and this is probably what is killing the performance. Is
> there any way to give the optimizer hints? I suspect that it would help in
> a vast majority of cases if it looked ad the where clauses in views and
> selects on views and tried cutting down the working set through that, and
> then pruning down the rest as it went along. It just seems a bit strange
> that doing subqueries with temporary tables should be so much faster.

Is it deciding that a lot of rows will match when it does the sequential
scans?  I guess a copy of the schema and query would help.

> Doing VACUUM ANALYZE often helps, but not always. Is there any way to give
> the optimizer hints on how to speed up selects on views that do big joins,
> both inner and outer?
>
> Another question - I have to do a join on the "<" operation. Something
> like:
>
> SELECT * FROM Table1, Table2 WHERE Table1.Field1 < Table2.Field2 ORDER BY
> Table1.Field1 DESC, Table2.Field2 DESC;
>
> This will give me a very large set of records. However, I only really need
> a few of those records. I only want the highest Field1, Field2 combination
> records for some other field in Table1. Effectively, something like:
>
> CREATE VIEW SomeView AS SELECT * FROM Table1, Table2 WHERE Table1.Field1 <
> Table2.Field2 ORDER BY Table1.Field1 DESC, Table2.Field2 DESC;
>
> and then doing:
>
> SELECT DISTINCT ON (Table1.Field3) * FROM SomeView. I would hope that
> DISTINCT would pick the first record returned for each of the different
> occurences in SomeView.
>
> Unfortunately it doesn't. By having a quick scan at the returned records,
> it doesn't seem to pick them in any particular order. Is this the case? And
> is there an SQL standard that says which records should DISTINCT return
> (first, last, or arbitrary)? And is there a way to enforce this behaviour,
> just in case the default behaviour changes in the future?

DISTINCT ON is a non-standard "feature" and yes, without additional help,
it will not pick a particular row (well, it's probably the first one
it comes across).  I believe if you use order by you can get the
first/last by the metric you ordered by. So if you did Field3, Field1
DESC, Field2 DESC in the order by it should give you the highest Field1
valued row, and then higest Field2 for ties.


pgsql-general by date:

Previous
From: "Gordan Bobic"
Date:
Subject: Another optimizer question
Next
From: Adam Haberlach
Date:
Subject: Re: Help with query. (*)