Tuning queries and distinct behaviour - Mailing list pgsql-general

From Gordan Bobic
Subject Tuning queries and distinct behaviour
Date
Msg-id 002f01c08167$de6d6620$8000000a@localdomain
Whole thread Raw
In response to Re: Re: MySQL file system  (Marc SCHAEFER <schaefer@alphanet.ch>)
Responses Re: Tuning queries and distinct behaviour  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Re: Tuning queries and distinct behaviour  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Is there a way to tune queries?

I'm doing queries that join around 5-6 tables. All join fields are indexed
either in hash (where tables are small enough and join is done on "="), or
btree (big tables, not joined on "="). The tables have between several
hundred and several tens of millions of records. The problem is that this
tends to take a _VERY_ long time. I know that I'm asking for a bit much on
such a huge task, but if I break the queries down manually into 10 simper
ones that I could run manually with temporary tables, each of those would
take a few seconds at most. The optimizer occasionally decides to do
sequential scans, and this is probably what is killing the performance. Is
there any way to give the optimizer hints? I suspect that it would help in
a vast majority of cases if it looked ad the where clauses in views and
selects on views and tried cutting down the working set through that, and
then pruning down the rest as it went along. It just seems a bit strange
that doing subqueries with temporary tables should be so much faster.

Doing VACUUM ANALYZE often helps, but not always. Is there any way to give
the optimizer hints on how to speed up selects on views that do big joins,
both inner and outer?

Another question - I have to do a join on the "<" operation. Something
like:

SELECT * FROM Table1, Table2 WHERE Table1.Field1 < Table2.Field2 ORDER BY
Table1.Field1 DESC, Table2.Field2 DESC;

This will give me a very large set of records. However, I only really need
a few of those records. I only want the highest Field1, Field2 combination
records for some other field in Table1. Effectively, something like:

CREATE VIEW SomeView AS SELECT * FROM Table1, Table2 WHERE Table1.Field1 <
Table2.Field2 ORDER BY Table1.Field1 DESC, Table2.Field2 DESC;

and then doing:

SELECT DISTINCT ON (Table1.Field3) * FROM SomeView. I would hope that
DISTINCT would pick the first record returned for each of the different
occurences in SomeView.

Unfortunately it doesn't. By having a quick scan at the returned records,
it doesn't seem to pick them in any particular order. Is this the case? And
is there an SQL standard that says which records should DISTINCT return
(first, last, or arbitrary)? And is there a way to enforce this behaviour,
just in case the default behaviour changes in the future?

Alternatively, can anyone think of a solution to this problem?

Thanks.

Gordan


pgsql-general by date:

Previous
From: "Martin A. Marques"
Date:
Subject: compilation error
Next
From: Travis Bauer
Date:
Subject: Re: plperl and sqrt.