Thread: Slow join query optimisation?

Slow join query optimisation?

From

Douglas Thomson

Date:

26 November 1999, 21:13:38

I am trying to execute a moderately complex join to combine student
enrolment data with the various tables that define the foreign keys
used by the enrolment table. Here is a sample SELECT statement:
    SELECT
        users.name,
        subjects.canonical,
        periods.description,
        courses.description,
        campuses.description,
        studyModes.description
    FROM
        t25 agents,
        t27 offerings,
        t30 enrolments,
        t26 subjects,
        t29 periods,
        t28 courses,
        t23 campuses,
        t21 studyModes,
        t24 users
    WHERE
        agents.agent='/Subjects/1999/sem1/GCO/1812' AND
        offerings.masterPeriod=agents.period AND
        offerings.masterSubject=agents.subject AND
        enrolments.offeringPeriod=offerings.period AND
        enrolments.offeringSubject=offerings.subject AND
        subjects.subject=enrolments.offeringSubject AND
        periods.period=enrolments.offeringPeriod AND
        courses.course=enrolments.course AND
        campuses.campus=enrolments.campus AND
        studyModes.studyMode=enrolments.studyMode AND
        users.studentID=enrolments.studentID;

This seems to execute unusually slowly, regardless of how little data
may be in the tables. The time is all in the backend. It takes almost
a second (PII 333MHz, 128M RAM, repeated test so no disk I/O involved)
even with all empty tables.

However, as I just slightly simplify the query (by removing all
reference to one table) the time drops:
    11 tables:    6.97s
    10 tables:    2.47s
     9 tables:    0.99s     (this is the SELECT shown above)
     8 tables:    0.39s
     7 tables:    0.19s
     6 tables:    0.12s
     5 tables:    0.09s

It seems to make no difference which tables I remove; the issue seems
to be the total number of tables that I am joining. If it matters, all
the tables have indexes on the combinations of attributes used in my
WHERE clause.

My question for the list: Is there something in the query optimiser
that will take a time that is something like exponential on the number
of tables being joined?

And if so, is there any way to get around it (other than using
several smaller selects and combining the data myself or
denormalising my tables)?

Doug.

P.S. I am using version 6.5.1, but I couldn't see anything in the
     HISTORY for 6.5.3 that sounded likely to affect this particular
     issue...

Re: [INTERFACES] Slow join query optimisation?

From

Tom Lane

Date:

27 November 1999, 00:38:41

Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes:
> My question for the list: Is there something in the query optimiser
> that will take a time that is something like exponential on the number
> of tables being joined?

Yes.  See the manual's discussion of genetic query optimization.  You
might try setting the GEQO threshold to something less than 10.

FWIW, the time you are looking at here is strictly planning time and
will not increase when you actually put data in the tables.  Unless
you foresee fairly small final tables, the planning time is not likely
to be a significant factor in this range of query sizes.

Still, if the purpose of the join is just to substitute descriptions
for IDs, you'd be well advised to consider doing it via table lookup
on the frontend side, assuming your frontend is coded in a language
that makes that reasonable to do.  The Postgres optimizer cannot
help but waste many cycles per query reverse-engineering your intent,
even assuming that it ultimately comes up with the best possible
query plan for the join...
        regards, tom lane

Re: [INTERFACES] Slow join query optimisation?

From

Douglas Thomson

Date:

01 December 1999, 22:28:57

> Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes:
> > My question for the list: Is there something in the query optimiser
> > that will take a time that is something like exponential on the number
> > of tables being joined?
>
> Yes.  See the manual's discussion of genetic query optimization.  You
> might try setting the GEQO threshold to something less than 10.

This seemed to make the query take *much* longer (like about five
times as long as before!) even with empty tables?

> FWIW, the time you are looking at here is strictly planning time and
> will not increase when you actually put data in the tables.  Unless
> you foresee fairly small final tables, the planning time is not likely
> to be a significant factor in this range of query sizes.

That would be true if I were typically selecting a large proportion
of the table data. However, in my application, a typical query only
retrieves 50 tuples. The query optimisation therefore dominates the
total time. I don't think the absolute table size matters much, since
the tables are indexed.

Here is a summary of my 9-table query timings:

                                            GEQO=11     GEQO=3
                                            =======     ======
Empty tables (hence 0 tuples selected)        0.94s      5.09s
Full tables (26 tuples selected)              0.98s      5.15s

These times include executing the SELECT, which (via libpq) means
that it also includes the fetch times for all the selected tuples).

> Still, if the purpose of the join is just to substitute descriptions
> for IDs, you'd be well advised to consider doing it via table lookup
> on the frontend side, assuming your frontend is coded in a language
> that makes that reasonable to do.

The drawback for me here is that either I must load up all the lookup
tables (when I may only need a few of the values from each of them),
or else I have to look up just the values I need (which is possible
but messy).

Is there any way to turn off the optimisation? Or perhaps some way to
work out the optimal strategy once, and then provide this information
directly? After all, presumably the optimiser will work out the same
answer every time (for the same table structures and the same SELECT)
so it seems pointless doing this work every time the SELECT gets
executed...

Doug.

Re: [INTERFACES] Slow join query optimisation?

From

Tom Lane

Date:

02 December 1999, 10:50:07

Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes:
> Is there any way to turn off the optimisation? Or perhaps some way to
> work out the optimal strategy once, and then provide this information
> directly?

Not at the moment.  There's been some talk of caching query plans,
which would get the job done, but none of the current developers are
working on that now.  It'll probably happen someday...
        regards, tom lane

************

Re: [INTERFACES] Slow join query optimisation?

From

Oleg Bartunov

Date:

02 December 1999, 11:54:06

On Thu, 2 Dec 1999, Tom Lane wrote:

> Date: Thu, 02 Dec 1999 10:21:43 -0500
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: dougt@mugc.cc.monash.edu.au
> Cc: pgsql-interfaces@postgreSQL.org
> Subject: Re: [INTERFACES] Slow join query optimisation? 
> 
> Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes:
> > Is there any way to turn off the optimisation? Or perhaps some way to
> > work out the optimal strategy once, and then provide this information
> > directly?
> 
> Not at the moment.  There's been some talk of caching query plans,
> which would get the job done, but none of the current developers are
> working on that now.  It'll probably happen someday...


What do you need to have 'someday' tomorrow :-?
Seriously, last topics about postgres performance indicate that
many people now seriously consider Postgres as a db engine for 
real business application. And because business is more and more
shifting to Internet, we also need to think about Web-friendly
postgres features. One feature (LIMIT) implemented by Jan  already
brings many Web-developers. Next was MVCC (and we're waiting for
WAL). We need DB pooling, which would be very useful in 
high-performance Web application, CORBA interface is very important
(I currently just reading books). Caching query plans is very desirable,
probably not caching, but storing (naming), so developer could decide
which query should be stored. 
Regards,
    Oleg
> 
>             regards, tom lane
> 
> ************
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


************

Re: [INTERFACES] Slow join query optimisation?

From

Douglas Thomson

Date:

03 December 1999, 09:26:10

> Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes:
> > My question for the list: Is there something in the query optimiser
> > that will take a time that is something like exponential on the number
> > of tables being joined?
>
> Yes.  See the manual's discussion of genetic query optimization.  You
> might try setting the GEQO threshold to something less than 10.

This seemed to make the query take *much* longer (like about five
times as long as before!) even with empty tables?

> FWIW, the time you are looking at here is strictly planning time and
> will not increase when you actually put data in the tables.  Unless
> you foresee fairly small final tables, the planning time is not likely
> to be a significant factor in this range of query sizes.

That would be true if I were typically selecting a large proportion
of the table data. However, in my application, a typical query only
retrieves 50 tuples. The query optimisation therefore dominates the
total time. I don't think the absolute table size matters much, since
the tables are indexed.

Here is a summary of my 9-table query timings:

                                            GEQO=11     GEQO=3
                                            =======     ======
Empty tables (hence 0 tuples selected)        0.94s      5.09s
Full tables (26 tuples selected)              0.98s      5.15s

These times include executing the SELECT, which (via libpq) means
that it also includes the fetch times for all the selected tuples).

> Still, if the purpose of the join is just to substitute descriptions
> for IDs, you'd be well advised to consider doing it via table lookup
> on the frontend side, assuming your frontend is coded in a language
> that makes that reasonable to do.

The drawback for me here is that either I must load up all the lookup
tables (when I may only need a few of the values from each of them),
or else I have to look up just the values I need (which is possible
but messy).

Is there any way to turn off the optimisation? Or perhaps some way to
work out the optimal strategy once, and then provide this information
directly? After all, presumably the optimiser will work out the same
answer every time (for the same table structures and the same SELECT)
so it seems pointless doing this work every time the SELECT gets
executed...

Doug.

************

Re: [INTERFACES] Slow join query optimisation?

From

Tom Lane

Date:

03 December 1999, 09:30:05

Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes:
> Is there any way to turn off the optimisation? Or perhaps some way to
> work out the optimal strategy once, and then provide this information
> directly?

Not at the moment.  There's been some talk of caching query plans,
which would get the job done, but none of the current developers are
working on that now.  It'll probably happen someday...
        regards, tom lane

************

Re: [INTERFACES] Slow join query optimisation?

From

Oleg Bartunov

Date:

03 December 1999, 09:32:15

On Thu, 2 Dec 1999, Tom Lane wrote:

> Date: Thu, 02 Dec 1999 10:21:43 -0500
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: dougt@mugc.cc.monash.edu.au
> Cc: pgsql-interfaces@postgreSQL.org
> Subject: Re: [INTERFACES] Slow join query optimisation? 
> 
> Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes:
> > Is there any way to turn off the optimisation? Or perhaps some way to
> > work out the optimal strategy once, and then provide this information
> > directly?
> 
> Not at the moment.  There's been some talk of caching query plans,
> which would get the job done, but none of the current developers are
> working on that now.  It'll probably happen someday...


What do you need to have 'someday' tomorrow :-?
Seriously, last topics about postgres performance indicate that
many people now seriously consider Postgres as a db engine for 
real business application. And because business is more and more
shifting to Internet, we also need to think about Web-friendly
postgres features. One feature (LIMIT) implemented by Jan  already
brings many Web-developers. Next was MVCC (and we're waiting for
WAL). We need DB pooling, which would be very useful in 
high-performance Web application, CORBA interface is very important
(I currently just reading books). Caching query plans is very desirable,
probably not caching, but storing (naming), so developer could decide
which query should be stored. 
Regards,
    Oleg
> 
>             regards, tom lane
> 
> ************
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


************