Thread: queries on 2+ indices

queries on 2+ indices

From
Marc Tardif
Date:
How does postgresql perform queries on one table using more than one
index? For example, assuming the following:

create table t1 ( f1 int, f2 int);
create index t1_f1 on t1 (f1);
create index t1_f2 on t1 (f2);
select * from t1 where f1=123 and f2=456;

By default, both indices will be btree's making them sorted. Therefore,
does postgresql retrieve all matching records from t1_f1 and t1_f2 into
intermediate tables and then performs somekind of merge sort before
retrieving the final results from t1? If so or if not, are intermediate
files created for this kind of operation or does the postgresql support
queries to multiple fields directly in its indexing system (perhaps aided
by "analyze")? Or, does this kind of operation rely much on memory?

I have tried making heads or tails out of the source code, but postgresql
is far more daunting than I had expected. Nevertheless, for future
reference, how could I find answers to questions about query management by
postgresql?

Many thanks for your excellent ordbms,
Marc Tardif



Re: [HACKERS] queries on 2+ indices

From
Tom Lane
Date:
Marc Tardif <admin@wtbwts.com> writes:
> How does postgresql perform queries on one table using more than one
> index?

It doesn't.  Simple enough, eh?
For example, assuming the following:

> create table t1 ( f1 int, f2 int);
> create index t1_f1 on t1 (f1);
> create index t1_f2 on t1 (f2);
> select * from t1 where f1=123 and f2=456;

The optimizer will attempt to guess which index is more selective
(will return fewer tuples for its part of the WHERE clause).  That
index would be used for the indexscan, and the rest of the WHERE
clause would be applied as a "qpqual", ie actually evaluated as
an expression against each tuple found by the index.

As you note, there's not any really efficient way to make use of
independent indexes to evaluate an AND condition like this one.
While Postgres' approach is pretty simplistic, I'm not sure that
a more-complicated approach would actually be any faster.

If you have a multi-column index, eg

create index t1_f1_f2 on t1 (f1, f2);

then the system can and will use both clauses of the WHERE with
that single index.  But again, it's not entirely clear that that's
all that much faster than just using the more-selective clause
in a smaller index.  Furthermore, a multi-column index is more
specialized than single-column indexes because it is useful for
only a narrower range of queries; so you have to consider the extra
work done at insert/update to manage the extra index, and decide
if it's really a win overall for your application.

> how could I find answers to questions about query management by
> postgresql?

Asking questions on the mailing lists isn't a bad way to start.
Seeing what EXPLAIN says about how queries will be executed is
another nice learning tool.

There is some high-level implementation info in the SGML documentation,
and more scattered in various README files, but you won't really
understand a lot until you start burrowing into the source code.
        regards, tom lane


Re: [HACKERS] queries on 2+ indices

From
Marc Tardif
Date:
I'm not sure I understand what is "qpqual" in your explanation. Once the
first indexscan is performed, is a temporary table created (in-file or
in-memory) containing the relevant tuples? If not, how can the remaining
part of the WHERE clause be evaluated against the previously selected
tuples during the first indexscan? Or, is the remaining part of the WHERE
clause re-evaluated again and again for each of the found tuples in the
first indexscan?

On Fri, 18 Feb 2000, Tom Lane wrote:

> Marc Tardif <admin@wtbwts.com> writes:
> 
> >  For example, assuming the following:
> >
> > create table t1 ( f1 int, f2 int);
> > create index t1_f1 on t1 (f1);
> > create index t1_f2 on t1 (f2);
> > select * from t1 where f1=123 and f2=456;
> 
> The optimizer will attempt to guess which index is more selective
> (will return fewer tuples for its part of the WHERE clause).  That
> index would be used for the indexscan, and the rest of the WHERE
> clause would be applied as a "qpqual", ie actually evaluated as
> an expression against each tuple found by the index.
> 
> As you note, there's not any really efficient way to make use of
> independent indexes to evaluate an AND condition like this one.
> While Postgres' approach is pretty simplistic, I'm not sure that
> a more-complicated approach would actually be any faster.
> 



Re: [HACKERS] queries on 2+ indices

From
Tom Lane
Date:
Marc Tardif <admin@wtbwts.com> writes:
> I'm not sure I understand what is "qpqual" in your explanation. Once the
> first indexscan is performed, is a temporary table created (in-file or
> in-memory) containing the relevant tuples?

No, it's all done on-the-fly as each tuple is scanned.

> is the remaining part of the WHERE
> clause re-evaluated again and again for each of the found tuples in the
> first indexscan?

That's what I said.
        regards, tom lane