Re: Index on two columns not used - Mailing list pgsql-performance

From Markus Schaber
Subject Re: Index on two columns not used
Date
Msg-id 453CC2DD.4070608@logix-tt.com
Whole thread Raw
In response to Re: Index on two columns not used  (Péter Kovács <peter.kovacs@chemaxon.hu>)
Responses Re: Index on two columns not used
Re: Index on two columns not used
List pgsql-performance
Hi, Peter,

Péter Kovács wrote:
> Sorry for the amateurish question, but what are "heap tuples"?
>
> Also, my understanding is that the following statement applies only for
> composite indexes: "PostgreSQL can't use the values stored in the index
> to check the join condition". I assume that PostgreSQL will be able to
> use single-column-indexes for join conditions. Is this correct?

Both questions are tightly related:

First, the "heap" is the part of the table where the actual tuples are
stored.

PostgreSQL uses an MVCC system, that means that multiple versions (with
their transaction information) of a single row can coexist in the heap.
This allows for higher concurrency in the backend.

Now, the index basically stores pointers like "pages 23 and 42 contain
rows with value 'foo'", but version information is not replicated to the
index pages, this keeps the index' size requirements low.

Additionally, in most UPDATE cases, the new row version will fit into
the same page as the old version. In this case, the index does not have
to be changed, which is an additional speed improvement.

But when accessing the data via the index, it can only give a
preselection of pages that contain interesting data, and PostgreSQL has
to look into the actual heap pages to check whether there really are row
versions that are visible in the current transaction.


A further problem is that some GIST index types are lossy, that means
the index does not retain the full information, but only an
approximation, for efficiency reasons.

A prominent example are the PostGIS geometry indices, they only store
the bounding box (4 float values) instead of the whole geometry (may be
millions of double precision coordinates). So it may be necessary to
re-check the condition with the real data, using the lossy index for
preselection.

HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


pgsql-performance by date:

Previous
From: John Philips
Date:
Subject: Re: Optimizing disk throughput on quad Opteron
Next
From: Alvaro Herrera
Date:
Subject: Re: Index on two columns not used