Home > mailing lists

Re: Index on two columns not used - Mailing list pgsql-performance

From	Markus Schaber
Subject	Re: Index on two columns not used
Date	October 23, 2006 10:26:12
Msg-id	453CC2DD.4070608@logix-tt.com Whole thread Raw
In response to	Re: Index on two columns not used (Péter Kovács <peter.kovacs@chemaxon.hu>)
Responses	Re: Index on two columns not used Re: Index on two columns not used
List	pgsql-performance

Tree view

Hi, Peter,

Péter Kovács wrote:
> Sorry for the amateurish question, but what are "heap tuples"?
>
> Also, my understanding is that the following statement applies only for
> composite indexes: "PostgreSQL can't use the values stored in the index
> to check the join condition". I assume that PostgreSQL will be able to
> use single-column-indexes for join conditions. Is this correct?

Both questions are tightly related:

First, the "heap" is the part of the table where the actual tuples are
stored.

PostgreSQL uses an MVCC system, that means that multiple versions (with
their transaction information) of a single row can coexist in the heap.
This allows for higher concurrency in the backend.

Now, the index basically stores pointers like "pages 23 and 42 contain
rows with value 'foo'", but version information is not replicated to the
index pages, this keeps the index' size requirements low.

Additionally, in most UPDATE cases, the new row version will fit into
the same page as the old version. In this case, the index does not have
to be changed, which is an additional speed improvement.

But when accessing the data via the index, it can only give a
preselection of pages that contain interesting data, and PostgreSQL has
to look into the actual heap pages to check whether there really are row
versions that are visible in the current transaction.

A further problem is that some GIST index types are lossy, that means
the index does not retain the full information, but only an
approximation, for efficiency reasons.

A prominent example are the PostGIS geometry indices, they only store
the bounding box (4 float values) instead of the whole geometry (may be
millions of double precision coordinates). So it may be necessary to
re-check the condition with the real data, using the lossy index for
preselection.

HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

pgsql-performance by date:

From: John Philips
Date: 23 October 2006, 09:17:03
Subject: Re: Optimizing disk throughput on quad Opteron

From: Alvaro Herrera
Date: 23 October 2006, 10:51:16
Subject: Re: Index on two columns not used

Re: Index on two columns not used - Mailing list pgsql-performance

Previous

Next