Re: Performace comparison of indexes over timestamp fields - Mailing list pgsql-performance

From Alexander Staubo
Subject Re: Performace comparison of indexes over timestamp fields
Date
Msg-id 88daf38c0705220539q5807d2c2t1f20a03922bf8d33@mail.gmail.com
Whole thread Raw
In response to Performace comparison of indexes over timestamp fields  (Arnau <arnaulist@andromeiberica.com>)
Responses Re: Performace comparison of indexes over timestamp fields  ("Steinar H. Gunderson" <sgunderson@bigfoot.com>)
List pgsql-performance
On 5/22/07, Arnau <arnaulist@andromeiberica.com> wrote:
>    On older versions of PostgreSQL, at least in my experience, queries
> on timestamps fields even having indexes where performing quite bad
> mainly sequential scans where performed.

PostgreSQL uses B-tree indexes for scalar values. For an expression
such as "t between a and b", I believe it's going to match both sides
of the table independently (ie., t >= a and t <= b) and intersect
these subsets. This is inefficient.

You should get better performance by mapping timestamps to a
one-dimensional plane and indexing them using GiST. GiST implements an
R-tree-like structure that supports bounding-box searches.

This involves setting up a functional index:

  create index ... on payment_transactions using gist (
    box(point(extract(epoch from time), 0), point(extract(epoch from
time), 0)) box_ops)

I'm using box() here because GiST doesn't have a concept of points.

Then insert as usual, and then query with something like:

  select ... from payment_transactions
  where box(
    point(extract(epoch from '2006-04-01'::date), 0),
    point(extract(epoch from '2006-08-01'::date), 0)) && box(
    point(extract(epoch from time), 0),
    point(extract(epoch from time), 0));

PostgreSQL should be able to exploit the GiST index by recognizing
that the result of box() expression operand is already computed in the
index.

This much less inconvenient and portable -- I would love for
PostgreSQL to be provide syntactic sugar and special-casing to make
this transparent -- but worth it if you are dealing with a lot of
range searches.

>    Now I have a newer version of PostgreSQL and I've done some tests
> comparing the performance of an index over a timestamp field with a
> numeric field. To do so, I have the following table:
>
>                       Table "public.payment_transactions"
>      Column     |            Type             |            Modifiers
> ----------------+-----------------------------+---------------------------------
> transaction_id | character varying(32)       | not null
> timestamp_in   | timestamp without time zone | default now()
> credits        | integer                     |
> epoch_in       | bigint                      |
> epoch_in2      | double precision            |
[snip]

A timestamp is stored internally as an 8-byte double-precision float.
Therefore, timestamp_in and epoch_in2 should behave identically.

> While doing the tests this table has about 100.000 entries.

Make sure PostgreSQL is able to keep the entire table in memory by
setting shared_buffers; you don't want to be hitting to the disk.

Make sure you run "analyze" on the table before you execute the test.

> To test the diferent indexes I have executed the following:

Your query plans are roughly identical. The difference in the timings
implies that you only ran the queries once. I suggest you run each
query at least 10 times, and report the individual numbers (the "total
runtime" parts of the output) you get. Arithmetic means are not that
interesting.

Alexander.

pgsql-performance by date:

Previous
From: cedric
Date:
Subject: Re: Key/Value reference table generation: INSERT/UPDATE performance
Next
From: "Steinar H. Gunderson"
Date:
Subject: Re: Performace comparison of indexes over timestamp fields