Thread: Re: Question about clustering multiple columns

Re: Question about clustering multiple columns

From
Bruno Wolff III
Date:
On Tue, Jun 13, 2006 at 09:04:15 -0700,
  Benjamin Arai <benjamin@araisoft.com> wrote:
> Hi,
>
> I have a database where there are three columns (name,date,data).  The
> queries are almost always something like SELECT date,data FROM table WHERE
> name=blah AND date > 1/1/2005 AND date < 1/1/2006;.  I currently have three
> B-tree indexes, one for each of the columns.  Is clustering on date index
> going to be what I want, or do I need a index that contains both name and
> date?

I would expect that clustering on the name would be better for the above
query.
You probably want an index on name and date combined.

Re: Question about clustering multiple columns

From
Bruno Wolff III
Date:
On Fri, Jun 16, 2006 at 11:11:59 -0700,
  Benjamin Arai <benjamin@araisoft.com> wrote:
> Hi,
>
> Thanks for the reply. I have one more question.  Does it matter in which
> order that I make the index?

Please keep replies copied to the lists so that other people can learn from
and crontibute to the discussion.

In this case I am just going to copy back to the performance list, since it
is generally better for perfomance questions than the general list.

> For example, should I create an index cusip,date or date,cusip, does it
> matter which order.  My goal is to cluster the entries by cusip, then for
> each cusip order the data by date (maybe the order by data occurs
> automatically).  Hm, in that case maybe I only need to cluster by cusip, but
> then how do I ensure that each cusip had its data ordered by date?

I think that you want to order by cusip (assuming that corresponds to "name"
in you sample query below) first. You won't end up having to go through values
in the index that will be filtered out if you do it that way.

The documentation for the cluster command says that it clusters on indexes,
not columns. So if the index is on (cusip, date), then the records will be
ordered by cusip, date immediately after the cluster. (New records added
after the cluster are not guarenteed to be ordered by the index.)

>
> Benjamin
>
> -----Original Message-----
> From: Bruno Wolff III [mailto:bruno@wolff.to]
> Sent: Friday, June 16, 2006 8:32 AM
> To: Benjamin Arai
> Cc: pgsql-general@postgresql.org; pgsql-performance@postgresql.org
> Subject: Re: Question about clustering multiple columns
>
> On Tue, Jun 13, 2006 at 09:04:15 -0700,
>   Benjamin Arai <benjamin@araisoft.com> wrote:
> > Hi,
> >
> > I have a database where there are three columns (name,date,data).  The
> > queries are almost always something like SELECT date,data FROM table
> > WHERE name=blah AND date > 1/1/2005 AND date < 1/1/2006;.  I currently
> > have three B-tree indexes, one for each of the columns.  Is clustering
> > on date index going to be what I want, or do I need a index that
> > contains both name and date?
>
> I would expect that clustering on the name would be better for the above
> query.
> You probably want an index on name and date combined.
>
> !DSPAM:4492ce0d180368658827628!
>

Re: Question about clustering multiple columns

From
"Benjamin Arai"
Date:
Thanks! This exactly what I was looking for.

Benjamin Arai
Benjamin@araisoft.com
http://www.benjaminarai.com

-----Original Message-----
From: Bruno Wolff III [mailto:bruno@wolff.to]
Sent: Friday, June 16, 2006 11:56 AM
To: Benjamin Arai
Cc: pgsql-performance@postgresql.org
Subject: Re: Question about clustering multiple columns

On Fri, Jun 16, 2006 at 11:11:59 -0700,
  Benjamin Arai <benjamin@araisoft.com> wrote:
> Hi,
>
> Thanks for the reply. I have one more question.  Does it matter in
> which order that I make the index?

Please keep replies copied to the lists so that other people can learn from
and crontibute to the discussion.

In this case I am just going to copy back to the performance list, since it
is generally better for perfomance questions than the general list.

> For example, should I create an index cusip,date or date,cusip, does
> it matter which order.  My goal is to cluster the entries by cusip,
> then for each cusip order the data by date (maybe the order by data
> occurs automatically).  Hm, in that case maybe I only need to cluster
> by cusip, but then how do I ensure that each cusip had its data ordered by
date?

I think that you want to order by cusip (assuming that corresponds to "name"
in you sample query below) first. You won't end up having to go through
values in the index that will be filtered out if you do it that way.

The documentation for the cluster command says that it clusters on indexes,
not columns. So if the index is on (cusip, date), then the records will be
ordered by cusip, date immediately after the cluster. (New records added
after the cluster are not guarenteed to be ordered by the index.)

>
> Benjamin
>
> -----Original Message-----
> From: Bruno Wolff III [mailto:bruno@wolff.to]
> Sent: Friday, June 16, 2006 8:32 AM
> To: Benjamin Arai
> Cc: pgsql-general@postgresql.org; pgsql-performance@postgresql.org
> Subject: Re: Question about clustering multiple columns
>
> On Tue, Jun 13, 2006 at 09:04:15 -0700,
>   Benjamin Arai <benjamin@araisoft.com> wrote:
> > Hi,
> >
> > I have a database where there are three columns (name,date,data).
> > The queries are almost always something like SELECT date,data FROM
> > table WHERE name=blah AND date > 1/1/2005 AND date < 1/1/2006;.  I
> > currently have three B-tree indexes, one for each of the columns.
> > Is clustering on date index going to be what I want, or do I need a
> > index that contains both name and date?
>
> I would expect that clustering on the name would be better for the
> above query.
> You probably want an index on name and date combined.
>
>
>

!DSPAM:4492fdfd193631139819016!