Thread: RE: pgsql-general-digest V1 #365

RE: pgsql-general-digest V1 #365

From
"Ansley, Michael"
Date:
On Fri, 25 Jun 1999, Jim Jennis wrote:
>>>  Not only legacy apps, but data warehousing. Frequently in a production
>>>  environment you use two sets of tables -- production and data
>>>  warehousing...One (production) with "bare bones" indicies to maximize
>>>  transaction performance, and one (a replicate in the data warehouse)
that
>>>  you "index the living daylights out of" so that the non db saavy
managers
>>>  who want to do ungodly joints and sorts on tables for organizational
>>>  reporting get decent performance.
I'm so pleased to find out that somebody else has picked this up.  In fact,
the summary tables that I'm working on are a kind of mini-warehouse, the
just happen to reside in the same tablespace as the transactional tables.
However, data warehousing is quite an important issue.  I know that most of
the people who use PG work on transactional systems, but if anyone tries to
run even a small warehouse on PG it's going to get complicated very quickly.

On Sat, 26 Jun 1999, Dustin Sallings wrote:
>>>      Creating lots of indices is far different from creating a single
>>>  index on a lot of fields.  Data warehousing is the former.  The problem
is
>>>  that you can't create a single index with a large number of fields.
Not entirely true.  Sometimes the level of summary can require more than
seven fields in an index, normally the primary index.

MikeA...

Re: Data warehousing

From
Herouth Maoz
Date:
At 10:29 +0300 on 28/06/1999, Ansley, Michael wrote:


> >>>  you "index the living daylights out of" so that the non db saavy
> managers
> >>>  who want to do ungodly joints and sorts on tables for organizational
> >>>  reporting get decent performance.
> I'm so pleased to find out that somebody else has picked this up.  In fact,
> the summary tables that I'm working on are a kind of mini-warehouse, the
> just happen to reside in the same tablespace as the transactional tables.
> However, data warehousing is quite an important issue.  I know that most of
> the people who use PG work on transactional systems, but if anyone tries to
> run even a small warehouse on PG it's going to get complicated very quickly.

Maybe I'm missing the point here, but it seems to me that if you simply use
indices not as key definitions but as query accelerators (as in "index the
living daylights out of"), then you may as well define a separate index on
each and every field. Why do multiple-field indices in such a case?

The case for defining uniqueness is a good case.

Other than that, we should remember that indices are merely a method of
acceleration of queries. There will not me much of a difference in
efficiency of queries if the index is split into the individual fields.
When postgres finally has built-in referential integrity, the question may
rise again.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma



Re: [GENERAL] Re: Data warehousing

From
Chris Bitmead
Date:
Herouth Maoz wrote:

> Maybe I'm missing the point here, but it seems to me that if
> you simply use indices not as key definitions but as query
> accelerators (as in "index the living daylights out of"), then
> you may as well define a separate index on each and every
> field. Why do multiple-field indices in such a case?

Why do multi-field indexes? Umm. How about to avoid doing a sequential
scan? If your query depends on multiple fields then once you've found
the set based on the first index you would have to sequential scan that
subset based on the second condition, unless you have a multi-field
index.

Re: [GENERAL] Re: Data warehousing

From
Stephen Davies
Date:
Chris Bitmead <chris.bitmead@bigfoot.com>  wrote:
> Herouth Maoz wrote:
>
> > Maybe I'm missing the point here, but it seems to me that if
> > you simply use indices not as key definitions but as query
> > accelerators (as in "index the living daylights out of"), then
> > you may as well define a separate index on each and every
> > field. Why do multiple-field indices in such a case?
>
> Why do multi-field indexes? Umm. How about to avoid doing a sequential
> scan? If your query depends on multiple fields then once you've found
> the set based on the first index you would have to sequential scan that
> subset based on the second condition, unless you have a multi-field
> index.

Is this correct?

With most other relational databases databases that I have used, relational algebra
is used to intersect sets from each index to give the final set rather than just
stopping after the first index. Surely PostgreSQL could do the same.

The usual argument for multi-field indexes is that they are significantly
smaller than the equivalent multiple single indexes and also faster: iff the
fields you searching on are leading parts of the index.

Cheers,
Stephen Davies



========================================================================
Stephen Davies Consulting                                         scldad@sdc.com.au
Adelaide, South Australia.                                          Voice: 041-980 1417
Computing & Network solutions.                                   Fax: 08-82728863



Re: [GENERAL] Re: Data warehousing

From
Herouth Maoz
Date:
At 16:00 +0300 on 28/06/1999, Chris Bitmead wrote:


> Why do multi-field indexes? Umm. How about to avoid doing a sequential
> scan? If your query depends on multiple fields then once you've found
> the set based on the first index you would have to sequential scan that
> subset based on the second condition, unless you have a multi-field
> index.

Well, I may be wrong, but I seem to recall reading in one of these mailing
lists once that there is no real difference performance-wise between
multiple-field indices and single-field indices on same columns. There was
a bit of overhead. I think Vadim should know the answer to this question.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma



Re: [GENERAL] Re: Data warehousing

From
Chris Bitmead
Date:
Stephen Davies wrote:

> > Why do multi-field indexes? Umm. How about to avoid doing a sequential
> > scan? If your query depends on multiple fields then once you've found
> > the set based on the first index you would have to sequential scan that
> > subset based on the second condition, unless you have a multi-field
> > index.
>
> Is this correct?
>
> With most other relational databases databases that I have
> used, relational algebra is used to intersect sets from
> each index to give the final set rather than just
> stopping after the first index. Surely PostgreSQL could do
> the same.

How do you think a database would find an intersection of two sets using
indexes? Say there were 3 clauses in the WHERE condition. The first
index lookup you do gets you all the records for that condition. Then
you would sequentially, for each result do a lookup on the second index.
For each of those results you would do a 3rd lookup.

So to get the first clause taken care of is 1 lookup. Lets say you get
50,000 results.
To take care of the 2nd lookup you do 50,000 index lookups. Lets say
there are 20,000 results.
To take care of the 3rd clause you do 20,000 lookups for a total of
70,001 index lookups. On the other hand if you had a single index on the
three attributes you would have 1 single index lookup.

I'm sure someone will correct me if I'm wrong, but that's my
understanding. BTW I have no idea if Postgres does or doesn't use
indexes for the secondary lookups, but I can't see that it would be a
huge difference anyway.

> The usual argument for multi-field indexes is that they
> are significantly smaller than the equivalent multiple
> single indexes and also faster: iff the
> fields you searching on are leading parts of the index.

Of course it will work without any indexes, but making it fast is what
indexing is all about.