Thread: RE: pgsql-general-digest V1 #365
On Fri, 25 Jun 1999, Jim Jennis wrote: >>> Not only legacy apps, but data warehousing. Frequently in a production >>> environment you use two sets of tables -- production and data >>> warehousing...One (production) with "bare bones" indicies to maximize >>> transaction performance, and one (a replicate in the data warehouse) that >>> you "index the living daylights out of" so that the non db saavy managers >>> who want to do ungodly joints and sorts on tables for organizational >>> reporting get decent performance. I'm so pleased to find out that somebody else has picked this up. In fact, the summary tables that I'm working on are a kind of mini-warehouse, the just happen to reside in the same tablespace as the transactional tables. However, data warehousing is quite an important issue. I know that most of the people who use PG work on transactional systems, but if anyone tries to run even a small warehouse on PG it's going to get complicated very quickly. On Sat, 26 Jun 1999, Dustin Sallings wrote: >>> Creating lots of indices is far different from creating a single >>> index on a lot of fields. Data warehousing is the former. The problem is >>> that you can't create a single index with a large number of fields. Not entirely true. Sometimes the level of summary can require more than seven fields in an index, normally the primary index. MikeA...
At 10:29 +0300 on 28/06/1999, Ansley, Michael wrote: > >>> you "index the living daylights out of" so that the non db saavy > managers > >>> who want to do ungodly joints and sorts on tables for organizational > >>> reporting get decent performance. > I'm so pleased to find out that somebody else has picked this up. In fact, > the summary tables that I'm working on are a kind of mini-warehouse, the > just happen to reside in the same tablespace as the transactional tables. > However, data warehousing is quite an important issue. I know that most of > the people who use PG work on transactional systems, but if anyone tries to > run even a small warehouse on PG it's going to get complicated very quickly. Maybe I'm missing the point here, but it seems to me that if you simply use indices not as key definitions but as query accelerators (as in "index the living daylights out of"), then you may as well define a separate index on each and every field. Why do multiple-field indices in such a case? The case for defining uniqueness is a good case. Other than that, we should remember that indices are merely a method of acceleration of queries. There will not me much of a difference in efficiency of queries if the index is split into the individual fields. When postgres finally has built-in referential integrity, the question may rise again. Herouth -- Herouth Maoz, Internet developer. Open University of Israel - Telem project http://telem.openu.ac.il/~herutma
Herouth Maoz wrote: > Maybe I'm missing the point here, but it seems to me that if > you simply use indices not as key definitions but as query > accelerators (as in "index the living daylights out of"), then > you may as well define a separate index on each and every > field. Why do multiple-field indices in such a case? Why do multi-field indexes? Umm. How about to avoid doing a sequential scan? If your query depends on multiple fields then once you've found the set based on the first index you would have to sequential scan that subset based on the second condition, unless you have a multi-field index.
Chris Bitmead <chris.bitmead@bigfoot.com> wrote: > Herouth Maoz wrote: > > > Maybe I'm missing the point here, but it seems to me that if > > you simply use indices not as key definitions but as query > > accelerators (as in "index the living daylights out of"), then > > you may as well define a separate index on each and every > > field. Why do multiple-field indices in such a case? > > Why do multi-field indexes? Umm. How about to avoid doing a sequential > scan? If your query depends on multiple fields then once you've found > the set based on the first index you would have to sequential scan that > subset based on the second condition, unless you have a multi-field > index. Is this correct? With most other relational databases databases that I have used, relational algebra is used to intersect sets from each index to give the final set rather than just stopping after the first index. Surely PostgreSQL could do the same. The usual argument for multi-field indexes is that they are significantly smaller than the equivalent multiple single indexes and also faster: iff the fields you searching on are leading parts of the index. Cheers, Stephen Davies ======================================================================== Stephen Davies Consulting scldad@sdc.com.au Adelaide, South Australia. Voice: 041-980 1417 Computing & Network solutions. Fax: 08-82728863
At 16:00 +0300 on 28/06/1999, Chris Bitmead wrote: > Why do multi-field indexes? Umm. How about to avoid doing a sequential > scan? If your query depends on multiple fields then once you've found > the set based on the first index you would have to sequential scan that > subset based on the second condition, unless you have a multi-field > index. Well, I may be wrong, but I seem to recall reading in one of these mailing lists once that there is no real difference performance-wise between multiple-field indices and single-field indices on same columns. There was a bit of overhead. I think Vadim should know the answer to this question. Herouth -- Herouth Maoz, Internet developer. Open University of Israel - Telem project http://telem.openu.ac.il/~herutma
Stephen Davies wrote: > > Why do multi-field indexes? Umm. How about to avoid doing a sequential > > scan? If your query depends on multiple fields then once you've found > > the set based on the first index you would have to sequential scan that > > subset based on the second condition, unless you have a multi-field > > index. > > Is this correct? > > With most other relational databases databases that I have > used, relational algebra is used to intersect sets from > each index to give the final set rather than just > stopping after the first index. Surely PostgreSQL could do > the same. How do you think a database would find an intersection of two sets using indexes? Say there were 3 clauses in the WHERE condition. The first index lookup you do gets you all the records for that condition. Then you would sequentially, for each result do a lookup on the second index. For each of those results you would do a 3rd lookup. So to get the first clause taken care of is 1 lookup. Lets say you get 50,000 results. To take care of the 2nd lookup you do 50,000 index lookups. Lets say there are 20,000 results. To take care of the 3rd clause you do 20,000 lookups for a total of 70,001 index lookups. On the other hand if you had a single index on the three attributes you would have 1 single index lookup. I'm sure someone will correct me if I'm wrong, but that's my understanding. BTW I have no idea if Postgres does or doesn't use indexes for the secondary lookups, but I can't see that it would be a huge difference anyway. > The usual argument for multi-field indexes is that they > are significantly smaller than the equivalent multiple > single indexes and also faster: iff the > fields you searching on are leading parts of the index. Of course it will work without any indexes, but making it fast is what indexing is all about.