Re: Alternatives to very large tables with many performance-killing indicies? - Mailing list pgsql-general

From Jeff Janes
Subject Re: Alternatives to very large tables with many performance-killing indicies?
Date
Msg-id CAMkU=1xja7P-TTBvwTBEUFgSBb0XpraCT_oN9Boi8rt-iadcdQ@mail.gmail.com
Whole thread Raw
In response to Alternatives to very large tables with many performance-killing indicies?  (Wells Oliver <wellsoliver@gmail.com>)
Responses Re: Alternatives to very large tables with many performance-killing indicies?  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-general
On Thu, Aug 16, 2012 at 1:54 PM, Wells Oliver <wellsoliver@gmail.com> wrote:
> Hey folks, a question. We have a table that's getting large (6 million rows
> right now, but hey, no end in sight).

Does it grow in chunks, or one row at a time?

> It's wide-ish, too, 98 columns.

How many of the columns are NULL for any given row?  Or perhaps
better, what is the distribution of values for any given column?  For
a given column, is there some magic value (NULL, 0, 1, -1, 9999, '')
which most of the rows have?

> The problem is that each of these columns needs to be searchable quickly at
> an application level, and I'm far too responsible an individual to put 98
> indexes on a table.

That is somewhat melodramatic.  Sure, creating 98 indexes does not
come for free.  And it is great that you are aware of this.  But just
because they are not free does not mean they are not worth their cost.
 Look at all the other costs of using a RDBMS.  Each letter of ACID
does not come for free.  But it is often worth the price.

In the generic case, you have a large amount of data to index.
Indexing a lot of data requires a lot of resources.  There is magic
bullet to this.

> Wondering what you folks have come across in terms of
> creative solutions that might be native to postgres. I can build something
> that indexes the data and caches it and runs separately from PG, but I
> wanted to exhaust all native options first.

If the data is frequently updated/inserted, then how would you
invalidate the cache when needed?  And if the data is not frequently
updated/inserted, then what about the obvious PG solution (building 96
indexes) is a problem?

If your queries are of the nature of:

where
  col1=:1 or
  col2=:1 or
  col3=:1 or
...
  col96=:1 or

then a full text index would probably be a better option.

Otherwise, it is hard to say.  You could replace 96 columns with a
single hstore column which has 96 different keys.  But from what I can
tell, maintaining a gin index on that hstore column would probably be
slower than maintaining 96 individual btree indexes.

And if you go with a gist index on the single hstore column, the cost
of maintenance is greatly reduced relative to gin.  But the index is
basically useless, you might as well just drop the index and do the
full table scan instead.

Cheers,

Jeff


pgsql-general by date:

Previous
From: Mike Christensen
Date:
Subject: Re: What text format is this and can I import it into Postgres?
Next
From: John R Pierce
Date:
Subject: Re: What text format is this and can I import it into Postgres?