> What can I do to prevent the index from getting bloated, or in whatever
> state it was in?
>
>
> What else can I do to further improve queries on this table? Someone
> suggested posting details of my conf file. Which settings are most
> likely to
> be useful for this?
If you often do range queries on date, consider partitioning your table
by date (something like 1 partition per month).
Of course, if you also often do range queries on something other than
date, and uncorrelated, forget it.
If you make a lot of big aggregate queries, consider materialized views :
Like "how many games player X won this week", etc
- create "helper" tables which contain the query results
- every night, recompute the results taking into account the most recent
data
- don't recompute results based on old data that never changes
This is only interesting if the aggregation reduces the data volume by
"an appreciable amount". For instance, if you run a supermarket with 1000
distinct products in stock and you sell 100.000 items a day, keeping a
cache of "count of product X sold each day" will reduce your data load by
about 100 on the query "count of product X sold this month".
The two suggestion above are not mutually exclusive.
You could try bizgres also. Or even MySQL !... MySQL's query engine is
slower than pg but the tables take much less space than Postgres, and it
can do index-only queries. So you can fit more in the cache. This is only
valid for MyISAM (InnoDB is a bloated hog). Of course, noone would want to
use MyISAM for the "safe" storage, but it's pretty good as a read-only
storage. You can even use the Archive format for even more compactness and
use of cache. Of course you'd have to devise a way to dump from pg and
load into MySQL but that's not hard. MySQL can be good if you target a
table with lots of small rows with a few ints, all of them in a
multicolumn index, so it doesn't need to hit the table itself.
Note that one in his right mind would never run aggregate queries on a
live R/W MyISAM table since the long queries will block all writes and
blow up the reaction time. But for a read-only cache updated at night, or
replication slave, it's okay.