Re: Large table search question - Mailing list pgsql-general

From Greg Stark
Subject Re: Large table search question
Date
Msg-id 874qpv8czt.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Large table search question  (Richard Huxton <dev@archonet.com>)
Responses Re: Large table search question  ("John Wells" <jb@sourceillustrated.com>)
List pgsql-general
Richard Huxton <dev@archonet.com> writes:

> If you execute a lot of queries for last_name="Smith" AND first_name="John"
> then it might well help, there are a lot of "Smith"s to choose from.

I think this is the crux of the argument. Even for a common last name like
Smith, you're going to be talking about what, a few thousand records? Probably
selective enough that the extra column isn't really necessary and you pay the
cost for that extra column on every single update and even on other lookups.

On the other hand this logic applies best to DSS style databases where you're
looking for the fastest average throughput. For OLTP databases it may not
hold: if 'Smith' breaks your performance guarantee then the application could
break. For systems like that it may be worth paying a 1% penalty everywhere
that's within your time budget to avoid paying a 100% penalty on the rare
query that would cause failures, even if on average that means a performance
loss.

In practice I find two column indexes are not uncommon, especially on
many-to-many relationship tables. Three column indexes are rare but they do
happen. Offhand I don't ever recall defining any indexes with four or more
columns.

> There's really no alternative to testing. The statistics tables are very useful
> here. Unless you have good reason not to, always turn the statistics gathering
> on, and take snapshot regularly to keep an eye on where PG is exerting the most
> effort.

IMHO many people rely too heavily on empirical data rather than having a good
sense of what they want to be happening. Statistics can obscure situations
like what I described above.

I do have statistics on though, and have the same thinking about always
leaving it on, but I'm unclear how to make use of these data. What tools
should I be looking at to use them?

--
greg

pgsql-general by date:

Previous
From: zhicheng wang
Date:
Subject: Re: after using pg_resetxlog, db lost
Next
From: Greg Stark
Date:
Subject: Problems with pgsql mail servers?