Re: Setting Statistics on Functional Indexes - Mailing list pgsql-performance

From Shaun Thomas
Subject Re: Setting Statistics on Functional Indexes
Date
Msg-id 508EA041.6090101@optionshouse.com
Whole thread Raw
In response to Re: Setting Statistics on Functional Indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 10/26/2012 04:08 PM, Tom Lane wrote:

> So the bottom line is that this is a case where you need a lot of
> resolution in the histogram.  I'm not sure there's anything good
> we can do to avoid that.

I kinda hoped it wouldn't be something like that. For the particularly
painful instance, it was easy to replace the index with a better (if
larger) dual index and drop the bad old one. But in some cases, I'm
having to maintain two indexes that make me sad:

CREATE TABLE activity (
   activity_id  SERIAL NOT NULL PRIMARY KEY,
   account_id   BIGINT NOT NULL,
   action_date  TIMESTAMP WITHOUT TIME ZONE
);

CREATE INDEX idx_activity_action_date_account_id
     ON activity (action_date, activity_id);

CREATE INDEX idx_activity_account_id_action_date
     ON activity (activity_id, action_date);

Because in the first case, we needed the action_date to be first for
analytics that *don't* supply account_id. But in the second case, we
need the account_id first, so we can get the most recent action(s) for
that account without a very expensive backwards index scan on the first
index.

I know that current_date seems like an edge case, but I can't see how
getting the most recent activity for something is an uncommon activity.
Tip tracking is actually the most frequent pattern in the systems I've
seen. Admittedly, those are almost always high TPS trading systems.

At this point, I'm almost willing to start putting in optimization
fences to force it along the right path. Which is gross, because that's
effectively no better than Oracle hints. But I also don't like setting
my statistics to 5000+ on problematic column/index combos to get the
right heuristics, or having semi-duplicate multi-column indexes to
exploit sorting performance.

I mean, I get it. I just wonder if this particular tweak isn't more of a
regression than initially thought.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


pgsql-performance by date:

Previous
From: "Albe Laurenz"
Date:
Subject: Re: Replaying 48 WAL files takes 80 minutes
Next
From: Shaun Thomas
Date:
Subject: Re: Tons of free RAM. Can't make it go away.