On Wed, 2003-07-09 at 14:42, Rod Taylor wrote:
Clustering definatly helped with that case, and appears to have helped
with all of the dates I have had high execution times for... thanks for
the tip..
> > To give you some perspective on the size of the dataset and the
> > performance level we are hitting, here are some "good" results based on
> > some explains:
>
> Before Tom jumps in taking all the fun out of trying to solve it...
>
>
> The estimates in the slow queries seem perfectly reasonable. In fact,
> the cost estimates of both the slow and fast queries are the same which
> is what would be expected if all of the data was distributed evenly
> amongst the table.
>
> Given it's a date, I would guess that the data is generally inserted
> into the table in an order following the date but for some reason those
> 'high' dates have their data distributed more evenly amongst the table.
> Clustered data will have fewer disk seeks and deal with fewer pages of
> information in general which makes for a much faster query. Distributed
> data will have to pull out significantly more information from the disk,
> throwing most of it away.
>
> I would guess that sometime on 2002-05-25 someone did a bit of data
> cleaning (deleting records). Next day the free space map had entries
> available in various locations within the table, and used them rather
> than appending to the end. With 89 Million records with date being
> significant, I'm guessing there aren't very many modifications or
> deletes on it.
>
> So.. How to solve the problem? If this is the type of query that occurs
> most often, you do primarily inserts, and the inserts are generally
> created following date, cluster the table by index "some_table_ix_0".
> The clustering won't degrade very much since that is how you naturally
> insert the data.
--
Chris Bowlby <excalibur@hub.org>
Hub.Org Networking Services