Home > mailing lists

Re: Expected performance of querying 5k records from 4 million records? - Mailing list pgsql-performance

From	Samuel Gendler
Subject	Re: Expected performance of querying 5k records from 4 million records?
Date	June 18, 2012 16:49:55
Msg-id	CAEV0TzBC_MVO_D65dWMoRXFSMCnFfFa7ZDikcom9aEKFXsjL0A@mail.gmail.com Whole thread Raw
In response to	Re: Expected performance of querying 5k records from 4 million records? (Anish Kejariwal <anishkej@gmail.com>)
List	pgsql-performance

Tree view

On Mon, Jun 18, 2012 at 9:39 AM, Anish Kejariwal <anishkej@gmail.com> wrote:

So, it looks like clustering the index appropriately fixes things! Also, I'll recreate the index switching the order to (dataset_id, stat_id,data_id)

Just keep in mind that clustering is a one-time operation. Inserts and updates will change the order of records in the table, so you'll need to re-cluster periodically to keep performance high if there are a lot of inserts and updates into the tables. I didn't re-read the thread, but I seem recall a partitioned table, so assuming you are partitioning in a manner which keeps the number of partitions that are actively being inserted/updated on to a minimum, you only need to cluster the active partitions, which isn't usually terribly painful. Also, if you are bulk loading data (and not creating random spaces in the table by deleting and updating), you can potentially order the data on the way into the table to avoid the need to cluster repeatedly.

--sam

pgsql-performance by date:

From: Anish Kejariwal
Date: 18 June 2012, 16:39:51
Subject: Re: Expected performance of querying 5k records from 4 million records?

From: Josh Berkus
Date: 18 June 2012, 19:08:57
Subject: Re: Expected performance of querying 5k records from 4 million records?

Re: Expected performance of querying 5k records from 4 million records? - Mailing list pgsql-performance

Previous

Next