Home > mailing lists

Thread: Question on partitioning

Question on partitioning

From

"Oliveiros Cristina"

Date:

21 August 2008, 11:25:40

Hello , All.

I am not sure if this is the right mailing list to place this question.

If it doesn't, please kindly redirect me to the right list.

I have a giant table with about 10,6 million records.

Queries on it are usually slow, and if I try to do something more elaborate like an INNER JOIN with itself it becomes unnacceptably slow.

I am looking for a way to improve performance.

One of the columns is of type date. Each "day" includes about a few tens of thousands records

And the older a date is the less likely I am to do queries on it.

The objective of the "self join" is to compare data from two different days, looking for diferences.

Ive read that one of the benefits of partitioning is to speed up queries by separating less used records.

My question is if partitioning can be a good way to make the queries faster (specially the self joins) or if it isn't worth trying because it doesn't help on my particular situation.

Please kindly advice me on this

Many thanks in advance for your kind help

Best,

Oliveiros

Re: Question on partitioning

From

"Scott Marlowe"

Date:

21 August 2008, 13:50:36

On Thu, Aug 21, 2008 at 8:25 AM, Oliveiros Cristina
<oliveiros.cristina@marktest.pt> wrote:
> Hello , All.
>
> I am not sure if this is the right mailing list to place this question.
> If it doesn't, please kindly redirect me to the right list.
>
> I have a giant table with about 10,6 million records.

10.6 million rows isn't really gigantic.  It's a good size, but unless
you're hitting a lot of the records at once, you should be able to get
pretty good performance if it's indexed properly.

> Queries on it are usually slow, and if I try to do something more elaborate
> like an INNER JOIN with itself it becomes unnacceptably slow.
>
> I am looking for a way to improve performance.
> One of the columns is of type date. Each "day" includes about a few tens of
> thousands records
> And the older  a date is the less likely I am to do queries on it.

Well, first let's look at your indexes and if they're being used.
Got an explain analyze output of your slow queries for us to look at?

Re: Question on partitioning

From

Mark Roberts

Date:

21 August 2008, 15:00:56

On Thu, 2008-08-21 at 15:25 +0100, Oliveiros Cristina wrote:
> Hello , All.
>  
> I am not sure if this is the right mailing list to place this
> question.
> If it doesn't, please kindly redirect me to the right list.
>  
> I have a giant table with about 10,6 million records.
>  
> Queries on it are usually slow, and if I try to do something more
> elaborate like an INNER JOIN with itself it becomes unnacceptably
> slow.
>  
> I am looking for a way to improve performance.
> One of the columns is of type date. Each "day" includes about a few
> tens of thousands records
> And the older  a date is the less likely I am to do queries on it.
>  
> The objective of the "self join" is to compare data from two different
> days, looking for diferences.
>  
> Ive read that one of the benefits of partitioning is to speed up
> queries by separating less used records.
>  
> My question is if partitioning can be a good way to make the queries
> faster (specially the self joins) or if it isn't worth trying because
> it doesn't help on my particular situation.
>  
> Please kindly advice me on this
>  
> Many thanks in advance for your kind help
>  
> Best,
> Oliveiros

I would expect partitioning to work.  I've heard tell that fine grained
partitioning coupled with check constraints can even eliminate the need
for certain indexes.  I do know that check constraints on the date will
help you tremendously if you decide to partition.

-Mark

Re: Question on partitioning

From

s.caillet@free.fr

Date:

22 August 2008, 06:02:11

Hi Christina,

Quoting Oliveiros Cristina <oliveiros.cristina@marktest.pt>:

> Hello , All.
>
> I am not sure if this is the right mailing list to place this question.
> If it doesn't, please kindly redirect me to the right list.
>
> I have a giant table with about 10,6 million records.
>
> Queries on it are usually slow, and if I try to do something more elaborate
> like an INNER JOIN with itself it becomes unnacceptably slow.
>
> I am looking for a way to improve performance.
> One of the columns is of type date. Each "day" includes about a few tens of
> thousands records
> And the older  a date is the less likely I am to do queries on it.
>
> The objective of the "self join" is to compare data from two different days,
> looking for diferences.
>
> Ive read that one of the benefits of partitioning is to speed up queries by
> separating less used records.
>
> My question is if partitioning can be a good way to make the queries faster
> (specially the self joins) or if it isn't worth trying because it doesn't
> help on my particular situation.
>
> Please kindly advice me on this
>
> Many thanks in advance for your kind help
>
> Best,
> Oliveiros
>

I'm sure partitioning is a good choice to improve your performance. But, as
Scott noticed, be sure first to have done all possible improvements on your
table such as indexes, of course (only if they don't damage your insert
performances).

I use partitioning with check constraints on a timestamp field and it works well
without any optimization of the conf files of postgresql. It's ok for massive
inserts (csv records to copy up to 60000 times per second) without indexes (they
cost a lot for insert statements), as to select many datas on the same tables.

Best regards,

Sylvain Caillet
IT Performance Software Project Manager