Thread: performance of partitioning?

performance of partitioning?

From
George Nychis
Date:
Hey all,

So I have a master table called "flows" and 400 partitions in the format
"flow_*" where * is equal to some epoch.

Each partition contains ~700,000 rows and has a check such that 1 field is equal
to a value:
    "flows_1107246900_interval_check" CHECK ("interval" = '2005-02-01
03:35:00'::timestamp without time zone)

Each partition has a different and unique non-overlapping check.

This query takes about 5 seconds to execute:
dp=> select count(*) from flows_1107246900;
  count
--------
  696836
(1 row)

This query has been running for 10 minutes now and hasn't stopped:
dp=> select count(*) from flows where interval='2005-02-01 03:35:00';

Isn't partitioning supposed to make the second query almost as fast?  My WHERE
is exactly the partitioning constraint, therefore it only needs to go to 1
partition and execute the query.

Why would it take magnitudes longer to run?  Am i misunderstanding something?

Thanks!
George

Re: performance of partitioning?

From
Jorge Godoy
Date:
George Nychis <gnychis@cmu.edu> writes:

> Hey all,
>
> So I have a master table called "flows" and 400 partitions in the format
> "flow_*" where * is equal to some epoch.
>
> Each partition contains ~700,000 rows and has a check such that 1 field is
> equal to a value:
>    "flows_1107246900_interval_check" CHECK ("interval" = '2005-02-01
> 03:35:00'::timestamp without time zone)
>
> Each partition has a different and unique non-overlapping check.
>
> This query takes about 5 seconds to execute:
> dp=> select count(*) from flows_1107246900;
>  count
> --------
>  696836
> (1 row)
>
> This query has been running for 10 minutes now and hasn't stopped:
> dp=> select count(*) from flows where interval='2005-02-01 03:35:00';
>
> Isn't partitioning supposed to make the second query almost as fast?  My WHERE
> is exactly the partitioning constraint, therefore it only needs to go to 1
> partition and execute the query.
>
> Why would it take magnitudes longer to run?  Am i misunderstanding something?

When checking from the partition it only contains the records from that
specific partition.  When checking from the parent table it contains records
for all partitions.

Also note that an index on interval wouldn't be helpful here, I believe, due
to the fact that data is in a different table and not on the parent one.


--
Jorge Godoy      <jgodoy@gmail.com>

Re: performance of partitioning?

From
cedric
Date:
Le mardi 27 février 2007 15:00, George Nychis a écrit :
> Hey all,
>
> So I have a master table called "flows" and 400 partitions in the format
> "flow_*" where * is equal to some epoch.
>
> Each partition contains ~700,000 rows and has a check such that 1 field is
> equal to a value:
>     "flows_1107246900_interval_check" CHECK ("interval" = '2005-02-01
> 03:35:00'::timestamp without time zone)
>
> Each partition has a different and unique non-overlapping check.
>
> This query takes about 5 seconds to execute:
> dp=> select count(*) from flows_1107246900;
>   count
> --------
>   696836
> (1 row)
>
> This query has been running for 10 minutes now and hasn't stopped:
> dp=> select count(*) from flows where interval='2005-02-01 03:35:00';
>
> Isn't partitioning supposed to make the second query almost as fast?  My
> WHERE is exactly the partitioning constraint, therefore it only needs to go
> to 1 partition and execute the query.
>
> Why would it take magnitudes longer to run?  Am i misunderstanding
> something?
perhaps you should consider constraint_exclusion
http://www.postgresql.org/docs/current/static/ddl-partitioning.html
http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-CONSTRAINT-EXCLUSION
>
> Thanks!
> George
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/

Re: performance of partitioning?

From
George Nychis
Date:

cedric wrote:
> Le mardi 27 février 2007 15:00, George Nychis a écrit :
>> Hey all,
>>
>> So I have a master table called "flows" and 400 partitions in the format
>> "flow_*" where * is equal to some epoch.
>>
>> Each partition contains ~700,000 rows and has a check such that 1 field is
>> equal to a value:
>>     "flows_1107246900_interval_check" CHECK ("interval" = '2005-02-01
>> 03:35:00'::timestamp without time zone)
>>
>> Each partition has a different and unique non-overlapping check.
>>
>> This query takes about 5 seconds to execute:
>> dp=> select count(*) from flows_1107246900;
>>   count
>> --------
>>   696836
>> (1 row)
>>
>> This query has been running for 10 minutes now and hasn't stopped:
>> dp=> select count(*) from flows where interval='2005-02-01 03:35:00';
>>
>> Isn't partitioning supposed to make the second query almost as fast?  My
>> WHERE is exactly the partitioning constraint, therefore it only needs to go
>> to 1 partition and execute the query.
>>
>> Why would it take magnitudes longer to run?  Am i misunderstanding
>> something?
> perhaps you should consider constraint_exclusion
> http://www.postgresql.org/docs/current/static/ddl-partitioning.html
> http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-CONSTRAINT-EXCLUSION
>> Thanks!
>> George
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Have you searched our list archives?
>>
>>                http://archives.postgresql.org/
>

That sounds like what i'm looking for, thanks.  I'll give it a try and report back.

- George

Re: performance of partitioning?

From
George Nychis
Date:

George Nychis wrote:
>
>
> cedric wrote:
>> Le mardi 27 février 2007 15:00, George Nychis a écrit :
>>> Hey all,
>>>
>>> So I have a master table called "flows" and 400 partitions in the format
>>> "flow_*" where * is equal to some epoch.
>>>
>>> Each partition contains ~700,000 rows and has a check such that 1
>>> field is
>>> equal to a value:
>>>     "flows_1107246900_interval_check" CHECK ("interval" = '2005-02-01
>>> 03:35:00'::timestamp without time zone)
>>>
>>> Each partition has a different and unique non-overlapping check.
>>>
>>> This query takes about 5 seconds to execute:
>>> dp=> select count(*) from flows_1107246900;
>>>   count
>>> --------
>>>   696836
>>> (1 row)
>>>
>>> This query has been running for 10 minutes now and hasn't stopped:
>>> dp=> select count(*) from flows where interval='2005-02-01 03:35:00';
>>>
>>> Isn't partitioning supposed to make the second query almost as fast?  My
>>> WHERE is exactly the partitioning constraint, therefore it only needs
>>> to go
>>> to 1 partition and execute the query.
>>>
>>> Why would it take magnitudes longer to run?  Am i misunderstanding
>>> something?
>> perhaps you should consider constraint_exclusion
>> http://www.postgresql.org/docs/current/static/ddl-partitioning.html
>> http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-CONSTRAINT-EXCLUSION
>>
>>> Thanks!
>>> George
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 4: Have you searched our list archives?
>>>
>>>                http://archives.postgresql.org/
>>
>
> That sounds like what i'm looking for, thanks.  I'll give it a try and
> report back.
>
> - George
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

Worked perfectly, the two commands have near exact execution time now.  Thank you!

- George

Re: performance of partitioning?

From
Brent Wood
Date:
George Nychis wrote:
> Hey all,
>
> So I have a master table called "flows" and 400 partitions in the
> format "flow_*" where * is equal to some epoch.
>
> Each partition contains ~700,000 rows and has a check such that 1
> field is equal to a value:
>    "flows_1107246900_interval_check" CHECK ("interval" = '2005-02-01
> 03:35:00'::timestamp without time zone)
>
> Each partition has a different and unique non-overlapping check.
>
> This query takes about 5 seconds to execute:
> dp=> select count(*) from flows_1107246900;
>  count
> --------
>  696836
> (1 row)
>
> This query has been running for 10 minutes now and hasn't stopped:
> dp=> select count(*) from flows where interval='2005-02-01 03:35:00';
>
> Isn't partitioning supposed to make the second query almost as fast?
> My WHERE is exactly the partitioning constraint, therefore it only
> needs to go to 1 partition and execute the query.
>
> Why would it take magnitudes longer to run?  Am i misunderstanding
> something?

We have a db with only 200,000,000 records, partitioned by year with
about 15 partitions. There is a clustered index on the timestamp field
and queries like a 25 wide self join for 3 months data are around 20
seconds. On a desktop box with a single SATA drive.

If you are querying by timestamp, I suggest a clustered index will help.

Brent Wood