Thread: Partitioning

Partitioning

From
James Sewell
Date:
Hello,

I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id' column. I have an index on the id column on each sub-table.

Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do index/sequential scans of every sub-table?

I want to select all rows which have an id which is in another query, so something like:

  WITH idlist as (SELECT id from othertable) 
    SELECT id from mastertable WHERE id = idlist.id);

I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my subquery at plan time?

Any tricks I am overlooking?

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
 

Level 2, 50 Queen St, Melbourne VIC 3000

(+61) 3 8370 8000  W www.lisasoft.com  (+61) 3 8370 8099
 


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Partitioning

From
John R Pierce
Date:
On 1/18/2015 5:58 PM, James Sewell wrote:
  WITH idlist as (SELECT id from othertable) 
    SELECT id from mastertable WHERE id = idlist.id);


    select mt.id, ...   from mastertable mt join othertable ot on mt.id=ot.id;

might optimize better.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Partitioning

From
James Sewell
Date:
Sadly not ... I still hit all the tables.



Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
 

Level 2, 50 Queen St, Melbourne VIC 3000

(+61) 3 8370 8000  W www.lisasoft.com  (+61) 3 8370 8099
 

On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote:
On 1/18/2015 5:58 PM, James Sewell wrote:
  WITH idlist as (SELECT id from othertable) 
    SELECT id from mastertable WHERE id = idlist.id);


    select mt.id, ...   from mastertable mt join othertable ot on mt.id=ot.id;

might optimize better.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Partitioning

From
Kyotaro HORIGUCHI
Date:
Hello,

an 2015 14:13:37 +1100, James Sewell <james.sewell@lisasoft.com> wrote in
<CANkGpBs8GypQ3TQGKdjTD+n-w1rkq5uO97h3tuhg5eWaKR6RbA@mail.gmail.com>
> Sadly not ... I still hit all the tables.

| 5.9.4. Partitioning and Constraint Exclusion

http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html

Constraint exclusion is a mechanism to omit tables that are known
to have no hit by the query *beforehand* execution. So the
criteria cannot rely on out of the query itself (and CHECK
constraints, of course).

Your query uses the result of the WITH-clause-query in the WHERE
clause which is unknown to the planner so constraint exclusion
does not work. JOINs don't change the situation.


> On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote:
>
> >  On 1/18/2015 5:58 PM, James Sewell wrote:
> >
> >   WITH idlist as (SELECT id from othertable)
> >     SELECT id from mastertable WHERE id = idlist.id);
> >
> >
> >
> >     select mt.id, ...   from mastertable mt join othertable ot on mt.id=
> > ot.id;
> >
> > might optimize better.

As the result, the query inevitably scans all the tables, but not
necessariry in sequqntial scans or simple index scans. The
suggestion above seeems showing the notation which the planner
can find the better plans on that premise.

For example, if you have an index on id of one of the two tables,
(and some other conditions match, of course) index only scan will
be selected for it and the suggested query will give you a
seemingly better plan than your query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: Partitioning

From
James Sewell
Date:
Hello,

Thanks for the reply. I can write queries which get index scans, but they are still slow. 

Each index is about 2.5GB, I suspect I am trying to read a these into memory in entirety.

Perhaps there is no way to tune this?

Cheers, james 



On Monday, 19 January 2015, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello,

an 2015 14:13:37 +1100, James Sewell <james.sewell@lisasoft.com> wrote in <CANkGpBs8GypQ3TQGKdjTD+n-w1rkq5uO97h3tuhg5eWaKR6RbA@mail.gmail.com>
> Sadly not ... I still hit all the tables.

| 5.9.4. Partitioning and Constraint Exclusion

http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html

Constraint exclusion is a mechanism to omit tables that are known
to have no hit by the query *beforehand* execution. So the
criteria cannot rely on out of the query itself (and CHECK
constraints, of course).

Your query uses the result of the WITH-clause-query in the WHERE
clause which is unknown to the planner so constraint exclusion
does not work. JOINs don't change the situation.


> On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote:
>
> >  On 1/18/2015 5:58 PM, James Sewell wrote:
> >
> >   WITH idlist as (SELECT id from othertable)
> >     SELECT id from mastertable WHERE id = idlist.id);
> >
> >
> >
> >     select mt.id, ...   from mastertable mt join othertable ot on mt.id=
> > ot.id;
> >
> > might optimize better.

As the result, the query inevitably scans all the tables, but not
necessariry in sequqntial scans or simple index scans. The
suggestion above seeems showing the notation which the planner
can find the better plans on that premise.

For example, if you have an index on id of one of the two tables,
(and some other conditions match, of course) index only scan will
be selected for it and the suggested query will give you a
seemingly better plan than your query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
 

Level 2, 50 Queen St, Melbourne VIC 3000

(+61) 3 8370 8000  W www.lisasoft.com  (+61) 3 8370 8099
 



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Partitioning

From
John R Pierce
Date:
On 1/18/2015 11:13 PM, James Sewell wrote:
>
> Each index is about 2.5GB, I suspect I am trying to read a these into
> memory in entirety.

an 11GB table with a (presumably integer) primary key requires an 2.5GB
index ?      100 of these would need 250GB of shared_buffers to stay
resident, not likely.





--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: Partitioning

From
James Sewell
Date:
Yeah definitely not staying resident once read, although the machine does gave 256GB of memory so some will persist in the OS cache.

Actually this brings up another question, if I have an unique integer index of 2.gb what percentage would I expect to read for a value that was higher or lower than all index values?

Cheers,
James





On Monday, 19 January 2015, John R Pierce <pierce@hogranch.com> wrote:
On 1/18/2015 11:13 PM, James Sewell wrote:

Each index is about 2.5GB, I suspect I am trying to read a these into memory in entirety.

an 11GB table with a (presumably integer) primary key requires an 2.5GB index ?      100 of these would need 250GB of shared_buffers to stay resident, not likely.





--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


--

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
 

Level 2, 50 Queen St, Melbourne VIC 3000

(+61) 3 8370 8000  W www.lisasoft.com  (+61) 3 8370 8099
 



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Partitioning

From
John R Pierce
Date:
On 1/18/2015 11:59 PM, James Sewell wrote:

Actually this brings up another question, if I have an unique integer index of 2.gb what percentage would I expect to read for a value that was higher or lower than all index values?

a couple 8k blocks.  its a b-tree.

-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Partitioning

From
Kyotaro HORIGUCHI
Date:
Hi,

At Mon, 19 Jan 2015 00:14:55 -0800, John R Pierce <pierce@hogranch.com> wrote in <54BCBCFF.5040100@hogranch.com>
> On 1/18/2015 11:59 PM, James Sewell wrote:
> >
> > Actually this brings up another question, if I have an unique integer
> > index of 2.gb <http://2.gb> what percentage would I expect to read for
> > a value that was higher or lower than all index values?
>
> a couple 8k blocks.  its a b-tree.

Yeah, scanning children reading highest/lowest values for each of
them would make it faster than the first single query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: Partitioning

From
Kyotaro HORIGUCHI
Date:

Sorry,

> Yeah, scanning children reading highest/lowest values for each of them would make it faster than the first single query.

Mmm..no. It has nothing different from querieng on the parent table. Please Ignore  the silly thing.

--
Kyotaro Horiguchi

Re: Partitioning

From
François Beausoleil
Date:

Le 2015-01-18 à 20:58, James Sewell <james.sewell@lisasoft.com> a écrit :

Hello,

I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id' column. I have an index on the id column on each sub-table.

Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do index/sequential scans of every sub-table?

I want to select all rows which have an id which is in another query, so something like:

  WITH idlist as (SELECT id from othertable) 
    SELECT id from mastertable WHERE id = idlist.id);

I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my subquery at plan time?

How many rows in idlist? Can you do two queries? Fetch the ids, then call the 2nd query with those values hard-coded in the query.

If necessary, and you know where each partition starts, tou could sort in the app and query the correct ranges, in multiple queries.

Hope that helps!
François Beausoleil

Re: Partitioning

From
Kyotaro HORIGUCHI
Date:
Hi,

19 Jan 2015 06:58:21 -0500, François Beausoleil <francois@teksol.info> wrote in
<CC2FD572-320A-4225-B98C-48F20992527C@teksol.info>
>
> > Le 2015-01-18 à 20:58, James Sewell <james.sewell@lisasoft.com> a écrit :
> >
> > Hello,
> >
> > I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id'
column.I have an index on the id column on each sub-table. 
> >
> > Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do
index/sequentialscans of every sub-table? 
> >
> > I want to select all rows which have an id which is in another query, so something like:
> >
> >   WITH idlist as (SELECT id from othertable)
> >     SELECT id from mastertable WHERE id = idlist.id);
> >
> > I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my
subqueryat plan time? 
> >
> How many rows in idlist? Can you do two queries? Fetch the ids, then call the 2nd query with those values hard-coded
inthe query. 
>
> If necessary, and you know where each partition starts, tou could sort in the app and query the correct ranges, in
multiplequeries. 

The strategy's effectiveness is depends mainly on how many ids
come from the othertable. It wins if fewer than certain number or
converged in a few partitions, however, straight joins will win
elsewise.

The result of EXPLAIN ANALYZE might draw more precise advices.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: Partitioning

From
Kyotaro HORIGUCHI
Date:
Hello,

an 2015 14:13:37 +1100, James Sewell <james.sewell@lisasoft.com> wrote in
<CANkGpBs8GypQ3TQGKdjTD+n-w1rkq5uO97h3tuhg5eWaKR6RbA@mail.gmail.com>
> Sadly not ... I still hit all the tables.

| 5.9.4. Partitioning and Constraint Exclusion

http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html

Constraint exclusion is a mechanism to omit tables that are known
to have no hit by the query *beforehand* execution. So the
criteria cannot rely on out of the query itself (and CHECK
constraints, of course).

Your query uses the result of the WITH-clause-query in the WHERE
clause which is unknown to the planner so constraint exclusion
does not work. JOINs don't change the situation.


> On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote:
>
> >  On 1/18/2015 5:58 PM, James Sewell wrote:
> >
> >   WITH idlist as (SELECT id from othertable)
> >     SELECT id from mastertable WHERE id = idlist.id);
> >
> >
> >
> >     select mt.id, ...   from mastertable mt join othertable ot on mt.id=
> > ot.id;
> >
> > might optimize better.

As the result, the query inevitably scans all the tables, but not
necessariry in sequqntial scans or simple index scans. The
suggestion above seeems showing the notation which the planner
can find the better plans on that premise.

For example, if you have an index on id of one of the two tables,
(and some other conditions match, of course) index only scan will
be selected for it and the suggested query will give you a
seemingly better plan than your query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: Partitioning

From
Kyotaro HORIGUCHI
Date:
Hi,

At Mon, 19 Jan 2015 00:14:55 -0800, John R Pierce <pierce@hogranch.com> wrote in <54BCBCFF.5040100@hogranch.com>
> On 1/18/2015 11:59 PM, James Sewell wrote:
> >
> > Actually this brings up another question, if I have an unique integer
> > index of 2.gb <http://2.gb> what percentage would I expect to read for
> > a value that was higher or lower than all index values?
>
> a couple 8k blocks.  its a b-tree.

Yeah, scanning children reading highest/lowest values for each of
them would make it faster than the first single query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: Partitioning

From
Kyotaro HORIGUCHI
Date:
Hi,

19 Jan 2015 06:58:21 -0500, François Beausoleil <francois@teksol.info> wrote in
<CC2FD572-320A-4225-B98C-48F20992527C@teksol.info>
>
> > Le 2015-01-18 à 20:58, James Sewell <james.sewell@lisasoft.com> a écrit :
> >
> > Hello,
> >
> > I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id'
column.I have an index on the id column on each sub-table. 
> >
> > Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do
index/sequentialscans of every sub-table? 
> >
> > I want to select all rows which have an id which is in another query, so something like:
> >
> >   WITH idlist as (SELECT id from othertable)
> >     SELECT id from mastertable WHERE id = idlist.id);
> >
> > I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my
subqueryat plan time? 
> >
> How many rows in idlist? Can you do two queries? Fetch the ids, then call the 2nd query with those values hard-coded
inthe query. 
>
> If necessary, and you know where each partition starts, tou could sort in the app and query the correct ranges, in
multiplequeries. 

The strategy's effectiveness is depends mainly on how many ids
come from the othertable. It wins if fewer than certain number or
converged in a few partitions, however, straight joins will win
elsewise.

The result of EXPLAIN ANALYZE might draw more precise advices.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general