Re: [HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers

From David Rowley
Subject Re: [HACKERS] path toward faster partition pruning
Date
Msg-id CAKJS1f_UBRUtmX29r-3ayysGAT3PGcs-z9DQDqGhuat87dbFrg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] path toward faster partition pruning  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: [HACKERS] path toward faster partition pruning
List pgsql-hackers
On 9 January 2018 at 21:40, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> Sorry about the absence in the last few days.  I will post a new version
> addressing various review comments by the end of this week.

I'm sorry for the flood of emails today.

Beena's tests on the run-time partition pruning patch also indirectly
exposed a problem with this patch.

Basically, the changes to add_paths_to_append_rel() are causing
duplication in partition_rels.

A test case is:

create table part (a int, b int) partition by list(a);
create table part1 partition of part for values in(1) partition by list (b);
create table part2 partition of part1 for values in(1);

select * from part;

partition_rels ends up with 3 items in the list, but there's only 2
partitions here. The reason for this is that, since planning here is
recursively calling add_paths_to_append_rel, the list for part ends up
with itself and part1 in it, then since part1's list already contains
itself, per set_append_rel_size's "rel->live_partitioned_rels =
list_make1_int(rti);", then part1 ends up in the list twice.

It would be nicer if you could use a RelIds for this, but you'd also
need some way to store the target partition relation since
nodeModifyTable.c does:

/* The root table RT index is at the head of the partitioned_rels list */
if (node->partitioned_rels)
{
    Index root_rti;
    Oid root_oid;

    root_rti = linitial_int(node->partitioned_rels);
    root_oid = getrelid(root_rti, estate->es_range_table);
    rel = heap_open(root_oid, NoLock); /* locked by InitPlan */
}

You could also fix it by instead of doing:

/*
* Accumulate the live partitioned children of this child, if it's
* itself partitioned rel.
*/
if (childrel->part_scheme)
    partitioned_rels = list_concat(partitioned_rels,
       childrel->live_partitioned_rels);

do something along the lines of:

if (childrel->part_scheme)
{
    ListCell *lc;
    ListCell *start = lnext(list_head(childrel->live_partitioned_rels));

    for_each_cell(lc, start)
       partitioned_rels = lappend_int(partitioned_rels,
       lfirst_int(lc));
}

Although it seems pretty fragile. It would probably be better to find
a nicer way of handling all this.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] SQL/JSON in PostgreSQL
Next
From: Jeff Davis
Date:
Subject: Re: Rangejoin rebased