Re: [HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] path toward faster partition pruning |
Date | |
Msg-id | CA+TgmoaVLDLc8=YESRwD32gPhodU_ELmXyKs77gveiYp+JE4vQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] path toward faster partition pruning (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] path toward faster partition pruning
|
List | pgsql-hackers |
+/* + * PartitionBoundCmpArg - Caller-defined argument to be passed to + * partition_bound_cmp() + * + * The first (fixed) argument involved in a comparison is the partition bound + * found in the catalog, while an instance of the following struct describes + * either a new partition bound being compared against existing bounds + * (caller should set is_bound to true and set bound), or a new tuple's + * partition key specified in datums (caller should set ndatums to the number + * of valid datums that are passed in the array). + */ +typedef struct PartitionBoundCmpArg +{ + bool is_bound; + union + { + PartitionListValue *lbound; + PartitionRangeBound *rbound; + PartitionHashBound *hbound; + } bound; + + Datum *datums; + int ndatums; +} PartitionBoundCmpArg; This is a pretty strange definition. datums/ndatums are never valid at the same time as any of lbound/rbound/hbound, but are not included in the union. Also, is_bound doesn't tell you which of rbound/lbound/hbound is actually valid. Granted, the current calling convention looks like a mess, too. Apparently, the argument to partition_bound_cmp is a PartitionBoundSpec if using hash partitioning, a Datum if list partitioning, and either a PartitionRangeBound or a Datum * if range partitioning depending on the value of probe_is_bound, and I use the word "apparently" because there are zero words of comments explaining what the argument to partition_bound_cmp of type "void *" is supposed to mean. I really should have noticed that and insisted that it be fixed before partitioning got committed. Looking a bit further, there are exactly two calls to partition_bound_cmp(). One is in partition_bound_bsearch() and the other is in check_new_partition_bound(). Now, looking at this, both the call to partition_bound_cmp() and every single call to partition_bound_bsearch() are inside a switch branch where we've dispatched on the partitioning type, which means that from code that is already specialized by partitioning type we are calling generic code which then turns back around and calls code that is specialized by partitioning type. Now, that could make sense if the generic code is pretty complex, but here's it's basically just the logic to do a bsearch. It seems to me that a cleaner solution here would be to duplicate that logic. Then we could have... static int partition_list_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, Datum value, bool *is_equal); static int partition_range_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, PartitionRangeBound *probe); static int partition_range_datum_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, int nvalues, Datum *values); static int partition_hash_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, int modulus, int remainder, bool *is_equal); ...which would involve fewer branches at runtime and more type-safety at compile time. partition_hash_bsearch could directly call partition_hbound_cmp, partition_list_bsearch could directly invoke FunctionCall2Coll, partition_range_bsearch could directly call partition_rbound_cmp, and partition_range_datum_bsearch could directly call partition_rbound_datum_cmp. All-in-all that seems a lot nicer to me than what we have here now. IIUC, the purpose of this patch is to let you search on a prefix of the partition keys, but I think that's really only possible for range partitioning, and it seems like the proposed nvalues argument to partition_range_datum_bsearch would give you what you need. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: