Re: [HACKERS] [POC] hash partitioning - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] [POC] hash partitioning |
Date | |
Msg-id | CA+TgmoZ_UGXfq5ygeDDMdUSJ4J_VX7nFnjC6mfY6BgOJ3qZCmw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] [POC] hash partitioning (amul sul <sulamul@gmail.com>) |
Responses |
Re: [HACKERS] [POC] hash partitioning
|
List | pgsql-hackers |
On Tue, Oct 24, 2017 at 1:21 PM, amul sul <sulamul@gmail.com> wrote: > Updated patch attached. This patch needs a rebase. It appears that satisfies_hash_func is declared incorrectly in pg_proc.h. ProcedureCreate seems to think that provariadic should be ANYOID if the type of the last element is ANYOID, ANYELEMENTOID if the type of the last element is ANYARRAYOID, and otherwise the element type corresponding to the array type. But here you have the last element as int4[] but provariadic is any. I wrote the following query to detect problems of this type, and I think we might want to just go ahead and add this to the regression test suite, verifying that it returns no rows: select oid::regprocedure, provariadic::regtype, proargtypes::regtype[] from pg_proc where provariadic != 0 and case proargtypes[array_length(proargtypes, 1)-1] when 2276 then 2276 -- any -> any when 2277 then 2283 -- anyarray-> anyelement else (select t.oid from pg_type t where t.typarray = proargtypes[array_length(proargtypes, 1)-1]) end != provariadic; The simple fix is change provariadic to int4 and call it good. It's tempting to go the other way and actually make it satisfies_hash_partition(int4, int4, variadic "any"), passing the column values directly and letting satisfies_hash_partition doing the hashing itself. Any arguments that had a partition key type different from the column type would have a RelabelType node placed on top of the column, so that get_fn_expr_argtype would return the partition key type. Then, the function could look up the hash function for that type and call it directly on the value. That way, we'd be doing only one function call instead of many, and the partition constraint would look nicer in \d+ output, too. :-) On the other hand, that would also mean that we'd have to look up the extended hash function every time through this function, though maybe that could be prevented by using fn_extra to cache FmgrInfos for all the hash functions on the first time through. I'm not sure how that would compare in terms of speed with what you have now, but maybe it's worth trying. The second paragraph of the CREATE TABLE documentation for PARTITION OF needs to be updated like this: "The form with <literal>IN</literal> is used for list partitioning, the form with <literal>FROM</literal> and <literal>TO</literal> is used for range partitioning, and the form with <literal>WITH</literal> is used for hash partitioning." The CREATE TABLE documentation says "When using range partitioning, the partition key can include multiple columns or expressions (up to 32,"; this should be changed to say "When using range or hash partitioning". - expression. If no B-tree operator class is specified when creating a - partitioned table, the default B-tree operator class for the datatype will - be used. If there is none, an error will be reported. + expression. If no operator class is specified when creating a partitioned + table, the default operator class of the appropriate type (btree for list + and range partitioning, hash for hash partitioning) will be used. If + there is none, an error will be reported. + </para> + + <para> + Since hash operator class provides only equality, not ordering, collation + is not relevant for hash partitioning. The behaviour will be unaffected + if a collation is specified. + </para> + + <para> + Hash partitioning will use support function 2 routines from the operator + class. If there is none, an error will be reported. See <xref + linkend="xindex-support"> for details of operator class support + functions. I think we should rework this a little more heavily. I suggest the following, starting after "a single column or expression": <para> Range and list partitioning require a btree operator class, while hash partitioning requires a hash operator class. If no operator class is specified explicitly, the default operator class of the appropriate type will be used; if no default operator class exists, an error will be raised. When hash partitioning is used, the operator class used must implement support function 2 (see <xref linkend="xindex-support"> for details). </para> I think we can leave out the part about collations. It's possibly worth a longer explanation here at some point: for range partitioning, collation can affect which rows go into which partitions; for list partitioning, it can't, but it can affect the order in which partitions are expanded (which is a can of worms I'm not quite ready to try to explain in user-facing documentation); for hash partitioning, it makes no difference at all. Although at some point we may want to document this, I think it's a job for a separate patch, since (1) the existing documentation doesn't document the precise import of collations on existing partitioning types and (2) I'm not sure that CREATE TABLE is really the best place to explain this. The example commands for creating a hash-partitioned table are missing spaces between WITH and the parenthesis which follows. In 0003, the changes to partition_bounds_copy claim that I shouldn't worry about the fact that typlen is set to 4 because datumCopy won't use it for a pass-by-value datatype, but I think that calling functions with incorrect arguments and hoping that they ignore them and therefore nothing bad happens doesn't sound like a very good idea. Fortunately, I think the actual code is fine; I think we just need to change the comments. For hash partitioning, the datums array always contains two integers, which are of type int4, which is indeed a pass-by-value type of length 4 (note that if we were using int8 for the modulus and remainder, we'd need to set byval to FLOAT8PASSBYVAL). I would just write this as: if (hash_part) { typlen = sizeof(int32); /* always int4 */ byval = true; /* int4 is pass-by-value */ } + for (i = 0; i < nkeys; i++) + { + if (!isnull[i]) + rowHash = hash_combine64(rowHash, DatumGetUInt64(hash_array[i])); + } Excess braces. I think it might be better to inline the logic in mix_hash_value() into each of the two callers. Then, the callers wouldn't need Datum hash_array[PARTITION_MAX_KEYS]; they could just fold each new hash value into a uint64 value. That seems likely to be slightly faster and I don't see any real downside. rhaas=# create table natch (a citext, b text) partition by hash (a); ERROR: XX000: missing support function 2(16398,16398) in opfamily 16437 LOCATION: RelationBuildPartitionKey, relcache.c:954 It shouldn't be possible to reach an elog() from SQL, and this is not a friendly error message. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: