Re: Do we want a hashset type? - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Do we want a hashset type? |
Date | |
Msg-id | 136e97fe-a41c-7e32-f58f-312f1a0e4caa@enterprisedb.com Whole thread Raw |
In response to | Re: Do we want a hashset type? ("Joel Jacobson" <joel@compiler.org>) |
Responses |
Re: Do we want a hashset type?
|
List | pgsql-hackers |
On 6/20/23 00:50, Joel Jacobson wrote: > On Mon, Jun 19, 2023, at 14:59, Tomas Vondra wrote: >> What unexpected issues you mean? Sure, if someone uses multisets as if >> they were sets (so ignoring the handling of duplicates), things will go >> booom! quickly. > > The unexpected issues I had in mind are subtle bugs due to treating multisets > as sets, which could go undetected due to having no duplicates initially. > Multisets might initially therefore seem equal, but later diverge due to > different element counts, leading to hard-to-detect issues. > Understood. >> I imagined (if we ended up doing MULTISET) we'd provide interface (e.g. >> operators) that'd allow perhaps help with this. > > Might help. But still think providing both structures would be a more foolproof > solution, offering users the choice to select what's best for their use-case. > Yeah. Not confusing people is better. >>> Despite SQL's multiset possibility, a distinct hashset type is my preference, >>> helping appropriate data structure choice and reducing misuse. >>> >>> The necessity of multisets is vague beyond standards compliance. >> >> True - we haven't had any requests/proposal to implement MULTISETs. >> >> I've looked at the SQL standard primarily to check if maybe there's some >> precedent that'd give us guidance on the SQL syntax etc. And I think >> multisets are that - even if we end up not implementing them, it'd be >> sad to have unnecessarily inconsistent syntax (in case someone decides >> to add multisets in the future). >> >> We could invent "SET" data type, so while standard has ARRAY / MULTISET, >> we'd have ARRAY / MULTISET / SET, and the difference between the last >> two would be just handling of duplicates. > > Is the idea to use the "SET" keyword for the syntax? > Isn't it a risk that will be confusing, since "SET" is currently > only used for configuration and update operations? > I haven't tried doing that, so not sure if there would be any conflicts in the grammar. But I can't think of a case that'd be confusing for users - when setting internal GUC variables it's a completely different context, there's no use for SQL-level collections (arrays, sets, ...). For UPDATE, it'd be pretty clear too, I think. It's possible to do UPDATE table SET col = SET[1,2,3] and it's clear the first is the command SET, while the second is a set constructor. For SELECT there'd be conflict, and for ALTER TABLE it'd be possible to do ALTER TABLE table ALTER COLUMN col SET DEFAULT SET[1,2,3]; Seems clear to me too, I think. >> The other way to look at sets is that they are pretty similar to arrays, >> except that there are no duplicates and order does not matter. Sure, the >> on-disk format and code is different, but from the SQL perspective it'd >> be nice to allow using sets in most places where arrays are allowed >> (which is what the standard does for MULTISETS, more or less). >> >> That'd mean we could probably search through gram.y for places working >> with arrays ("ARRAY array_expr", "ARRAY select_with_parens", ...) and >> make them work with sets too, say by having SET_SUBLINK instead of >> ARRAY_SUBLINK, set_expression instead of array_expression, etc. >> >> This might be also "consistent" with defining hashset type using CREATE >> TYPE with ELEMENT, because we consider the type to be "array". So that >> would be polymorphic type, but we don't have pre-defined array for every >> type (and I'm not sure we want to). >> >> Of course, maybe there's some fatal flaw in these idea, I don't know. >> And I don't want to move the goalposts too far - but it seems like this >> might make some stuff actually simpler to implement (by piggy-backing on >> the existing array infrastructure). > > I think it's very interesting thoughts and ambitions. > > I wonder though, from a user-perspective, if a new hashset type still > wouldn't just be considered simpler, than introducing new SQL syntax? > It's a matter of personal taste, I guess. I'm fine with calling function API and what not, but a sensible SQL syntax seems nicer. > However, it would be interesting to see how the piggy-backing on the > existing array infrastructure would look in practise code-wise though. > > I think it's still meaningful to continue hacking on the int4-type > hashset extension, to see if we can agree on the semantics, > especially around null handling and sorting. > Definitely. It certainly was not my intention to derail the work by proposing more and more stuff. So feel free to pursue what makes sense to you / helps the use case. TBH I don't particularly see why we'd want to sort sets. I wonder if the SQL standard says something about these things (for MULTISETs), especially for the NULL handling. If it does, I'd try to stick with those rules. >> A mostly unrelated thought - I wonder if this might be somehow related >> to the foreign key array patch ([1] might be the most recent attempt in >> this direction). Not to hashset itself, but I recalled these patches >> because it'd mean we don't need the separate "edges" link table (so the >> hashset column would be the think backing the FK). >> >> [1] >> https://www.postgresql.org/message-id/CAJvoCut7zELHnBSC8HrM6p-R6q-NiBN1STKhqnK5fPE-9%3DGq3g%40mail.gmail.com > > I remember that one! We tried to revive that one, but didn't manage to keep it alive. > It's a really good idea though. Good idea to see if there might be synergies > between arrays and hashsets in this area, since if we envision the elements in > a hashset mostly will be PKs, then it would be nice to enforce reference > integrity. I haven't followed that at all, but I wonder how difficult would it be to also support other collection types (like sets) and not just arrays. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: