Thread: multiple UNIQUE indices for FK
Hi, For some time I'm struggling to get my schema "optimised" for a sort of "message exchange" (or "document circulation") system. For every record in the table of those messages I have: 1. SENDER 2. RECEIPIENT 3. unique (sender assigned)SN 4. ... and naturally all the other stuff, like the message itself, timestamps, etc. My plan is to have it unique-constraint against 1+3, for joins and to keep the "sanity bonds" in force all the time. So I figure to have: ALTER ...msgs ADD CONSTRINT sender_uniq UNIQUE (sender,SSN); Unfortunately all that proved to be "not so good" for application level, since there I "almost always" a need to select "MY" messages, which lead to: SELECT * FROM msgs WHERE sender = "ME" UNION ALL SELECT * FROM msgs WHERE receipient = "ME"; Which does not look so bad, but when one has to JOIN it with other stuff, the application becomes "obfuscated" with complexity of those joins. So I tried other approach. A table with columns like: 1. ME 2. THEM 3. FROMME bool (true if ME is sender, false otherwise). 4. sender unique serial (SSN) 6. .... and the rest of it. But this time I had to partition this table (on FROMME value), to be able to correctly create different constraints depending on FROMME being true or false. So I have: ALTER ...msgs_from_me ADD CONSTRINT me_uniq UNIQUE (ME,SSN); ALTER ... msgs_to_me ADD CONSTRINT them_uniq UNIQUE (THEM,SSN); Now application level selects and joins are much simpler, like: SELECT * FROM msgs m JOIN partners p USING (them); But along the run, the specs for the system evolve, and currently I need to asssign an additional unique serial, which sequentially lables every message that "belongs" to ME irrespective if ME originated it or ME is a recepient. And it have to be explicitly unique constrained for FK. My problem is, that currently the table is partitioned. Is there a way to have a unique constraint across partitions (inharited tables)? And I'm not looking back to the initial (single table) schema, since I'm unable to sreach my head around the concept of a unique constraint that is able to cover IDs, which sometimes are in the SENDER column, while on other times in RECEPIENT. Can anybody suggest any other way out of this mass? that is, apart from siging off ;7 Thenx, -R
hi, May be someone could help me with this SQL problem. I'm still fighting with the "message system" I've explained in my initial mail (apppended here, since some time have passed since). Currently I tried "partial indeces" for FK, but either it's entirely unsuitable for the purpose ... or I havent' learned how to use it this way. Literelly I have: CREATE TABLE msgs (me INT, them INT, ssn INT, fromme bool, ..) CREATE UNIQUE INDEX fromme ON msgs(me,ssn) WHERE fromme = true; CREATE UNIQUE INDEX fromthem ON msgs(them,ssn) WHERE fromme = false; CREATE UNIQUE INDEX my_global ON msgs(me,them,ssn); In addition to that, some othar tables "are supposed to" point to the messages in MSGS table. Like the following table containing information parteining to last message a sender generated. CREATE TABLE most_recent(me INT, last INT, ...) ALTER TABLE most_recent ADD CONSTRAINT msgs_fk FOREIGN KEY (me,last) REFERENCES msgs (me,ssn); only it doesn'nt work that way, since (ME,SSN) is only partially unique. and it's uterly pointless to have MOST_RECENT use MY_GLOBAL index for FK, since recepient (the THEM column) varies from one MOST_RECENT update to another. So my question is: is there a way to point (using foreign key) a record from MOST_RECENT table, into a record within MSGS table? (provided the uniqueness within MSGS is assured only partially depending on FROMME - as shown above)? any help appreciated. -R W dniu 23.02.2016 o 09:39, Rafal Pietrak pisze: > Hi, > > For some time I'm struggling to get my schema "optimised" for a sort of > "message exchange" (or "document circulation") system. > > For every record in the table of those messages I have: > 1. SENDER > 2. RECEIPIENT > 3. unique (sender assigned)SN > 4. ... and naturally all the other stuff, like the message itself, > timestamps, etc. > > My plan is to have it unique-constraint against 1+3, for joins and to > keep the "sanity bonds" in force all the time. > > So I figure to have: > ALTER ...msgs ADD CONSTRINT sender_uniq UNIQUE (sender,SSN); > > Unfortunately all that proved to be "not so good" for application level, > since there I "almost always" a need to select "MY" messages, which lead to: > SELECT * FROM msgs WHERE sender = "ME" UNION ALL SELECT * FROM msgs > WHERE receipient = "ME"; > > Which does not look so bad, but when one has to JOIN it with other > stuff, the application becomes "obfuscated" with complexity of those joins. > > So I tried other approach. A table with columns like: > 1. ME > 2. THEM > 3. FROMME bool (true if ME is sender, false otherwise). > 4. sender unique serial (SSN) > 6. .... and the rest of it. > > But this time I had to partition this table (on FROMME value), to be > able to correctly create different constraints depending on FROMME being > true or false. So I have: > ALTER ...msgs_from_me ADD CONSTRINT me_uniq UNIQUE (ME,SSN); > ALTER ... msgs_to_me ADD CONSTRINT them_uniq UNIQUE (THEM,SSN); > > Now application level selects and joins are much simpler, like: > SELECT * FROM msgs m JOIN partners p USING (them); > > But along the run, the specs for the system evolve, and currently I need > to asssign an additional unique serial, which sequentially lables every > message that "belongs" to ME irrespective if ME originated it or ME is a > recepient. And it have to be explicitly unique constrained for FK. > > My problem is, that currently the table is partitioned. > > Is there a way to have a unique constraint across partitions (inharited > tables)? And I'm not looking back to the initial (single table) schema, > since I'm unable to sreach my head around the concept of a unique > constraint that is able to cover IDs, which sometimes are in the SENDER > column, while on other times in RECEPIENT. > > Can anybody suggest any other way out of this mass? that is, apart from > siging off ;7 > > Thenx, > > -R > >
W dniu 23.02.2016 o 09:39, Rafal Pietrak pisze:> Can anybody suggest any other way out of this mass?
The only thought that sticks while reading your prose is:
message ----> message-person <---- person
message-person (message_id, person_id, relationship_type[sender, receiver])
Partitioning and partial indexes both have considerable limitations that you might need to work around. That said normalization exists for a reason and having multiple "person" columns in a table is a form of duplication that if left presents just the problems you are seeing.
I suspect your SSN should fit onto the message-person table.
The following doesn't make sense - if the SSN is sender unique then there is no expectation that a receiver would not receive two messages with the same SSN from different senders.
ALTER ... msgs_to_me ADD CONSTRINT them_uniq UNIQUE (THEM,SSN);
David J.
W dniu 28.02.2016 o 03:35, David G. Johnston pisze: > W dniu 23.02.2016 o 09:39, Rafal Pietrak pisze: > > Can anybody suggest any other way out of this mass? > > > The only thought that sticks while reading your prose is: > > message ----> message-person <---- person > > > message-person (message_id, person_id, relationship_type[sender, receiver]) Sorry for the prose. The only way I think I can explain myself about concepts that I don't fully grasp is ... the flood of words. But regarding the matter at hand. If I understand it correctlyl, your suggestion for me is to have: CREATE TABLE persons(person_id primaty key, ...); CREATE TABLE msgs_person(msg_id, person_id references persons(person_id), rel_type, the_message_itself, primary key(message_id, person_id,rel_type),....); I must say, that this is like my 10th version of my response to your post ... with every iterration I've figured out more functionality that I get from your suggestion ... and actually the message-person table is pretty much what I currently have. Even the FK from MOST_RECENT table, was doable, after I've suplemented it with RELATION_TYPE field with a constant value of "SENDER", thus hitting the index that is on the messages-person table. The later got me thinking of SQL definition missing a way to put constant into FK definition, like this: ... ADD CONSTRAINT messages_fk FOREIGN KEY (person_id, message_id, "sender") REFERENCES msgs_person(person_id,message_id,rel_type); but that's beyond this thread. > > Partitioning and partial indexes both have considerable limitations that > you might need to work around. That said normalization exists for a > reason and having multiple "person" columns in a table is a form of > duplication that if left presents just the problems you are seeing. > > I suspect your SSN should fit onto the message-person table. > > The following doesn't make sense - if the SSN is sender unique then > there is no expectation that a receiver would not receive two messages > with the same SSN from different senders. I don't get it. Of cource it's possible to receive two messages with the same SSN. By "sender unique" I mean, that every sender has full control of whatever he/she wishes to use for SSN, provided that he/she does not assign duplicates. It also means, that there is no relation between SSN assigned by different senders and collisions *should* be expected unless UNIQUE covers both THEM/SENDR and SSN. Thus: > ALTER ... msgs_to_me ADD CONSTRINT them_uniq UNIQUE (THEM,SSN); > > David J. But thenx for the answer. -R
W dniu 28.02.2016 o 03:35, David G. Johnston pisze: > W dniu 23.02.2016 o 09:39, Rafal Pietrak pisze: > > Can anybody suggest any other way out of this mass? > > > The only thought that sticks while reading your prose is: > > message ----> message-person <---- person > > > message-person (message_id, person_id, relationship_type[sender, receiver]) It just occured to me: how do I make sure (e.g. force within a database) with the above structure, that a message can have *only one* sender? but, allow for multiple recepients? -R
W dniu 28.02.2016 o 03:35, David G. Johnston pisze: > W dniu 23.02.2016 o 09:39, Rafal Pietrak pisze: > > Can anybody suggest any other way out of this mass? > > > The only thought that sticks while reading your prose is: > > message ----> message-person <---- person > > > message-person (message_id, person_id, relationship_type[sender, > receiver]) It just occured to me: how do I make sure (e.g. force within a database) with the above structure, that a message can have*only one* sender? but, allow for multiple recepients? -R ____________________________________________________________________________________ If I'm not too late, unique partial index: CREATE UNIQUE INDEX unique_message_sender (message_id, person_id) WHERE relationship_type = sender; Regards, Igor
Sorry for the delay - used to getting replied-to-all on messages I send but you didn't and I didn't notice the response until now.
>
> Partitioning and partial indexes both have considerable limitations that
> you might need to work around. That said normalization exists for a
> reason and having multiple "person" columns in a table is a form of
> duplication that if left presents just the problems you are seeing.
>
> I suspect your SSN should fit onto the message-person table.
>
> The following doesn't make sense - if the SSN is sender unique then
> there is no expectation that a receiver would not receive two messages
> with the same SSN from different senders.
I don't get it.
Of cource it's possible to receive two messages with the same SSN.
By "sender unique" I mean, that every sender has full control of
whatever he/she wishes to use for SSN, provided that he/she does not
assign duplicates. It also means, that there is no relation between SSN
assigned by different senders and collisions *should* be expected unless
UNIQUE covers both THEM/SENDR and SSN.
Thus:
> ALTER ... msgs_to_me ADD CONSTRINT them_uniq UNIQUE (THEM,SSN);
Yeah, mental lapse on my part.
David J.
W dniu 01.03.2016 o 20:02, Igor Neyman pisze: [--------------------] > > > It just occured to me: how do I make sure (e.g. force within a database) with the above structure, that a message can have*only one* sender? > but, allow for multiple recepients? > > -R > > ____________________________________________________________________________________ > > If I'm not too late, unique partial index: > > CREATE UNIQUE INDEX unique_message_sender (message_id, person_id) WHERE relationship_type = sender; > Of cource. Apparently I'm so twisted after the numerous rewrites of my schema for this project, that I forget the obvious. Thenx, -R
Hmmm... aparently, it takes more time to rewrite schema+app to the new layout :( While doing so I fell onto another problem, to which I cannot find any resolve so far. 1. partial index asuring ssn unique for sender work fine. 2. but in the original schema I did have an additional field NEXT, which allowed me to dasy-chain all messages originating from a particular sender, and just one message (the most recent one) did have it a NULL there, so it was easy to peek the last message (which is a frequent operation). 3. currently, having just partially-unique index on messages-persons table for senders, I'm unable to FK (person,role,next) to (person,role,ssn). Postgres complains, that FK columns MUST have an unconditional unique index at its target columns. A) how to get around it? B) why that unique index at the target have to be unconditional? I mean: --> if the table was split into two inharited tables (one for role-sender, one for role-recepient), the partition table containing only role-sender could have a full-unique index and thus could become target for FK(sender,next). --> so why rdbms cannot treat partial indexes just like that: as if those where full-unique-indexes, but only covering part of the data. And consequently if there was a 'partial-index-miss', the target key is assumed as not present. I'd apreciate any help in how should I implement the chaining of rows in messages-persons table (like above); and some info on the "theory of rdbms" (or clasure in standard specs) which lead to restrictions preventing partial indexes as FK targets, thenx -R
Hi Rafal: On Fri, Mar 4, 2016 at 11:44 AM, Rafal Pietrak <rafal@ztk-rp.eu> wrote: > While doing so I fell onto another problem, to which I cannot find any > resolve so far. ... > 2. but in the original schema I did have an additional field NEXT, which > allowed me to dasy-chain all messages originating from a particular > sender, and just one message (the most recent one) did have it a NULL > there, so it was easy to peek the last message (which is a frequent > operation). I do not recall your original schema too well, but IMO doing linked-lists with database records is not usually a good idea. They are very procedural and relational is declarative. Normally to peek at the last message from a chain you just declare your intentions in sql, typically by having a timestamp column and doing select whatever where whatelse order by xxx_ts desc limit 1. > 3. currently, having just partially-unique index on messages-persons > table for senders, I'm unable to FK (person,role,next) to (person,role,ssn). This is normally a sympton of your schema not being appropiately normalized. > Postgres complains, that FK columns MUST have an unconditional unique > index at its target columns. I would expect this, a foreign key must uniquely determine a row on another table, postgres insures this with unconditional unique index. This is because FK target tables, not indexes. If you are targetting a conditional index probably you want another type of constraint. > A) how to get around it? Do not use FK. Try to use generic constraints. Better , normalize your schema, IIRC it was not even in 2NF, and this tends to be asking for problems. > B) why that unique index at the target have to be unconditional? I mean: > --> if the table was split into two inharited tables (one for > role-sender, one for role-recepient), the partition table containing > only role-sender could have a full-unique index and thus could become > target for FK(sender,next). > --> so why rdbms cannot treat partial indexes just like that: as if > those where full-unique-indexes, but only covering part of the data. And > consequently if there was a 'partial-index-miss', the target key is > assumed as not present. > I'd apreciate any help in how should I implement the chaining of rows in > messages-persons table (like above); and some info on the "theory of > rdbms" (or clasure in standard specs) which lead to restrictions > preventing partial indexes as FK targets, Partial indexes, even indexes in general, are implementation details in "theory of rdbms". This theory is more matemathics, based on tuples, sets, and the like. Normally FK wants unique keys as targets, the fact that many dbms force a unique index for these is an implementation detail, you can have a unique constraint by just scanning the table on every insertion / update, it will be slow but will work ( and in some cases, like extremely small tables would even be much better than indexes ). From what you write your way of operation reminds me of when I worked with COBOL and indexed files, you try to use the indexes, directly, and make the rdbms use them automatically for some ops like you would do in a one of this systems, but that is not the way rdbms work, they like to have a declared structure and decide by themselves what to do. This is nice in that once you write a query you can partition, add indexes, create views, and let the rdbms work out how to do it, but imposes some ( some would say a lot ) constraints in how you put your data in. Francisco Olarte.
Hi, W dniu 04.03.2016 o 12:59, Francisco Olarte pisze: > Hi Rafal: > > On Fri, Mar 4, 2016 at 11:44 AM, Rafal Pietrak <rafal@ztk-rp.eu> wrote: >> While doing so I fell onto another problem, to which I cannot find any >> resolve so far. > ... >> 2. but in the original schema I did have an additional field NEXT, which >> allowed me to dasy-chain all messages originating from a particular >> sender, and just one message (the most recent one) did have it a NULL >> there, so it was easy to peek the last message (which is a frequent >> operation). > > I do not recall your original schema too well, but IMO doing > linked-lists with database records is not usually a good idea. They > are very procedural and relational is declarative. Normally to peek at > the last message from a chain you just declare your intentions in sql, > typically by having a timestamp column and doing select whatever where > whatelse order by xxx_ts desc limit 1. > >> 3. currently, having just partially-unique index on messages-persons >> table for senders, I'm unable to FK (person,role,next) to (person,role,ssn). > > This is normally a sympton of your schema not being appropiately normalized. OK. I'd apreciate some guidance here. I've sattled for a schema suggested in this thread a fiew posts before. In short went like this: CREATE TABLE persons(person_id primaty key, ...); CREATE TABLE msgs_person(msg_id, person_id references persons(person_id), rel_type, the_message_itself, primary key(message_id, person_id,rel_type),....); where: person_id - sender or recepient of the message msg_id - an ID uniquely assigned by sender rel_type - a role a row in msgs_person table is assigned to this particular relation: person+message; this role can be either SENDER or RECEPIENT Then I have a partial unique index: CREATE UNIQUE INDX by_sender (msg_id,person_id,rel_type) where (rel_type = SENDER); which ensures, that a message can have just one SENDER. And now, apart from the above, I'm trying to put a NEXT field into the MSGS_PERSON table, so that I can (sort of): ALTER TABLE msgs_person ADD CONSTRAINT next_fk FOREIGN KEY (next,person_id,rel_type) REFERENCES msgs_person(msg_id,person_id, rel_type); ...( just for: rel_type=SENDER). What should I do with this to have it "appropriately normalized"? > >> Postgres complains, that FK columns MUST have an unconditional unique >> index at its target columns. > > I would expect this, a foreign key must uniquely determine a row on > another table, postgres insures this with unconditional unique index. > This is because FK target tables, not indexes. If you are targetting a > conditional index probably you want another type of constraint. But if this is so, a partial unique index should suffice, since it does support locating of a *single* row in a table... and this should be all that's required for FK to be consistant. right? > >> A) how to get around it? > > Do not use FK. Try to use generic constraints. Better , normalize your "generic constraint's"? - pls elaborate regarding schema I've just schetched above. [-------------] > > Partial indexes, even indexes in general, are implementation details > in "theory of rdbms". This theory is more matemathics, based on > tuples, sets, and the like. Normally FK wants unique keys as targets, > the fact that many dbms force a unique index for these is an > implementation detail, you can have a unique constraint by just In that case, pls forgive my language - all I ment is that this "implementation detail", in postgresql is in fact "enforced policy" ... but this isn't actually the problem here. I'm quite happy with a system that helps me avoid performence pitfalls. The problem I see is different - and the reason I'm asking about theoretical background of the implementation is different. The problem is that once one accepts the requirement for a unique index as FK target column "mandatory performance support", then I fail to see real reazon, where *ENY* unique index shouldn't do that role too. They are unique (within domains of their conditions) and by definition yield a single row for FK (or nothing); that should be sufficient for the engine to keep data consistancy as expected, shouldn't it? Naturally I undestand that there might be some deep reasons for exclusion of partial indexed as FK target "selectors" - I'd apreciate further explanations. But in case those reasons exist, I'd expect workarounds to exist too - like "SQL idioms" - that people normally use for cases like these. Would those "generic constraint" be be idiom? So as I said before, I'm looking for some guidence here. (I'm really emotionally bond to that NEXT field there :) -R
Hi Rafal: These are my opinions, somebody else may think they are not correct, comments are wellcome. On Fri, Mar 4, 2016 at 2:30 PM, Rafal Pietrak <rafal@ztk-rp.eu> wrote: >> This is normally a sympton of your schema not being appropiately normalized. > OK. I'd apreciate some guidance here. I've sattled for a schema > suggested in this thread a fiew posts before. In short went like this: > > CREATE TABLE persons(person_id primaty key, ...); > CREATE TABLE msgs_person(msg_id, person_id references > persons(person_id), rel_type, the_message_itself, primary > key(message_id, person_id,rel_type),....); > > where: > person_id - sender or recepient of the message > msg_id - an ID uniquely assigned by sender > rel_type - a role a row in msgs_person table is assigned to this > particular relation: person+message; this role can be either SENDER or > RECEPIENT For this particular case ( person must be either a UNIQUE sender or a multiple recipient ) my opinion will be: 0.- Table persons is OK. 1.- Move sender to msg. CREATE TABLE messages ( message_id primary key, sender_person_id NOT NULL references persons, message_ts timestamp ( or other ordering imposing field, more on this below). ....rest of fields. ) 2.- Make a recipients table: CREATE TABLE recipients ( message_id references messages, receipient_person_id references persons, primary key (message_id, person_id) -- to avoid duplicate senders. ) Also, it will be useful to know if the sender can be a recipient to ( like with e-mail ). Make sender_person_id NOT NULL in messages if you want to insure every message ahs exactly ONE SENDER, leave it out if you want to allow senderless messages. An FK column must either link to a record or be null. Then, if you want to have a msgs-person ''table'' I would use a view: CREATE VIEW msgs_persons as SELECT message_id, sender_person_id as person_id, 'SENDER' as role from messages UNION ALL SELECT message_id, recipient_person_id as person_id, 'RECIPIENT' as role from recipients > And now, apart from the above, I'm trying to put a NEXT field into the > MSGS_PERSON table, so that I can (sort of): > ALTER TABLE msgs_person ADD CONSTRAINT next_fk FOREIGN KEY > (next,person_id,rel_type) REFERENCES msgs_person(msg_id,person_id, > rel_type); ...( just for: rel_type=SENDER). > What should I do with this to have it "appropriately normalized"? The normalization problem is more in the msgs-person table, as it contains heterogeneous data, this is why I would use the recipients table approach. I do not know what you want to achieve with the NEXT. It seem you are trying to build a linked list of sent messages. In my approach, if you really need this, you just put the next field on messages, index it on (sender_id+message_id) and make (sender_id+next_message_id) reference that index. But if you are just trying to know the last message for a sender, I would put an ordering column, like timestamp ( my personal preference, if message_id has a total ordering ( I suppose it's an integer or an string ) would be to add message_ts, fill it with the message sending/reception timestamp ( I do not know which kind of messages you are storing, so not too sure aabout this, so lets say row insertion timestamp ), use the message-id as a tie breaker ( in case you get many messages with the same timestamp ) ( Or, if you have an external ordered field, use that, more knowledge of the particular system is needed ), index the table on message_ts ( or message_ts + message_id, if lots of dupes ) to speed up queries and just do a query on messages for a sender id, order by message_ts desc, message_id desc limit 1 for the last message. > "generic constraint's"? - pls elaborate regarding schema I've just > schetched above. You can check anything in a constraint on a table by using a function to do things. Although, IIRC, there were some problems if you made queries, and the thing was something better done with triggers. The problem is difficult, and this is why when dessigning tables thinks like linked list and similar are best avoided, as you enter a difficult zone when you begin to consider visibility rules for the data ( when many transactions, with different isolation levels, are concurrently manipulating the same set of tuples). Your app may not do this, but servers are dessigned to allow it. > In that case, pls forgive my language - all I ment is that this > "implementation detail", in postgresql is in fact "enforced policy" ... Nothing to forgive, I was just trying to explain some things. > but this isn't actually the problem here. I'm quite happy with a system > that helps me avoid performence pitfalls. > The problem I see is different - and the reason I'm asking about > theoretical background of the implementation is different. > The problem is that once one accepts the requirement for a unique index > as FK target column "mandatory performance support", then I fail to see > real reazon, where *ENY* unique index shouldn't do that role too. They > are unique (within domains of their conditions) and by definition yield > a single row for FK (or nothing); that should be sufficient for the > engine to keep data consistancy as expected, shouldn't it? Maybe. I'm not following too much what you try to do here. Either you are way above my level or you have a mental model of how postgres should work which does not correspond to how it does. > Naturally I undestand that there might be some deep reasons for > exclusion of partial indexed as FK target "selectors" - I'd apreciate > further explanations. But in case those reasons exist, I'd expect > workarounds to exist too - like "SQL idioms" - that people normally use > for cases like these. Would those "generic constraint" be be idiom? Please, forget anything I said about generic constraint, my fault, I was trying to express something, failed at it, and do not know how to fix it. What I personally do to avoid these kind of problems is to avoid dessigning something which needs references to a partial index. And I've been successful at it for a long time. I'm not going to recover the complete thread to recap on why you exactly are doing these kind of really advanced things, but I suppose you need them for some reason and I do not have the resources to study it. > So as I said before, I'm looking for some guidence here. (I'm really > emotionally bond to that NEXT field there :) ON this I cannot help you too much. I do not see what you are trying to achieve with the NEXT field. These will need more explanations, and more study, and as I said before, I do not have the available resources for them. Sorry for the tme I've taken, but I feel I can not be of any help here. Best regards. Francisco Olarte, over & out.
The problem is that once one accepts the requirement for a unique index
as FK target column "mandatory performance support", then I fail to see
real reazon, where *ENY* unique index shouldn't do that role too. They
are unique (within domains of their conditions) and by definition yield
a single row for FK (or nothing); that should be sufficient for the
engine to keep data consistancy as expected, shouldn't it?
A foreign key doesn't get to use a WHERE clause so the planner has no ability to know just by looking at a query that the partial unique index should be used.
In other words the presence of absence of an FK constraint between two tables should not alter the results of any question. But since a partial unique constraint could result in the full table having duplicates on the constrained columns when ignoring the partial's WHERE clause this would not be true.
For the example data you could construct a partial unique index [(a,b) WHERE c = true]
(a,b,c)
(1,1,true),
(1,1,false),
(1,2,true)
This Query:
SELECT a, b, c
FROM src
JOIN abc USING (a,b)
Would return 1 row if the FK restricted the executor to only looking at rows in the partial index but would return 2 rows if it considers (say, because of using a sequential scan) the table as a whole.
This seems simply like an implementation artifact. INDEX is used only upon data entry and for performance gains and never in order to ensure correctness.
I'm wandering into novel territory (for me) in my explanation above but it seems to cover the concept well even if I'm imprecise in some areas.
David J.
Hi, W dniu 04.03.2016 o 18:59, Francisco Olarte pisze: [----------------] > > Make sender_person_id NOT NULL in messages if you want to insure every > message ahs exactly ONE SENDER, leave it out if you want to allow > senderless messages. An FK column must either link to a record or be > null. > > Then, if you want to have a msgs-person ''table'' I would use a view: > > CREATE VIEW msgs_persons as > SELECT message_id, sender_person_id as person_id, 'SENDER' as role from messages > UNION ALL > SELECT message_id, recipient_person_id as person_id, 'RECIPIENT' as > role from recipients > Ha! This was my initial schema .. with the addition of one "super table", that the two above (sender_person_id and recipient_person_id) both inharited from (to avoid the UNION ALL when selecting everything). With that layout, the NEXT column worked just fine. Only then came the requirement to have a "possibly sequence-continues" unique ID assigned to every message irrespectively if a particular person was a sender or a recipient of that message_id. And I couldn't figure out how to implement it across separate (even if inharited) tables. So came the concept of single table of messages, with ROLE field and a partial unique constraint on sender+sender-message-id ... and I've sterted to rewrite the schema, but at certain point I realized that it broke the NEXT functionality and I cannot imagine any way to reintroduce it into the new table layouts. Now I'm quite stuck here. [--------------] > > Sorry for the tme I've taken, but I feel I can not be of any help here. > It's quite all right. Sometimes help comes from the discussion alone (as opposed to direct explanations)... It also happened to me that the solution came to me while I was describing my problem to the list ... but haven't send the email yet, and didn't have to since the problem got solved :) So thenx, BTW: I'm considering your sugestion of replaceing NEXT with the timestamp. The primary reason for the NEXT is to be able to fetch a row "just preceeding" currently inserted new one AFTER the insert is done (in trigger after), so that some elaborated "statistics" get updated in that "one before" message record. May be timestap would do instead... -R
W dniu 04.03.2016 o 19:33, David G. Johnston pisze: > On Fri, Mar 4, 2016 at 6:30 AM, Rafal Pietrak <rafal@ztk-rp.eu > <mailto:rafal@ztk-rp.eu>>wrote: > > The problem is that once one accepts the requirement for a unique index > as FK target column "mandatory performance support", then I fail to see > real reazon, where *ENY* unique index shouldn't do that role too. They > are unique (within domains of their conditions) and by definition yield > a single row for FK (or nothing); that should be sufficient for the > engine to keep data consistancy as expected, shouldn't it? > > > A foreign key doesn't get to use a WHERE clause so the planner has no > ability to know just by looking at a query that the partial unique index > should be used. Hmm. IMHO quite the contrary. as FK does not have WHERE declaration, the planner seeing relevant index (e.i index covering the columns of interest) should use it irrespectively. And whatever index hits, data/row is hit; whatever isn't (hit through such partial index), target data/row is just missed. And if documented, such behavior becomes feature. I personally would be quite happy with such feature. > > In other words the presence of absence of an FK constraint between two > tables should not alter the results of any question. But since a It wouldn't. There are three cases: 1. FK is defined without unique index (I think Oracle allows for that), so every IPDATE/INSERT need a full scan of the target ... but as you've said: for small tables that might be OK. 2. FK is defined with unique index over target column - posgresql requires that. such unique index guarantees a single target row for FK to point to. 3. FK is defined with partially-unique indes. This is new and ... would it create ambiquity between queries. No. I don't think so (provided that FK/partial-index are used consistently). The only "ambiquity" arises when one allows for "unindexed" FK, while subsequent changes to schema add partially-unique index at target columns. But this wouldn't happen in postgresql ... and who cares about Oracle :7 And even then. such index may fail to get created of currently present FK have records pointing outside that newly created index, Once index get created, queries become consistent again. just like creating full unique index may fail, and when data is corrected and index get created - the queries become consistent (with it). > partial unique constraint could result in the full table having > duplicates on the constrained columns when ignoring the partial's WHERE > clause this would not be true. > > For the example data you could construct a partial unique index [(a,b) > WHERE c = true] > (a,b,c) > (1,1,true), > (1,1,false), > (1,2,true) > > This Query: > > SELECT a, b, c > FROM src > JOIN abc USING (a,b) > > Would return 1 row if the FK restricted the executor to only looking at > rows in the partial index but would return 2 rows if it considers (say, > because of using a sequential scan) the table as a whole. I'd say that if there is an implementation requirement for FK target column set to be covered by unique index, then executor should never ignore it in favour of any other search plan. If it does, it's a bug. > > This seems simply like an implementation artifact. INDEX is used only > upon data entry and for performance gains and never in order to ensure > correctness. But I understand that there may be more implementation details then my unacquainted eye can see. Thenx for the info, -R
>
> In other words the presence of absence of an FK constraint between two
> tables should not alter the results of any question. But since a
It wouldn't.
3. FK is defined with partially-unique indes. This is new and ... would
it create ambiquity between queries. No. I don't think so (provided that
FK/partial-index are used consistently).
My knowledge in this area is somewhat limited but that is very large "provided that"
The only "ambiquity" arises when one allows for "unindexed" FK, while
subsequent changes to schema add partially-unique index at target
columns. But this wouldn't happen in postgresql ... and who cares about
Oracle :7 And even then. such index may fail to get created of currently
present FK have records pointing outside that newly created index, Once
index get created, queries become consistent again. just like creating
full unique index may fail, and when data is corrected and index get
created - the queries become consistent (with it).
> partial unique constraint could result in the full table having
> duplicates on the constrained columns when ignoring the partial's WHERE
> clause this would not be true.
>
> For the example data you could construct a partial unique index [(a,b)
> WHERE c = true]
> (a,b,c)
> (1,1,true),
> (1,1,false),
> (1,2,true)
>
> This Query:
>
> SELECT a, b, c
> FROM src
> JOIN abc USING (a,b)
>
> Would return 1 row if the FK restricted the executor to only looking at
> rows in the partial index but would return 2 rows if it considers (say,
> because of using a sequential scan) the table as a whole.
I'd say that if there is an implementation requirement for FK target
column set to be covered by unique index, then executor should never
ignore it in favour of any other search plan. If it does, it's a bug.
Since the current requirement is that the index and the sequential scan for a FK lookup would return the same data the most efficient plan is chosen. Lacking a where clause a join between two tables with a defined PK-FK relationship can be more quickly fulfilled by simply scanning both tables completely and then merging them together once the scanned data has been sorted on the keyed columns. While the index is already sorted the extra hits to the heap to check visibility are likely going to make working with the index less performant. There is no bug in this case because the exact same results are returned in either scenario.
I've made my point and am not fluent enough to discuss the issues that would need to be addressed to implement a FK-to-a-partial-unique-index feature.
I recall having hit this limitation myself previously so I too wouldn't mind seeing it implemented - but doing so without degrading the performance of all FK-related queries is important too - and the increase in risk of having bugs in the short term is quite high given that the fundamental operation of a key module needs to be changed. Its not something whose cost-benefit analysis reports favorably.
David J.
W dniu 05.03.2016 o 01:03, David G. Johnston pisze: [----------------] > > I've made my point and am not fluent enough to discuss the issues that > would need to be addressed to implement a FK-to-a-partial-unique-index > feature. > > I recall having hit this limitation myself previously so I too wouldn't > mind seeing it implemented - but doing so without degrading the > performance of all FK-related queries is important too - and the > increase in risk of having bugs in the short term is quite high given > that the fundamental operation of a key module needs to be changed. Its > not something whose cost-benefit analysis reports favorably. > I see your point and mostly (not fuly, though) I agree. The questions I raised aimed at weeding out "mathematical" constraints from "implementation" one. I just wanted to know if its "impossible", or just "hard/dangerous to do". I think I have a broader picture now. Thenx for all the explanations, -R
Hi Rafal: On Fri, Mar 4, 2016 at 11:46 PM, Rafal Pietrak <rafal@ztk-rp.eu> wrote: > W dniu 04.03.2016 o 18:59, Francisco Olarte pisze: >> Make sender_person_id NOT NULL in messages if you want to insure every >> message ahs exactly ONE SENDER, leave it out if you want to allow >> senderless messages. An FK column must either link to a record or be >> null. >> >> Then, if you want to have a msgs-person ''table'' I would use a view: >> >> CREATE VIEW msgs_persons as >> SELECT message_id, sender_person_id as person_id, 'SENDER' as role from messages >> UNION ALL >> SELECT message_id, recipient_person_id as person_id, 'RECIPIENT' as >> role from recipients > This was my initial schema .. with the addition of one "super table", > that the two above (sender_person_id and recipient_person_id) both > inharited from (to avoid the UNION ALL when selecting everything). Wuf. I do not like it. I would add a column named sender_person_id to messages ( to distinguish its role ) and put a recipient_person_id, or just person_id, in recipients ( the role is clear in that table ) to avoid problems. Otherwise, what do you call the parent table and the fields? It's a naming issue, nut I've found the hard way naming is important in this things. Bear in mind you do only avoid TYPING the union all when selecting everything ( as inheritance DOES do a union all, it would have to do it with both kids AND the parent, so it MAY be slower ). And you introduce several problems, the naming ones, a very strange foreign-key relationship between kids, the possibility of having a row inserted in the parent. > With that layout, the NEXT column worked just fine. I do not doubt the NEXT column works, I just doubt it's a good thing on a relational dessign. > Only then came the requirement to have a "possibly sequence-continues" > unique ID assigned to every message irrespectively if a particular > person was a sender or a recipient of that message_id. And I couldn't > figure out how to implement it across separate (even if inharited) tables. > So came the concept of single table of messages, with ROLE field and a > partial unique constraint on sender+sender-message-id ... and I've > sterted to rewrite the schema, but at certain point I realized that it > broke the NEXT functionality and I cannot imagine any way to reintroduce > it into the new table layouts. Which is exactly the functionality of the NEXT column ? I mean, I see you have messages with ONE sender and MANY? (Can they be zero? ) recipients. What are you trying to achieve with it? How are you planning to maintain it in your dessign? > Now I'm quite stuck here. I ask these questions because I think we are in a case of http://xyproblem.info/ . > BTW: I'm considering your sugestion of replaceing NEXT with the > timestamp. The primary reason for the NEXT is to be able to fetch a row > "just preceeding" currently inserted new one AFTER the insert is done > (in trigger after), so that some elaborated "statistics" get updated in > that "one before" message record. May be timestap would do instead... If you are planning on updating the previous row for a message ( or a person ? ) on a trigger, this smells fishy. You may have a reason, not knowing what you are exactly planning to do, I cannot tell, but it sounds really weird. Francisco Olarte.
W dniu 05.03.2016 o 19:53, Francisco Olarte pisze: > Hi Rafal: > > On Fri, Mar 4, 2016 at 11:46 PM, Rafal Pietrak <rafal@ztk-rp.eu> wrote: >> W dniu 04.03.2016 o 18:59, Francisco Olarte pisze: >>> Make sender_person_id NOT NULL in messages if you want to insure every >>> message ahs exactly ONE SENDER, leave it out if you want to allow >>> senderless messages. An FK column must either link to a record or be >>> null. >>> >>> Then, if you want to have a msgs-person ''table'' I would use a view: >>> >>> CREATE VIEW msgs_persons as >>> SELECT message_id, sender_person_id as person_id, 'SENDER' as role from messages >>> UNION ALL >>> SELECT message_id, recipient_person_id as person_id, 'RECIPIENT' as >>> role from recipients >> This was my initial schema .. with the addition of one "super table", >> that the two above (sender_person_id and recipient_person_id) both >> inharited from (to avoid the UNION ALL when selecting everything). > > Wuf. I do not like it. I would add a column named sender_person_id to > messages ( to distinguish its role ) and put a recipient_person_id, or > just person_id, in recipients ( the role is clear in that table ) to > avoid problems. Otherwise, what do you call the parent table and the > fields? It's a naming issue, nut I've found the hard way naming is > important in this things. Bear in mind you do only avoid TYPING the > union all when selecting everything ( as inheritance DOES do a union > all, it would have to do it with both kids AND the parent, so it MAY > be slower ). And you introduce several problems, the naming ones, a > very strange foreign-key relationship between kids, the possibility of > having a row inserted in the parent. No, no. It was OK. the parent table was appropriately "ruled on insert" for inherited tables to work as partitions. and the table was called "messages" :) [--------------] >> broke the NEXT functionality and I cannot imagine any way to reintroduce >> it into the new table layouts. > > Which is exactly the functionality of the NEXT column ? I mean, I see > you have messages with ONE sender and MANY? (Can they be zero? ) > recipients. What are you trying to achieve with it? How are you > planning to maintain it in your dessign? NULL NEXT indicates the last message inserted, and as I explain below, this is necessary to access/update the row that looses it's "most recently inserted" status just after that happens. The access/update of the row that looses it's "last" status after new insert is sufficiently expensive, that it pays to do so after an insert. Otherwise I'd have to perform it for pretty much every row that is selected, every time it is selected ... which will be an overkill. and pls note, that when performance of locating one NULL field in millions of records becomes a problem, I'm prepared to use a reserved value (a reserved record, like: "a null message") for NEXT, instead of current NULL. > >> Now I'm quite stuck here. > > I ask these questions because I think we are in a case of > http://xyproblem.info/ . :) this is a good one!!! Actually I'm very, very acquainted with this "XY problem". i.e quite often when "end-users" ask me for additional functionality, they (like in the XY case) suggest a "technical solution". And (just like you :) I always ask: pls tell me what you do "traditionally", e.g "when you are currently doing this on paper: how does it go - step by step", then I'll find a good IT solution for you. But there is a flip side of this coin. In case of a complex inter-mangled systems, where a well defined "critical point" shows up, it's more efficient to extract the "show case" that causes the problem and focus on this, instead of digressing on overall design. (which may be flowed, but cannot be rewritten at this point). > >> BTW: I'm considering your sugestion of replaceing NEXT with the >> timestamp. The primary reason for the NEXT is to be able to fetch a row >> "just preceeding" currently inserted new one AFTER the insert is done >> (in trigger after), so that some elaborated "statistics" get updated in >> that "one before" message record. May be timestap would do instead... > > If you are planning on updating the previous row for a message ( or a > person ? ) on a trigger, this smells fishy. You may have a reason, not Yes it does. > knowing what you are exactly planning to do, I cannot tell, but it > sounds really weird. I can understand that. But all I can tell you without full documentation is that it's statistics gathering, which results are subsequently used for almost every query made; and that it's a major performance hog of the system, so it have to be done just once, at the moment is becomes well defined, which is just after insert of a "next" row. And It actually does not matter (to me) how the "single row" is located within the system (i.e using NEXT or not), but it have to be located robustly - there may not be a situation, where such last record is missed or more then one get updated. I like FK as NEXT because I get that guarantee from the postgresql itself. BTW: using timestamp instead of FK to message_id does not work quite so well. To see the problem, ponder a question: what time resolution should such timestamp have, to be as robust as FK ... irrespectively of the transaction load? I did however considered NEXT becoming a binary field: FALSE for "the last message, and TRUE for the message "just before the last" ... with UNIQUE index on it end every other message having this field NULL; but the system does have an operation of "popping/purging the last message" from the pool and in such case I don't see how to restore TRUE for the message immediately before the one that became the last one after such operation. After that "pop" operation I'm currently adjusting "row before last" statistics, but possibly this could be avoided. So as of now (being afraid of breaking a working system) I will need to update statistics within a row that just have become "the one just before the last", so I will need to know which row becomes that .... and thus I ruled out NEXT becoming a BOOL field. ... again in dead waters. -R
Hi Rafal: On Mon, Mar 7, 2016 at 9:29 AM, Rafal Pietrak <rafal@ztk-rp.eu> wrote: ..... >> be slower ). And you introduce several problems, the naming ones, a >> very strange foreign-key relationship between kids, the possibility of >> having a row inserted in the parent. > No, no. It was OK. the parent table was appropriately "ruled on insert" > for inherited tables to work as partitions. But you have to rule a lot, to avoid people inserting into the kids, anyway, without seeing the whole lot I'm not gonna comment more. > and the table was called "messages" :) But it did not contain messages, it contained message-persons relations. ..... sniped, too complex without seeing the whole dessign. >> http://xyproblem.info/ . > :) this is a good one!!! > Actually I'm very, very acquainted with this "XY problem". i.e quite > often when "end-users" ask me for additional functionality, they (like > in the XY case) suggest a "technical solution". And (just like you :) I > always ask: pls tell me what you do "traditionally", e.g "when you are > currently doing this on paper: how does it go - step by step", then I'll > find a good IT solution for you. Well, now you have an url to mail them. > In case of a complex inter-mangled systems, where a well defined > "critical point" shows up, it's more efficient to extract the "show > case" that causes the problem and focus on this, instead of digressing > on overall design. (which may be flowed, but cannot be rewritten at this > point). May be, but for me your solutions are so complex I cannot follow them. > BTW: using timestamp instead of FK to message_id does not work quite so > well. To see the problem, ponder a question: what time resolution should > such timestamp have, to be as robust as FK ... irrespectively of the > transaction load? That's irrelevant. Timestamp is a concept, as I told you, it's just a value whcich defines a full order. Normally the system has a timestamp source which insures it. If you have not one you can use a cache=1 sequence. When in a single process problem like this I normally use an XXXXsecond timestamp which I autoincrement if repeated, something like: get_timestamp_for_id() { Locked(mutex) { now=time(); if (last_returned_id >= now) { return ++ last_returned_id; } else { return last_returned_id = now; } } This has the nice property that it eventually drops to timestamp after a burst, so the ID do double service as generation timestamps, but a single locked counter, a sequence, works as well. ... More snipping. I cannot recommend more things. The only thing, for easier locating of a message in a person, cache the last message id in the person ( which you can use as a lock for updtings ) and just use the next for linking the chain ( because, as you said, a message can be no longer the last, so, unless this only happens when you destructively pop the last message in the chain for a user, you need a full linked list to recover the previous one ) ( if you determine the new last message by other means you do not need any of these things, just cache the last message in each person record, then when you insert a new one you update each sender / recipient with the last message id at the same time you insert the records, preferably sorting the ids first to avoid deadlocks if your concurrency is high, although I suspect you'll need a linked-list-per-user if it has to be the previous one ). Francisco Olarte.
W dniu 07.03.2016 o 20:11, Francisco Olarte pisze: [---------------] > > When in a single process problem like this I normally use an > XXXXsecond timestamp which I autoincrement if repeated, something > like: > > get_timestamp_for_id() { > Locked(mutex) { > now=time(); > if (last_returned_id >= now) { > return ++ last_returned_id; > } else { > return last_returned_id = now; > } > } > > This has the nice property that it eventually drops to timestamp after > a burst, so the ID do double service as generation timestamps, but a > single locked counter, a sequence, works as well. OK. But in this case I really fail to see the usefulness of "timestamp semantics" for this field. Just plain INT/BIGINT would do; particularly if the system has a chance to grow beyond 1 message per second sustained. If you used it, while actually meaning: "something sequencial/monotonic like timestamp is", that was misleading to me. I'd say: an inverse XY problem :) ... but INT/SERIAL semantics might actually be a workaround for me. Finding a "largest but smaller then" is somewhat more expensive then plain hit with NEXT=CURRENT ... but may be the penalty will not be so great after all. And if there are no hidden rise conditions/locking problems that might just do the trick. > > ... More snipping. > > I cannot recommend more things. The only thing, for easier locating of > a message in a person, cache the last message id in the person ( which > you can use as a lock for updtings ) and just use the next for linking > the chain ( because, as you said, a message can be no longer the last, > so, unless this only happens when you destructively pop the last > message in the chain for a user, you need a full linked list to > recover the previous one ) ( if you determine the new last message by > other means you do not need any of these things, just cache the last > message in each person record, then when you insert a new one you > update each sender / recipient with the last message id at the same > time you insert the records, preferably sorting the ids first to avoid > deadlocks if your concurrency is high, although I suspect you'll need > a linked-list-per-user if it has to be the previous one ). Yes I do. The NEXT field was there for some time now, and it worked exactly as expected. That's why I'm so desperate to keep it. Particularly that it followed the "business model" the schema served (in the language of XY problem, for the pop operation), literally: "efficiently find a message, that this person send before". But thenx for tackling the problem with me. -R