Re: multiple UNIQUE indices for FK - Mailing list pgsql-general

From Rafal Pietrak
Subject Re: multiple UNIQUE indices for FK
Date
Msg-id 56DD3BE6.7050909@ztk-rp.eu
Whole thread Raw
In response to Re: multiple UNIQUE indices for FK  (Francisco Olarte <folarte@peoplecall.com>)
Responses Re: multiple UNIQUE indices for FK  (Francisco Olarte <folarte@peoplecall.com>)
List pgsql-general

W dniu 05.03.2016 o 19:53, Francisco Olarte pisze:
> Hi Rafal:
>
> On Fri, Mar 4, 2016 at 11:46 PM, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
>> W dniu 04.03.2016 o 18:59, Francisco Olarte pisze:
>>> Make sender_person_id NOT NULL in messages if you want to insure every
>>> message ahs exactly ONE SENDER, leave it out if you want to allow
>>> senderless messages. An FK column must either link to a record or be
>>> null.
>>>
>>> Then, if you want to have a msgs-person ''table'' I would use a view:
>>>
>>> CREATE VIEW msgs_persons as
>>> SELECT message_id, sender_person_id as person_id, 'SENDER' as role from messages
>>> UNION ALL
>>> SELECT message_id, recipient_person_id as person_id, 'RECIPIENT' as
>>> role from recipients
>> This was my initial schema .. with the addition of one "super table",
>> that the two above (sender_person_id and recipient_person_id) both
>> inharited from (to avoid the UNION ALL when selecting everything).
>
> Wuf. I do not like it. I would add a column named sender_person_id to
> messages ( to distinguish its role ) and put a recipient_person_id, or
> just person_id, in recipients ( the role is clear in that table ) to
> avoid problems. Otherwise, what do you call the parent table and the
> fields? It's a naming issue, nut I've found the hard way naming is
> important in this things. Bear in mind you do only avoid TYPING the
> union all when selecting everything ( as inheritance DOES do a union
> all, it would have to do it with both kids AND the parent, so it MAY
> be slower ). And you introduce several problems, the naming ones, a
> very strange foreign-key relationship between kids, the possibility of
> having a row inserted in the parent.

No, no. It was OK. the parent table was appropriately "ruled on insert"
for inherited tables to work as partitions.

and the table was called "messages" :)

[--------------]
>> broke the NEXT functionality and I cannot imagine any way to reintroduce
>> it into the new table layouts.
>
> Which is exactly the functionality of the NEXT column ? I mean, I see
> you have messages with ONE sender and MANY? (Can they be zero? )
> recipients. What are you trying to achieve with it? How are you
> planning to maintain it in your dessign?

NULL NEXT indicates the last message inserted, and as I explain below,
this is necessary to access/update the row that looses it's "most
recently inserted" status just after that happens.

The access/update of the row that looses it's "last" status after new
insert is sufficiently expensive, that it pays to do so after an insert.
Otherwise I'd have to perform it for pretty much every row that is
selected, every time it is selected ... which will be an overkill.

and pls note, that when performance of locating one NULL field in
millions of records becomes a problem, I'm prepared to use a reserved
value (a reserved record, like: "a null message") for NEXT, instead of
current NULL.

>
>> Now I'm quite stuck here.
>
> I ask these questions because I think we are in a case of
> http://xyproblem.info/ .

:) this is a good one!!!

Actually I'm very, very acquainted with this "XY problem". i.e quite
often when "end-users" ask me for additional functionality, they (like
in the XY case) suggest a "technical solution". And (just like you :) I
always ask: pls tell me what you do "traditionally", e.g "when you are
currently doing this on paper: how does it go - step by step", then I'll
find a good IT solution for you.

But there is a flip side of this coin.

In case of a complex inter-mangled systems, where a well defined
"critical point" shows up, it's more efficient to extract the "show
case" that causes the problem and focus on this, instead of digressing
on overall design. (which may be flowed, but cannot be rewritten at this
point).

>
>> BTW: I'm considering your sugestion of replaceing NEXT with the
>> timestamp. The primary reason for the NEXT is to be able to fetch a row
>> "just preceeding" currently inserted new one AFTER the insert is done
>> (in trigger after), so that some elaborated "statistics" get updated in
>> that "one before" message record. May be timestap would do instead...
>
> If you are planning on updating the previous row for a message ( or a
> person ? ) on a trigger, this smells fishy. You may have a reason, not

Yes it does.

> knowing what you are exactly planning to do, I cannot tell, but it
> sounds really weird.

I can understand that.

But all I can tell you without full documentation is that it's
statistics gathering, which results are subsequently used for almost
every query made; and that it's a major performance hog of the system,
so it have to be done just once, at the moment is becomes well defined,
which is just after insert of a "next" row.

And It actually does not matter (to me) how the "single row" is located
within the system (i.e using NEXT or not), but it have to be located
robustly - there may not be a situation, where such last record is
missed or more then one get updated. I like FK as NEXT because I get
that guarantee from the postgresql itself.

BTW: using timestamp instead of FK to message_id does not work quite so
well. To see the problem, ponder a question: what time resolution should
such timestamp have, to be as robust as FK ... irrespectively of the
transaction load?

I did however considered NEXT becoming a binary field: FALSE for "the
last message, and TRUE for the message "just before the last" ... with
UNIQUE index on it end every other message having this field NULL; but
the system does have an operation of "popping/purging the last message"
from the pool and in such case I don't see how to restore TRUE for the
message immediately before the one that became the last one after such
operation.

After that "pop" operation I'm currently adjusting "row before last"
statistics, but possibly this could be avoided. So as of now (being
afraid of breaking a working system) I will need to update statistics
within a row that just have become "the one just before the last", so I
will need to know which row becomes that .... and thus I ruled out NEXT
becoming a BOOL field.

... again in dead waters.

-R


pgsql-general by date:

Previous
From:
Date:
Subject: Re: "missing chunk number XX for toast value YY in pg_toast ..." after pg_basebackup.
Next
From:
Date:
Subject: Re: Does a call to a language handler provide a context/session, and somewhere to keep session data?