Re: coalesce in plpgsql, and other style questions - Mailing list pgsql-novice

From Jeff Davis
Subject Re: coalesce in plpgsql, and other style questions
Date
Msg-id 1339543839.12295.102.camel@sussancws0025
Whole thread Raw
In response to coalesce in plpgsql, and other style questions  (Ross Boylan <ross@biostat.ucsf.edu>)
Responses Re: coalesce in plpgsql, and other style questions
List pgsql-novice
On Tue, 2012-06-12 at 10:46 -0700, Ross Boylan wrote:
> I just wrote my first pl/pgsql function, and would appreciate any
> comments people have on it.  I'll be writing a bunch of similar
> functions, with semantics "give me the id of the object if exists,
> otherwise create it and give me the id."
>
> My solution seems excessively procedural to me.

I think what you're trying to do is procedural, and that's not
necessarily bad. You are trying to get an existing ID if available, or
assign a new ID if not; and I assume you are doing so to save the space
associated with storing the full names (essentially like dictionary
encoding).

>   I thought I could get
> the right semantics with something like
> select coalesce((select id from mytable where name='foo'),
>     (insert into mytable ('name') values('foo') returning id))
> but I could not get that to work in plgsql.

In pl/pgsql, you can't just use SELECT, you have to use something like
SELECT INTO the_id ... or use PERFORM.

Also, you have a potential race condition there, because someone might
insert "foo" into mytable after the select (first arg to coalesce) and
before the insert (second arg to coalesce).

> Also, I wonder if calling a column 'name' is asking for trouble.

I recommend against it.

> The actual function is a little more complicated because the table has
> the possibility of canonical entries with other entries pointing to
> them,  I use 'did' for database id to avoid confusion with various other
> message ids the system uses; it's for tracking emails.
>
> Here's the code.  As I said, I'd love comments on any aspect of it.
>
> /* gom = get or make functions retrieve did if available, otherwise create record
>    and return its did. */
>
> create or replace function gom_hostids(IN hostname text, IN canonicalname text = NULL,
>        OUT hostid bigint, OUT canonicalid bigint) language plpgsql as $$

Sometimes it's good to declare your variables with names like
in_hostname or out_canonicalid, to prevent confusion reading
similarly-named variables coming from the tables.

> DECLARE
> BEGIN
>         select did, canonical into hostid, canonicalid from host
>         where name = hostname;
>         if FOUND then
>            return;
>            end if;
>         if canonicalname is not NULL then
>            select did  into canonicalid from host where name = canonicalname;
>            if not FOUND then
>               insert into host (name) values(canonicalname) returning did into canonicalid;
>               end if;
>            end if;
>         if hostname != canonical then

Is canonical a proper variable here? It's not in the argument list, and
it's not DECLAREd. Did you mean canonicalname?

>                 insert into host (name, canonical) values(hostname, canonicalid)
>                        returning did into hostid;
>         else
>                 hostid := canonicalid;

I would recommend keeping more explicit track of NULL values and what
should happen to them. It looks like a NULL canonicalname would fall
into this else branch (I could be mistaken), but it's a little hard to
follow exactly what you want to happen. A few comments would help.

>         return;
> END
> $$;

Again, you have a potential race between checking for the canonicalname
in host, and inserting it into host when it doesn't exist. If another
transaction runs in between those two, you will get duplicates in the
host table.

> Table definition:
> create table host (
>        did bigint primary key default (nextval('rb_id_seq')), ---database id, like oid
>        name  text, --IP address legal, usually enclosed in [].  More commonly an internet domain.
>                    -- domain is a sql keyword and so I used host.
>        canonical bigint references host (did) --if not null, preferred name for host
>                  --may refer to self
> ) inherits (RBObject);
>
> I'm using Posgresql 8.4.

I recommend a unique constraint on host.name. That will catch the race
conditions I mentioned above, and turn them into errors. You can catch
those errors in plpgsql, allowing you to do what you want in that case.

Regards,
    Jeff Davis


pgsql-novice by date:

Previous
From: Tom Lane
Date:
Subject: Re: inserting a column into a view
Next
From: Ross Boylan
Date:
Subject: Re: coalesce in plpgsql, and other style questions