Thread: How to represent a bi-directional list in db?
I am creating an application for a manufacturing scenario. To represent stages in an assembly line, I wanted to create following table, CREATE TABLE stages ( id SERIAL PRIMARY KEY, name VARCHAR(80) NOT NULL, created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, prev_stage_id SERIAL REFERENCES stages NULL, next_stage_id SERIAL REFERENCES stages NULL, process_id SERIAL REFERENCES processes NOT NULL ); But it: Failed with: conflicting NULL/NOT NULL declarations for column "prev_stage_id" of table "stages" Is it not possible to create "nullable" self referencing foreign keys? -- Pankaj Jangid
Pankaj: On Sun, Sep 22, 2019 at 4:25 PM Pankaj Jangid <pankaj.jangid@gmail.com> wrote: > CREATE TABLE stages ( > id SERIAL PRIMARY KEY, > name VARCHAR(80) NOT NULL, > created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, > updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, > prev_stage_id SERIAL REFERENCES stages NULL, > next_stage_id SERIAL REFERENCES stages NULL, > process_id SERIAL REFERENCES processes NOT NULL > ); > Failed with: conflicting NULL/NOT NULL declarations for column > "prev_stage_id" of table "stages" > Is it not possible to create "nullable" self referencing foreign keys? Serial seems wrong. It means integer, not null, defaul next value from a sequence. What you probably want is just "prev_stage_id INTEGER" ( NULL by default ), as you do not want the prev/next stage ids to be generated, you normally would want to assign values from other tuples. Also, you may have problems populating this kind of table, as you will not have the ids from either prev or next stage when building it. And lastly, in SQL you do not really need a doubly linked list, just populate prev_stage_id, and index it and you can query next stage of a tuple using it. Francisco Olarte.
Francisco Olarte <folarte@peoplecall.com> writes: > On Sun, Sep 22, 2019 at 4:25 PM Pankaj Jangid <pankaj.jangid@gmail.com> wrote: >> CREATE TABLE stages ( >> id SERIAL PRIMARY KEY, >> name VARCHAR(80) NOT NULL, >> created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, >> updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, >> prev_stage_id SERIAL REFERENCES stages NULL, >> next_stage_id SERIAL REFERENCES stages NULL, >> process_id SERIAL REFERENCES processes NOT NULL >> ); >> Failed with: conflicting NULL/NOT NULL declarations for column >> "prev_stage_id" of table "stages" >> Is it not possible to create "nullable" self referencing foreign keys? > > Serial seems wrong. It means integer, not null, defaul next value from > a sequence. > > What you probably want is just "prev_stage_id INTEGER" ( NULL by > default ), as you do not want the prev/next stage ids to be generated, > you normally would want to assign values from other tuples. > Thanks. This resolved my problem of NULL/NOT NULL conflict. I wasn't aware that SERIAL is by default NOT NULL. > Also, you may have problems populating this kind of table, as you will > not have the ids from either prev or next stage when building it. > If NULL value is allowed I can fill it up with NULL initially. Right? Or is there something wrong here. > And lastly, in SQL you do not really need a doubly linked list, just > populate prev_stage_id, and index it and you can query next stage of a > tuple using it. > Could you please elaborate? Suppose I have this table, CREATE TABLE stages ( id SERIAL PRIMARY KEY, name VARCHAR(80) NOT NULL, next_id INTEGER REFERENCE stages NULL, ); What would be the backward query in that case? Forward is clear. This is forward query, SELECT name FROM stages WHERE next_id = 123; -- Pankaj Jangid
Pankaj: On Mon, Sep 23, 2019 at 4:07 AM Pankaj Jangid <pankaj.jangid@gmail.com> wrote: > Thanks. This resolved my problem of NULL/NOT NULL conflict. I wasn't > aware that SERIAL is by default NOT NULL. Not only that. Once you strip the annoying NOT NULL the only thing remaining on a serial column is a "default nextval", which you normally do not want ( you could think of populating the table in creative ways, but they are on a different sequence than the one you use for the ID column ). > > Also, you may have problems populating this kind of table, as you will > > not have the ids from either prev or next stage when building it. > If NULL value is allowed I can fill it up with NULL initially. Right? Or > is there something wrong here. There is not, you can use (id,prev,next) = (1,null,null) and then update, but you are going to need to travel up and down a lot, or store a lot of data. If you use the trick I comment later of just using "prev", you can do, on a table having (id=serial, prev=int), build a sequence by doing "prev_id=null"; insert (id,prev,other_data) returning id; copy return value to prev_id, rinse and repeat. Also note that you can query the sequence AND advance it and then insert all rows without default values. > > And lastly, in SQL you do not really need a doubly linked list, just > > populate prev_stage_id, and index it and you can query next stage of a > > tuple using it. > Could you please elaborate? Suppose I have this table, > CREATE TABLE stages ( > id SERIAL PRIMARY KEY, > name VARCHAR(80) NOT NULL, > next_id INTEGER REFERENCE stages NULL, > ); > What would be the backward query in that case? Forward is clear. This is > forward query, > SELECT name FROM stages WHERE next_id = 123; No. That is a BACKWARDS QUERY. You are in row 123, you go BACK to its preceedeing one. If you need a traversable list containing (ListID, id,name) = x,1,A; x,2,b; x,3;c ( I've added the ListId column to make it more interesting/reallistic, you normally do not have a single table) In sql you can build a (ListId, id, prev_id, name ) table ( PREV is easier, as when you insert a row, in a normal application, you know the previous one, but not the next one ) with the data (x,1,null,a),(x,2,1,b),(x,3,2,c) ( the last one is a synthetic sentinel and assumes nullable id), you can do it in a lot of ways. To traverse it forward you just querying "select id where listid=x and next_id is null" to locate the head (1), and then just go forward by selecting with prev_id = last got id until you hit zero results. To traverse backwards there are several ways. In the real cases I've used I always had a "list row" where I could store the node for the 1st stage. In that cases i linked them circularly, (id=1, prev=3), so bidirectional traversing was easy. Or you can use a special sentinel node ( with a marker, like name=null). The thing is you locate the last row, and then just query with id=last got prev_id. I do not remember the details, but probably your "stages" are stages of something which has a row, which can readily have a "first_stage_id" or something similar. Lists in tables are not the same as in C, where you directly store pointers which point outwards. In this case any unique data serves as a pointer, slow ( table scan ) by default, faster if you index the column. Anyway, unless you need the "linked list" functionality for something ( really heavy manipulation of large stage lists, splicing things around ), I've normally found it's easier, in sql, to model this kind of thing with a master-detail + order column. ( whatever = (id, ...., first_stage_id), stages=(id, order, .... ) ) Francisco Olarte.
Francisco Olarte <folarte@peoplecall.com> writes: >> Could you please elaborate? Suppose I have this table, >> CREATE TABLE stages ( >> id SERIAL PRIMARY KEY, >> name VARCHAR(80) NOT NULL, >> next_id INTEGER REFERENCE stages NULL, >> ); >> What would be the backward query in that case? Forward is clear. This is >> forward query, >> SELECT name FROM stages WHERE next_id = 123; > > No. That is a BACKWARDS QUERY. You are in row 123, you go BACK to its > preceedeing one. > If you need a traversable list containing (ListID, id,name) = x,1,A; > x,2,b; x,3;c ( I've added the ListId column to make it more > interesting/reallistic, you normally do not have a single table) > In sql you can build a (ListId, id, prev_id, name ) table ( PREV is > easier, as when you insert a row, in a normal application, you know > the previous one, but not the next one ) with the data > (x,1,null,a),(x,2,1,b),(x,3,2,c) ( the last one is a synthetic > sentinel and assumes nullable id), you can do it in a lot of ways. > > To traverse it forward you just querying "select id where listid=x and > next_id is null" to locate the head (1), and then just go forward by > selecting with prev_id = last got id until you hit zero results. > > To traverse backwards there are several ways. In the real cases I've > used I always had a "list row" where I could store the node for the > 1st stage. In that cases i linked them circularly, (id=1, prev=3), so > bidirectional traversing was easy. Or you can use a special sentinel > node ( with a marker, like name=null). The thing is you locate the > last row, and then just query with id=last got prev_id. I do not > remember the details, but probably your "stages" are stages of > something which has a row, which can readily have a "first_stage_id" > or something similar. > > Lists in tables are not the same as in C, where you directly store > pointers which point outwards. In this case any unique data serves as > a pointer, slow ( table scan ) by default, faster if you index the > column. > > Anyway, unless you need the "linked list" functionality for something > ( really heavy manipulation of large stage lists, splicing things > around ), I've normally found it's easier, in sql, to model this kind > of thing with a master-detail + order column. > ( whatever = (id, ...., first_stage_id), stages=(id, order, .... ) ) > Thanks a lot Francisco. This is great help. My stages are stages of processes. So yes processes are also stored in a table. I got the idea. I'll add another column in the processes table which points to the first stage (first_stage_id). And quries Forward pass: 1. select name from stages where id = <first_stage_id from process table> 2. select name from stages where prev_id = <last fetched id> 3. repeat (2) Backward pass: 1. select name from stages where prev_id = <first_stage_id from process table> 2. select name from stages where id = <last fetched prev_id> 3. repeat (2) This is assuming I also create a circular list. I can also store last_stage_id in the process table if we don't want to create circular list in db. Regards. -- Pankaj Jangid
Pankaj: On Mon, Sep 23, 2019 at 4:07 PM Pankaj Jangid <pankaj.jangid@gmail.com> wrote: ... > My stages are stages of processes. So yes processes are also stored in a > table. I got the idea. I'll add another column in the processes table > which points to the first stage (first_stage_id). And quries > Forward pass: ... > Backward pass: .. > This is assuming I also create a circular list. I can also store > last_stage_id in the process table if we don't want to create circular > list in db. That's exactly the idea. A pointer in C becomes a pair of fields with corresponding value, and it can be traversed in any direction, slowly by table scan or fastly with indexes. In fact a pair of values becomes equivalent to two pointers, as it can be traversed either way ( but the indexing acceleration has to be applied to each direction ). That being said, linked lists are procedural data structures, SQL is declarative, so they are not a great match, that's one of the reasons why they are rarely seen. Things like master-detail have less impedance mismatch. Francisco Olarte.
Francisco Olarte <folarte@peoplecall.com> writes: > That being said, linked lists are procedural data structures, SQL is > declarative, so they are not a great match, that's one of the reasons > why they are rarely seen. Things like master-detail have less > impedance mismatch. Thanks Francisco. Got the idea. -- Pankaj Jangid