Thread: Re: system catalog relation of a table and a serial sequence
[2001-12-14 22:53] Brent Verner said: | I propose adding a function to pg_dump.c for now. Patch adding a getSerialSequenceName() function to pg_dump.[ch] is attached. I'm aware that this is not the /best/ solution to this problem, but it is better than the current breakage in pg_dump. feedback appreciated. cheers. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
Brent Verner <brent@rcfile.org> writes: > I'm aware that this is not the /best/ solution to this problem, but > it is better than the current breakage in pg_dump. I'd dispute that, primarily because the patch blithely assumes that there is no other kind of default value than a serial-created nextval(). It looks to me like it will either coredump or do the wrong thing with other default-value strings. regards, tom lane
[2001-12-15 21:02] Tom Lane said: | Brent Verner <brent@rcfile.org> writes: | > I'm aware that this is not the /best/ solution to this problem, but | > it is better than the current breakage in pg_dump. | | I'd dispute that, primarily because the patch blithely assumes that | there is no other kind of default value than a serial-created nextval(). | It looks to me like it will either coredump or do the wrong thing | with other default-value strings. monkey me! Yes, quite a nasty oversight! I'll clean this up. thanks. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
[2001-12-15 21:12] Tom Lane said: | While you're at it, why not fix the code so that it can deal with | multiple SERIALs attached to a table? will do. I'd appreciate a bit of advice on both of the issues to be addressed. 1) Is a strcmp(firststrtok,"nextval('") == 0 sufficient to determine that the adsrc is indeed one that we're looking for? If not, suggestions are greatly appreciated :-) 2) Should this function now look like .. ? char** getSerialSequenceNames(const char* table) Or would you suggest it return a smarter struct? thanks. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
While you're at it, why not fix the code so that it can deal with multiple SERIALs attached to a table? regards, tom lane
Brent Verner <brent@rcfile.org> writes: > 1) Is a strcmp(firststrtok,"nextval('") == 0 sufficient to determine > that the adsrc is indeed one that we're looking for? If not, > suggestions are greatly appreciated :-) I would not use strtok at all, but look for nextval('" at the start of the string and "'::text) at the end. If both match, and there's at least one character between, then the intervening text can be presumed to be a sequence name. You might further check that the apparent sequence name ends with _seq --- if not, it wasn't generated by SERIAL. > 2) Should this function now look like .. ? > char** getSerialSequenceNames(const char* table) > Or would you suggest it return a smarter struct? char** (null-terminated vector) would probably work. BTW, don't forget you have the OID of the table available from the table list, so you can avoid the subselect, as well as the relname quoting issues that you didn't take care of. When a tablename argument is provided, I'd be inclined to make a pre-pass over the table list to see if it matches any non-sequence table names, and if so build a list of their associated sequence name(s). Keep in mind that we'll probably generalize the tablename argument to support wildcarding someday soon, so it'd be good if the code could cope with more than one matching table. regards, tom lane
[2001-12-15 21:43] Tom Lane said: | Brent Verner <brent@rcfile.org> writes: | > 1) Is a strcmp(firststrtok,"nextval('") == 0 sufficient to determine | > that the adsrc is indeed one that we're looking for? If not, | > suggestions are greatly appreciated :-) | | I would not use strtok at all, but look for nextval('" at the start | of the string and "'::text) at the end. If both match, and there's | at least one character between, then the intervening text can be | presumed to be a sequence name. You might further check that the | apparent sequence name ends with _seq --- if not, it wasn't generated | by SERIAL. Why not use strtok? The following should be safe, no? t1 = strtok(adsrc,"\""); t2 = strtok(NULL,"\""); t3 = strtok(NULL,"\""); if( t0 && t2 && strcmp(t0,"nextval('") == 0 && strcmp(t2,"'::text)") == 0 ){ /* this is a call to nextval, check for t1 =~ /_seq$/ */ } | > 2) Should this function now look like .. ? | > char** getSerialSequenceNames(const char* table) | > Or would you suggest it return a smarter struct? | | char** (null-terminated vector) would probably work. | | BTW, don't forget you have the OID of the table available from the table | list, so you can avoid the subselect, as well as the relname quoting | issues that you didn't take care of. When a tablename argument is | provided, I'd be inclined to make a pre-pass over the table list to see | if it matches any non-sequence table names, and if so build a list of | their associated sequence name(s). Keep in mind that we'll probably | generalize the tablename argument to support wildcarding someday soon, | so it'd be good if the code could cope with more than one matching | table. gotcha. thanks. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
[2001-12-15 22:26] Rod Taylor said: | | > Brent Verner <brent@rcfile.org> writes: | > > 1) Is a strcmp(firststrtok,"nextval('") == 0 sufficient to | determine | > > that the adsrc is indeed one that we're looking for? If not, | > > suggestions are greatly appreciated :-) | > | > I would not use strtok at all, but look for nextval('" at the start | > of the string and "'::text) at the end. If both match, and there's | > at least one character between, then the intervening text can be | > presumed to be a sequence name. You might further check that the | > apparent sequence name ends with _seq --- if not, it wasn't | generated | > by SERIAL. | | Wouldn't you want to include user sequences that are required for | using the table? If someone has used their own sequence as the | default value for a column it would be nice to have it dumped as well. This is my thought as well. Hopefully Tom will concur. cheers. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
Brent Verner <brent@rcfile.org> writes: > Why not use strtok? Well, it's ugly (I don't like non-reentrant library routines), it's not really buying anything, and I don't think you've got the corner cases right anyway. I'd go for something like if (strlen(adsrc) > 19 && strncmp(adsrc, "nextval('\"", 10) == 0 && strcmp(adsrc + strlen(adsrc) - 9, "\"'::text)") == 0) regards, tom lane
> | > You might further check that the > | > apparent sequence name ends with _seq --- if not, it wasn't > | > generated by SERIAL. > | > | Wouldn't you want to include user sequences that are required for > | using the table? If someone has used their own sequence as the > | default value for a column it would be nice to have it dumped as well. > This is my thought as well. Hopefully Tom will concur. Well, that's why I said "might". I'm not sure what the correct behavior is here. If we had an actual SERIAL datatype --- that is, we could unambiguously tell that a given column was SERIAL --- then a case could be made that "pg_dump -t table" should dump only those sequences associated with table's SERIAL columns. I think it'd be a bit surprising if "pg_dump -t table" would dump sequences declared independently of the table. An example where you'd likely not be happy with that is if the same sequence is being used to feed multiple tables. I agree that dumping all such sequences will often be the desired behavior, but that doesn't leave me convinced that it's the right thing to do. Any comments out there? regards, tom lane
[2001-12-15 23:17] Tom Lane said: | > | > You might further check that the | > | > apparent sequence name ends with _seq --- if not, it wasn't | > | > generated by SERIAL. | > | | > | Wouldn't you want to include user sequences that are required for | > | using the table? If someone has used their own sequence as the | > | default value for a column it would be nice to have it dumped as well. | | > This is my thought as well. Hopefully Tom will concur. | | Well, that's why I said "might". I'm not sure what the correct behavior | is here. If we had an actual SERIAL datatype --- that is, we could | unambiguously tell that a given column was SERIAL --- then a case could | be made that "pg_dump -t table" should dump only those sequences | associated with table's SERIAL columns. | | I think it'd be a bit surprising if "pg_dump -t table" would dump | sequences declared independently of the table. An example where you'd | likely not be happy with that is if the same sequence is being used to | feed multiple tables. | | I agree that dumping all such sequences will often be the desired | behavior, but that doesn't leave me convinced that it's the right | thing to do. | | Any comments out there? sure :-) What we can do is determine /any/ sequence referenced by a nextval(..) attribute default with the following SELECT query. create sequence non_serial_sequence; create table aaa ( id serial, nonid int default nextval('non_serial_sequence') ); SELECT adsrc FROM pg_attrdef WHERE adrelid=( SELECT oid FROM pg_class WHERE relname='aaa' ); adsrc -------------------------------------- nextval('"aaa_id_seq"'::text) nextval('non_serial_sequence'::text) We get the nextval(..) calls to both of the referenced sequences, and the strtok code I'm using extracts the proper sequence names. Am I overlooking something here? Is there any other way a nextval(..) adsrc would appear not containing a sequence related to this relation? cheers. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
[2001-12-15 23:25] Tom Lane said: | Brent Verner <brent@rcfile.org> writes: | > Why not use strtok? | | Well, it's ugly (I don't like non-reentrant library routines), it's | not really buying anything, and I don't think you've got the corner | cases right anyway. I'd go for something like How about strtok_r? I /really/ like the fact that strtok will eat either of the tokens ['"] that might be around the sequence name... just call me lazy :-) thanks. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
Brent Verner <brent@rcfile.org> writes: > How about strtok_r? I /really/ like the fact that strtok will > eat either of the tokens ['"] that might be around the sequence > name... just call me lazy :-) That behavior creates one of the "corner cases" I was alluding to. Shall I leave the difficulty as an exercise for the student? regards, tom lane
[2001-12-16 00:42] Tom Lane said: | Brent Verner <brent@rcfile.org> writes: | > How about strtok_r? I /really/ like the fact that strtok will | > eat either of the tokens ['"] that might be around the sequence | > name... just call me lazy :-) | | That behavior creates one of the "corner cases" I was alluding to. | Shall I leave the difficulty as an exercise for the student? sure. I'm assuming the strtok_r is not acceptable :-) I'll work on this a bit more tonight. Expect a better patch sometiime tomorrow. thanks. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
At 01:20 16/12/01 -0500, Brent Verner wrote: > > Expect a better patch sometiime tomorrow. > The hard part in this might be getting pg_restore to know that the sequence is part of the table definition; perhaps it is adequate to use the 'deps' field. I think it is currently unused for SEQUENCE (and SEQUENCE SET) entries, so we could assume that if it is set, the sequence is logically part of a table. You could set the deps to the table OID, then the restore operation (_tocEntryRequired) could scan the TOC for a matching table, and see if the matching table is being restored (ie. _tocEntryRequired would, in the case of 'SEQUENCE' and 'SEQUENCE SET' entries, scan the entire TOC for the matching table then call itself recursively on the table entry. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
[2001-12-16 00:42] Tom Lane said: | Brent Verner <brent@rcfile.org> writes: | > How about strtok_r? I /really/ like the fact that strtok will | > eat either of the tokens ['"] that might be around the sequence | > name... just call me lazy :-) | | That behavior creates one of the "corner cases" I was alluding to. | Shall I leave the difficulty as an exercise for the student? Ok... I ended up working longer than I'd thought :-) * no strtok were used in this patch. ;-) * Handles both serial-sequences and user-sequences referenced in nextval(...) default column defs. * Loop over tables so we can check wildcard table name in the future per your suggestion. I've only noted a TODO: regarding the wildcard matching. * Instead of using a NULL terminated char** array to hold the collected sequence names, I put in a simple strarray ADT -- mostly so I could have the strarrayContains() test to call from the conditional around dumpSequence(). If this is just dumb, I'll replace it with a simple char** implementation. Did I overlook some utility funcs in the PG source that already does this? If so, I'll gladly use those. * Patch is really attached :-P comments? tired. b -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
Attachment
[2001-12-16 06:30] Brent Verner said: | [2001-12-16 00:42] Tom Lane said: | | Brent Verner <brent@rcfile.org> writes: | | > How about strtok_r? I /really/ like the fact that strtok will | | > eat either of the tokens ['"] that might be around the sequence | | > name... just call me lazy :-) | | | | That behavior creates one of the "corner cases" I was alluding to. | | Shall I leave the difficulty as an exercise for the student? | | Ok... I ended up working longer than I'd thought :-) | | * no strtok were used in this patch. ;-) | * Handles both serial-sequences and user-sequences referenced in | nextval(...) default column defs. | * Loop over tables so we can check wildcard table name in the future | per your suggestion. I've only noted a TODO: regarding the wildcard | matching. | * Instead of using a NULL terminated char** array to hold the collected | sequence names, I put in a simple strarray ADT -- mostly so I could | have the strarrayContains() test to call from the conditional around | dumpSequence(). If this is just dumb, I'll replace it with a simple | char** implementation. Did I overlook some utility funcs in the | PG source that already does this? If so, I'll gladly use those. | * Patch is really attached :-P This patch needs a fix already... I just realized (while playing with this code in a different context) that I forgot to change the malloc line in strarrayInit() after typedef'ing strarray as pointer to struct, instead of just the struct. - strarray _ary = (strarray)malloc(sizeof(strarray)); + strarray _ary = (strarray)malloc(sizeof(struct strarray)); cheers. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
On Sun, 2001-12-16 at 17:17, Tom Lane wrote: > > | > You might further check that the > > | > apparent sequence name ends with _seq --- if not, it wasn't > > | > generated by SERIAL. > > | > > | Wouldn't you want to include user sequences that are required for > > | using the table? If someone has used their own sequence as the > > | default value for a column it would be nice to have it dumped as well. > > > This is my thought as well. Hopefully Tom will concur. > > Well, that's why I said "might". I'm not sure what the correct behavior > is here. If we had an actual SERIAL datatype --- that is, we could > unambiguously tell that a given column was SERIAL --- then a case could > be made that "pg_dump -t table" should dump only those sequences > associated with table's SERIAL columns. > > I think it'd be a bit surprising if "pg_dump -t table" would dump > sequences declared independently of the table. An example where you'd > likely not be happy with that is if the same sequence is being used to > feed multiple tables. > > I agree that dumping all such sequences will often be the desired > behavior, but that doesn't leave me convinced that it's the right > thing to do. > > Any comments out there? Along with "DROP COLUMN" this is probably one of the biggest "I can't believe it doesn't" things out there. I would tend to say that Brent's patch, in dumping all of the sequences used by a table, is erring on the _correct_ side of caution. Remember that someone who this is a problem for can easily post-process the sequence out of the dump with sed or something, but someone for whom the opposite is true doesn't have anything like as trivial a job to put it back in there. Cheers, Andrew. -- -------------------------------------------------------------------- Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267
Brent, do you have a new, final patch that you want to submit for this? --------------------------------------------------------------------------- Brent Verner wrote: > [2001-12-16 06:30] Brent Verner said: > | [2001-12-16 00:42] Tom Lane said: > | | Brent Verner <brent@rcfile.org> writes: > | | > How about strtok_r? I /really/ like the fact that strtok will > | | > eat either of the tokens ['"] that might be around the sequence > | | > name... just call me lazy :-) > | | > | | That behavior creates one of the "corner cases" I was alluding to. > | | Shall I leave the difficulty as an exercise for the student? > | > | Ok... I ended up working longer than I'd thought :-) > | > | * no strtok were used in this patch. ;-) > | * Handles both serial-sequences and user-sequences referenced in > | nextval(...) default column defs. > | * Loop over tables so we can check wildcard table name in the future > | per your suggestion. I've only noted a TODO: regarding the wildcard > | matching. > | * Instead of using a NULL terminated char** array to hold the collected > | sequence names, I put in a simple strarray ADT -- mostly so I could > | have the strarrayContains() test to call from the conditional around > | dumpSequence(). If this is just dumb, I'll replace it with a simple > | char** implementation. Did I overlook some utility funcs in the > | PG source that already does this? If so, I'll gladly use those. > | * Patch is really attached :-P > > This patch needs a fix already... I just realized (while playing with > this code in a different context) that I forgot to change the malloc > line in strarrayInit() after typedef'ing strarray as pointer to struct, > instead of just the struct. > > - strarray _ary = (strarray)malloc(sizeof(strarray)); > + strarray _ary = (strarray)malloc(sizeof(struct strarray)); > > cheers. > brent > > -- > "Develop your talent, man, and leave the world something. Records are > really gifts from people. To think that an artist would love you enough > to share his music with anyone is a beautiful thing." -- Duane Allman > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026