Thread: Re: psql and bytea
Sorry for my poor English. First, postgresql is a beautiful DBMS ! I try to use bytea type with psql to store some files (i know, there are \lo_*). There are some informations in the man page of psql : ------------------------------------ Another possible use of this mechanism is to copy the contents of a file into a field. First load the file into a variable and then proceed as above. testdb=> \set content ’\’’ ‘cat my_file.txt‘ ’\’’ testdb=> INSERT INTO my_table VALUES (:content); One possible problem with this approach is that my_file.txt might contain single quotes. These need to be escaped so that they don’t cause a syntax error when the third line is processed. This could be done with the program sed: testdb=> \set content ’\’’ ‘sed -e "s/’/\\\\\\’/g" < my_file.txt‘ ’\’’ But this does not work with binary string (bytea) since i want to store any kind of files. ------------------------------------ I check the documentation (really fantastic documentation ! Great job): http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=datatype-binary.html I build two very simple tools to convert a binary stream to a bytea stream. ------------------------------------ tobytea_sql.c : /* * convert a stream to bytea. * The '\' is escaped, so it can be used in a SQL statement. */ #include <stdio.h> int main(void) { int c ; while ((c = getchar()) != EOF) { if ( c == '\'' || c == '\\' || c < 32 || c > 126 ) { printf("\\\\%d%d%d", (c % (8*8*8)) / (8*8), (c % (8*8)) / 8, c % 8) ; } else { putchar(c) ; } } return 0 ; } ------------------------------------ /* * convert a bytea stream * This is not compatible with the output of tobytea_sql ! * tobytea_sql put two \ . frombytea espect only one ! */ #include <stdio.h> int main(void) { int c ; int car ; while ((c = getchar()) != EOF) { if ( c < 32 || c > 126) { fprintf(stderr, "Char ignored\n") ; break ; } if ( c == '\\' ) { if ((c = getchar()) == EOF) return 0 ; car = (c - '0') * 8 * 8 ; if ((c = getchar()) == EOF) return 0 ; car += (c - '0') * 8 ; if ((c = getchar()) == EOF) return 0 ; car += c - '0' ; putchar(car) ; } else { putchar(c) ; } } return 0 ; } ------------------------------------ How to use it ? insert into a table : $ psql [...] one=> create temp table tmp (data bytea) ; CREATE TABLE one=> -- push some data in a psql variable one=> \set content '\''`tobytea_sql < data`'\'' one=> \echo :content 'o\\030\\251\\273C7\\266K\\331 ... [snip] ... \323W\\027\\256D' one=> -- insert one=> insert into tmp (data) values (:content) ; INSERT 35042356 1 one=> -- vrite the content of temp.data to a file one=> \a Output format is unaligned. one=> \t Showing only tuples. one=> \o | frombytea > data one=> select data from tmp ; one=> Char ignored -- psql send a CR at the end of the row (frombytea drop it since CR = 10 and it is not a "normal" char for a bytea stream). Is there a bug in psql ? Check this : WARNING : this example take a lot of memory ! Create a big file with some random data : $ dd if=/dev/urandom bs=1M count=20 of=data $ #the bytea stream is about 74 Mo !) $ tobytea < data | wc 0 81574 74379444 $ psql [...] one=> \set content '\''`tobytea_sql < data`'\'' one=> insert into tmp (data) values(:content) ; one'> -- Problem ! one'> ' -- try to continue one(> ) ; INSERT 35042373 1 one=> select length(data) from tmp ; length -- 18924736 (1 row) one=> -- this is not the original size (20M : 20971520). I don't have any problems with smaller files (10Mo is always fine). All of this is used with postgresql 7.3.2 shipped with Red Hat Linux 9 . -- Féliciano Matias <feliciano.matias@free.fr>
Attachment
=?ISO-8859-1?Q?F=E9liciano?= Matias <feliciano.matias@free.fr> writes: > one=3D> -- this is not the original size (20M : 20971520). > I don't have any problems with smaller files (10Mo is always fine). > All of this is used with postgresql 7.3.2 shipped with Red Hat Linux 9 . Hmm. There used to be some off-by-one type bugs in psql's variable-substitution code, but those were fixed long before 7.3.2. In any case, it's hard to see why such a problem would only arise when you got past 10Mb string lengths. I couldn't duplicate the problem here, so I'm going to suggest that maybe you have a hardware problem? Perhaps there's a flaky RAM chip in an area of memory that doesn't get used until you push up the size of psql quite a bit. It'd be worth running memtest86 for awhile to check. regards, tom lane
Le jeu 15/05/2003 à 16:47, Tom Lane a écrit : > =?ISO-8859-1?Q?F=E9liciano?= Matias <feliciano.matias@free.fr> writes: > > one=3D> -- this is not the original size (20M : 20971520). > > I don't have any problems with smaller files (10Mo is always fine). > > All of this is used with postgresql 7.3.2 shipped with Red Hat Linux 9 . > > Hmm. There used to be some off-by-one type bugs in psql's > variable-substitution code, but those were fixed long before 7.3.2. > In any case, it's hard to see why such a problem would only arise > when you got past 10Mb string lengths. > More then 10Mb string lengths : > > $ tobytea < data | wc > > 0 81574 74379444 (71 Mo) > I couldn't duplicate the problem here, It seems it's a bug in glibc-2.3.2-27.9.i686.rpm coming with RH9. > so I'm going to suggest that > maybe you have a hardware problem? Perhaps there's a flaky RAM chip in > an area of memory that doesn't get used until you push up the size > of psql quite a bit. It'd be worth running memtest86 for awhile to > check. The hardware is ok. I found the problem. It's bug in sprintf() =============================================== #include <stdio.h> #include <stdlib.h> #include <string.h> #define SIZE (1024*70000) int main(void) { char * s = malloc(SIZE) ; char * d = malloc(SIZE) ; memset(s, 'a', SIZE-1) ; d[SIZE-1] = '\0' ; sprintf(d,"%s",s) ; printf("%zi\n", strlen(d)) ; return 0 ; } =============================================== $ ./a.out 67108863 (2^26-1) My libc have a problem. I will fill a bug report to http://bugzilla.redhat.com/ . I apply this patch to the postgresql source : =============================================== diff -urN postgresql-7.3.2.orig/src/bin/psql/mainloop.c postgresql-7.3.2/src/bin/psql/mainloop.c --- postgresql-7.3.2.orig/src/bin/psql/mainloop.c 2002-10-13 01:09:34.000000000 +0200 +++ postgresql-7.3.2/src/bin/psql/mainloop.c 2003-05-16 04:46:03.000000000 +0200 @@ -389,8 +389,9 @@ exit(EXIT_FAILURE); } - sprintf(new, "%.*s%s%s", i, line, value, - &line[i + thislen + in_length]); + sprintf(new, "%.*s", i, line) ; + strcat(&new[i], value) ; + strcat(&new[i+out_length], &line[i + thislen + in_length]) ; free(line); line = new; =============================================== Also, i want to know if someone is interesting by tools such as (to|from)bytea. I can put this little toys (with some enhancement. "--help" :-) ) in contrib and update the man page of psql to show how to use this tools to copy the content of a file (or stream) into a field. Since Postgresql have binary string, it is interesting and better than \lo_* for little files (this use more memory than \lo_*). By the way, i can add an option to psql that toggle the output of carrier return at the end of each records. Perhaps this can be add to \t. > > regards, tom lane > -- Féliciano Matias <feliciano.matias@free.fr>
Attachment
** Reply to message from Féliciano Matias <feliciano.matias@free.fr> on 16 May 2003 06:56:02 +0200 Hi, I'm interested :) Regards, Wayne > Le jeu 15/05/2003 à 16:47, Tom Lane a écrit : > > =?ISO-8859-1?Q?F=E9liciano?= Matias <feliciano.matias@free.fr> writes: > > > one=3D> -- this is not the original size (20M : 20971520). > > > I don't have any problems with smaller files (10Mo is always fine). > > > All of this is used with postgresql 7.3.2 shipped with Red Hat Linux 9 . > > > > Hmm. There used to be some off-by-one type bugs in psql's > > variable-substitution code, but those were fixed long before 7.3.2. > > In any case, it's hard to see why such a problem would only arise > > when you got past 10Mb string lengths. > > > > More then 10Mb string lengths : > > > $ tobytea < data | wc > > > 0 81574 74379444 (71 Mo) > > > I couldn't duplicate the problem here, > > It seems it's a bug in glibc-2.3.2-27.9.i686.rpm coming with RH9. > > > so I'm going to suggest that > > maybe you have a hardware problem? Perhaps there's a flaky RAM chip in > > an area of memory that doesn't get used until you push up the size > > of psql quite a bit. It'd be worth running memtest86 for awhile to > > check. > > The hardware is ok. > > I found the problem. It's bug in sprintf() > =============================================== > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #define SIZE (1024*70000) > int main(void) { > char * s = malloc(SIZE) ; > char * d = malloc(SIZE) ; > memset(s, 'a', SIZE-1) ; > d[SIZE-1] = '\0' ; > sprintf(d,"%s",s) ; > printf("%zi\n", strlen(d)) ; > return 0 ; > } > =============================================== > $ ./a.out > 67108863 (2^26-1) > > My libc have a problem. I will fill a bug report to > http://bugzilla.redhat.com/ . > > I apply this patch to the postgresql source : > =============================================== > diff -urN postgresql-7.3.2.orig/src/bin/psql/mainloop.c postgresql-7.3.2/src/bin/psql/mainloop.c > --- postgresql-7.3.2.orig/src/bin/psql/mainloop.c 2002-10-13 01:09:34.000000000 +0200 > +++ postgresql-7.3.2/src/bin/psql/mainloop.c 2003-05-16 04:46:03.000000000 +0200 > @@ -389,8 +389,9 @@ > exit(EXIT_FAILURE); > } > > - sprintf(new, "%.*s%s%s", i, line, value, > - &line[i + thislen + in_length]); > + sprintf(new, "%.*s", i, line) ; > + strcat(&new[i], value) ; > + strcat(&new[i+out_length], &line[i + thislen + in_length]) ; > > free(line); > line = new; > =============================================== > > Also, i want to know if someone is interesting by tools such as > (to|from)bytea. I can put this little toys (with some enhancement. > "--help" :-) ) in contrib and update the man page of psql to show how > to use this tools to copy the content of a file (or stream) into a > field. > Since Postgresql have binary string, it is interesting and better than > \lo_* for little files (this use more memory than \lo_*). > > By the way, i can add an option to psql that toggle the output of > carrier return at the end of each records. Perhaps this can be add to > \t. > > > > > regards, tom lane > > > > -- > Féliciano Matias <feliciano.matias@free.fr>
> My libc have a problem. I will fill a bug report to > http://bugzilla.redhat.com/ . > done : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=90987 -- Féliciano Matias <feliciano.matias@free.fr>
Attachment
Le ven 16/05/2003 à 06:56, Féliciano Matias a écrit : > > All of this is used with postgresql 7.3.2 shipped with Red Hat Linux 9 > [...] > I found the problem. It's bug in sprintf() > =============================================== > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #define SIZE (1024*70000) > int main(void) { > char * s = malloc(SIZE) ; > char * d = malloc(SIZE) ; > memset(s, 'a', SIZE-1) ; > d[SIZE-1] = '\0' ; > sprintf(d,"%s",s) ; > printf("%zi\n", strlen(d)) ; > return 0 ; > } > =============================================== > $ ./a.out > 67108863 (2^26-1) > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=90987 from bugzilla@redhat.com ------- Additional Comments From drepper@redhat.com 2003-06-09 23:22 ------- The current glibc CVS code has been changed to not have this liimtation anymore. -- Féliciano Matias <feliciano.matias@free.fr>
Attachment
I've seen that come up several times now. Féliciano Matias wrote: > Le ven 16/05/2003 à 06:56, Féliciano Matias a écrit : > > >>>All of this is used with postgresql 7.3.2 shipped with Red Hat Linux 9 > > >>[...] > > >>I found the problem. It's bug in sprintf() >>=============================================== >>#include <stdio.h> >>#include <stdlib.h> >>#include <string.h> >>#define SIZE (1024*70000) >>int main(void) { >> char * s = malloc(SIZE) ; >> char * d = malloc(SIZE) ; >> memset(s, 'a', SIZE-1) ; >> d[SIZE-1] = '\0' ; >> sprintf(d,"%s",s) ; >> printf("%zi\n", strlen(d)) ; >> return 0 ; >>} >>=============================================== >>$ ./a.out >>67108863 (2^26-1) > > >>https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=90987 > > > from > > bugzilla@redhat.com > > > ------- Additional Comments From drepper@redhat.com 2003-06-09 23:22 ------- > The current glibc CVS code has been changed to not have this liimtation anymore. >