Thread: importing db as text files
What's the big deal with importing text files? I have a 70 MB file to import and it's been one problem after another. I used the copy command and it appears that it's just not possible. I finally massaged the file into a .sql file and ran that using \i db.sql but that failed too because I overlooked ' in names like D'Adario. The other problem I encountered was that a numeric field had to have data in it, pg would not default to the default value. So instead of massaging all the data again I decided to change the data type for that column. This is my first experience with postgresql and I'm wondering if I should expect to encounter similar pain as I go further into this? So far it's been very painful trying to do what I thought would be easy and what I think should be easy. PostgreSQL 7.3.4 on linux redhat 9
On Wed, Aug 13, 2003 at 14:14:20 -0700, expect <expect@ihubbell.com> wrote: > > What's the big deal with importing text files? I have a 70 MB file to import > and it's been one problem after another. I used the copy command and it appears > that it's just not possible. I finally massaged the file into a .sql file and That is unlikely, but without more details it is hard to say what you need to do. > ran that using \i db.sql but that failed too because I overlooked ' in names > like D'Adario. The other problem I encountered was that a numeric field had > to have data in it, pg would not default to the default value. So instead of You can use the keyword default in insert statements to get a default value. > massaging all the data again I decided to change the data type for that column. > This is my first experience with postgresql and I'm wondering if I should expect > to encounter similar pain as I go further into this? So far it's been very > painful trying to do what I thought would be easy and what I think should be > easy. The impression I get is that you expect postgres to make a best guess when presented with ambiguous data. That is a very dangerous thing to do. I would much prefer ambiguous data be rejected so that I can make sure what I think the value is, is the same as what the database thinks the value is.
> -----Original Message----- > From: expect [mailto:expect@ihubbell.com] > Sent: Wednesday, August 13, 2003 2:14 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] importing db as text files > > What's the big deal with importing text files? I have a 70 > MB file to import and it's been one problem after another. I > used the copy command and it appears that it's just not > possible. I finally massaged the file into a .sql file and > ran that using \i db.sql but that failed too because I > overlooked ' in names like D'Adario. The other problem I > encountered was that a numeric field had to have data in it, > pg would not default to the default value. So instead of > massaging all the data again I decided to change the data > type for that column. This is my first experience with > postgresql and I'm wondering if I should expect to encounter > similar pain as I go further into this? So far it's been > very painful trying to do what I thought would be easy and > what I think should be easy. I use the copy command all the time without problems. If the data follows the format you describe for copy, there will be no problems. If the data is some other format, then you will have to use perl or sed or something to reformat it. If your data is in some format that is not easy to massage into something that copy wants to eat, then use an ODBC driver for text files and move all the data with insert/select. Expect no miracles. GI/GO.
Hi expect, Best way in my opinion is to use the copy table command. This way Pg will actually 'massage' the data (string escapes and all) for you. If you have complex files best to use a combo of sed/awk to pre-massage the field delimiters and import the data to a temporary table. In most instances it is best to pipe the data to the psql command using copy table from stdin as from file requires that you are the process owner. Always use a temporary import table and perform validation/further conversion in that table (IMO). eg (from one of my earlier posts) cat unprocesseddata.txt | sed "s/\",\"/$TAB/g" | sed "s/\"//g" | sed "s/.$//g" | psql -dmydatabase -c "copy importtable from stdin delimiter '\t';" The first sed will replace all instances of "," with a tab character (I achieve tab in my shell scripts with a line like TAB=$(printf "\t")) The second sed will remove the start and end " because the first only checks for ",". The third sed will remove the Windows carriage returns (if any - depends on the platform the file was generated from. Note here that my file format never contains any " in a field so I can safely run the second sed. If your import file contains these then you will need to change the regex or use awk. Regards, Jason On Thu, 14 Aug 2003 07:14 am, expect wrote: > What's the big deal with importing text files? I have a 70 MB file to > import and it's been one problem after another. I used the copy command > and it appears that it's just not possible. I finally massaged the file > into a .sql file and ran that using \i db.sql but that failed too because I > overlooked ' in names like D'Adario. The other problem I encountered was > that a numeric field had to have data in it, pg would not default to the > default value. So instead of massaging all the data again I decided to > change the data type for that column. This is my first experience with > postgresql and I'm wondering if I should expect to encounter similar pain > as I go further into this? So far it's been very painful trying to do what > I thought would be easy and what I think should be easy. > > PostgreSQL 7.3.4 on linux redhat 9 > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings
I tend to use perl to preprocess dumps (in our case from Informix). It tends to work much faster than shell scripts (althoughyour mileage may vary). I have to fix missing numeric values (if the column allows nulls why can't the copy commandaccept an empty field, I wonder?), missing dates. In other cases is massages date formats, spatial data, etc. Forexample, a crude program below to clean DOQQ metadata: Greg W. ================================== firenze% more infxunl2psql #!/usr/dist/bin/perl -w $FILE = $ARGV[0]; $OUTPUT = $ARGV[1]; $MODE = $ARGV[2]; open (INFILE,"$FILE"); open (OUTFILE,">$OUTPUT"); foreach $line (<INFILE>) { chop($line); chop($line); if (($MODE cmp "DOQ") == 0) { ($t_source_filename, $t_quadrangle_name, $t_west_longitude, $t_east_longitude, $t_north_latitude, $t_south_latitude,$t_production_date, $t_raster_order, $t_band_organization, $t_band_content, $t_bits_per_pixel, $t_samples_and_lines,$t_horizontal_datum, $t_horizontal_coordinate_system, $t_coordinate_zone, $t_horizontal_units, $t_horizontal_resolution,$t_secondary_horizontal_datum, $t_xy_origin, $t_secondary_xy_origin, $t_nw_quad_corner_xy, $t_ne_quad_corner_xy,$t_se_quad_corner_xy, $t_sw_quad_corner_xy, $t_secondary_nw_quad_xy, $t_secondary_ne_quad_xy, $tsecondary_se_quad_xy,$t_secondary_sw_quad_xy, $t_rmse_xy, $t_image_source, $t_source_dem_date, $t_agency, $t_producer,$t_production_system, $t_standard_version, $t_metadata_date, $t_data_file_size, $byte_count) = split(/\|/,$line); if (length($t_production_date) == 0) { $t_production_date = "\\N"; # psql seems to dump a blank data with this nomenclature } if (length($t_coordinate_zone) == 0) { # an integer $t_coordinate_zone = 0; } if (length($t_band_content) == 0) { $t_band_content = 0; # also an int } if (length($t_bits_per_pixel) == 0) { $t_bits_per_pixel = 0; # reasonable default for an int ? } if (length($t_horizontal_resolution) == 0) { $t_horizontal_resolution = 0.0; } if (length($t_secondary_horizontal_datum) == 0) { $t_secondary_horizontal_datum = "\'\'"; } if (length($t_rmse_xy) == 0) { $t_rmse_xy = 0.0; } if (length($t_metadata_date) == 0) { $t_metadata_date = "\\N"; } if (length($t_data_file_size) == 0) { $t_data_file_size = 0; # a big int } if (length($byte_count) == 0) { $byte_count = 0; # reasonable default ? for an int } $out_line = $t_source_filename . "|" . $t_quadrangle_name . "|" . $t_west_longitude . "|" . $t_east_longitude . "|". $t_north_latitude . "|" . $t_south_latitude . "|" . $t_production_date . "|" . $t_raster_order . "|" . $t_band_org anization . "|" . $t_band_content . "|" . $t_bits_per_pixel . "|" . $t_samples_and_lines . "|" . $t_horizontal_datum . "|". $t_horizontal_coordinate_system . "|" . $t_coordinate_zone . "|" . $t_horizontal_units . "|" . $t_horizontal_resolu tion . "|" . $t_secondary_horizontal_datum . "|" . $t_xy_origin . "|" . $t_secondary_xy_origin . "|" . $t_nw_quad_corner_xy. "|" . $t_ne_quad_corner_xy . "|" . $t_se_quad_corner_xy . "|" . $t_sw_quad_corner_xy . "|" . $t_secondary_nw_quad_xy. "|" . $t_secondary_ne_quad_xy . "|" . $tsecondary_se_quad_xy . "|" . $t_secondary_sw_quad_xy . "|". $t_rmse_xy . "|" . $t_image_source . "|" . $t_source_dem_date . "|" . $t_agency . "|" . $t_producer . "|" . $t_production_system. "|" . $t_standard_version . "|" . $t_metadata_date . "|" . $t_data_file_size . "|" . $byte_count; print OUTFILE "$out_line\n"; } else { print OUTFILE "$line\n"; } } close INFILE; close OUTFILE; -----Original Message----- From: Jason Godden [mailto:jasongodden@optushome.com.au] Sent: Wednesday, August 13, 2003 2:35 PM To: expect; pgsql-general@postgresql.org Subject: Re: [GENERAL] importing db as text files Hi expect, Best way in my opinion is to use the copy table command. This way Pg will actually 'massage' the data (string escapes and all) for you. If you have complex files best to use a combo of sed/awk to pre-massage the field delimiters and import the data to a temporary table. In most instances it is best to pipe the data to the psql command using copy table from stdin as from file requires that you are the process owner. Always use a temporary import table and perform validation/further conversion in that table (IMO). eg (from one of my earlier posts) cat unprocesseddata.txt | sed "s/\",\"/$TAB/g" | sed "s/\"//g" | sed "s/.$//g" | psql -dmydatabase -c "copy importtable from stdin delimiter '\t';" The first sed will replace all instances of "," with a tab character (I achieve tab in my shell scripts with a line like TAB=$(printf "\t")) The second sed will remove the start and end " because the first only checks for ",". The third sed will remove the Windows carriage returns (if any - depends on the platform the file was generated from. Note here that my file format never contains any " in a field so I can safely run the second sed. If your import file contains these then you will need to change the regex or use awk. Regards, Jason On Thu, 14 Aug 2003 07:14 am, expect wrote: > What's the big deal with importing text files? I have a 70 MB file to > import and it's been one problem after another. I used the copy command > and it appears that it's just not possible. I finally massaged the file > into a .sql file and ran that using \i db.sql but that failed too because I > overlooked ' in names like D'Adario. The other problem I encountered was > that a numeric field had to have data in it, pg would not default to the > default value. So instead of massaging all the data again I decided to > change the data type for that column. This is my first experience with > postgresql and I'm wondering if I should expect to encounter similar pain > as I go further into this? So far it's been very painful trying to do what > I thought would be easy and what I think should be easy. > > PostgreSQL 7.3.4 on linux redhat 9 > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
On Wed, 13 Aug 2003 16:30:04 -0500 Bruno Wolff III <bruno@wolff.to> wrote: > On Wed, Aug 13, 2003 at 14:14:20 -0700, > expect <expect@ihubbell.com> wrote: > > > > What's the big deal with importing text files? I have a 70 MB file to import > > and it's been one problem after another. I used the copy command and it appears > > that it's just not possible. I finally massaged the file into a .sql file and > > That is unlikely, but without more details it is hard to say what you need > to do. What's unlikely? That it didn't work? But it didn't work. > > > ran that using \i db.sql but that failed too because I overlooked ' in names > > like D'Adario. The other problem I encountered was that a numeric field had > > to have data in it, pg would not default to the default value. So instead of > > You can use the keyword default in insert statements to get a default value. From what I've read it should use the default value when there is no value. Is that not true? Is this a known issue? > > > massaging all the data again I decided to change the data type for that column. > > This is my first experience with postgresql and I'm wondering if I should expect > > to encounter similar pain as I go further into this? So far it's been very > > painful trying to do what I thought would be easy and what I think should be > > easy. > > The impression I get is that you expect postgres to make a best guess when No you've come to the wrong impression. I believe that the problem lies in the fact that pg will not default to the default value when no value is present. > presented with ambiguous data. That is a very dangerous thing to do. I would > much prefer ambiguous data be rejected so that I can make sure what I think > the value is, is the same as what the database thinks the value is. > >
On Thu, 14 Aug 2003 07:34:55 +1000 Jason Godden <jasongodden@optushome.com.au> wrote: > Hi expect, > > Best way in my opinion is to use the copy table command. This way Pg will > actually 'massage' the data (string escapes and all) for you. I guess we're of the same opinion. I did use the copy table command. I believe the problem is that pg is unable to use the default value when a value is not present. > > If you have complex files best to use a combo of sed/awk to pre-massage the > field delimiters and import the data to a temporary table. In most instances > it is best to pipe the data to the psql command using copy table from stdin > as from file requires that you are the process owner. > > Always use a temporary import table and perform validation/further conversion > in that table (IMO). > > eg (from one of my earlier posts) > > cat unprocesseddata.txt | sed "s/\",\"/$TAB/g" | sed "s/\"//g" | sed "s/.$//g" > | psql -dmydatabase -c "copy importtable from stdin delimiter '\t';" > > The first sed will replace all instances of "," with a tab character > (I achieve tab in my shell scripts with a line like TAB=$(printf "\t")) > > The second sed will remove the start and end " because the first only checks > for ",". > > The third sed will remove the Windows carriage returns (if any - depends on > the platform the file was generated from. > > Note here that my file format never contains any " in a field so I can safely > run the second sed. If your import file contains these then you will need to > change the regex or use awk. > > Regards, > > Jason > > On Thu, 14 Aug 2003 07:14 am, expect wrote: > > What's the big deal with importing text files? I have a 70 MB file to > > import and it's been one problem after another. I used the copy command > > and it appears that it's just not possible. I finally massaged the file > > into a .sql file and ran that using \i db.sql but that failed too because I > > overlooked ' in names like D'Adario. The other problem I encountered was > > that a numeric field had to have data in it, pg would not default to the > > default value. So instead of massaging all the data again I decided to > > change the data type for that column. This is my first experience with > > postgresql and I'm wondering if I should expect to encounter similar pain > > as I go further into this? So far it's been very painful trying to do what > > I thought would be easy and what I think should be easy. > > > > PostgreSQL 7.3.4 on linux redhat 9 > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 7: don't forget to increase your free space map settings > > >
On Wed, 13 Aug 2003 14:24:30 -0700 "Dann Corbit" <DCorbit@connx.com> wrote: > > -----Original Message----- > > From: expect [mailto:expect@ihubbell.com] > > Sent: Wednesday, August 13, 2003 2:14 PM > > To: pgsql-general@postgresql.org > > Subject: [GENERAL] importing db as text files > > > > What's the big deal with importing text files? I have a 70 > > MB file to import and it's been one problem after another. I > > used the copy command and it appears that it's just not > > possible. I finally massaged the file into a .sql file and > > ran that using \i db.sql but that failed too because I > > overlooked ' in names like D'Adario. The other problem I > > encountered was that a numeric field had to have data in it, > > pg would not default to the default value. So instead of > > massaging all the data again I decided to change the data > > type for that column. This is my first experience with > > postgresql and I'm wondering if I should expect to encounter > > similar pain as I go further into this? So far it's been > > very painful trying to do what I thought would be easy and > > what I think should be easy. > > I use the copy command all the time without problems. If the data > follows the format you describe for copy, there will be no problems. If > the data is some other format, then you will have to use perl or sed or > something to reformat it. I believe the problem is that pg is unable to use the default value when a value is not present. > > If your data is in some format that is not easy to massage into > something that copy wants to eat, then use an ODBC driver for text files > and move all the data with insert/select. > > Expect no miracles. GI/GO. Importing a text file is a miracle? You're scaring me. ;^) > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > >
On Wed, 13 Aug 2003 14:59:29 -0700 "Gregory S. Williamson" <gsw@globexplorer.com> wrote: > I tend to use perl to preprocess dumps (in our case from Informix). It tends to work much faster than shell scripts (althoughyour mileage may vary). I have to fix missing numeric values (if the column allows nulls why can't the copy commandaccept an empty field, I wonder?), missing dates. In other cases is massages date formats, spatial data, etc. Forexample, a crude program below to clean DOQQ metadata: I'm not having any problems massaging the data. I believe the problem is that pg is unable to use the default value when a value is not present. BTW FWIW I took a few liberties with your perl code below for ease of reading. For $out_line you can use the little append operator like: $out_line = $t_source_filename . "|"; $out_line .= $t_quadrangle_name . "|"; etc., etc. > > Greg W. > ================================== > firenze% more infxunl2psql > #!/usr/dist/bin/perl -w > > $FILE = $ARGV[0]; > $OUTPUT = $ARGV[1]; > $MODE = $ARGV[2]; > > open (INFILE,"$FILE"); > open (OUTFILE,">$OUTPUT"); > > foreach $line (<INFILE>) > { > chop($line); > chop($line); > if (($MODE cmp "DOQ") == 0) { ($t_source_filename, $t_quadrangle_name, $t_west_longitude, $t_east_longitude, $t_north_latitude, $t_south_latitude, $t_production_date, $t_raster_order, $t_band_organization, $t_band_content, $t_bits_per_pixel, $t_samples_and_lines, $t_horizontal_datum, $t_horizontal_coordinate_system, $t_coordinate_zone, $t_horizontal_units, $t_horizontal_resolution, $t_secondary_horizontal_datum, $t_xy_origin, $t_secondary_xy_origin, $t_nw_quad_corner_xy, $t_ne_quad_corner_xy, $t_se_quad_corner_xy, $t_sw_quad_corner_xy, $t_secondary_nw_quad_xy, $t_secondary_ne_quad_xy, $tsecondary_se_quad_xy, $t_secondary_sw_quad_xy, $t_rmse_xy, $t_image_source, $t_source_dem_date, $t_agency, $t_producer, $t_production_system, $t_standard_version, $t_metadata_date, $t_data_file_size, $byte_count) = split(/\|/,$line); > if (length($t_production_date) == 0) { > $t_production_date = "\\N"; # psql seems to dump a blank data with this nomenclature > } > if (length($t_coordinate_zone) == 0) { # an integer > $t_coordinate_zone = 0; > } > if (length($t_band_content) == 0) { > $t_band_content = 0; # also an int > } > if (length($t_bits_per_pixel) == 0) { > $t_bits_per_pixel = 0; # reasonable default for an int ? > } > if (length($t_horizontal_resolution) == 0) { > $t_horizontal_resolution = 0.0; > } > if (length($t_secondary_horizontal_datum) == 0) { > $t_secondary_horizontal_datum = "\'\'"; > } > if (length($t_rmse_xy) == 0) { > $t_rmse_xy = 0.0; > } > if (length($t_metadata_date) == 0) { > $t_metadata_date = "\\N"; > } > if (length($t_data_file_size) == 0) { > $t_data_file_size = 0; # a big int > } > if (length($byte_count) == 0) { > $byte_count = 0; # reasonable default ? for an int > } > $out_line = $t_source_filename . "|" . $t_quadrangle_name . "|" . $t_west_longitude . "|" . $t_east_longitude ."|" . $t_north_latitude . "|" . $t_south_latitude . "|" . $t_production_date . "|" . $t_raster_order . "|" . $t_band_org > anization . "|" . $t_band_content . "|" . $t_bits_per_pixel . "|" . $t_samples_and_lines . "|" . $t_horizontal_datum ."|" . $t_horizontal_coordinate_system . "|" . $t_coordinate_zone . "|" . $t_horizontal_units . "|" . $t_horizontal_resolu > tion . "|" . $t_secondary_horizontal_datum . "|" . $t_xy_origin . "|" . $t_secondary_xy_origin . "|" . $t_nw_quad_corner_xy. "|" . $t_ne_quad_corner_xy . "|" . $t_se_quad_corner_xy . "|" . $t_sw_quad_corner_xy . "|" . $t_secondary_nw_quad_xy. "|" . $t_secondary_ne_quad_xy . "|" . $tsecondary_se_quad_xy . "|" . $t_secondary_sw_quad_xy . "|". $t_rmse_xy . "|" . $t_image_source . "|" . $t_source_dem_date . "|" . $t_agency . "|" . $t_producer . "|" . $t_production_system. "|" . $t_standard_version . "|" . $t_metadata_date . "|" . $t_data_file_size . "|" . $byte_count; > print OUTFILE "$out_line\n"; > } > else { > print OUTFILE "$line\n"; > } > } > close INFILE; > close OUTFILE; > > -----Original Message----- > From: Jason Godden [mailto:jasongodden@optushome.com.au] > Sent: Wednesday, August 13, 2003 2:35 PM > To: expect; pgsql-general@postgresql.org > Subject: Re: [GENERAL] importing db as text files > > > Hi expect, > > Best way in my opinion is to use the copy table command. This way Pg will > actually 'massage' the data (string escapes and all) for you. > > If you have complex files best to use a combo of sed/awk to pre-massage the > field delimiters and import the data to a temporary table. In most instances > it is best to pipe the data to the psql command using copy table from stdin > as from file requires that you are the process owner. > > Always use a temporary import table and perform validation/further conversion > in that table (IMO). > > eg (from one of my earlier posts) > > cat unprocesseddata.txt | sed "s/\",\"/$TAB/g" | sed "s/\"//g" | sed "s/.$//g" > | psql -dmydatabase -c "copy importtable from stdin delimiter '\t';" > > The first sed will replace all instances of "," with a tab character > (I achieve tab in my shell scripts with a line like TAB=$(printf "\t")) > > The second sed will remove the start and end " because the first only checks > for ",". > > The third sed will remove the Windows carriage returns (if any - depends on > the platform the file was generated from. > > Note here that my file format never contains any " in a field so I can safely > run the second sed. If your import file contains these then you will need to > change the regex or use awk. > > Regards, > > Jason > > On Thu, 14 Aug 2003 07:14 am, expect wrote: > > What's the big deal with importing text files? I have a 70 MB file to > > import and it's been one problem after another. I used the copy command > > and it appears that it's just not possible. I finally massaged the file > > into a .sql file and ran that using \i db.sql but that failed too because I > > overlooked ' in names like D'Adario. The other problem I encountered was > > that a numeric field had to have data in it, pg would not default to the > > default value. So instead of massaging all the data again I decided to > > change the data type for that column. This is my first experience with > > postgresql and I'm wondering if I should expect to encounter similar pain > > as I go further into this? So far it's been very painful trying to do what > > I thought would be easy and what I think should be easy. > > > > PostgreSQL 7.3.4 on linux redhat 9 > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 7: don't forget to increase your free space map settings > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > >
On Wed, Aug 13, 2003 at 02:59:29PM -0700, Gregory S. Williamson wrote: > I tend to use perl to preprocess dumps (in our case from Informix). It tends to work much faster than shell scripts (althoughyour mileage may vary). I have to fix missing numeric values (if the column allows nulls why can't the copy commandaccept an empty field, I wonder?), missing dates. In other cases is massages date formats, spatial data, etc. Forexample, a crude program below to clean DOQQ metadata: [snip] > $t_production_date = "\\N"; # psql seems to dump a blank data with this nomenclature Umm, \N represents NULL. NULL is not a blank field, it's null. You could also tell copy that a blank field represents a null but that might have unexpected effects on text fields which are supposed to be blank. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "All that is needed for the forces of evil to triumph is for enough good > men to do nothing." - Edmond Burke > "The penalty good people pay for not being interested in politics is to be > governed by people worse than themselves." - Plato
Attachment
On Wed, 13 Aug 2003 14:59:29 -0700 "Gregory S. Williamson" <gsw@globexplorer.com> wrote: > I tend to use perl to preprocess dumps (in our case from Informix). It tends > to work much faster than shell scripts (although your mileage may vary). I > have to fix missing numeric values (if the column allows nulls why can't the > copy command accept an empty field, I wonder?), missing dates. In other cases This seems to be the question I need answered as well. Does anyone know the answer? > is massages date formats, spatial data, etc. For example, a crude program > below to clean DOQQ metadata: > > Greg W. > ================================== > firenze% more infxunl2psql > #!/usr/dist/bin/perl -w > > $FILE = $ARGV[0]; > $OUTPUT = $ARGV[1]; > $MODE = $ARGV[2]; > > open (INFILE,"$FILE"); > open (OUTFILE,">$OUTPUT"); > > foreach $line (<INFILE>) > { > chop($line); > chop($line); > if (($MODE cmp "DOQ") == 0) { > ($t_source_filename, $t_quadrangle_name, $t_west_longitude, > $t_east_longitude, $t_north_latitude, $t_south_latitude, > $t_production_date, $t_raster_order, $t_band_organization, > $t_band_content, $t_bits_per_pixel, $t_samples_and_lines, > $t_horizontal_datum, $t_horizontal_coordinate_system, > $t_coordinate_zone, $t_horizontal_units, $t_horizontal_resolution, > $t_secondary_horizontal_datum, $t_xy_origin, $t_secondary_xy_origin, > $t_nw_quad_corner_xy, $t_ne_quad_corner_xy, $t_se_quad_corner_xy, > $t_sw_quad_corner_xy, $t_secondary_nw_quad_xy, > $t_secondary_ne_quad_xy, $tsecondary_se_quad_xy, > $t_secondary_sw_quad_xy, $t_rmse_xy, $t_image_source, > $t_source_dem_date, $t_agency, $t_producer, $t_production_system, > $t_standard_version, $t_metadata_date, $t_data_file_size, $byte_count) > = split(/\|/,$line); if (length($t_production_date) == 0) { > $t_production_date = "\\N"; # psql seems to dump a blank data with > this nomenclature} > if (length($t_coordinate_zone) == 0) { # an integer > $t_coordinate_zone = 0; > } > if (length($t_band_content) == 0) { > $t_band_content = 0; # also an int > } > if (length($t_bits_per_pixel) == 0) { > $t_bits_per_pixel = 0; # reasonable default for an int ? > } > if (length($t_horizontal_resolution) == 0) { > $t_horizontal_resolution = 0.0; > } > if (length($t_secondary_horizontal_datum) == 0) { > $t_secondary_horizontal_datum = "\'\'"; > } > if (length($t_rmse_xy) == 0) { > $t_rmse_xy = 0.0; > } > if (length($t_metadata_date) == 0) { > $t_metadata_date = "\\N"; > } > if (length($t_data_file_size) == 0) { > $t_data_file_size = 0; # a big int > } > if (length($byte_count) == 0) { > $byte_count = 0; # reasonable default ? for an int > } > $out_line = $t_source_filename . "|" . $t_quadrangle_name . "|" . > $t_west_longitude . "|" . $t_east_longitude . "|" . $t_north_latitude > . "|" . $t_south_latitude . "|" . $t_production_date . "|" . > $t_raster_order . "|" . $t_band_org > anization . "|" . $t_band_content . "|" . $t_bits_per_pixel . "|" . > $t_samples_and_lines . "|" . $t_horizontal_datum . "|" . > $t_horizontal_coordinate_system . "|" . $t_coordinate_zone . "|" . > $t_horizontal_units . "|" . $t_horizontal_resolu tion . "|" . > $t_secondary_horizontal_datum . "|" . $t_xy_origin . "|" . > $t_secondary_xy_origin . "|" . $t_nw_quad_corner_xy . "|" . > $t_ne_quad_corner_xy . "|" . $t_se_quad_corner_xy . "|" . $t_sw_quad_corner_xy > . "|" . $t_secondary_nw_quad_xy . "|" . $t_secondary_ne_quad_xy . "|" . > $tsecondary_se_quad_xy . "|" . $t_secondary_sw_quad_xy . "|" . $t_rmse_xy . > "|" . $t_image_source . "|" . $t_source_dem_date . "|" . $t_agency . "|" . > $t_producer . "|" . $t_production_system . "|" . $t_standard_version . "|" . > $t_metadata_date . "|" . $t_data_file_size . "|" . $byte_count; > print OUTFILE "$out_line\n"; > } > else { > print OUTFILE "$line\n"; > } > } > close INFILE; > close OUTFILE; > > -----Original Message----- > From: Jason Godden [mailto:jasongodden@optushome.com.au] > Sent: Wednesday, August 13, 2003 2:35 PM > To: expect; pgsql-general@postgresql.org > Subject: Re: [GENERAL] importing db as text files > > > Hi expect, > > Best way in my opinion is to use the copy table command. This way Pg will > actually 'massage' the data (string escapes and all) for you. > > If you have complex files best to use a combo of sed/awk to pre-massage the > field delimiters and import the data to a temporary table. In most instances > it is best to pipe the data to the psql command using copy table from stdin > as from file requires that you are the process owner. > > Always use a temporary import table and perform validation/further conversion > in that table (IMO). > > eg (from one of my earlier posts) > > cat unprocesseddata.txt | sed "s/\",\"/$TAB/g" | sed "s/\"//g" | sed "s/.$//g" > > | psql -dmydatabase -c "copy importtable from stdin delimiter '\t';" > > The first sed will replace all instances of "," with a tab character > (I achieve tab in my shell scripts with a line like TAB=$(printf "\t")) > > The second sed will remove the start and end " because the first only checks > for ",". > > The third sed will remove the Windows carriage returns (if any - depends on > the platform the file was generated from. > > Note here that my file format never contains any " in a field so I can safely > run the second sed. If your import file contains these then you will need to > change the regex or use awk. > > Regards, > > Jason > > On Thu, 14 Aug 2003 07:14 am, expect wrote: > > What's the big deal with importing text files? I have a 70 MB file to > > import and it's been one problem after another. I used the copy command > > and it appears that it's just not possible. I finally massaged the file > > into a .sql file and ran that using \i db.sql but that failed too because I > > overlooked ' in names like D'Adario. The other problem I encountered was > > that a numeric field had to have data in it, pg would not default to the > > default value. So instead of massaging all the data again I decided to > > change the data type for that column. This is my first experience with > > postgresql and I'm wondering if I should expect to encounter similar pain > > as I go further into this? So far it's been very painful trying to do what > > I thought would be easy and what I think should be easy. > > > > PostgreSQL 7.3.4 on linux redhat 9 > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 7: don't forget to increase your free space map settings > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > >
On Thu, 14 Aug 2003 02:17 pm, expect wrote: > On Wed, 13 Aug 2003 14:59:29 -0700 > > "Gregory S. Williamson" <gsw@globexplorer.com> wrote: > > have to fix missing numeric values (if the column allows nulls why can't > > the copy command accept an empty field, I wonder?), missing dates. In > > other cases > > This seems to be the question I need answered as well. Does anyone know > the answer? Copy can accept an empty field (representing null): copy datatable from stdin delimiter '\t' null ''; by default pg expects '\N' to mean null but you can override it: Description: copy data between files and tables Syntax: COPY table [ ( column [, ...] ) ] FROM { 'filename' | stdin } [ [ WITH ] [ BINARY ] [ OIDS ] [ DELIMITER [ AS ] 'delimiter' ] [ NULL [ AS ] 'null string' ] ] COPY table [ ( column [, ...] ) ] TO { 'filename' | stdout } [ [ WITH ] [ BINARY ] [ OIDS ] [ DELIMITER [ AS ] 'delimiter' ] [ NULL [ AS ] 'null string' ] ] Rgds, Jason
Two issues raised (other than my atrocious coding) a) if defaults for a column are defined and copy is given a value of <> (implicit NULL) why not use the default ? [doesan explicit \n work -- didn't try, I have to confess] b) more generally, if copy finds a column that allows null and the data is a null (not explicitly defined as such, justno data) isn't that value valid, e.g. a NULL value ? I kludged a simple way to make my data load, without really understanding why -- other than how "C" converts strings to integers("atoi") -- a simple NULL can't be a NULL entry or at least a '0' if that is the default. Not meant in ANY way as criticism, really, just trying to get a handle on this cool tool, and how it differs from ones Ido know. Apologies for any double postings I may have caused. Greg W. -----Original Message----- From: Martijn van Oosterhout [mailto:kleptog@svana.org] Sent: Wed 8/13/2003 9:24 PM To: Gregory S. Williamson Cc: expect; pgsql-general@postgresql.org Subject: Re: [GENERAL] importing db as text files
On Thu, 14 Aug 2003 01:52 pm, you wrote: > On Thu, 14 Aug 2003 07:34:55 +1000 > > Jason Godden <jasongodden@optushome.com.au> wrote: > > Hi expect, > > > > Best way in my opinion is to use the copy table command. This way Pg > > will actually 'massage' the data (string escapes and all) for you. > > I guess we're of the same opinion. I did use the copy table command. > I believe the problem is that pg is unable to use the default value when a > value is not present. PG is behaving correctly IMO: create table data (data1 int4 not null, data2 int4,data3 int4 not null default 10); insert into data values (2,null,default) - OK insert into data values (null,2,default) - Fail not null data1 insert into data values (2,null) - Fail? missing field? insert into data (data1,data2) values (2,null) - OK data3 = default with explicit field nomination ^ copy from... (essentially becomes - although it does something a bit different behind the scenes): insert into data (data1,data2,data3) values (x,y,z) if data3 is specified not null default 10 and you have a line in your import file which is translated thus: 2 2 \N - default null but you can nominate what that is insert into data (data1,data2,data3) values (2,2,null); this will fail because you are explicitly saying put null in a not null field. So use an intermediatory table without not null constraints with copy from... then use a query: insert into realtable (data1,data2,data3) select data1, case when data2 is null then default else data2 end, data3 from data where data1 is not null; <-because data1 has no option to be null or a default value etc... Problem solved... I'd be curious as to how many ppl actually import their data STRAIGHT into their production tables without integrity checking. Ofcourse if you massage and manage it externally such as the method Greg uses then you're in business too - either way I believe the pg copy syntax is correct and makes sense. PG Copy CANT make a guess that you intend null or to skip that field so the default pops in there - null and default are two very different things. in fact null is oo (infinity?) different things... Unless someone changes the insert behaviour in a later release then you will have to come up with a 'massaged' way (Greg/Perl, intermediatory tables and pl/pgsql functions). But then think about this: insert into data values (1,2,null) which by the proposed new null behaviour suddenly becomes: insert into data values (1,2,default) (say default = 10); but in reality the user simply mucked up on data entry, didn't actually mean default and meant to insert 70 instead - they don't get a warning about it and your data integrity is screwed. Not only that the db isn't paying strict attention to the intended SQL syntax and the constraint management is moved to the client - PG (and any other decent database - ie Oracle) is far above the MySQL cram-it-in-even-if-it-breaks-code-all-your-business-rules-in-your-client way of doing things! Default should be used in an explicit sense IMHO: insert into data (data1,data2) values (1,2) - now data3 becomes default and all is good VERY different to: insert into data values (1,2,null);
I have a query which definitely runs faster when sequential scans are turned off. Which parameters do I tune? and which way? After quickly perusing the docs, and google, I think it is the random_page_cost? Any help would be appreciated. Dave -- Dave Cramer <dave@fastcrypt.com> fastcrypt -- Dave Cramer <Dave@micro-automation.net>
On Wed, Aug 13, 2003 at 20:45:55 -0700, expect <expect@ihubbell.com> wrote: > On Wed, 13 Aug 2003 16:30:04 -0500 > Bruno Wolff III <bruno@wolff.to> wrote: > > > On Wed, Aug 13, 2003 at 14:14:20 -0700, > > expect <expect@ihubbell.com> wrote: > > > > > > What's the big deal with importing text files? I have a 70 MB file to import > > > and it's been one problem after another. I used the copy command and it appears > > > that it's just not possible. I finally massaged the file into a .sql file and > > > > That is unlikely, but without more details it is hard to say what you need > > to do. > > > What's unlikely? That it didn't work? But it didn't work. That it isn't possible to load your data using the copy command. You may need to do some transformation before sending it to copy. Using copy is probably going to give you the fastest load time. > From what I've read it should use the default value when there is no value. > Is that not true? Is this a known issue? Where did you read that? What do you mean by 'no value'? The syntax for the insert command is given in the manual: INSERT INTO table [ ( column [, ...] ) ] { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) | query } > > No you've come to the wrong impression. I believe that the problem lies in the > fact that pg will not default to the default value when no value is present. And how do you think 'no value' is represented? If you use default to respresent 'no value' then things will work as you expect for insert statements. When you are using copy, defaults can't be used on a row by row basis (but unlisted columns will get default values) and you will need to inlcude the value that you want in each row.
On Thu, Aug 14, 2003 at 14:50:41 +1000, Jason Godden <jasongodden@optushome.com.au> wrote: > > Copy can accept an empty field (representing null): That assumes that an empty field is what he means by 'no value'. For some data types an empty string is a perfectly valid data type. > copy datatable from stdin delimiter '\t' null ''; > > by default pg expects '\N' to mean null but you can override it: He wants the default value, not null though. Copy doesn't have a way to specify a string to be replaced by the default value. It probably wouldn't be too hard to add this option, but it is too late for 7.4. If the default value is a constant preprocessing should be an easy way to handle the issue. Another option may be to change null's to the default after importing the data (assuming there aren't actual nulls in the data). If he is trying to use nextval for all rows on one column, then not entering that column would be the way to go. We haven't seen any example of the data he is trying to import so it is hard to give him specific advice.
Please don't reply to messages to start a new thread. On Thu, Aug 14, 2003 at 08:09:23 -0400, Dave Cramer <Dave@micro-automation.net> wrote: > I have a query which definitely runs faster when sequential scans are > turned off. Which parameters do I tune? and which way? > > After quickly perusing the docs, and google, I think it is the > random_page_cost? Have you run vacuum analyze? Can you supply a copy of the query and explain analyze output?
On Thu, Aug 14, 2003 at 00:50:27 -0700, "Gregory S. Williamson" <gsw@globexplorer.com> wrote: > > Two issues raised (other than my atrocious coding) > > a) if defaults for a column are defined and copy is given a value of <> (implicit NULL) why not use the default ? [doesan explicit \n work -- didn't try, I have to confess] Because empty strings are valid data and you can't go assuming they are null, since you would then need a different string to represent the empty string. > b) more generally, if copy finds a column that allows null and the data is a null (not explicitly defined as such, justno data) isn't that value valid, e.g. a NULL value ? It might be bad data. The copy statement provides a way to define a string that represents the null string. It would probably be reasonable to add a way to specify a string to be replaced by the default value.
On Thu, 14 Aug 2003 08:45:20 -0500 Bruno Wolff III <bruno@wolff.to> wrote: > On Wed, Aug 13, 2003 at 20:45:55 -0700, > expect <expect@ihubbell.com> wrote: > > On Wed, 13 Aug 2003 16:30:04 -0500 > > Bruno Wolff III <bruno@wolff.to> wrote: > > > > > On Wed, Aug 13, 2003 at 14:14:20 -0700, > > > expect <expect@ihubbell.com> wrote: > > > > > > > > What's the big deal with importing text files? I have a 70 MB file to import > > > > and it's been one problem after another. I used the copy command and it appears > > > > that it's just not possible. I finally massaged the file into a .sql file and > > > > > > That is unlikely, but without more details it is hard to say what you need > > > to do. > > > > > > What's unlikely? That it didn't work? But it didn't work. > > That it isn't possible to load your data using the copy command. > You may need to do some transformation before sending it to copy. > Using copy is probably going to give you the fastest load time. > > > From what I've read it should use the default value when there is no value. > > Is that not true? Is this a known issue? > > Where did you read that? What do you mean by 'no value'? The data is really very simple. Here's a pseudo-example of 2 rows of data: 191922C,Bob Cobb,D'Obbalina Sr.,312 Elm Street,Yountville,CA,94599,5,062001,082009 339111C,Elma Thelma,Velma,98 Oak Lane,St. Louis,MO,63119-2065,,, > > The syntax for the insert command is given in the manual: > INSERT INTO table [ ( column [, ...] ) ] > { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) | query } > > > > > No you've come to the wrong impression. I believe that the problem lies in the > > fact that pg will not default to the default value when no value is present. > > And how do you think 'no value' is represented? If you use default to > respresent 'no value' then things will work as you expect for insert > statements. When you are using copy, defaults can't be used on a row > by row basis (but unlisted columns will get default values) and you will > need to inlcude the value that you want in each row. > >
On Thu, Aug 14, 2003 at 09:08:16 -0700, expect <expect@ihubbell.com> wrote: > > 191922C,Bob Cobb,D'Obbalina Sr.,312 Elm Street,Yountville,CA,94599,5,062001,082009 > 339111C,Elma Thelma,Velma,98 Oak Lane,St. Louis,MO,63119-2065,,, That should be handled by copy pretty easily. If the columns after what appears to be the zip code are some type of number rather than a string, then the empty string is going to be an invalid value. If you want the empty string to be interpreted as the default and you don't have any NULL values in your input, then the simplest thing to do is set the code for NULLs to be the empty string. You can then do queries after the data is in the database to change the NULLs to the appropiate default. If you are running the 7.4 beta you can use the DEFAULT key in the update, otherwise you will need to duplicate the default expression in the update commands. You will want to run update once for each column that can have NULL values using IS NULL in the WHERE clause. If these columns have a NOT NULL constraint, you may want to use a temporary table to load the data and then copy it over (with a single insert statement) after it has been cleaned up. Note that it isn't obvious what empty strings should map to for numbers. NULL and 0 make about as much sense as using the default value.
On Wednesday, August 13, 2003 17:59, Gregory S. Williamson [mailto:gsw@globexplorer.com] wrote: > >I tend to use perl to preprocess dumps (in our case from >Informix). It tends to work much faster than shell scripts >(although your mileage may vary). I have to fix missing >numeric values (if the column allows nulls why can't the copy >command accept an empty field, I wonder?) Have you tried "copy <tbl> from stdin with null as '';" or am I missing something? See http://techdocs.postgresql.org/techdocs/usingcopy.php or the postgresql documentation for more info. Cheers, Murthy
I have a query which definitely runs faster when sequential scans are turned off. Which parameters do I tune? and which way? After quickly perusing the docs, and google, I think it is the random_page_cost? Any help would be appreciated. Dave -- Dave Cramer <dave@fastcrypt.com> fastcrypt
On Thu, 14 Aug 2003 12:46:07 -0500 Bruno Wolff III <bruno@wolff.to> wrote: > On Thu, Aug 14, 2003 at 09:08:16 -0700, > expect <expect@ihubbell.com> wrote: > > > > 191922C,Bob Cobb,D'Obbalina Sr.,312 Elm Street,Yountville,CA,94599,5,062001,082009 > > 339111C,Elma Thelma,Velma,98 Oak Lane,St. Louis,MO,63119-2065,,, > > That should be handled by copy pretty easily. If the columns after what > appears to be the zip code are some type of number rather than a string, > then the empty string is going to be an invalid value. > > If you want the empty string to be interpreted as the default and you don't > have any NULL values in your input, then the simplest thing to do is set > the code for NULLs to be the empty string. You can then do queries after I'm new to this so everything has to be explicit, I don't want to make any leaps with what anyone is saying here. copy db from '/usr/tmp/db' with delimiter ',' null ''; Is this the command that you're recommending? > the data is in the database to change the NULLs to the appropiate default. > If you are running the 7.4 beta you can use the DEFAULT key in the update, > otherwise you will need to duplicate the default expression in the update > commands. You will want to run update once for each column that can have > NULL values using IS NULL in the WHERE clause. > If these columns have a NOT NULL constraint, you may want to use a temporary > table to load the data and then copy it over (with a single insert statement) > after it has been cleaned up. > > Note that it isn't obvious what empty strings should map to for numbers. > NULL and 0 make about as much sense as using the default value. Well I'm new here but it seems to me they should map to the default value for that column. Why wouldn't they? > >
On Thu, 14 Aug 2003, expect wrote: > On Thu, 14 Aug 2003 12:46:07 -0500 > Bruno Wolff III <bruno@wolff.to> wrote: > > Note that it isn't obvious what empty strings should map to for numbers. > > NULL and 0 make about as much sense as using the default value. > > Well I'm new here but it seems to me they should map to the default value > for that column. Why wouldn't they? One problem with doing that is that it's inconsistent. Given create table test( a text default 'abc', b int default 5 ); copy test from stdin with delimiter ','; , \. What would you expect the values of the row in test to be?
On Fri, 15 Aug 2003 00:32:01 -0700 (PDT) Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > > On Thu, 14 Aug 2003, expect wrote: > > > On Thu, 14 Aug 2003 12:46:07 -0500 > > Bruno Wolff III <bruno@wolff.to> wrote: > > > Note that it isn't obvious what empty strings should map to for numbers. > > > NULL and 0 make about as much sense as using the default value. > > > > Well I'm new here but it seems to me they should map to the default value > > for that column. Why wouldn't they? > > One problem with doing that is that it's inconsistent. Please elaborate. How is it inconsistent, exactly? > > Given > create table test( > a text default 'abc', > b int default 5 > ); > > copy test from stdin with delimiter ','; > , > \. > > What would you expect the values of the row in test > to be? Oh a test.... Does the \. end the STDIN input? Where's the null option? Don't you mean: copy test from stdin with delimiter ',' null ''; In this case I would expect the row to have: a | b ---------- | abc | 5 Is this too much to expect? > > >
Whilst I agree with what Bruno said regarding adding a default option to the copy syntax I think that the basic principles Stephan and I outlined in how copy treats your import is correct and the developers did the right thing. Now if the devs, you or I want to add a default option to copy in the future thats all good but for the moment you will have to use some pre or post process to get the absolute effect you require. It's not that hard.. literally one line of piped commands on the command line where you specify the defaults or a post process that does the final import into your production tables from your data import table. Remember NULL != Default. These are VERY different concepts and I believe that enough examples of how this behaviour can lead to confusion have been shown. With copy it is also important to remember that is not part of the SQL standard and PG doesn't use SQL statements to do the import but rather a low-level C operation. If you need the behaviour you're referring to sed/awk the data up and generate BEGIN... INSERT... COMMIT... statements and pipe that to PG. It's not that difficult and I'll give you some suggestions with it if you want. I'm not trying to be difficult with my view of this but there are always other ways (sometimes one liners) that can achieve the behaviour you're after. Rgds, Jason On Fri, 15 Aug 2003 05:53 pm, you wrote: > On Fri, 15 Aug 2003 00:32:01 -0700 (PDT) > > Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > > On Thu, 14 Aug 2003, expect wrote: > > > On Thu, 14 Aug 2003 12:46:07 -0500 > > > > > > Bruno Wolff III <bruno@wolff.to> wrote: > > > > Note that it isn't obvious what empty strings should map to for > > > > numbers. NULL and 0 make about as much sense as using the default > > > > value. > > > > > > Well I'm new here but it seems to me they should map to the default > > > value for that column. Why wouldn't they? > > > > One problem with doing that is that it's inconsistent. > > Please elaborate. How is it inconsistent, exactly? > > > Given > > create table test( > > a text default 'abc', > > b int default 5 > > ); > > > > copy test from stdin with delimiter ','; > > , > > \. > > > > What would you expect the values of the row in test > > to be? > > Oh a test.... > > Does the \. end the STDIN input? > > Where's the null option? Don't you mean: > > copy test from stdin with delimiter ',' null ''; > > > In this case I would expect the row to have: > > a | b > ---------- > > abc | 5 > > > Is this too much to expect? > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
On Fri, 15 Aug 2003, expect wrote: > On Fri, 15 Aug 2003 00:32:01 -0700 (PDT) > Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > > > > > On Thu, 14 Aug 2003, expect wrote: > > > > > On Thu, 14 Aug 2003 12:46:07 -0500 > > > Bruno Wolff III <bruno@wolff.to> wrote: > > > > Note that it isn't obvious what empty strings should map to for numbers. > > > > NULL and 0 make about as much sense as using the default value. > > > > > > Well I'm new here but it seems to me they should map to the default value > > > for that column. Why wouldn't they? > > > > One problem with doing that is that it's inconsistent. > > Please elaborate. How is it inconsistent, exactly? See my comments below. > > > > Given > > create table test( > > a text default 'abc', > > b int default 5 > > ); > > > > copy test from stdin with delimiter ','; > > , > > \. > > > > What would you expect the values of the row in test > > to be? > > > Oh a test.... > > Does the \. end the STDIN input? > > Where's the null option? Don't you mean: > > copy test from stdin with delimiter ',' null ''; No, I'm saying without a null specifier. > In this case I would expect the row to have: > > a | b > ---------- > | > abc | 5 > > > Is this too much to expect? Without a null specifier of '', empty string is a valid value for a. Why would it get the default, and how would you insert an empty string? Should it treat b differently because '' isn't valid for that type, that'd be inconsistent. With a null specifier of '', empty string is valid for both and means NULL. Otherwise, how would you specify a null when you have that null specifier? What you probably really want is another specifier that inserts the default for a column, so you could say something like: copy test from stdin with delimiter ',' default '';
AFAICT, NULL == NULL '' == empty string default == column default Seems pretty clear, and about as intuitive as it gets. Trying to use an empty string for anything else just means it's notavailble to represent itself. any other DB using anything else is complicating things Now, for exports/dumps that don't use the ' char to delineate the start and begin of a column, i.e. 454wetoit,Four score and seven years ago,default,, what would the last three values be? Empty strings would be my guess. '454wetoit','Four score and seven years ago','default','','' seems like quoted column values would be necessary to distinguish the constants NULL and default from strings that containthose two text sequences. Stephan Szabo wrote: > On Fri, 15 Aug 2003, expect wrote: > > >>On Fri, 15 Aug 2003 00:32:01 -0700 (PDT) >>Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: >> >> >>>On Thu, 14 Aug 2003, expect wrote: >>> >>> >>>>On Thu, 14 Aug 2003 12:46:07 -0500 >>>>Bruno Wolff III <bruno@wolff.to> wrote: >>>> >>>>>Note that it isn't obvious what empty strings should map to for numbers. >>>>>NULL and 0 make about as much sense as using the default value. >>>> >>>>Well I'm new here but it seems to me they should map to the default value >>>>for that column. Why wouldn't they? >>> >>>One problem with doing that is that it's inconsistent. >> >>Please elaborate. How is it inconsistent, exactly? > > > See my comments below. > > >>>Given >>>create table test( >>> a text default 'abc', >>> b int default 5 >>>); >>> >>>copy test from stdin with delimiter ','; >>>, >>>\. >>> >>>What would you expect the values of the row in test >>>to be? >> >> >>Oh a test.... >> >>Does the \. end the STDIN input? >> >>Where's the null option? Don't you mean: >> >>copy test from stdin with delimiter ',' null ''; > > > No, I'm saying without a null specifier. > > >>In this case I would expect the row to have: >> >> a | b >>---------- >> | >> abc | 5 >> >> >>Is this too much to expect? > > > Without a null specifier of '', empty string is a valid value for a. > Why would it get the default, and how would you insert an empty string? > Should it treat b differently because '' isn't valid for that type, that'd > be inconsistent. > > With a null specifier of '', empty string is valid for both and means > NULL. Otherwise, how would you specify a null when you have that > null specifier? > > What you probably really want is another specifier that inserts the > default for a column, so you could say something like: > copy test from stdin with delimiter ',' default ''; > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
On Fri, 15 Aug 2003 11:32:20 -0400 Murthy Kambhampaty <murthy.kambhampaty@goeci.com> wrote: > On Friday, August 15, 2003 03:53, expect [mailto:expect@ihubbell.com] wrote: > >On Fri, 15 Aug 2003 00:32:01 -0700 (PDT) > >Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > > > ... > >> Given > >> create table test( > >> a text default 'abc', > >> b int default 5 > >> ); > >> > >> copy test from stdin with delimiter ','; > >> , > >> \. > >> > >> What would you expect the values of the row in test > >> to be? > > > > > ... > >In this case I would expect the row to have: > > > > a | b > >---------- > > | > > abc | 5 > > > > > >Is this too much to expect? > > Your expectations don't seem consistent with the table definition, which > permits Null values are allowed in (it's ironic you would add "null ''" to > the copy statement, which says you want to treat empty strings as default > values, not null!). > Why is it ironic? I don't think you've followed the thread correctly. It's what I wanted to have happen.
>-----Original Message----- >From: expect [mailto:expect@ihubbell.com] >Sent: Friday, August 15, 2003 11:56 >To: pgsql-general@postgresql.org >Subject: Re: [GENERAL] importing db as text files > > >On Fri, 15 Aug 2003 11:32:20 -0400 >Murthy Kambhampaty <murthy.kambhampaty@goeci.com> wrote: > >> On Friday, August 15, 2003 03:53, expect >[mailto:expect@ihubbell.com] wrote: >> >On Fri, 15 Aug 2003 00:32:01 -0700 (PDT) >> >Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: >> > >> ... >> >> Given >> >> create table test( >> >> a text default 'abc', >> >> b int default 5 >> >> ); >> >> >> >> copy test from stdin with delimiter ','; >> >> , >> >> \. >> >> >> >> What would you expect the values of the row in test >> >> to be? >> > >> > >> ... >> >In this case I would expect the row to have: >> > >> > a | b >> >---------- >> > | >> > abc | 5 >> > >> > >> >Is this too much to expect? >> >> Your expectations don't seem consistent with the table >definition, which >> permits Null values are allowed in (it's ironic you would >add "null ''" to >> the copy statement, which says you want to treat empty >strings as default >> values, not null!). >> > > >Why is it ironic? I don't think you've followed the thread correctly. >It's what I wanted to have happen. > Pardon my dyslexic typing; that was to read: "it's ironic you would add "null ''" to the copy statement, which says you want to treat empty strings as nulls, not default values!"
On Friday, August 15, 2003 03:53, expect [mailto:expect@ihubbell.com] wrote: >On Fri, 15 Aug 2003 00:32:01 -0700 (PDT) >Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > ... >> Given >> create table test( >> a text default 'abc', >> b int default 5 >> ); >> >> copy test from stdin with delimiter ','; >> , >> \. >> >> What would you expect the values of the row in test >> to be? > > ... >In this case I would expect the row to have: > > a | b >---------- > | > abc | 5 > > >Is this too much to expect? Your expectations don't seem consistent with the table definition, which permits Null values are allowed in (it's ironic you would add "null ''" to the copy statement, which says you want to treat empty strings as default values, not null!). It can be frustrating to deal with postgresql, but I've come to appreciate the requirement that you clean your data before you COPY it. Cheers, Murthy PS: Sorry "expect", I meant to send to the list, not directly to you. (I guess I'd better just hit "reply to all", and not edit the "To:" line.)
On Fri, 15 Aug 2003 08:03:04 -0700 (PDT) Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > On Fri, 15 Aug 2003, expect wrote: > > > On Fri, 15 Aug 2003 00:32:01 -0700 (PDT) > > Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > > > > > > > > On Thu, 14 Aug 2003, expect wrote: > > > > > > > On Thu, 14 Aug 2003 12:46:07 -0500 > > > > Bruno Wolff III <bruno@wolff.to> wrote: > > > > > Note that it isn't obvious what empty strings should map to for numbers. > > > > > NULL and 0 make about as much sense as using the default value. > > > > > > > > Well I'm new here but it seems to me they should map to the default value > > > > for that column. Why wouldn't they? > > > > > > One problem with doing that is that it's inconsistent. > > > > Please elaborate. How is it inconsistent, exactly? > > See my comments below. > > > > > > > Given > > > create table test( > > > a text default 'abc', > > > b int default 5 > > > ); > > > > > > copy test from stdin with delimiter ','; > > > , > > > \. > > > > > > What would you expect the values of the row in test > > > to be? > > > > > > Oh a test.... > > > > Does the \. end the STDIN input? > > > > Where's the null option? Don't you mean: > > > > copy test from stdin with delimiter ',' null ''; > > No, I'm saying without a null specifier. > > > In this case I would expect the row to have: > > > > a | b > > ---------- > > | > > abc | 5 > > > > > > Is this too much to expect? > > Without a null specifier of '', empty string is a valid value for a. > Why would it get the default, and how would you insert an empty string? > Should it treat b differently because '' isn't valid for that type, that'd > be inconsistent. > > With a null specifier of '', empty string is valid for both and means > NULL. Otherwise, how would you specify a null when you have that > null specifier? Don't want to specify a null. > > What you probably really want is another specifier that inserts the > default for a column, so you could say something like: > copy test from stdin with delimiter ',' default ''; No, I don't want anyone else to do anymore work. I'd much rather see improvements than new features, unless the new feature is a big improvement. > > > >
I usually reduce cpu_index_tuple_cost in postgresql.conf, from 0.001 to 0.000001, this way the optimizer is "more willing" to use indexes.
Of course you should try different values until you find the ones that adjust to your system configuration.
On Thu, 2003-08-14 at 09:00, Dave Cramer wrote:
Of course you should try different values until you find the ones that adjust to your system configuration.
On Thu, 2003-08-14 at 09:00, Dave Cramer wrote:
I have a query which definitely runs faster when sequential scans are turned off. Which parameters do I tune? and which way? After quickly perusing the docs, and google, I think it is the random_page_cost? Any help would be appreciated. Dave
Attachment
On Fri, 15 Aug 2003, expect wrote: > On Fri, 15 Aug 2003 08:03:04 -0700 (PDT) > Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > > > On Fri, 15 Aug 2003, expect wrote: > > > Oh a test.... > > > > > > Does the \. end the STDIN input? > > > > > > Where's the null option? Don't you mean: > > > > > > copy test from stdin with delimiter ',' null ''; > > > > No, I'm saying without a null specifier. > > > > > In this case I would expect the row to have: > > > > > > a | b > > > ---------- > > > | > > > abc | 5 > > > > > > > > > Is this too much to expect? > > > > Without a null specifier of '', empty string is a valid value for a. > > Why would it get the default, and how would you insert an empty string? > > Should it treat b differently because '' isn't valid for that type, that'd > > be inconsistent. > > > > With a null specifier of '', empty string is valid for both and means > > NULL. Otherwise, how would you specify a null when you have that > > null specifier? > > Don't want to specify a null. That doesn't answer the question. If you say that the NULL specifier is some string and then that string were to give you something other than a NULL, that wouldn't be reasonable behavior for people who do want a NULL.
SET enable_seqscan TO FALSE; <your query> SET enable_seqscan TO TRUE; Is this what you where looking for? Regards, Fernando On Thu, 2003-08-14 at 08:00, Dave Cramer wrote: > I have a query which definitely runs faster when sequential scans are > turned off. Which parameters do I tune? and which way? > > After quickly perusing the docs, and google, I think it is the > random_page_cost? > > Any help would be appreciated. > > Dave
I inderstand why NULL is not the same as a default value. I do not understand why an integer column: some_val INT, will not accept a null value (not that there is no default). NULL is NULL ... why do I have to massage load data to substitute a '0' when what I really want is NULL (no data). Just one of those things I'll never get, I guess. Thanks for the information ... Greg W. -----Original Message----- From: Jason Godden [mailto:jasongodden@optushome.com.au] Sent: Fri 8/15/2003 3:11 AM To: pgsql-general@postgresql.org Cc: Subject: Re: [GENERAL] importing db as text files Whilst I agree with what Bruno said regarding adding a default option to the copy syntax I think that the basic principles Stephan and I outlined in how copy treats your import is correct and the developers did the right thing. Now if the devs, you or I want to add a default option to copy in the future thats all good but for the moment you will have to use some pre or post process to get the absolute effect you require. It's not that hard.. literally one line of piped commands on the command line where you specify the defaults or a post process that does the final import into your production tables from your data import table. Remember NULL != Default. These are VERY different concepts and I believe that enough examples of how this behaviour can lead to confusion have been shown. With copy it is also important to remember that is not part of the SQL standard and PG doesn't use SQL statements to do the import but rather a low-level C operation. If you need the behaviour you're referring to sed/awk the data up and generate BEGIN... INSERT... COMMIT... statements and pipe that to PG. It's not that difficult and I'll give you some suggestions with it if you want. I'm not trying to be difficult with my view of this but there are always other ways (sometimes one liners) that can achieve the behaviour you're after. Rgds, Jason On Fri, 15 Aug 2003 05:53 pm, you wrote: > On Fri, 15 Aug 2003 00:32:01 -0700 (PDT) > > Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > > On Thu, 14 Aug 2003, expect wrote: > > > On Thu, 14 Aug 2003 12:46:07 -0500 > > > > > > Bruno Wolff III <bruno@wolff.to> wrote: > > > > Note that it isn't obvious what empty strings should map to for > > > > numbers. NULL and 0 make about as much sense as using the default > > > > value. > > > > > > Well I'm new here but it seems to me they should map to the default > > > value for that column. Why wouldn't they? > > > > One problem with doing that is that it's inconsistent. > > Please elaborate. How is it inconsistent, exactly? > > > Given > > create table test( > > a text default 'abc', > > b int default 5 > > ); > > > > copy test from stdin with delimiter ','; > > , > > \. > > > > What would you expect the values of the row in test > > to be? > > Oh a test.... > > Does the \. end the STDIN input? > > Where's the null option? Don't you mean: > > copy test from stdin with delimiter ',' null ''; > > > In this case I would expect the row to have: > > a | b > ---------- > > abc | 5 > > > Is this too much to expect? > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On 14 Aug 2003, Dave Cramer wrote: > I have a query which definitely runs faster when sequential scans are > turned off. Which parameters do I tune? and which way? > > After quickly perusing the docs, and google, I think it is the > random_page_cost? Set your effective_cache_size correctly first, then lower random page cost until you hit the sweet spot. As someone else pointed out, you can also lower the cpu_***_cost parameters, and often this is preferable. On boxes with fast memory / CPUs, it's often good to get the planner in the state of mind of using the CPU a little more, so it will pick a faster join method than it otherwise would.