Thread: importing db as text files

importing db as text files

From
expect
Date:
What's the big deal with importing text files?  I have a 70 MB file to import
and it's been one problem after another.  I used the copy command and it appears
that it's just not possible. I finally massaged the file into a .sql file and
ran that using \i db.sql but that failed too because I overlooked ' in names
like D'Adario.  The other problem I encountered was that a numeric field had
to have data in it, pg would not default to the default value.  So instead of
massaging all the data again I decided to change the data type for that column.
This is my first experience with postgresql and I'm wondering if I should expect
to encounter similar pain as I go further into this?  So far it's been very
painful trying to do what I thought would be easy and what I think should be
easy.

PostgreSQL 7.3.4 on linux redhat 9


Re: importing db as text files

From
Bruno Wolff III
Date:
On Wed, Aug 13, 2003 at 14:14:20 -0700,
  expect <expect@ihubbell.com> wrote:
>
> What's the big deal with importing text files?  I have a 70 MB file to import
> and it's been one problem after another.  I used the copy command and it appears
> that it's just not possible. I finally massaged the file into a .sql file and

That is unlikely, but without more details it is hard to say what you need
to do.

> ran that using \i db.sql but that failed too because I overlooked ' in names
> like D'Adario.  The other problem I encountered was that a numeric field had
> to have data in it, pg would not default to the default value.  So instead of

You can use the keyword default in insert statements to get a default value.

> massaging all the data again I decided to change the data type for that column.
> This is my first experience with postgresql and I'm wondering if I should expect
> to encounter similar pain as I go further into this?  So far it's been very
> painful trying to do what I thought would be easy and what I think should be
> easy.

The impression I get is that you expect postgres to make a best guess when
presented with ambiguous data. That is a very dangerous thing to do. I would
much prefer ambiguous data be rejected so that I can make sure what I think
the value is, is the same as what the database thinks the value is.

Re: importing db as text files

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: expect [mailto:expect@ihubbell.com]
> Sent: Wednesday, August 13, 2003 2:14 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] importing db as text files
>
> What's the big deal with importing text files?  I have a 70
> MB file to import and it's been one problem after another.  I
> used the copy command and it appears that it's just not
> possible. I finally massaged the file into a .sql file and
> ran that using \i db.sql but that failed too because I
> overlooked ' in names like D'Adario.  The other problem I
> encountered was that a numeric field had to have data in it,
> pg would not default to the default value.  So instead of
> massaging all the data again I decided to change the data
> type for that column. This is my first experience with
> postgresql and I'm wondering if I should expect to encounter
> similar pain as I go further into this?  So far it's been
> very painful trying to do what I thought would be easy and
> what I think should be easy.

I use the copy command all the time without problems.  If the data
follows the format you describe for copy, there will be no problems.  If
the data is some other format, then you will have to use perl or sed or
something to reformat it.

If your data is in some format that is not easy to massage into
something that copy wants to eat, then use an ODBC driver for text files
and move all the data with insert/select.

Expect no miracles. GI/GO.

Re: importing db as text files

From
Jason Godden
Date:
Hi expect,

Best way in my opinion is to use the copy table command.  This way Pg will
actually 'massage' the data (string escapes and all) for you.

If you have complex files best to use a combo of sed/awk to pre-massage the
field delimiters and import the data to a temporary table.  In most instances
it is best to pipe the data to the psql command using copy table from stdin
as from file requires that you are the process owner.

Always use a temporary import table and perform validation/further conversion
in that table (IMO).

eg (from one of my earlier posts)

cat unprocesseddata.txt | sed "s/\",\"/$TAB/g" | sed "s/\"//g" | sed "s/.$//g"
| psql -dmydatabase -c "copy importtable from stdin delimiter '\t';"

The first sed will replace all instances of "," with a tab character
(I achieve tab in my shell scripts with a line like TAB=$(printf "\t"))

The second sed will remove the start and end " because the first only checks
for ",".

The third sed will remove the Windows carriage returns (if any - depends on
the platform the file was generated from.

Note here that my file format never contains any " in a field so I can safely
run the second sed.  If your import file contains these then you will need to
change the regex or use awk.

Regards,

Jason

On Thu, 14 Aug 2003 07:14 am, expect wrote:
> What's the big deal with importing text files?  I have a 70 MB file to
> import and it's been one problem after another.  I used the copy command
> and it appears that it's just not possible. I finally massaged the file
> into a .sql file and ran that using \i db.sql but that failed too because I
> overlooked ' in names like D'Adario.  The other problem I encountered was
> that a numeric field had to have data in it, pg would not default to the
> default value.  So instead of massaging all the data again I decided to
> change the data type for that column. This is my first experience with
> postgresql and I'm wondering if I should expect to encounter similar pain
> as I go further into this?  So far it's been very painful trying to do what
> I thought would be easy and what I think should be easy.
>
> PostgreSQL 7.3.4 on linux redhat 9
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings


Re: importing db as text files

From
"Gregory S. Williamson"
Date:
I tend to use perl to preprocess dumps (in our case from Informix). It tends to work much faster than shell scripts
(althoughyour mileage may vary). I have to fix missing numeric values (if the column allows nulls why can't the copy
commandaccept an empty field, I wonder?), missing dates. In other cases is massages date formats, spatial data, etc.
Forexample, a crude program below to clean DOQQ metadata: 

Greg W.
==================================
firenze% more infxunl2psql
#!/usr/dist/bin/perl -w

$FILE = $ARGV[0];
$OUTPUT = $ARGV[1];
$MODE = $ARGV[2];

open (INFILE,"$FILE");
open (OUTFILE,">$OUTPUT");

foreach $line (<INFILE>)
{
chop($line);
chop($line);
if (($MODE cmp "DOQ") == 0) {
        ($t_source_filename, $t_quadrangle_name, $t_west_longitude, $t_east_longitude, $t_north_latitude,
$t_south_latitude,$t_production_date, $t_raster_order, $t_band_organization, $t_band_content, $t_bits_per_pixel,
$t_samples_and_lines,$t_horizontal_datum, $t_horizontal_coordinate_system, $t_coordinate_zone, $t_horizontal_units,
$t_horizontal_resolution,$t_secondary_horizontal_datum, $t_xy_origin, $t_secondary_xy_origin, $t_nw_quad_corner_xy,
$t_ne_quad_corner_xy,$t_se_quad_corner_xy, $t_sw_quad_corner_xy, $t_secondary_nw_quad_xy, $t_secondary_ne_quad_xy,
$tsecondary_se_quad_xy,$t_secondary_sw_quad_xy, $t_rmse_xy, $t_image_source, $t_source_dem_date, $t_agency,
$t_producer,$t_production_system, $t_standard_version, $t_metadata_date, $t_data_file_size, $byte_count) =
split(/\|/,$line);
        if (length($t_production_date) == 0) {
                $t_production_date = "\\N";    # psql seems to dump a blank data with this nomenclature
                }
        if (length($t_coordinate_zone) == 0) {  # an integer
                $t_coordinate_zone = 0;
                }
        if (length($t_band_content) == 0) {
                $t_band_content = 0;    # also an int
                }
        if (length($t_bits_per_pixel) == 0) {
                $t_bits_per_pixel = 0;  # reasonable default for an int ?
                }
        if (length($t_horizontal_resolution) == 0) {
                $t_horizontal_resolution = 0.0;
                }
        if (length($t_secondary_horizontal_datum) == 0) {
                $t_secondary_horizontal_datum = "\'\'";
                }
        if (length($t_rmse_xy) == 0) {
                $t_rmse_xy = 0.0;
                }
        if (length($t_metadata_date) == 0) {
                $t_metadata_date = "\\N";
                }
        if (length($t_data_file_size) == 0) {
                $t_data_file_size = 0;  # a big int
                }
        if (length($byte_count) == 0) {
                $byte_count = 0;        # reasonable default ? for an int
                }
        $out_line = $t_source_filename . "|" . $t_quadrangle_name . "|" . $t_west_longitude . "|" . $t_east_longitude .
"|". $t_north_latitude . "|" . $t_south_latitude . "|" . $t_production_date . "|" . $t_raster_order . "|" . $t_band_org 
anization . "|" . $t_band_content . "|" . $t_bits_per_pixel . "|" . $t_samples_and_lines . "|" . $t_horizontal_datum .
"|". $t_horizontal_coordinate_system . "|" . $t_coordinate_zone . "|" . $t_horizontal_units . "|" .
$t_horizontal_resolu
tion . "|" . $t_secondary_horizontal_datum . "|" . $t_xy_origin . "|" . $t_secondary_xy_origin . "|" .
$t_nw_quad_corner_xy. "|" . $t_ne_quad_corner_xy . "|" . $t_se_quad_corner_xy . "|" . $t_sw_quad_corner_xy . "|" .
$t_secondary_nw_quad_xy. "|" . $t_secondary_ne_quad_xy . "|" . $tsecondary_se_quad_xy . "|" . $t_secondary_sw_quad_xy .
"|". $t_rmse_xy . "|" . $t_image_source . "|" . $t_source_dem_date . "|" . $t_agency . "|" . $t_producer . "|" .
$t_production_system. "|" . $t_standard_version . "|" . $t_metadata_date . "|" . $t_data_file_size . "|" . $byte_count; 
        print OUTFILE "$out_line\n";
        }
else {
        print OUTFILE "$line\n";
        }
}
close INFILE;
close OUTFILE;

-----Original Message-----
From: Jason Godden [mailto:jasongodden@optushome.com.au]
Sent: Wednesday, August 13, 2003 2:35 PM
To: expect; pgsql-general@postgresql.org
Subject: Re: [GENERAL] importing db as text files


Hi expect,

Best way in my opinion is to use the copy table command.  This way Pg will
actually 'massage' the data (string escapes and all) for you.

If you have complex files best to use a combo of sed/awk to pre-massage the
field delimiters and import the data to a temporary table.  In most instances
it is best to pipe the data to the psql command using copy table from stdin
as from file requires that you are the process owner.

Always use a temporary import table and perform validation/further conversion
in that table (IMO).

eg (from one of my earlier posts)

cat unprocesseddata.txt | sed "s/\",\"/$TAB/g" | sed "s/\"//g" | sed "s/.$//g"
| psql -dmydatabase -c "copy importtable from stdin delimiter '\t';"

The first sed will replace all instances of "," with a tab character
(I achieve tab in my shell scripts with a line like TAB=$(printf "\t"))

The second sed will remove the start and end " because the first only checks
for ",".

The third sed will remove the Windows carriage returns (if any - depends on
the platform the file was generated from.

Note here that my file format never contains any " in a field so I can safely
run the second sed.  If your import file contains these then you will need to
change the regex or use awk.

Regards,

Jason

On Thu, 14 Aug 2003 07:14 am, expect wrote:
> What's the big deal with importing text files?  I have a 70 MB file to
> import and it's been one problem after another.  I used the copy command
> and it appears that it's just not possible. I finally massaged the file
> into a .sql file and ran that using \i db.sql but that failed too because I
> overlooked ' in names like D'Adario.  The other problem I encountered was
> that a numeric field had to have data in it, pg would not default to the
> default value.  So instead of massaging all the data again I decided to
> change the data type for that column. This is my first experience with
> postgresql and I'm wondering if I should expect to encounter similar pain
> as I go further into this?  So far it's been very painful trying to do what
> I thought would be easy and what I think should be easy.
>
> PostgreSQL 7.3.4 on linux redhat 9
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: importing db as text files

From
expect
Date:
On Wed, 13 Aug 2003 16:30:04 -0500
Bruno Wolff III <bruno@wolff.to> wrote:

> On Wed, Aug 13, 2003 at 14:14:20 -0700,
>   expect <expect@ihubbell.com> wrote:
> >
> > What's the big deal with importing text files?  I have a 70 MB file to import
> > and it's been one problem after another.  I used the copy command and it appears
> > that it's just not possible. I finally massaged the file into a .sql file and
>
> That is unlikely, but without more details it is hard to say what you need
> to do.


What's unlikely?  That it didn't work?  But it didn't work.


>
> > ran that using \i db.sql but that failed too because I overlooked ' in names
> > like D'Adario.  The other problem I encountered was that a numeric field had
> > to have data in it, pg would not default to the default value.  So instead of
>
> You can use the keyword default in insert statements to get a default value.


From what I've read it should use the default value when there is no value.
Is that not true?  Is this a known issue?


>
> > massaging all the data again I decided to change the data type for that column.
> > This is my first experience with postgresql and I'm wondering if I should expect
> > to encounter similar pain as I go further into this?  So far it's been very
> > painful trying to do what I thought would be easy and what I think should be
> > easy.
>
> The impression I get is that you expect postgres to make a best guess when

No you've come to the wrong impression.  I believe that the problem lies in the
fact that pg will not default to the default value when no value is present.


> presented with ambiguous data. That is a very dangerous thing to do. I would
> much prefer ambiguous data be rejected so that I can make sure what I think
> the value is, is the same as what the database thinks the value is.
>
>

Re: importing db as text files

From
expect
Date:
On Thu, 14 Aug 2003 07:34:55 +1000
Jason Godden <jasongodden@optushome.com.au> wrote:

> Hi expect,
>
> Best way in my opinion is to use the copy table command.  This way Pg will
> actually 'massage' the data (string escapes and all) for you.

I guess we're of the same opinion.  I did use the copy table command.
I believe the problem is that pg is unable to use the default value when a
value is not present.


>
> If you have complex files best to use a combo of sed/awk to pre-massage the
> field delimiters and import the data to a temporary table.  In most instances
> it is best to pipe the data to the psql command using copy table from stdin
> as from file requires that you are the process owner.
>
> Always use a temporary import table and perform validation/further conversion
> in that table (IMO).
>
> eg (from one of my earlier posts)
>
> cat unprocesseddata.txt | sed "s/\",\"/$TAB/g" | sed "s/\"//g" | sed "s/.$//g"
> | psql -dmydatabase -c "copy importtable from stdin delimiter '\t';"
>
> The first sed will replace all instances of "," with a tab character
> (I achieve tab in my shell scripts with a line like TAB=$(printf "\t"))
>
> The second sed will remove the start and end " because the first only checks
> for ",".
>
> The third sed will remove the Windows carriage returns (if any - depends on
> the platform the file was generated from.
>
> Note here that my file format never contains any " in a field so I can safely
> run the second sed.  If your import file contains these then you will need to
> change the regex or use awk.
>
> Regards,
>
> Jason
>
> On Thu, 14 Aug 2003 07:14 am, expect wrote:
> > What's the big deal with importing text files?  I have a 70 MB file to
> > import and it's been one problem after another.  I used the copy command
> > and it appears that it's just not possible. I finally massaged the file
> > into a .sql file and ran that using \i db.sql but that failed too because I
> > overlooked ' in names like D'Adario.  The other problem I encountered was
> > that a numeric field had to have data in it, pg would not default to the
> > default value.  So instead of massaging all the data again I decided to
> > change the data type for that column. This is my first experience with
> > postgresql and I'm wondering if I should expect to encounter similar pain
> > as I go further into this?  So far it's been very painful trying to do what
> > I thought would be easy and what I think should be easy.
> >
> > PostgreSQL 7.3.4 on linux redhat 9
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 7: don't forget to increase your free space map settings
>
>
>

Re: importing db as text files

From
expect
Date:
On Wed, 13 Aug 2003 14:24:30 -0700
"Dann Corbit" <DCorbit@connx.com> wrote:

> > -----Original Message-----
> > From: expect [mailto:expect@ihubbell.com]
> > Sent: Wednesday, August 13, 2003 2:14 PM
> > To: pgsql-general@postgresql.org
> > Subject: [GENERAL] importing db as text files
> >
> > What's the big deal with importing text files?  I have a 70
> > MB file to import and it's been one problem after another.  I
> > used the copy command and it appears that it's just not
> > possible. I finally massaged the file into a .sql file and
> > ran that using \i db.sql but that failed too because I
> > overlooked ' in names like D'Adario.  The other problem I
> > encountered was that a numeric field had to have data in it,
> > pg would not default to the default value.  So instead of
> > massaging all the data again I decided to change the data
> > type for that column. This is my first experience with
> > postgresql and I'm wondering if I should expect to encounter
> > similar pain as I go further into this?  So far it's been
> > very painful trying to do what I thought would be easy and
> > what I think should be easy.
>
> I use the copy command all the time without problems.  If the data
> follows the format you describe for copy, there will be no problems.  If
> the data is some other format, then you will have to use perl or sed or
> something to reformat it.


I believe the problem is that pg is unable to use the default value
when a value is not present.

>
> If your data is in some format that is not easy to massage into
> something that copy wants to eat, then use an ODBC driver for text files
> and move all the data with insert/select.
>
> Expect no miracles. GI/GO.


Importing a text file is a miracle?  You're scaring me. ;^)

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>
>

Re: importing db as text files

From
expect
Date:
On Wed, 13 Aug 2003 14:59:29 -0700
"Gregory S. Williamson" <gsw@globexplorer.com> wrote:

> I tend to use perl to preprocess dumps (in our case from Informix). It tends to work much faster than shell scripts
(althoughyour mileage may vary). I have to fix missing numeric values (if the column allows nulls why can't the copy
commandaccept an empty field, I wonder?), missing dates. In other cases is massages date formats, spatial data, etc.
Forexample, a crude program below to clean DOQQ metadata: 

I'm not having any problems massaging the data.  I believe the problem is that
pg is unable to use the default value when a value is not present.



BTW FWIW I took a few liberties with your perl code below for ease of reading.

For $out_line you can use the little append operator like:

$out_line  = $t_source_filename . "|";
$out_line .= $t_quadrangle_name . "|";
etc., etc.



>
> Greg W.
> ==================================
> firenze% more infxunl2psql
> #!/usr/dist/bin/perl -w
>
> $FILE = $ARGV[0];
> $OUTPUT = $ARGV[1];
> $MODE = $ARGV[2];
>
> open (INFILE,"$FILE");
> open (OUTFILE,">$OUTPUT");
>
> foreach $line (<INFILE>)
> {
> chop($line);
> chop($line);
> if (($MODE cmp "DOQ") == 0)
  {
        ($t_source_filename, $t_quadrangle_name, $t_west_longitude,
    $t_east_longitude, $t_north_latitude, $t_south_latitude,
    $t_production_date, $t_raster_order, $t_band_organization,
    $t_band_content, $t_bits_per_pixel, $t_samples_and_lines,
    $t_horizontal_datum, $t_horizontal_coordinate_system,
    $t_coordinate_zone, $t_horizontal_units, $t_horizontal_resolution,
    $t_secondary_horizontal_datum, $t_xy_origin, $t_secondary_xy_origin,
    $t_nw_quad_corner_xy, $t_ne_quad_corner_xy, $t_se_quad_corner_xy,
    $t_sw_quad_corner_xy, $t_secondary_nw_quad_xy, $t_secondary_ne_quad_xy,
    $tsecondary_se_quad_xy, $t_secondary_sw_quad_xy, $t_rmse_xy, $t_image_source,
    $t_source_dem_date, $t_agency, $t_producer, $t_production_system,
    $t_standard_version, $t_metadata_date, $t_data_file_size, $byte_count)
    = split(/\|/,$line);

>         if (length($t_production_date) == 0) {
>                 $t_production_date = "\\N";    # psql seems to dump a blank data with this nomenclature
>                 }
>         if (length($t_coordinate_zone) == 0) {  # an integer
>                 $t_coordinate_zone = 0;
>                 }
>         if (length($t_band_content) == 0) {
>                 $t_band_content = 0;    # also an int
>                 }
>         if (length($t_bits_per_pixel) == 0) {
>                 $t_bits_per_pixel = 0;  # reasonable default for an int ?
>                 }
>         if (length($t_horizontal_resolution) == 0) {
>                 $t_horizontal_resolution = 0.0;
>                 }
>         if (length($t_secondary_horizontal_datum) == 0) {
>                 $t_secondary_horizontal_datum = "\'\'";
>                 }
>         if (length($t_rmse_xy) == 0) {
>                 $t_rmse_xy = 0.0;
>                 }
>         if (length($t_metadata_date) == 0) {
>                 $t_metadata_date = "\\N";
>                 }
>         if (length($t_data_file_size) == 0) {
>                 $t_data_file_size = 0;  # a big int
>                 }
>         if (length($byte_count) == 0) {
>                 $byte_count = 0;        # reasonable default ? for an int
>                 }
>         $out_line = $t_source_filename . "|" . $t_quadrangle_name . "|" . $t_west_longitude . "|" . $t_east_longitude
."|" . $t_north_latitude . "|" . $t_south_latitude . "|" . $t_production_date . "|" . $t_raster_order . "|" .
$t_band_org
> anization . "|" . $t_band_content . "|" . $t_bits_per_pixel . "|" . $t_samples_and_lines . "|" . $t_horizontal_datum
."|" . $t_horizontal_coordinate_system . "|" . $t_coordinate_zone . "|" . $t_horizontal_units . "|" .
$t_horizontal_resolu
> tion . "|" . $t_secondary_horizontal_datum . "|" . $t_xy_origin . "|" . $t_secondary_xy_origin . "|" .
$t_nw_quad_corner_xy. "|" . $t_ne_quad_corner_xy . "|" . $t_se_quad_corner_xy . "|" . $t_sw_quad_corner_xy . "|" .
$t_secondary_nw_quad_xy. "|" . $t_secondary_ne_quad_xy . "|" . $tsecondary_se_quad_xy . "|" . $t_secondary_sw_quad_xy .
"|". $t_rmse_xy . "|" . $t_image_source . "|" . $t_source_dem_date . "|" . $t_agency . "|" . $t_producer . "|" .
$t_production_system. "|" . $t_standard_version . "|" . $t_metadata_date . "|" . $t_data_file_size . "|" . $byte_count; 
>         print OUTFILE "$out_line\n";
>         }
> else {
>         print OUTFILE "$line\n";
>         }
> }
> close INFILE;
> close OUTFILE;
>
> -----Original Message-----
> From: Jason Godden [mailto:jasongodden@optushome.com.au]
> Sent: Wednesday, August 13, 2003 2:35 PM
> To: expect; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] importing db as text files
>
>
> Hi expect,
>
> Best way in my opinion is to use the copy table command.  This way Pg will
> actually 'massage' the data (string escapes and all) for you.
>
> If you have complex files best to use a combo of sed/awk to pre-massage the
> field delimiters and import the data to a temporary table.  In most instances
> it is best to pipe the data to the psql command using copy table from stdin
> as from file requires that you are the process owner.
>
> Always use a temporary import table and perform validation/further conversion
> in that table (IMO).
>
> eg (from one of my earlier posts)
>
> cat unprocesseddata.txt | sed "s/\",\"/$TAB/g" | sed "s/\"//g" | sed "s/.$//g"
> | psql -dmydatabase -c "copy importtable from stdin delimiter '\t';"
>
> The first sed will replace all instances of "," with a tab character
> (I achieve tab in my shell scripts with a line like TAB=$(printf "\t"))
>
> The second sed will remove the start and end " because the first only checks
> for ",".
>
> The third sed will remove the Windows carriage returns (if any - depends on
> the platform the file was generated from.
>
> Note here that my file format never contains any " in a field so I can safely
> run the second sed.  If your import file contains these then you will need to
> change the regex or use awk.
>
> Regards,
>
> Jason
>
> On Thu, 14 Aug 2003 07:14 am, expect wrote:
> > What's the big deal with importing text files?  I have a 70 MB file to
> > import and it's been one problem after another.  I used the copy command
> > and it appears that it's just not possible. I finally massaged the file
> > into a .sql file and ran that using \i db.sql but that failed too because I
> > overlooked ' in names like D'Adario.  The other problem I encountered was
> > that a numeric field had to have data in it, pg would not default to the
> > default value.  So instead of massaging all the data again I decided to
> > change the data type for that column. This is my first experience with
> > postgresql and I'm wondering if I should expect to encounter similar pain
> > as I go further into this?  So far it's been very painful trying to do what
> > I thought would be easy and what I think should be easy.
> >
> > PostgreSQL 7.3.4 on linux redhat 9
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 7: don't forget to increase your free space map settings
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>

Re: importing db as text files

From
Martijn van Oosterhout
Date:
On Wed, Aug 13, 2003 at 02:59:29PM -0700, Gregory S. Williamson wrote:
> I tend to use perl to preprocess dumps (in our case from Informix). It tends to work much faster than shell scripts
(althoughyour mileage may vary). I have to fix missing numeric values (if the column allows nulls why can't the copy
commandaccept an empty field, I wonder?), missing dates. In other cases is massages date formats, spatial data, etc.
Forexample, a crude program below to clean DOQQ metadata: 

[snip]

>                 $t_production_date = "\\N";    # psql seems to dump a blank data with this nomenclature

Umm, \N represents NULL. NULL is not a blank field, it's null. You could
also tell copy that a blank field represents a null but that might have
unexpected effects on text fields which are supposed to be blank.

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "All that is needed for the forces of evil to triumph is for enough good
> men to do nothing." - Edmond Burke
> "The penalty good people pay for not being interested in politics is to be
> governed by people worse than themselves." - Plato

Attachment

Re: importing db as text files

From
expect
Date:
On Wed, 13 Aug 2003 14:59:29 -0700
"Gregory S. Williamson" <gsw@globexplorer.com> wrote:

> I tend to use perl to preprocess dumps (in our case from Informix). It tends
> to work much faster than shell scripts (although your mileage may vary). I




> have to fix missing numeric values (if the column allows nulls why can't the
> copy command accept an empty field, I wonder?), missing dates. In other cases

This seems to be the question I need answered as well.  Does anyone know the
answer?



> is massages date formats, spatial data, etc. For example, a crude program
> below to clean DOQQ metadata:
>
> Greg W.
> ==================================
> firenze% more infxunl2psql
> #!/usr/dist/bin/perl -w
>
> $FILE = $ARGV[0];
> $OUTPUT = $ARGV[1];
> $MODE = $ARGV[2];
>
> open (INFILE,"$FILE");
> open (OUTFILE,">$OUTPUT");
>
> foreach $line (<INFILE>)
> {
> chop($line);
> chop($line);
> if (($MODE cmp "DOQ") == 0) {
>         ($t_source_filename, $t_quadrangle_name, $t_west_longitude,
>         $t_east_longitude, $t_north_latitude, $t_south_latitude,
>         $t_production_date, $t_raster_order, $t_band_organization,
>         $t_band_content, $t_bits_per_pixel, $t_samples_and_lines,
>         $t_horizontal_datum, $t_horizontal_coordinate_system,
>         $t_coordinate_zone, $t_horizontal_units, $t_horizontal_resolution,
>         $t_secondary_horizontal_datum, $t_xy_origin, $t_secondary_xy_origin,
>         $t_nw_quad_corner_xy, $t_ne_quad_corner_xy, $t_se_quad_corner_xy,
>         $t_sw_quad_corner_xy, $t_secondary_nw_quad_xy,
>         $t_secondary_ne_quad_xy, $tsecondary_se_quad_xy,
>         $t_secondary_sw_quad_xy, $t_rmse_xy, $t_image_source,
>         $t_source_dem_date, $t_agency, $t_producer, $t_production_system,
>         $t_standard_version, $t_metadata_date, $t_data_file_size, $byte_count)
>         = split(/\|/,$line); if (length($t_production_date) == 0) {
>                 $t_production_date = "\\N";    # psql seems to dump a blank data with
>                 this nomenclature}
>         if (length($t_coordinate_zone) == 0) {  # an integer
>                 $t_coordinate_zone = 0;
>                 }
>         if (length($t_band_content) == 0) {
>                 $t_band_content = 0;    # also an int
>                 }
>         if (length($t_bits_per_pixel) == 0) {
>                 $t_bits_per_pixel = 0;  # reasonable default for an int ?
>                 }
>         if (length($t_horizontal_resolution) == 0) {
>                 $t_horizontal_resolution = 0.0;
>                 }
>         if (length($t_secondary_horizontal_datum) == 0) {
>                 $t_secondary_horizontal_datum = "\'\'";
>                 }
>         if (length($t_rmse_xy) == 0) {
>                 $t_rmse_xy = 0.0;
>                 }
>         if (length($t_metadata_date) == 0) {
>                 $t_metadata_date = "\\N";
>                 }
>         if (length($t_data_file_size) == 0) {
>                 $t_data_file_size = 0;  # a big int
>                 }
>         if (length($byte_count) == 0) {
>                 $byte_count = 0;        # reasonable default ? for an int
>                 }
>         $out_line = $t_source_filename . "|" . $t_quadrangle_name . "|" .
>         $t_west_longitude . "|" . $t_east_longitude . "|" . $t_north_latitude
>         . "|" . $t_south_latitude . "|" . $t_production_date . "|" .
>         $t_raster_order . "|" . $t_band_org
> anization . "|" . $t_band_content . "|" . $t_bits_per_pixel . "|" .
> $t_samples_and_lines . "|" . $t_horizontal_datum . "|" .
> $t_horizontal_coordinate_system . "|" . $t_coordinate_zone . "|" .
> $t_horizontal_units . "|" . $t_horizontal_resolu tion . "|" .
> $t_secondary_horizontal_datum . "|" . $t_xy_origin . "|" .
> $t_secondary_xy_origin . "|" .  $t_nw_quad_corner_xy . "|" .
> $t_ne_quad_corner_xy . "|" . $t_se_quad_corner_xy . "|" . $t_sw_quad_corner_xy
> . "|" . $t_secondary_nw_quad_xy . "|" . $t_secondary_ne_quad_xy . "|" .
> $tsecondary_se_quad_xy . "|" . $t_secondary_sw_quad_xy . "|" . $t_rmse_xy .
> "|" . $t_image_source . "|" . $t_source_dem_date . "|" . $t_agency . "|" .
> $t_producer . "|" . $t_production_system . "|" . $t_standard_version . "|" .
> $t_metadata_date . "|" . $t_data_file_size . "|" . $byte_count;
>         print OUTFILE "$out_line\n";
>         }
> else {
>         print OUTFILE "$line\n";
>         }
> }
> close INFILE;
> close OUTFILE;
>
> -----Original Message-----
> From: Jason Godden [mailto:jasongodden@optushome.com.au]
> Sent: Wednesday, August 13, 2003 2:35 PM
> To: expect; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] importing db as text files
>
>
> Hi expect,
>
> Best way in my opinion is to use the copy table command.  This way Pg will
> actually 'massage' the data (string escapes and all) for you.
>
> If you have complex files best to use a combo of sed/awk to pre-massage the
> field delimiters and import the data to a temporary table.  In most instances
> it is best to pipe the data to the psql command using copy table from stdin
> as from file requires that you are the process owner.
>
> Always use a temporary import table and perform validation/further conversion
> in that table (IMO).
>
> eg (from one of my earlier posts)
>
> cat unprocesseddata.txt | sed "s/\",\"/$TAB/g" | sed "s/\"//g" | sed "s/.$//g"
>
> | psql -dmydatabase -c "copy importtable from stdin delimiter '\t';"
>
> The first sed will replace all instances of "," with a tab character
> (I achieve tab in my shell scripts with a line like TAB=$(printf "\t"))
>
> The second sed will remove the start and end " because the first only checks
> for ",".
>
> The third sed will remove the Windows carriage returns (if any - depends on
> the platform the file was generated from.
>
> Note here that my file format never contains any " in a field so I can safely
> run the second sed.  If your import file contains these then you will need to
> change the regex or use awk.
>
> Regards,
>
> Jason
>
> On Thu, 14 Aug 2003 07:14 am, expect wrote:
> > What's the big deal with importing text files?  I have a 70 MB file to
> > import and it's been one problem after another.  I used the copy command
> > and it appears that it's just not possible. I finally massaged the file
> > into a .sql file and ran that using \i db.sql but that failed too because I
> > overlooked ' in names like D'Adario.  The other problem I encountered was
> > that a numeric field had to have data in it, pg would not default to the
> > default value.  So instead of massaging all the data again I decided to
> > change the data type for that column. This is my first experience with
> > postgresql and I'm wondering if I should expect to encounter similar pain
> > as I go further into this?  So far it's been very painful trying to do what
> > I thought would be easy and what I think should be easy.
> >
> > PostgreSQL 7.3.4 on linux redhat 9
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 7: don't forget to increase your free space map settings
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>

Re: importing db as text files

From
Jason Godden
Date:
On Thu, 14 Aug 2003 02:17 pm, expect wrote:
> On Wed, 13 Aug 2003 14:59:29 -0700
>
> "Gregory S. Williamson" <gsw@globexplorer.com> wrote:
> > have to fix missing numeric values (if the column allows nulls why can't
> > the copy command accept an empty field, I wonder?), missing dates. In
> > other cases
>
> This seems to be the question I need answered as well.  Does anyone know
> the answer?

Copy can accept an empty field (representing null):

copy datatable from stdin delimiter '\t' null '';

by default pg expects '\N' to mean null but you can override it:

Description: copy data between files and tables
Syntax:
COPY table [ ( column [, ...] ) ]
    FROM { 'filename' | stdin }
    [ [ WITH ]
          [ BINARY ]
          [ OIDS ]
          [ DELIMITER [ AS ] 'delimiter' ]
          [ NULL [ AS ] 'null string' ] ]
COPY table [ ( column [, ...] ) ]
    TO { 'filename' | stdout }
    [ [ WITH ]
          [ BINARY ]
          [ OIDS ]
          [ DELIMITER [ AS ] 'delimiter' ]
          [ NULL [ AS ] 'null string' ] ]

Rgds,

Jason

Re: importing db as text files

From
"Gregory S. Williamson"
Date:
Two issues raised (other than my atrocious coding)

 a) if defaults for a column are defined and copy is given a value of <> (implicit NULL) why not use the default ?
[doesan explicit \n work -- didn't try, I have to confess] 

 b) more generally, if copy finds a column that allows null and the data is a null (not explicitly defined as such,
justno data) isn't that value valid, e.g. a NULL value ? 

I kludged a simple way to make my data load, without really understanding why -- other than how "C" converts strings to
integers("atoi") -- a simple NULL can't be a NULL entry or at least a '0' if that is the default. 

Not meant in ANY way as criticism, really, just trying to get a handle on this cool tool, and how it differs from ones
Ido know. 

Apologies for any double postings I may have caused.

Greg W.

-----Original Message-----
From:    Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent:    Wed 8/13/2003 9:24 PM
To:    Gregory S. Williamson
Cc:    expect; pgsql-general@postgresql.org
Subject:    Re: [GENERAL] importing db as text files







Re: importing db as text files

From
Jason Godden
Date:
On Thu, 14 Aug 2003 01:52 pm, you wrote:

> On Thu, 14 Aug 2003 07:34:55 +1000
>
> Jason Godden <jasongodden@optushome.com.au> wrote:
> > Hi expect,
> >
> > Best way in my opinion is to use the copy table command.  This way Pg
> > will actually 'massage' the data (string escapes and all) for you.
>
> I guess we're of the same opinion.  I did use the copy table command.
> I believe the problem is that pg is unable to use the default value when a
> value is not present.

PG is behaving correctly IMO:

create table data (data1 int4 not null,    data2 int4,data3 int4 not null default
10);

insert into data values (2,null,default) - OK
insert into data values (null,2,default) - Fail not null data1
insert into data values (2,null) - Fail? missing field?
insert into data (data1,data2) values (2,null) - OK data3 = default with
explicit field nomination ^

copy from... (essentially becomes - although it does something a bit different
behind the scenes):

insert into data (data1,data2,data3) values (x,y,z)

if data3 is specified not null default 10 and you have a line in your import
file which is translated thus:

2 2 \N - default null but you can nominate what that is
insert into data (data1,data2,data3) values (2,2,null);

this will fail because you are explicitly saying put null in a not null field.

So use an intermediatory table without not null constraints with copy from...
then use a query:

insert into realtable (data1,data2,data3) select data1,
case when data2 is null then default else data2 end,
data3 from data where data1 is not null; <-because data1 has no option to be
null or a default value etc...

Problem solved... I'd be curious as to how many ppl actually import their data
STRAIGHT into their production tables without integrity checking.  Ofcourse
if you massage and manage it externally such as the method Greg uses then
you're in business too - either way I believe the pg copy syntax is correct
and makes sense.  PG Copy CANT make a guess that you intend null or to skip
that field so the default pops in there - null and default are two very
different things. in fact null is oo (infinity?) different things...

Unless someone changes the insert behaviour in a later release then you will
have to come up with a 'massaged' way (Greg/Perl, intermediatory tables and
pl/pgsql functions).  But then think about this:

insert into data values (1,2,null)

which by the proposed new null behaviour suddenly becomes:

insert into data values (1,2,default) (say default = 10);

but in reality the user simply mucked up on data entry, didn't actually mean
default and meant to insert 70 instead - they don't get a warning about it
and your data integrity is screwed.  Not only that the db isn't paying strict
attention to the intended SQL syntax and the constraint management is moved
to the client - PG (and any other decent database - ie Oracle) is far above
the MySQL
cram-it-in-even-if-it-breaks-code-all-your-business-rules-in-your-client way
of doing things!

Default should be used in an explicit sense IMHO:

insert into data (data1,data2) values (1,2) - now data3 becomes default and
all is good

VERY different to: insert into data values (1,2,null);

query tuning

From
Dave Cramer
Date:
I have a query which definitely runs faster when sequential scans are
turned off. Which parameters do I tune? and which way?

After quickly perusing the docs, and google, I think it is the
random_page_cost?

Any help would be appreciated.

Dave
--
Dave Cramer <dave@fastcrypt.com>
fastcrypt
--
Dave Cramer <Dave@micro-automation.net>


Re: importing db as text files

From
Bruno Wolff III
Date:
On Wed, Aug 13, 2003 at 20:45:55 -0700,
  expect <expect@ihubbell.com> wrote:
> On Wed, 13 Aug 2003 16:30:04 -0500
> Bruno Wolff III <bruno@wolff.to> wrote:
>
> > On Wed, Aug 13, 2003 at 14:14:20 -0700,
> >   expect <expect@ihubbell.com> wrote:
> > >
> > > What's the big deal with importing text files?  I have a 70 MB file to import
> > > and it's been one problem after another.  I used the copy command and it appears
> > > that it's just not possible. I finally massaged the file into a .sql file and
> >
> > That is unlikely, but without more details it is hard to say what you need
> > to do.
>
>
> What's unlikely?  That it didn't work?  But it didn't work.

That it isn't possible to load your data using the copy command.
You may need to do some transformation before sending it to copy.
Using copy is probably going to give you the fastest load time.

> From what I've read it should use the default value when there is no value.
> Is that not true?  Is this a known issue?

Where did you read that? What do you mean by 'no value'?

The syntax for the insert command is given in the manual:
INSERT INTO table [ ( column [, ...] ) ]
    { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) | query }

>
> No you've come to the wrong impression.  I believe that the problem lies in the
> fact that pg will not default to the default value when no value is present.

And how do you think 'no value' is represented? If you use default to
respresent 'no value' then things will work as you expect for insert
statements. When you are using copy, defaults can't be used on a row
by row basis (but unlisted columns will get default values) and you will
need to inlcude the value that you want in each row.

Re: importing db as text files

From
Bruno Wolff III
Date:
On Thu, Aug 14, 2003 at 14:50:41 +1000,
  Jason Godden <jasongodden@optushome.com.au> wrote:
>
> Copy can accept an empty field (representing null):

That assumes that an empty field is what he means by 'no value'.
For some data types an empty string is a perfectly valid data type.

> copy datatable from stdin delimiter '\t' null '';
>
> by default pg expects '\N' to mean null but you can override it:

He wants the default value, not null though. Copy doesn't have a way
to specify a string to be replaced by the default value. It probably
wouldn't be too hard to add this option, but it is too late for 7.4.

If the default value is a constant preprocessing should be an easy way
to handle the issue. Another option may be to change null's to the
default after importing the data (assuming there aren't actual nulls
in the data). If he is trying to use nextval for all rows on one column,
then not entering that column would be the way to go.

We haven't seen any example of the data he is trying to import so it is
hard to give him specific advice.

Re: query tuning

From
Bruno Wolff III
Date:
Please don't reply to messages to start a new thread.

On Thu, Aug 14, 2003 at 08:09:23 -0400,
  Dave Cramer <Dave@micro-automation.net> wrote:
> I have a query which definitely runs faster when sequential scans are
> turned off. Which parameters do I tune? and which way?
>
> After quickly perusing the docs, and google, I think it is the
> random_page_cost?

Have you run vacuum analyze?

Can you supply a copy of the query and explain analyze output?

Re: importing db as text files

From
Bruno Wolff III
Date:
On Thu, Aug 14, 2003 at 00:50:27 -0700,
  "Gregory S. Williamson" <gsw@globexplorer.com> wrote:
>
> Two issues raised (other than my atrocious coding)
>
>  a) if defaults for a column are defined and copy is given a value of <> (implicit NULL) why not use the default ?
[doesan explicit \n work -- didn't try, I have to confess] 

Because empty strings are valid data and you can't go assuming they are
null, since you would then need a different string to represent the empty
string.

>  b) more generally, if copy finds a column that allows null and the data is a null (not explicitly defined as such,
justno data) isn't that value valid, e.g. a NULL value ? 

It might be bad data. The copy statement provides a way to define a string that
represents the null string.

It would probably be reasonable to add a way to specify a string to be
replaced by the default value.

Re: importing db as text files

From
expect
Date:
On Thu, 14 Aug 2003 08:45:20 -0500
Bruno Wolff III <bruno@wolff.to> wrote:

> On Wed, Aug 13, 2003 at 20:45:55 -0700,
>   expect <expect@ihubbell.com> wrote:
> > On Wed, 13 Aug 2003 16:30:04 -0500
> > Bruno Wolff III <bruno@wolff.to> wrote:
> >
> > > On Wed, Aug 13, 2003 at 14:14:20 -0700,
> > >   expect <expect@ihubbell.com> wrote:
> > > >
> > > > What's the big deal with importing text files?  I have a 70 MB file to import
> > > > and it's been one problem after another.  I used the copy command and it appears
> > > > that it's just not possible. I finally massaged the file into a .sql file and
> > >
> > > That is unlikely, but without more details it is hard to say what you need
> > > to do.
> >
> >
> > What's unlikely?  That it didn't work?  But it didn't work.
>
> That it isn't possible to load your data using the copy command.
> You may need to do some transformation before sending it to copy.
> Using copy is probably going to give you the fastest load time.
>
> > From what I've read it should use the default value when there is no value.
> > Is that not true?  Is this a known issue?
>
> Where did you read that? What do you mean by 'no value'?

The data is really very simple.  Here's a pseudo-example of 2 rows of data:

191922C,Bob Cobb,D'Obbalina Sr.,312 Elm Street,Yountville,CA,94599,5,062001,082009
339111C,Elma Thelma,Velma,98 Oak Lane,St. Louis,MO,63119-2065,,,



>
> The syntax for the insert command is given in the manual:
> INSERT INTO table [ ( column [, ...] ) ]
>     { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) | query }
>
> >
> > No you've come to the wrong impression.  I believe that the problem lies in the
> > fact that pg will not default to the default value when no value is present.
>
> And how do you think 'no value' is represented? If you use default to
> respresent 'no value' then things will work as you expect for insert
> statements. When you are using copy, defaults can't be used on a row
> by row basis (but unlisted columns will get default values) and you will
> need to inlcude the value that you want in each row.
>
>

Re: importing db as text files

From
Bruno Wolff III
Date:
On Thu, Aug 14, 2003 at 09:08:16 -0700,
  expect <expect@ihubbell.com> wrote:
>
> 191922C,Bob Cobb,D'Obbalina Sr.,312 Elm Street,Yountville,CA,94599,5,062001,082009
> 339111C,Elma Thelma,Velma,98 Oak Lane,St. Louis,MO,63119-2065,,,

That should be handled by copy pretty easily. If the columns after what
appears to be the zip code are some type of number rather than a string,
then the empty string is going to be an invalid value.

If you want the empty string to be interpreted as the default and you don't
have any NULL values in your input, then the simplest thing to do is set
the code for NULLs to be the empty string. You can then do queries after
the data is in the database to change the NULLs to the appropiate default.
If you are running the 7.4 beta you can use the DEFAULT key in the update,
otherwise you will need to duplicate the default expression in the update
commands. You will want to run update once for each column that can have
NULL values using IS NULL in the WHERE clause.
If these columns have a NOT NULL constraint, you may want to use a temporary
table to load the data and then copy it over (with a single insert statement)
after it has been cleaned up.

Note that it isn't obvious what empty strings should map to for numbers.
NULL and 0 make about as much sense as using the default value.

Re: importing db as text files

From
Murthy Kambhampaty
Date:
On Wednesday, August 13, 2003 17:59, Gregory S. Williamson
[mailto:gsw@globexplorer.com] wrote:

>
>I tend to use perl to preprocess dumps (in our case from
>Informix). It tends to work much faster than shell scripts
>(although your mileage may vary). I have to fix missing
>numeric values (if the column allows nulls why can't the copy
>command accept an empty field, I wonder?)

Have you tried "copy <tbl> from stdin with null as '';" or am I missing
something?

See
http://techdocs.postgresql.org/techdocs/usingcopy.php
or the postgresql documentation for more info.

Cheers,
    Murthy

query tuning

From
Dave Cramer
Date:
I have a query which definitely runs faster when sequential scans are
turned off. Which parameters do I tune? and which way?

After quickly perusing the docs, and google, I think it is the
random_page_cost?

Any help would be appreciated.

Dave
--
Dave Cramer <dave@fastcrypt.com>
fastcrypt


Re: importing db as text files

From
expect
Date:
On Thu, 14 Aug 2003 12:46:07 -0500
Bruno Wolff III <bruno@wolff.to> wrote:

> On Thu, Aug 14, 2003 at 09:08:16 -0700,
>   expect <expect@ihubbell.com> wrote:
> >
> > 191922C,Bob Cobb,D'Obbalina Sr.,312 Elm Street,Yountville,CA,94599,5,062001,082009
> > 339111C,Elma Thelma,Velma,98 Oak Lane,St. Louis,MO,63119-2065,,,
>
> That should be handled by copy pretty easily. If the columns after what
> appears to be the zip code are some type of number rather than a string,
> then the empty string is going to be an invalid value.
>
> If you want the empty string to be interpreted as the default and you don't
> have any NULL values in your input, then the simplest thing to do is set
> the code for NULLs to be the empty string. You can then do queries after


I'm new to this so everything has to be explicit, I don't want to make any
leaps with what anyone is saying here.


copy db from '/usr/tmp/db' with delimiter ',' null '';

Is this the command that you're recommending?


> the data is in the database to change the NULLs to the appropiate default.
> If you are running the 7.4 beta you can use the DEFAULT key in the update,
> otherwise you will need to duplicate the default expression in the update
> commands. You will want to run update once for each column that can have
> NULL values using IS NULL in the WHERE clause.
> If these columns have a NOT NULL constraint, you may want to use a temporary
> table to load the data and then copy it over (with a single insert statement)
> after it has been cleaned up.
>
> Note that it isn't obvious what empty strings should map to for numbers.
> NULL and 0 make about as much sense as using the default value.

Well I'm new here but it seems to me they should map to the default value
for that column.   Why wouldn't they?




>
>

Re: importing db as text files

From
Stephan Szabo
Date:
On Thu, 14 Aug 2003, expect wrote:

> On Thu, 14 Aug 2003 12:46:07 -0500
> Bruno Wolff III <bruno@wolff.to> wrote:
> > Note that it isn't obvious what empty strings should map to for numbers.
> > NULL and 0 make about as much sense as using the default value.
>
> Well I'm new here but it seems to me they should map to the default value
> for that column.   Why wouldn't they?

One problem with doing that is that it's inconsistent.

Given
create table test(
 a text default 'abc',
 b int default 5
);

copy test from stdin with delimiter ',';
,
\.

What would you expect the values of the row in test
to be?


Re: importing db as text files

From
expect
Date:
On Fri, 15 Aug 2003 00:32:01 -0700 (PDT)
Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:

>
> On Thu, 14 Aug 2003, expect wrote:
>
> > On Thu, 14 Aug 2003 12:46:07 -0500
> > Bruno Wolff III <bruno@wolff.to> wrote:
> > > Note that it isn't obvious what empty strings should map to for numbers.
> > > NULL and 0 make about as much sense as using the default value.
> >
> > Well I'm new here but it seems to me they should map to the default value
> > for that column.   Why wouldn't they?
>
> One problem with doing that is that it's inconsistent.

Please elaborate.  How is it inconsistent, exactly?


>
> Given
> create table test(
>  a text default 'abc',
>  b int default 5
> );
>
> copy test from stdin with delimiter ',';
> ,
> \.
>
> What would you expect the values of the row in test
> to be?


Oh a test....

Does the \. end the STDIN input?

Where's the null option?  Don't you mean:

copy test from stdin with delimiter ',' null '';


In this case I would expect the row to have:

  a  |  b
----------
     |
 abc |  5


Is this too much to expect?


>
>
>

Re: importing db as text files

From
Jason Godden
Date:
Whilst I agree with what Bruno said regarding adding a default option to the
copy syntax I think that the basic principles Stephan and I outlined in how
copy treats your import is correct and the developers did the right thing.

Now if the devs, you or I want to add a default option to copy in the future
thats all good but for the moment you will have to use some pre or post
process to get the absolute effect you require.  It's not that hard..
literally one line of piped commands on the command line where you specify
the defaults or a post process that does the final import into your
production tables from your data import table.

Remember NULL != Default.  These are VERY different concepts and I believe
that enough examples of how this behaviour can lead to confusion have been
shown.  With copy it is also important to remember that is not part of the
SQL standard and PG doesn't use SQL statements to do the import but rather a
low-level C operation.  If you need the behaviour you're referring to sed/awk
the data up and generate BEGIN... INSERT... COMMIT... statements and pipe
that to PG.  It's not that difficult and I'll give you some suggestions with
it if you want.

I'm not trying to be difficult with my view of this but there are always other
ways (sometimes one liners) that can achieve the behaviour you're after.

Rgds,

Jason


On Fri, 15 Aug 2003 05:53 pm, you wrote:
> On Fri, 15 Aug 2003 00:32:01 -0700 (PDT)
>
> Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
> > On Thu, 14 Aug 2003, expect wrote:
> > > On Thu, 14 Aug 2003 12:46:07 -0500
> > >
> > > Bruno Wolff III <bruno@wolff.to> wrote:
> > > > Note that it isn't obvious what empty strings should map to for
> > > > numbers. NULL and 0 make about as much sense as using the default
> > > > value.
> > >
> > > Well I'm new here but it seems to me they should map to the default
> > > value for that column.   Why wouldn't they?
> >
> > One problem with doing that is that it's inconsistent.
>
> Please elaborate.  How is it inconsistent, exactly?
>
> > Given
> > create table test(
> >  a text default 'abc',
> >  b int default 5
> > );
> >
> > copy test from stdin with delimiter ',';
> > ,
> > \.
> >
> > What would you expect the values of the row in test
> > to be?
>
> Oh a test....
>
> Does the \. end the STDIN input?
>
> Where's the null option?  Don't you mean:
>
> copy test from stdin with delimiter ',' null '';
>
>
> In this case I would expect the row to have:
>
>   a  |  b
> ----------
>
>  abc |  5
>
>
> Is this too much to expect?
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


Re: importing db as text files

From
Stephan Szabo
Date:
On Fri, 15 Aug 2003, expect wrote:

> On Fri, 15 Aug 2003 00:32:01 -0700 (PDT)
> Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
>
> >
> > On Thu, 14 Aug 2003, expect wrote:
> >
> > > On Thu, 14 Aug 2003 12:46:07 -0500
> > > Bruno Wolff III <bruno@wolff.to> wrote:
> > > > Note that it isn't obvious what empty strings should map to for numbers.
> > > > NULL and 0 make about as much sense as using the default value.
> > >
> > > Well I'm new here but it seems to me they should map to the default value
> > > for that column.   Why wouldn't they?
> >
> > One problem with doing that is that it's inconsistent.
>
> Please elaborate.  How is it inconsistent, exactly?

See my comments below.

> >
> > Given
> > create table test(
> >  a text default 'abc',
> >  b int default 5
> > );
> >
> > copy test from stdin with delimiter ',';
> > ,
> > \.
> >
> > What would you expect the values of the row in test
> > to be?
>
>
> Oh a test....
>
> Does the \. end the STDIN input?
>
> Where's the null option?  Don't you mean:
>
> copy test from stdin with delimiter ',' null '';

No, I'm saying without a null specifier.

> In this case I would expect the row to have:
>
>   a  |  b
> ----------
>      |
>  abc |  5
>
>
> Is this too much to expect?

Without a null specifier of '', empty string is a valid value for a.
Why would it get the default, and how would you insert an empty string?
Should it treat b differently because '' isn't valid for that type, that'd
be inconsistent.

With a null specifier of '', empty string is valid for both and means
NULL.  Otherwise, how would you specify a null when you have that
null specifier?

What you probably really want is another specifier that inserts the
default for a column, so you could say something like:
copy test from stdin with delimiter ',' default '';



Re: importing db as text files

From
Dennis Gearon
Date:
AFAICT,
    NULL == NULL
    '' == empty string
    default == column default

Seems pretty clear, and about as intuitive as it gets. Trying to use an empty string for anything else just means it's
notavailble to represent itself. any other DB using anything else is complicating things 

Now, for exports/dumps that don't use the ' char to delineate the start and begin of a column, i.e.

    454wetoit,Four score and seven years ago,default,,

what would the last three values be? Empty strings would be my guess.

    '454wetoit','Four score and seven years ago','default','',''

seems like quoted column values would be necessary to distinguish the constants NULL and default from strings that
containthose two text sequences. 


Stephan Szabo wrote:
> On Fri, 15 Aug 2003, expect wrote:
>
>
>>On Fri, 15 Aug 2003 00:32:01 -0700 (PDT)
>>Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
>>
>>
>>>On Thu, 14 Aug 2003, expect wrote:
>>>
>>>
>>>>On Thu, 14 Aug 2003 12:46:07 -0500
>>>>Bruno Wolff III <bruno@wolff.to> wrote:
>>>>
>>>>>Note that it isn't obvious what empty strings should map to for numbers.
>>>>>NULL and 0 make about as much sense as using the default value.
>>>>
>>>>Well I'm new here but it seems to me they should map to the default value
>>>>for that column.   Why wouldn't they?
>>>
>>>One problem with doing that is that it's inconsistent.
>>
>>Please elaborate.  How is it inconsistent, exactly?
>
>
> See my comments below.
>
>
>>>Given
>>>create table test(
>>> a text default 'abc',
>>> b int default 5
>>>);
>>>
>>>copy test from stdin with delimiter ',';
>>>,
>>>\.
>>>
>>>What would you expect the values of the row in test
>>>to be?
>>
>>
>>Oh a test....
>>
>>Does the \. end the STDIN input?
>>
>>Where's the null option?  Don't you mean:
>>
>>copy test from stdin with delimiter ',' null '';
>
>
> No, I'm saying without a null specifier.
>
>
>>In this case I would expect the row to have:
>>
>>  a  |  b
>>----------
>>     |
>> abc |  5
>>
>>
>>Is this too much to expect?
>
>
> Without a null specifier of '', empty string is a valid value for a.
> Why would it get the default, and how would you insert an empty string?
> Should it treat b differently because '' isn't valid for that type, that'd
> be inconsistent.
>
> With a null specifier of '', empty string is valid for both and means
> NULL.  Otherwise, how would you specify a null when you have that
> null specifier?
>
> What you probably really want is another specifier that inserts the
> default for a column, so you could say something like:
> copy test from stdin with delimiter ',' default '';
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>


Re: importing db as text files

From
expect
Date:
On Fri, 15 Aug 2003 11:32:20 -0400
Murthy Kambhampaty <murthy.kambhampaty@goeci.com> wrote:

> On Friday, August 15, 2003 03:53, expect [mailto:expect@ihubbell.com] wrote:
> >On Fri, 15 Aug 2003 00:32:01 -0700 (PDT)
> >Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
> >
> ...
> >> Given
> >> create table test(
> >>  a text default 'abc',
> >>  b int default 5
> >> );
> >>
> >> copy test from stdin with delimiter ',';
> >> ,
> >> \.
> >>
> >> What would you expect the values of the row in test
> >> to be?
> >
> >
> ...
> >In this case I would expect the row to have:
> >
> >  a  |  b
> >----------
> >     |
> > abc |  5
> >
> >
> >Is this too much to expect?
>
> Your expectations don't seem consistent with the table definition, which
> permits Null values are allowed in (it's ironic you would add "null ''" to
> the copy statement, which says you want to treat empty strings as default
> values, not null!).
>


Why is it ironic? I don't think you've followed the thread correctly.
It's what I wanted to have happen.








Re: importing db as text files

From
Murthy Kambhampaty
Date:
>-----Original Message-----
>From: expect [mailto:expect@ihubbell.com]
>Sent: Friday, August 15, 2003 11:56
>To: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] importing db as text files
>
>
>On Fri, 15 Aug 2003 11:32:20 -0400
>Murthy Kambhampaty <murthy.kambhampaty@goeci.com> wrote:
>
>> On Friday, August 15, 2003 03:53, expect
>[mailto:expect@ihubbell.com] wrote:
>> >On Fri, 15 Aug 2003 00:32:01 -0700 (PDT)
>> >Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
>> >
>> ...
>> >> Given
>> >> create table test(
>> >>  a text default 'abc',
>> >>  b int default 5
>> >> );
>> >>
>> >> copy test from stdin with delimiter ',';
>> >> ,
>> >> \.
>> >>
>> >> What would you expect the values of the row in test
>> >> to be?
>> >
>> >
>> ...
>> >In this case I would expect the row to have:
>> >
>> >  a  |  b
>> >----------
>> >     |
>> > abc |  5
>> >
>> >
>> >Is this too much to expect?
>>
>> Your expectations don't seem consistent with the table
>definition, which
>> permits Null values are allowed in (it's ironic you would
>add "null ''" to
>> the copy statement, which says you want to treat empty
>strings as default
>> values, not null!).
>>
>
>
>Why is it ironic? I don't think you've followed the thread correctly.
>It's what I wanted to have happen.
>
Pardon my dyslexic typing; that was to read:
"it's ironic you would add "null ''" to the copy statement, which says you
want to treat empty strings as nulls, not default values!"


Re: importing db as text files

From
Murthy Kambhampaty
Date:
On Friday, August 15, 2003 03:53, expect [mailto:expect@ihubbell.com] wrote:
>On Fri, 15 Aug 2003 00:32:01 -0700 (PDT)
>Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
>
...
>> Given
>> create table test(
>>  a text default 'abc',
>>  b int default 5
>> );
>>
>> copy test from stdin with delimiter ',';
>> ,
>> \.
>>
>> What would you expect the values of the row in test
>> to be?
>
>
...
>In this case I would expect the row to have:
>
>  a  |  b
>----------
>     |
> abc |  5
>
>
>Is this too much to expect?

Your expectations don't seem consistent with the table definition, which
permits Null values are allowed in (it's ironic you would add "null ''" to
the copy statement, which says you want to treat empty strings as default
values, not null!).

It can be frustrating to deal with postgresql, but I've come to appreciate
the requirement that you clean your data before you COPY it.

Cheers,
    Murthy

PS: Sorry "expect", I meant to send to the list, not directly to you. (I
guess I'd better just hit "reply to all", and not edit the "To:" line.)

Re: importing db as text files

From
expect
Date:
On Fri, 15 Aug 2003 08:03:04 -0700 (PDT)
Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:

> On Fri, 15 Aug 2003, expect wrote:
>
> > On Fri, 15 Aug 2003 00:32:01 -0700 (PDT)
> > Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
> >
> > >
> > > On Thu, 14 Aug 2003, expect wrote:
> > >
> > > > On Thu, 14 Aug 2003 12:46:07 -0500
> > > > Bruno Wolff III <bruno@wolff.to> wrote:
> > > > > Note that it isn't obvious what empty strings should map to for numbers.
> > > > > NULL and 0 make about as much sense as using the default value.
> > > >
> > > > Well I'm new here but it seems to me they should map to the default value
> > > > for that column.   Why wouldn't they?
> > >
> > > One problem with doing that is that it's inconsistent.
> >
> > Please elaborate.  How is it inconsistent, exactly?
>
> See my comments below.
>
> > >
> > > Given
> > > create table test(
> > >  a text default 'abc',
> > >  b int default 5
> > > );
> > >
> > > copy test from stdin with delimiter ',';
> > > ,
> > > \.
> > >
> > > What would you expect the values of the row in test
> > > to be?
> >
> >
> > Oh a test....
> >
> > Does the \. end the STDIN input?
> >
> > Where's the null option?  Don't you mean:
> >
> > copy test from stdin with delimiter ',' null '';
>
> No, I'm saying without a null specifier.
>
> > In this case I would expect the row to have:
> >
> >   a  |  b
> > ----------
> >      |
> >  abc |  5
> >
> >
> > Is this too much to expect?
>
> Without a null specifier of '', empty string is a valid value for a.
> Why would it get the default, and how would you insert an empty string?
> Should it treat b differently because '' isn't valid for that type, that'd
> be inconsistent.
>
> With a null specifier of '', empty string is valid for both and means
> NULL.  Otherwise, how would you specify a null when you have that
> null specifier?

Don't want to specify a null.

>
> What you probably really want is another specifier that inserts the
> default for a column, so you could say something like:
> copy test from stdin with delimiter ',' default '';

No, I don't want anyone else to do anymore work.  I'd much rather see
improvements than new features, unless the new feature is a big
improvement.


>
>
>
>

Re: query tuning

From
Franco Bruno Borghesi
Date:
I usually reduce cpu_index_tuple_cost in postgresql.conf, from 0.001 to 0.000001, this way the optimizer is "more willing" to use indexes.

Of course you should try different values until you find the ones that adjust to your system configuration.

On Thu, 2003-08-14 at 09:00, Dave Cramer wrote:
I have a query which definitely runs faster when sequential scans are
turned off. Which parameters do I tune? and which way?

After quickly perusing the docs, and google, I think it is the
random_page_cost?

Any help would be appreciated.

Dave
Attachment

Re: importing db as text files

From
Stephan Szabo
Date:
On Fri, 15 Aug 2003, expect wrote:

> On Fri, 15 Aug 2003 08:03:04 -0700 (PDT)
> Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
>
> > On Fri, 15 Aug 2003, expect wrote:
> > > Oh a test....
> > >
> > > Does the \. end the STDIN input?
> > >
> > > Where's the null option?  Don't you mean:
> > >
> > > copy test from stdin with delimiter ',' null '';
> >
> > No, I'm saying without a null specifier.
> >
> > > In this case I would expect the row to have:
> > >
> > >   a  |  b
> > > ----------
> > >      |
> > >  abc |  5
> > >
> > >
> > > Is this too much to expect?
> >
> > Without a null specifier of '', empty string is a valid value for a.
> > Why would it get the default, and how would you insert an empty string?
> > Should it treat b differently because '' isn't valid for that type, that'd
> > be inconsistent.
> >
> > With a null specifier of '', empty string is valid for both and means
> > NULL.  Otherwise, how would you specify a null when you have that
> > null specifier?
>
> Don't want to specify a null.

That doesn't answer the question. If you say that the NULL specifier is
some string and then that string were to give you something other than a
NULL, that wouldn't be reasonable behavior for people who do want a NULL.


Re: query tuning

From
Fernando Nasser
Date:
SET enable_seqscan TO FALSE;
<your query>
SET enable_seqscan TO TRUE;

Is this what you where looking for?

Regards,
Fernando


On Thu, 2003-08-14 at 08:00, Dave Cramer wrote:
> I have a query which definitely runs faster when sequential scans are
> turned off. Which parameters do I tune? and which way?
>
> After quickly perusing the docs, and google, I think it is the
> random_page_cost?
>
> Any help would be appreciated.
>
> Dave


Re: importing db as text files

From
"Gregory S. Williamson"
Date:
I inderstand why NULL is not the same as a default value. I do not understand why an integer column:
  some_val INT,

will not accept a null value (not that there is no default).

NULL is NULL ... why do I have to massage load data to substitute a '0' when what I really want is NULL (no data).

Just one of those things I'll never get, I guess.

Thanks for the information ...

Greg W.

-----Original Message-----
From:    Jason Godden [mailto:jasongodden@optushome.com.au]
Sent:    Fri 8/15/2003 3:11 AM
To:    pgsql-general@postgresql.org
Cc:
Subject:    Re: [GENERAL] importing db as text files

Whilst I agree with what Bruno said regarding adding a default option to the
copy syntax I think that the basic principles Stephan and I outlined in how
copy treats your import is correct and the developers did the right thing.

Now if the devs, you or I want to add a default option to copy in the future
thats all good but for the moment you will have to use some pre or post
process to get the absolute effect you require.  It's not that hard..
literally one line of piped commands on the command line where you specify
the defaults or a post process that does the final import into your
production tables from your data import table.

Remember NULL != Default.  These are VERY different concepts and I believe
that enough examples of how this behaviour can lead to confusion have been
shown.  With copy it is also important to remember that is not part of the
SQL standard and PG doesn't use SQL statements to do the import but rather a
low-level C operation.  If you need the behaviour you're referring to sed/awk
the data up and generate BEGIN... INSERT... COMMIT... statements and pipe
that to PG.  It's not that difficult and I'll give you some suggestions with
it if you want.

I'm not trying to be difficult with my view of this but there are always other
ways (sometimes one liners) that can achieve the behaviour you're after.

Rgds,

Jason


On Fri, 15 Aug 2003 05:53 pm, you wrote:
> On Fri, 15 Aug 2003 00:32:01 -0700 (PDT)
>
> Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
> > On Thu, 14 Aug 2003, expect wrote:
> > > On Thu, 14 Aug 2003 12:46:07 -0500
> > >
> > > Bruno Wolff III <bruno@wolff.to> wrote:
> > > > Note that it isn't obvious what empty strings should map to for
> > > > numbers. NULL and 0 make about as much sense as using the default
> > > > value.
> > >
> > > Well I'm new here but it seems to me they should map to the default
> > > value for that column.   Why wouldn't they?
> >
> > One problem with doing that is that it's inconsistent.
>
> Please elaborate.  How is it inconsistent, exactly?
>
> > Given
> > create table test(
> >  a text default 'abc',
> >  b int default 5
> > );
> >
> > copy test from stdin with delimiter ',';
> > ,
> > \.
> >
> > What would you expect the values of the row in test
> > to be?
>
> Oh a test....
>
> Does the \. end the STDIN input?
>
> Where's the null option?  Don't you mean:
>
> copy test from stdin with delimiter ',' null '';
>
>
> In this case I would expect the row to have:
>
>   a  |  b
> ----------
>
>  abc |  5
>
>
> Is this too much to expect?
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)




Re: query tuning

From
"scott.marlowe"
Date:
On 14 Aug 2003, Dave Cramer wrote:

> I have a query which definitely runs faster when sequential scans are
> turned off. Which parameters do I tune? and which way?
>
> After quickly perusing the docs, and google, I think it is the
> random_page_cost?

Set your effective_cache_size correctly first, then lower random page cost
until you hit the sweet spot.

As someone else pointed out, you can also lower the cpu_***_cost
parameters, and often this is preferable.  On boxes with fast memory /
CPUs, it's often good to get the planner in the state of mind of using the
CPU a little more, so it will pick a faster join method than it otherwise
would.