Thread: ERROR: relation . . . does not exist

ERROR: relation . . . does not exist

From
"Albretch Mueller"
Date:
 Hi,
~
 I created a number of csv files which I need to import into PG
tables. On the them looks like this:
~
sh-3.1# head -n 3
/media/hda3/prjx/JPack/REF/LANDMARKS/PROFILES/20080828104627_j2sdk-1_4_2_18-linux-i586.binfls.md5sum.txt
"md5sum","fl"
"d41d8cd98f00b204e9800998ecf8427e",".systemPrefs/.system.lock"
"d41d8cd98f00b204e9800998ecf8427e",".systemPrefs/.systemRootModFile"
~
 and has a totalof 565 lines
~
sh-3.1# wc -l /media/hda3/prjx/JPack/REF/LANDMARKS/PROFILES/20080828104627_j2sdk-1_4_2_18-linux-i586.binfls.md5sum.txt
565 /media/hda3/prjx/JPack/REF/LANDMARKS/PROFILES/20080828104627_j2sdk-1_4_2_18-linux-i586.binfls.md5sum.txt
~
 However, after installing pg from source (using ./configure
--without-readline --prefix=/media/hda4/pgsql/PGDBMS) and creating the
postgres user just fine (as you can see from a knoppix box):
~
root@Knoppix:/media/hda4/pgsql/inst/postgresql-8.3.3# adduser postgres
root@Knoppix:/media/hda4/pgsql/inst/postgresql-8.3.3# chown postgres
/media/hda4/pgsql/DATA
root@Knoppix:/media/hda4/pgsql/inst/postgresql-8.3.3# su - postgres
No directory, logging in with HOME=/
postgres@Knoppix:/$ cd /media/hda4/pgsql/PGDBMS/bin
postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ initdb -D /media/hda4/pgsql/DATA
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale C.
The default database encoding has accordingly been set to SQL_ASCII.
The default text search configuration will be set to "english".

fixing permissions on existing directory /media/hda4/pgsql/DATA ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers/max_fsm_pages ... 24MB/153600
creating configuration files ... ok
creating template1 database in /media/hda4/pgsql/DATA/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the -A option the
next time you run initdb.

Success. You can now start the database server using:

    postgres -D /media/hda4/pgsql/DATA
or
    pg_ctl -D /media/hda4/pgsql/DATA -l logfile start

// __ then starting PG
postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ postgres -D
/media/hda4/pgsql/DATA >logfile 2>&1 &
postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ chown postgres
/media/hda4/pgsql/LOGS
postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ postgres -D
/media/hda4/pgsql/DATA > /media/hda4/pgsql/LOGS/logfile 2>&1
[1] 16712
postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ createdb jpk;
postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ psql jpk;
Welcome to psql 8.3.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

jpk=# COPY j2sdk_1_4_2_18_binfls_md5sum FROM
'/media/hda3/prjx/JPack/REF/LANDMARKS/PROFILES/20080828104627_j2sdk-1_4_2_18-linux-i586.binfls.md5sum.txt'
CSV HEADER;
ERROR:  relation "j2sdk_1_4_2_18_binfls_md5sum" does not exist
~
 What is going on here?
~
 How can I fix that error, effectively creating and populating the
tables from csv files?
~
 Thanks
 lbrtchx

Re: ERROR: relation . . . does not exist

From
Raymond O'Donnell
Date:
On 28/08/2008 22:29, Albretch Mueller wrote:

> jpk=# COPY j2sdk_1_4_2_18_binfls_md5sum FROM
> '/media/hda3/prjx/JPack/REF/LANDMARKS/PROFILES/20080828104627_j2sdk-1_4_2_18-linux-i586.binfls.md5sum.txt'
> CSV HEADER;
> ERROR:  relation "j2sdk_1_4_2_18_binfls_md5sum" does not exist
> ~
>  What is going on here?

Just what it says - the relation doesn't exist 'cos you haven't created
it yet. You have to create the table first before you can COPY into it.

>  How can I fix that error, effectively creating and populating the
> tables from csv files?

Just include a CREATE TABLE statement in the same script before the COPY.

Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

Re: ERROR: relation . . . does not exist

From
"Albretch Mueller"
Date:
On Thu, Aug 28, 2008 at 5:40 PM, Raymond O'Donnell <rod@iol.ie> wrote:
> You have to create the table first before you can COPY into it.
~
 Well, based on how the statement reads I made the wrong assumption of
thinking that  PG would be smart enough to make ends-meat with the
data that it gets once you indicate:
~
 1) the header names (to be mapped to column names)
~
 2) the data from which PG could easily figure out the types for each
column, since it is CSV data
~
 That may/should become a RFE
~
 Thanks
 lbrtchx
~

Re: ERROR: relation . . . does not exist

From
Adrian Klaver
Date:
On Thursday 28 August 2008 3:14:01 pm Albretch Mueller wrote:
> On Thu, Aug 28, 2008 at 5:40 PM, Raymond O'Donnell <rod@iol.ie> wrote:
> > You have to create the table first before you can COPY into it.
>
> ~
>  Well, based on how the statement reads I made the wrong assumption of
> thinking that  PG would be smart enough to make ends-meat with the
> data that it gets once you indicate:
> ~
>  1) the header names (to be mapped to column names)
> ~
>  2) the data from which PG could easily figure out the types for each
> column, since it is CSV data

Define easily. A CSV column has fields with 1's and 0's  should that be an
integer field or a boolean field?  A CSV column has a series of text entries
none of which are greater then 25 characters long. Does that mean a datatype
of varchar(25) or is it just an artifact of the data to date? You have a CSV
column with text strings of format 'YYYY/MM/DD' . Do you want them stored as
strings or dates?  I could go on, but the point is that table data types
require some thought on the part of the DBA.

> ~
>  That may/should become a RFE
> ~
>  Thanks
>  lbrtchx
> ~



--
Adrian Klaver
aklaver@comcast.net

Re: ERROR: relation . . . does not exist

From
"Albretch Mueller"
Date:
On Thu, Aug 28, 2008 at 7:50 PM, Adrian Klaver <aklaver@comcast.net> wrote:
> Define easily.
~
 OK, let me try to outline the approach I would go for:
~
 I think "COPY FROM CSV" should have three options, namely:
~
 1) the way we have used it in which you create the table first
~
 2) another way in which defaults are declared, generally as:
~
 2.1) aggressive: data type, value and formatting analysis is done; if
only 1 or 0 are found declare then a BOOLEAN, if repeated data is
found (say state codes) and the stratification nodes cover the rest of
the data, stratify the data out to other extra table (they have a name
I can't recall now), index it ..., if data is kind of numeric with
front slashes and/or hyphen could they possibly be dates? if they are
definitelly dates convert them to bigint (and do the formatting in the
presentation code (also this a win-win situation with i18n code)) ...
~
 2.2) conservative: data type and value, but no formatting analysis is
done and the greater encompassing data type is selected, say for 1 or
0 data use bytes [0, 255], for bytes use int, if something could be
encoded as char(2), use varchar instead, . . .
~
 2.3) dumn: just use the coarsest data type possible; bigint for
anything that looks like a number and varchar for the rest
~
 the "dumn" option should suggest to the DBA the option they are
using, quantified consequences for their desicions (larger DBs for no
reason, approx. reduction in speed, . .) and how not to be "dumn"
~
 3) or you could define "import templates" declaring which specific
data types to use for data in a certain way, which could be declared
per column using regexps
~
> I could go on, but the point is that table data types require some thought on the part of the DBA.
~
 Well, it still requires their minds and input, but they will have
jobs even if they get some help, don't you think so ;-)
~
 lbrtchx

Re: ERROR: relation . . . does not exist

From
Martijn van Oosterhout
Date:
On Sat, Aug 30, 2008 at 08:23:25AM -0400, Albretch Mueller wrote:
>  OK, let me try to outline the approach I would go for:
> ~
>  I think "COPY FROM CSV" should have three options, namely:

I think you're confusing postgresql with a spreadsheet program. A
database is designed to take care of your data and ensure its
integrity. As such it requires a little more thought.

There are client programs which will do this for you, perhaps you wan
one of those?

What's so hard about:

create table foo (a text, b text);

After which your COPY will complete fine.

>  2.1) aggressive: data type, value and formatting analysis is done; if
> only 1 or 0 are found declare then a BOOLEAN, if repeated data is
> found (say state codes) and the stratification nodes cover the rest of
> the data, stratify the data out to other extra table (they have a name
> I can't recall now), index it ..., if data is kind of numeric with
> front slashes and/or hyphen could they possibly be dates? if they are
> definitelly dates convert them to bigint (and do the formatting in the
> presentation code (also this a win-win situation with i18n code)) ...

PostgreSQL has 60+ types and many look like eachother. How do you
propose to differentiate?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: ERROR: relation . . . does not exist

From
"Albretch Mueller"
Date:
> I think you're confusing postgresql with a spreadsheet program.
~
 I wonder what makes you think so
~
> There are client programs which will do this for you, perhaps you wan one of those?
~
 Well, then obviously there is the need for it and you were not
successful enough at convincing these developers that they were
"confusing postgresql with a spreadsheet program"
~
> PostgreSQL has 60+ types and many look like eachother. How do you propose to differentiate?
~
 Data Types are basically about value ranges (how many bits do you
need to hold the value) and formatting. IMHO, finding an optimal
[im|ex]port reasoning among 60+ types should not be that much of a big
deal. In fact as a data analyst I have exported and imported CSV data
a whole lot and in many occasions it required some extra custom
coding. I may as well consolidate my code as a whole jcsvport library
in java and start an OS project when I find the time to so
~
 lbrtchx

Re: ERROR: relation . . . does not exist

From
Christophe
Date:
On Aug 30, 2008, at 6:26 AM, Albretch Mueller wrote:
>  Well, then obviously there is the need for it and you were not
> successful enough at convincing these developers that they were
> "confusing postgresql with a spreadsheet program"

The behavior you are looking for is typical of a spreadsheet, because
spreadsheet programs (generally; I'm sure there are exceptions) don't
have the notion of a schema; each cell can hold its own particular
type.  That being said, the automatic type-guessing that Excel, say,
provides is far from foolproof; I've probably spent more time
cleaning up Excel's bad guesses than would have been saved by my just
specifying a type for each column.

As has been noted, text representation of values are extremely
ambiguous as of which Postgres type they mean... and, of course, you
could have user-defined domains and types as well.  It's true that it
could take a wild guess, but that's not a traditional part of a
database engine.

That being said, it would not be too hard to write a client that
accepted a CSV or tab-delimited file, parsed the header into column
names, and then scanned the values of the columns to take a
reasonable guess as to the column type from a highly limited set of
possibilities.  This is probably one of those classic "twenty lines
of Perl" problems.

It doesn't seem as though COPY INTO is the right place for that,
since the particular guesses and set of types that one would make
strike me as very closely tied to your particular application domain.

Re: ERROR: relation . . . does not exist

From
Adrian Klaver
Date:
On Saturday 30 August 2008 5:23:25 am Albretch Mueller wrote:
> On Thu, Aug 28, 2008 at 7:50 PM, Adrian Klaver <aklaver@comcast.net> wrote:
> > Define easily.
>
> ~
>  OK, let me try to outline the approach I would go for:
> ~
>  I think "COPY FROM CSV" should have three options, namely:
> ~
>  1) the way we have used it in which you create the table first
> ~
>  2) another way in which defaults are declared, generally as:
> ~
>  2.1) aggressive: data type, value and formatting analysis is done; if
> only 1 or 0 are found declare then a BOOLEAN, if repeated data is
> found (say state codes) and the stratification nodes cover the rest of
> the data, stratify the data out to other extra table (they have a name
> I can't recall now), index it ..., if data is kind of numeric with
> front slashes and/or hyphen could they possibly be dates? if they are
> definitelly dates convert them to bigint (and do the formatting in the
> presentation code (also this a win-win situation with i18n code)) ...
> ~
>  2.2) conservative: data type and value, but no formatting analysis is
> done and the greater encompassing data type is selected, say for 1 or
> 0 data use bytes [0, 255], for bytes use int, if something could be
> encoded as char(2), use varchar instead, . . .
> ~
>  2.3) dumn: just use the coarsest data type possible; bigint for
> anything that looks like a number and varchar for the rest
> ~
>  the "dumn" option should suggest to the DBA the option they are
> using, quantified consequences for their desicions (larger DBs for no
> reason, approx. reduction in speed, . .) and how not to be "dumn"
> ~
>  3) or you could define "import templates" declaring which specific
> data types to use for data in a certain way, which could be declared
> per column using regexps
> ~
>
> > I could go on, but the point is that table data types require some
> > thought on the part of the DBA.
>
> ~
>  Well, it still requires their minds and input, but they will have
> jobs even if they get some help, don't you think so ;-)
> ~
>  lbrtchx

This is a combination of more work then necessary and putting the cart after
the horse. All I can see happening is delaying the point of decision to a
later time and or dumping the decision process on someone else. There is
already a "dumb" solution that has been brought many times on this list. It
involve creating a holding table that has text only fields and copying the
data into and then moving the data from there to a final table. As far as
import templates I suggest looking at:
http://pgloader.projects.postgresql.org/
It also addresses some of your other suggestions. It does not automatically
create a table though.



--
Adrian Klaver
aklaver@comcast.net

Re: ERROR: relation . . . does not exist

From
Steve Atkins
Date:
On Aug 30, 2008, at 9:19 AM, Christophe wrote:

>
> On Aug 30, 2008, at 6:26 AM, Albretch Mueller wrote:
>> Well, then obviously there is the need for it and you were not
>> successful enough at convincing these developers that they were
>> "confusing postgresql with a spreadsheet program"
>
> The behavior you are looking for is typical of a spreadsheet,
> because spreadsheet programs (generally; I'm sure there are
> exceptions) don't have the notion of a schema; each cell can hold
> its own particular type.  That being said, the automatic type-
> guessing that Excel, say, provides is far from foolproof; I've
> probably spent more time cleaning up Excel's bad guesses than would
> have been saved by my just specifying a type for each column.
>
> As has been noted, text representation of values are extremely
> ambiguous as of which Postgres type they mean... and, of course, you
> could have user-defined domains and types as well.  It's true that
> it could take a wild guess, but that's not a traditional part of a
> database engine.
>
> That being said, it would not be too hard to write a client that
> accepted a CSV or tab-delimited file, parsed the header into column
> names, and then scanned the values of the columns to take a
> reasonable guess as to the column type from a highly limited set of
> possibilities.  This is probably one of those classic "twenty lines
> of Perl" problems.

About 150 line of perl[1]. It can actually work quite well, but is
entirely a client-side problem. None of that sort of heuristics should
go anywhere near COPY in.

> It doesn't seem as though COPY INTO is the right place for that,
> since the particular guesses and set of types that one would make
> strike me as very closely tied to your particular application domain.

Cheers,
   Steve

[1] A validator (regex) for each data type, then for each column track
which data types it may be, as you scan through the file. Use the
relative priorities of different data types to assign something
appropriate for each column, then do a second pass translating the
format into something Postgresql is comfortable with and feed it into
pg_putcopydata.



Re: ERROR: relation . . . does not exist

From
"Albretch Mueller"
Date:
> spreadsheet programs (generally; I'm sure there are exceptions) don't have the notion of a schema; each cell can hold
itsown particular type. 
~
 Oh, now I see what Martin meant!
~
> that's not a traditional part of a database engine.
~
 well, yeah! I would totally agree with you, but since I doubt very
much "COPY FROM CSV" is part of the SQL standard to beging with, why
not spice it up a little more?
~
> This is probably one of those classic "twenty lines of Perl" problems.
~
 java since 1.5 comes with a full blown, PERL-like regexp engine
~
> I suggest looking at: http://pgloader.projects.postgresql.org/
> [1] A validator (regex) for each data type . . .
~
 that sort of things was what i was talking about, but I would go
quite a bit farther
~
 Thanks
 lbrtchx

Re: ERROR: relation . . . does not exist

From
Adrian Klaver
Date:
On Saturday 30 August 2008 9:42:19 am Adrian Klaver wrote:
> On Saturday 30 August 2008 5:23:25 am Albretch Mueller wrote:
> > On Thu, Aug 28, 2008 at 7:50 PM, Adrian Klaver <aklaver@comcast.net>
wrote:
> > > Define easily.
> >
> > ~
> >  OK, let me try to outline the approach I would go for:
> > ~
> >  I think "COPY FROM CSV" should have three options, namely:
> > ~
> >  1) the way we have used it in which you create the table first
> > ~
> >  2) another way in which defaults are declared, generally as:
> > ~
> >  2.1) aggressive: data type, value and formatting analysis is done; if
> > only 1 or 0 are found declare then a BOOLEAN, if repeated data is
> > found (say state codes) and the stratification nodes cover the rest of
> > the data, stratify the data out to other extra table (they have a name
> > I can't recall now), index it ..., if data is kind of numeric with
> > front slashes and/or hyphen could they possibly be dates? if they are
> > definitelly dates convert them to bigint (and do the formatting in the
> > presentation code (also this a win-win situation with i18n code)) ...
> > ~
> >  2.2) conservative: data type and value, but no formatting analysis is
> > done and the greater encompassing data type is selected, say for 1 or
> > 0 data use bytes [0, 255], for bytes use int, if something could be
> > encoded as char(2), use varchar instead, . . .
> > ~
> >  2.3) dumn: just use the coarsest data type possible; bigint for
> > anything that looks like a number and varchar for the rest
> > ~
> >  the "dumn" option should suggest to the DBA the option they are
> > using, quantified consequences for their desicions (larger DBs for no
> > reason, approx. reduction in speed, . .) and how not to be "dumn"
> > ~
> >  3) or you could define "import templates" declaring which specific
> > data types to use for data in a certain way, which could be declared
> > per column using regexps
> > ~
> >
> > > I could go on, but the point is that table data types require some
> > > thought on the part of the DBA.
> >
> > ~
> >  Well, it still requires their minds and input, but they will have
> > jobs even if they get some help, don't you think so ;-)
> > ~
> >  lbrtchx
>
> This is a combination of more work then necessary and putting the cart
> after the horse.  All I can see happening is delaying the point of decision

Lets try this again. The cart before the horse.
Memo to self:
1) Drink sufficient coffee.
2) Answer email.

> to a later time and or dumping the decision process on someone else. There
> is already a "dumb" solution that has been brought many times on this list.
> It involve creating a holding table that has text only fields and copying
> the data into and then moving the data from there to a final table. As far
> as import templates I suggest looking at:
> http://pgloader.projects.postgresql.org/
> It also addresses some of your other suggestions. It does not automatically
> create a table though.
>
>
>
> --
> Adrian Klaver
> aklaver@comcast.net



--
Adrian Klaver
aklaver@comcast.net

Re: ERROR: relation . . . does not exist

From
Christophe
Date:
On Aug 30, 2008, at 10:33 AM, Albretch Mueller wrote:
>  well, yeah! I would totally agree with you, but since I doubt very
> much "COPY FROM CSV" is part of the SQL standard to beging with, why
> not spice it up a little more?

I'd guess that coming up with a general algorithm to guess the type
from a column of CSV text would satisfy no one, since we'd always
miss a particular case that is important to someone (are times local
or UTC? Should we assume integer or float? Varchar or text?), and the
option is a forest of switches that would be extremely complex and
error prone.

This sounds very much like an application-domain problem, best solved
in the application domain.

Re: ERROR: relation . . . does not exist

From
"Albretch Mueller"
Date:
> ... are times local or UTC
~
 this is a rather semantic, not a syntactic issue that some code could
NOT decide based on the data it reads
~
> Should we assume integer or float?
~
 is a dot anywhere in the data you read in for that particular column? ...
~
> Varchar or text?
~
 Is the length of the data read in always less than 255 bytes ( or
characters?)?  ...
~
 You have made clear to me why my attempt for a RFE for COPY FROM CVS
has found some technical resistance/disagreement, but I still think my
idea even if not so popular for concrete and cultural reasons makes at
least sense to some people
~
 DBAs ussualy have a mental map of the data they have on each table,
etc; whereas as a data analyst you find yourself constantly reading
in, cleasing and marshaling data from which you have no prior
knowledge
~
 lbrtchx

Re: ERROR: relation . . . does not exist

From
Christophe
Date:
> You have made clear to me why my attempt for a RFE for COPY FROM CVS
> has found some technical resistance/disagreement, but I still think my
> idea even if not so popular for concrete and cultural reasons makes at
> least sense to some people

It's a perfectly reasonable problem to want to solve; the question is
whether COPY is the right place to solve it. I would think that a tool
that reads the CSV data and produces a proposed schema definition for
the table would be a more generally-useful approach.

Re: ERROR: relation . . . does not exist

From
Lew
Date:
Albretch Mueller wrote:
>> Varchar or text?
> ~
>  Is the length of the data read in always less than 255 bytes ( or
> characters?)?  ...

It may be more limited than that by application-domain-specific constraints -
e.g., a license plate might be statutorily limited to eight characters.

It might be coincidence that the input happens to fit within 255 characters
when future inputs might not.  One cannot make certain statements about
whether "data read in always [being] less than 255 ... characters" based on a
limited sample set, only probabilistic ones, absent reasoning about the
application domain.

Additionally, one could use TEXT for shorter columns if one wanted:
> There are no performance differences between these three types,
  [character varying(n), character(n) and text]
> apart from increased storage size when using the blank-padded
> type, and a few extra cycles to check the length when storing
> into a length-constrained column. While character(n) has
> performance advantages in some other database systems, it has
> no such advantages in PostgreSQL. In most situations text or
> character varying should be used instead.
<http://www.postgresql.org/docs/8.3/static/datatype-character.html>

DBMSes are about schemata and planning the data structures, not loosey-goosey
typing.  (This reminds me of the debate between the loosely-typed language
(e.g., PHP) camp versus the strongly-typed language (C#, Java) camp.)  Schemas
are based on analysis of and reasoning about the application domain, not lucky
guesses about a limited sample of inputs.

--
Lew

Re: ERROR: relation . . . does not exist

From
Peter Eisentraut
Date:
Albretch Mueller wrote:
>> PostgreSQL has 60+ types and many look like eachother. How do you propose to differentiate?
> ~
>  Data Types are basically about value ranges (how many bits do you
> need to hold the value) and formatting.

That is exactly wrong, at least in the PostgreSQL approach to the type
system.  Data types are about functions and operators associated with
them.  So it is not about what the data looks like, but what kind of
processing you want to do with it.  Guessing that information from a
dump of data seems pretty much impossible.