Thread: URGENT: temporary table not recognized?
Hi. I'm using Postgresql 9.0.5, and the connection is made via pgbouncer. I create a temporary table, and then want to import data into this table via a COPY command. Yet, this just created table is not being recognized. What's up? From my terminal: mydb=# mydb=# create temporary table vl ( alias varchar(35) ,dates timestamp without time zone ,referers text null ); CREATE TABLE Time: 1.871 ms mydb=# copy vl from '/backup/data.txt' WITH DELIMITER AS '|'; ERROR: relation "vl" does not exist mydb=# Weird! Welcome any thoughts. Thanks
Phoenix Kiula <phoenix.kiula@gmail.com> writes: > Hi. I'm using Postgresql 9.0.5, and the connection is made via pgbouncer. Perhaps pgbouncer is redirecting the second command to a different session? regards, tom lane
Tom Lane wrote: > Phoenix Kiula <phoenix.kiula@gmail.com> writes: >> Hi. I'm using Postgresql 9.0.5, and the connection is made via pgbouncer. > > Perhaps pgbouncer is redirecting the second command to a different > session? This may be OT, but are temp tables also removed when setting a new session authorization? Bosco.
On Fri, Jan 6, 2012 at 11:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Phoenix Kiula <phoenix.kiula@gmail.com> writes: >> Hi. I'm using Postgresql 9.0.5, and the connection is made via pgbouncer. > > Perhaps pgbouncer is redirecting the second command to a different > session? > Thanks Tom. I'm in the exact same session in my terminal, and the commands are entered within 2 seconds of each other. With copy/paste, maybe split microseconds of each other. How can I make sure pgbouncer takes it all in the same session? I also tried the two commands within a transaction.
On 01/06/2012 01:11 PM, Phoenix Kiula wrote: > On Fri, Jan 6, 2012 at 11:46 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Phoenix Kiula<phoenix.kiula@gmail.com> writes: >>> Hi. I'm using Postgresql 9.0.5, and the connection is made via pgbouncer. >> Perhaps pgbouncer is redirecting the second command to a different >> session? >> > > > Thanks Tom. I'm in the exact same session in my terminal, and the > commands are entered within 2 seconds of each other. With copy/paste, > maybe split microseconds of each other. > > How can I make sure pgbouncer takes it all in the same session? I also > tried the two commands within a transaction. > Sounds like you are using statement pooling - every statement can be assigned to a different server connection. You may need transaction pooling or session pooling: http://pgbouncer.projects.postgresql.org/doc/usage.html Cheers, Steve
On Fri, Jan 6, 2012 at 11:24 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > On 01/06/2012 01:11 PM, Phoenix Kiula wrote: >> >> On Fri, Jan 6, 2012 at 11:46 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> >>> Phoenix Kiula<phoenix.kiula@gmail.com> writes: >>>> >>>> Hi. I'm using Postgresql 9.0.5, and the connection is made via >>>> pgbouncer. >>> >>> Perhaps pgbouncer is redirecting the second command to a different >>> session? >>> >> >> >> Thanks Tom. I'm in the exact same session in my terminal, and the >> commands are entered within 2 seconds of each other. With copy/paste, >> maybe split microseconds of each other. >> >> How can I make sure pgbouncer takes it all in the same session? I also >> tried the two commands within a transaction. >> > > Sounds like you are using statement pooling - every statement can be > assigned to a different server connection. You may need transaction pooling > or session pooling: > > http://pgbouncer.projects.postgresql.org/doc/usage.html Statement pooling throws error on open transaction. -- marko
Marko Kreen <markokr@gmail.com> writes: > On Fri, Jan 6, 2012 at 11:24 PM, Steve Crawford > <scrawford@pinpointresearch.com> wrote: >> On 01/06/2012 01:11 PM, Phoenix Kiula wrote: >>> How can I make sure pgbouncer takes it all in the same session? I also >>> tried the two commands within a transaction. >> Sounds like you are using statement pooling - every statement can be >> assigned to a different server connection. You may need transaction pooling >> or session pooling: >> http://pgbouncer.projects.postgresql.org/doc/usage.html > Statement pooling throws error on open transaction. Yeah, if it still fails within a single transaction, it gets harder to blame pgbouncer. But there are not very many other candidates. I wondered about a funny setting of search_path, but ISTM that could at worst result in copying into the wrong table (ie some other table named "vl"), not failure to find any "vl" at all. It might be worth turning on statement logging and ensuring that log_line_prefix includes the process PID. Then the postmaster log would provide indisputable evidence whether the CREATE and the COPY are executed in the same session or not. regards, tom lane
On Fri, Jan 6, 2012 at 4:24 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > On 01/06/2012 01:11 PM, Phoenix Kiula wrote: >> >> On Fri, Jan 6, 2012 at 11:46 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> >>> Phoenix Kiula<phoenix.kiula@gmail.com> writes: >>>> >>>> Hi. I'm using Postgresql 9.0.5, and the connection is made via >>>> pgbouncer. >>> >>> Perhaps pgbouncer is redirecting the second command to a different >>> session? >>> >> >> >> Thanks Tom. I'm in the exact same session in my terminal, and the >> commands are entered within 2 seconds of each other. With copy/paste, >> maybe split microseconds of each other. >> >> How can I make sure pgbouncer takes it all in the same session? I also >> tried the two commands within a transaction. >> > > Sounds like you are using statement pooling - every statement can be > assigned to a different server connection. You may need transaction pooling > or session pooling: > > http://pgbouncer.projects.postgresql.org/doc/usage.html Thanks Steve. YES! I changed it to transaction pooling and now it works. Another problem through. I need to COPY a huge text file into a table, with about 350 million lines in the file (i.e., 350 million rows in the table). While copying, some lines do not have data. They are empty values. How can I specify in COPY command that if data is not found, it should be ignored? In my temp table definition, I set this column as "NULL" anyway, so it should be ok if this column was left empty! What can I do in my COPY command to circumvent this? Thanks.
On 01/06/2012 03:12 PM, Phoenix Kiula wrote: > On Fri, Jan 6, 2012 at 4:24 PM, Steve Crawford > <scrawford@pinpointresearch.com> wrote: >> On 01/06/2012 01:11 PM, Phoenix Kiula wrote: > > > Thanks Steve. YES! I changed it to transaction pooling and now it works. > > Another problem through. > > I need to COPY a huge text file into a table, with about 350 million > lines in the file (i.e., 350 million rows in the table). > > While copying, some lines do not have data. They are empty values. > > How can I specify in COPY command that if data is not found, it should > be ignored? In my temp table definition, I set this column as "NULL" > anyway, so it should be ok if this column was left empty! > > What can I do in my COPY command to circumvent this? http://www.postgresql.org/docs/9.0/interactive/sql-copy.html Search for NULL > > Thanks. > -- Adrian Klaver adrian.klaver@gmail.com
On Fri, Jan 6, 2012 at 6:20 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote: > http://www.postgresql.org/docs/9.0/interactive/sql-copy.html > > Search for > NULL Thanks Adrian. Without examples, it's hard to predict syntax. If the value after a pipe is missing altogether, I suppose the missing value is "\n" (newline). But this doesn't work: copy vl from 'data.txt' WITH DELIMITER AS '|' NULL '\n'; None of these work either: copy vl from 'data.txt' WITH DELIMITER AS '|' NULL \n; copy vl from 'data.txt' WITH DELIMITER AS '|' NULL \\n; copy vl from 'data.txt' WITH DELIMITER AS '|' NULL ''; The first two give errors, the third one throws the same missing value for column error. The data is stored like this: 123|big string here|189209209|US|2001-01-01 123|big string here|189209209|US|2001-01-01 123|big string here|189209209|US|2001-01-01 123|big string here|189209209|US|2001-01-01 But sometimes, the strings are: |big string here|189209209|US|2001-01-01 |big string here|189209209|US|2001-01-01 Or 123|big string here|189209209|US 123|big string here|189209209|US| So you see either the first column, which is the ID in a way, is missing so the "missing character" is probably a blank (''?). In this case I want COPY to just ignore this line. Or the last column is missing, where the missing character can be a newline I suppose? So how do I specify this in the COPY command so that it doesn't croak? If a line's ID is missing, it should ignore the line and go on instead of not doing anything by throwing an error for EVERYTHING! Thanks.
On 01/06/2012 03:42 PM, Phoenix Kiula wrote: > On Fri, Jan 6, 2012 at 6:20 PM, Adrian Klaver<adrian.klaver@gmail.com> wrote: > >> http://www.postgresql.org/docs/9.0/interactive/sql-copy.html >> >> Search for >> NULL > > > Thanks Adrian. > > Without examples, it's hard to predict syntax. If the value after a > pipe is missing altogether, I suppose the missing value is "\n" > (newline). But this doesn't work: > > copy vl from 'data.txt' WITH DELIMITER AS '|' NULL '\n'; > > None of these work either: > > copy vl from 'data.txt' WITH DELIMITER AS '|' NULL \n; > copy vl from 'data.txt' WITH DELIMITER AS '|' NULL \\n; > copy vl from 'data.txt' WITH DELIMITER AS '|' NULL ''; > > The first two give errors, the third one throws the same missing value > for column error. > > The data is stored like this: > > 123|big string here|189209209|US|2001-01-01 > 123|big string here|189209209|US|2001-01-01 > 123|big string here|189209209|US|2001-01-01 > 123|big string here|189209209|US|2001-01-01 > > But sometimes, the strings are: > > |big string here|189209209|US|2001-01-01 > |big string here|189209209|US|2001-01-01 > > Or > > 123|big string here|189209209|US > 123|big string here|189209209|US| > > So you see either the first column, which is the ID in a way, is > missing so the "missing character" is probably a blank (''?). In this > case I want COPY to just ignore this line. > > Or the last column is missing, where the missing character can be a > newline I suppose? > > So how do I specify this in the COPY command so that it doesn't croak? > If a line's ID is missing, it should ignore the line and go on instead > of not doing anything by throwing an error for EVERYTHING! > > Thanks. > > Missing data is one thing, missing delimiters is another. Try doing a small copy of data with just a few lines to see which variants are actually causing the error. My money is on the one that has a mismatch between the table column count and the data column count. I.e., the row with three delimiters instead of four: 23|big string here|189209209|US When you say "ignore", do you mean that you want PostgreSQL to assume a null value for the missing column or to not import that row at all? In general, when you have data scrubbing issues like this, grep/sed/awk/... are your friends. Clean it up then import it. I suppose you could import all rows into a big text field and process it in PostgreSQL but I doubt you will find that to be an optimal solution. Cheers, Steve
On 01/06/2012 03:42 PM, Phoenix Kiula wrote: > On Fri, Jan 6, 2012 at 6:20 PM, Adrian Klaver<adrian.klaver@gmail.com> wrote: > >> http://www.postgresql.org/docs/9.0/interactive/sql-copy.html >> >> Search for >> NULL > > > > Thanks Adrian. > > Without examples, it's hard to predict syntax. If the value after a > pipe is missing altogether, I suppose the missing value is "\n" > (newline). But this doesn't work: > > copy vl from 'data.txt' WITH DELIMITER AS '|' NULL '\n'; > > None of these work either: > > copy vl from 'data.txt' WITH DELIMITER AS '|' NULL \n; > copy vl from 'data.txt' WITH DELIMITER AS '|' NULL \\n; > copy vl from 'data.txt' WITH DELIMITER AS '|' NULL ''; Try: copy vl from 'data.txt' WITH CSV DELIMITER '|'; If that doesn't work take a look at pgloader: http://pgfoundry.org/projects/pgloader/ -- Adrian Klaver adrian.klaver@gmail.com
On Fri, Jan 6, 2012 at 6:53 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > On 01/06/2012 03:42 PM, Phoenix Kiula wrote: >> >> On Fri, Jan 6, 2012 at 6:20 PM, Adrian Klaver<adrian.klaver@gmail.com> >> wrote: >> >>> http://www.postgresql.org/docs/9.0/interactive/sql-copy.html >>> >>> Search for >>> NULL >> >> >> >> Thanks Adrian. >> >> Without examples, it's hard to predict syntax. If the value after a >> pipe is missing altogether, I suppose the missing value is "\n" >> (newline). But this doesn't work: >> >> copy vl from 'data.txt' WITH DELIMITER AS '|' NULL '\n'; >> >> None of these work either: >> >> copy vl from 'data.txt' WITH DELIMITER AS '|' NULL \n; >> copy vl from 'data.txt' WITH DELIMITER AS '|' NULL \\n; >> copy vl from 'data.txt' WITH DELIMITER AS '|' NULL ''; >> >> The first two give errors, the third one throws the same missing value >> for column error. >> >> The data is stored like this: >> >> 123|big string here|189209209|US|2001-01-01 >> 123|big string here|189209209|US|2001-01-01 >> 123|big string here|189209209|US|2001-01-01 >> 123|big string here|189209209|US|2001-01-01 >> >> But sometimes, the strings are: >> >> |big string here|189209209|US|2001-01-01 >> |big string here|189209209|US|2001-01-01 >> >> Or >> >> 123|big string here|189209209|US >> 123|big string here|189209209|US| >> >> So you see either the first column, which is the ID in a way, is >> missing so the "missing character" is probably a blank (''?). In this >> case I want COPY to just ignore this line. >> >> Or the last column is missing, where the missing character can be a >> newline I suppose? >> >> So how do I specify this in the COPY command so that it doesn't croak? >> If a line's ID is missing, it should ignore the line and go on instead >> of not doing anything by throwing an error for EVERYTHING! >> >> Thanks. >> >> > Missing data is one thing, missing delimiters is another. Try doing a small > copy of data with just a few lines to see which variants are actually > causing the error. My money is on the one that has a mismatch between the > table column count and the data column count. I.e., the row with three > delimiters instead of four: > > > 23|big string here|189209209|US > > When you say "ignore", do you mean that you want PostgreSQL to assume a null > value for the missing column or to not import that row at all? > > In general, when you have data scrubbing issues like this, grep/sed/awk/... > are your friends. Clean it up then import it. > > I suppose you could import all rows into a big text field and process it in > PostgreSQL but I doubt you will find that to be an optimal solution. Thanks Steve. The file has 350 million lines. Sed, Awk etc are a little painful when the file is 18GB witht hat many lines. I'd want Postgresql to ignore the line altogether when something is missing. Is this an option we can use, or are rules hoisted on us?
On Fri, Jan 6, 2012 at 6:54 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote: > > Try: > copy vl from 'data.txt' WITH CSV DELIMITER '|'; Doesn't work. Can't see what the different in CSV is from a text file. Same errors are thrown. > If that doesn't work take a look at pgloader: > http://pgfoundry.org/projects/pgloader/ Wow, another geeky tool. Hard to find documentation. Archaic presentation, no simple steps to install and get using. Anyway doesn't seem to provide the options I need (http://pgloader.projects.postgresql.org/) -- a) Ability to assign more than one NULL value b) Ability to ignore lines altogether that have any problem Really, other databases have mechanisms to ignore "problem lines" while copying. Does Postgresql allow me to *ignore* lines while COPYING?
On 01/06/2012 03:12 PM, Phoenix Kiula wrote: > ... >> Sounds like you are using statement pooling - every statement can be >> assigned to a different server connection. You may need transaction pooling >> or session pooling: >> >> http://pgbouncer.projects.postgresql.org/doc/usage.html > > > Thanks Steve. YES! I changed it to transaction pooling and now it works. > But Marko is correct. If you were using: begin; create temporary... \copy commit; and if your pooler was set to statement then you should have seen an error. Are you by any chance routing stderr to /dev/null or otherwise hiding messages? If you are using "psql.....2>/dev/null" then everything would *look* the same with pooling set to statement or transaction but you would be blissfully ignorant of the errors. Cheers, Steve
On 01/06/2012 03:55 PM, Phoenix Kiula wrote: > ... >> In general, when you have data scrubbing issues like this, grep/sed/awk/... >> are your friends. Clean it up then import it. > Thanks Steve. > > The file has 350 million lines. Sed, Awk etc are a little painful when > the file is 18GB witht hat many lines. > > I'd want Postgresql to ignore the line altogether when something is > missing. Is this an option we can use, or are rules hoisted on us? > I've found grep, sed and friends to be quite effective and proper pre-cleaning to have a relatively minor impact on performance. Done properly, you will just be piping the data through a very simple grep/sed/awk/... into psql. No extra disk-reads, minimal memory use and a bit of CPU. And you will be in charge of deciding how suspect data is handled. If by "rules hoisted" you mean "will PostgreSQL make arbitrary and possibly incorrect assumptions to attempt to force bad data into a table" then the answer is "no". In fact, it has become more and more picky over time. Trust me, at some point you will thank it for doing so. For example the following used to "work": select current_date < 2020-01-01; But it returned "false" which was probably not what the user wanted. (2020-01-01 is the integer 2018 which PostgreSQL interpreted as being less than current_date). But it sure looks OK at first glance when you really meant '2020-01-01'::date. In current versions, that statement will throw an error just at PostgreSQL does with dates like February 31 and a variety of other things that certain other DBMS' deem good. Cheers, Steve
On Friday, January 06, 2012 4:16:09 pm Phoenix Kiula wrote: > On Fri, Jan 6, 2012 at 6:54 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote: > > Try: > > copy vl from 'data.txt' WITH CSV DELIMITER '|'; > > Doesn't work. Can't see what the different in CSV is from a text file. > Same errors are thrown. > > > If that doesn't work take a look at pgloader: > > http://pgfoundry.org/projects/pgloader/ > > Wow, another geeky tool. Hard to find documentation. Archaic > presentation, no simple steps to install and get using. Anyway doesn't > seem to provide the options I need > (http://pgloader.projects.postgresql.org/) -- Install: Download tar -xzvf pgloader-2.3.2.tar.gz cd pgloader-2.3.2/ sudo make Very difficult. http://pgloader.projects.postgresql.org/ > > a) Ability to assign more than one NULL value null You can configure here how null value is represented into your flat data file. This parameter is optional and defaults to '' (that is empty string). If defined on a table level, this local value will overwrite the global one. empty_string You can configure here how empty values are represented into your flat data file. This parameter is optional and defaults to \ (that is backslash followed by space). If defined on a table level, this local value will overwrite the global one. reformat Use this option when you need to preprocess some column data with pgloader reformatting modules, or your own. The value of this option is a comma separated list of columns to rewrite, which are a colon separated list of column name, reformat module name, reformat function name. Here's an example to reformat column dt_cx with the mysql.timestamp() reformatting function: reformat = dt_cx:mysql:timestamp See global setting option reformat_path for configuring where pgloader will look for reformat packages and modules. If you want to write a new formating function, provide a python package called reformat (a directory of this name containing an empty __init__.py file will do) and place in there arbitrary named modules (foo.py files) containing functions with the following signature: def bar(reject, input) The reject object has a log(self, messages, data = None) method for you to log errors into section.rej.log and section.rej files. > b) Ability to ignore lines altogether that have any problem reject_log In case of errors processing input data, a human readable log per rejected input data line is produced into the reject_log file. reject_data In case of errors processing input data, the rejected input line is appended to the reject_data file. > > Really, other databases have mechanisms to ignore "problem lines" > while copying. Does Postgresql allow me to *ignore* lines while > COPYING? No. I await with bated breath your most excellent patch to COPY. -- Adrian Klaver adrian.klaver@gmail.com
On Fri, Jan 6, 2012 at 8:19 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote: > On Friday, January 06, 2012 4:16:09 pm Phoenix Kiula wrote: >> On Fri, Jan 6, 2012 at 6:54 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote: >> > Try: >> > copy vl from 'data.txt' WITH CSV DELIMITER '|'; >> >> Doesn't work. Can't see what the different in CSV is from a text file. >> Same errors are thrown. >> >> > If that doesn't work take a look at pgloader: >> > http://pgfoundry.org/projects/pgloader/ >> >> Wow, another geeky tool. Hard to find documentation. Archaic >> presentation, no simple steps to install and get using. Anyway doesn't >> seem to provide the options I need >> (http://pgloader.projects.postgresql.org/) -- > > Install: > Download > tar -xzvf pgloader-2.3.2.tar.gz > cd pgloader-2.3.2/ > sudo make > > Very difficult. > > http://pgloader.projects.postgresql.org/ Sorry. That I already did. But where's the config file? How to configure the config file? Where's the simple doc (not on that ugly PGFoundry website, I mean in English that people can understand what to do, with a starting sample?) >> a) Ability to assign more than one NULL value ... So Null, or Empty Value? (I will ignore the Phd I need for all that geeky "reformat" jazz) My request is simple. If a value is missing, ignore the line and move on. This is an option that's perfectly alright to expect/need. I know PG is being super-disciplined and strict, which is great if that's the default, but to remove any option of flexibility on this pretext is a little silly. And sounds like an excuse. >> b) Ability to ignore lines altogether that have any problem > > reject_log > In case of errors processing input data, a human readable log per rejected input > data line is produced into the reject_log file. > reject_data > In case of errors processing input data, the rejected input line is appended to > the reject_data file. I don't need a reject log if the main command will croak and be utterly useless, as COPY is already. If the full command will go but some lines will be ignored then a reject log is useful so I can identify the ignored lines. Which is it? My sense is it's the former, which means this entire software is just a more complicated way of (non-)achieving the same thing I do with COPY. > I await with bated breath your most excellent patch to COPY. Thanks for your help on this list. I'll be moving to more modern couchDB type databases eventually anyway, but for now I'm stuck with PG.
On 01/06/12 5:33 PM, Phoenix Kiula wrote: >> > http://pgloader.projects.postgresql.org/ > Sorry. That I already did. > > But where's the config file? How to configure the config file? > Where's the simple doc (not on that ugly PGFoundry website, I mean in > English that people can understand what to do, with a starting > sample?) > > > you should check your attitude at the door. this isn't Microsoft Pay per Incident Tech Support. what you're asking for is right on that very URL you so conveniently quoted, both the configuration file format ("Global Configuration Section"), and where the sample pgloader.conf can be found... > Please see the given configuration example which should be distributed > in/usr/share/doc/pgloader/examples/pgloader.conf. > > The example configuration file comes with example data and can be used > a unit test ofpgloader. > so, yes, you need to download the package and un-tar it to see the sample .conf.... $ wget http://pgfoundry.org/frs/download.php/2294/pgloader-2.3.2.tar.gz $ tar xzf pgloader-2.3.2.tar.gz $ cd pgloader-2.3.2 $ more examples/pgloader.conf
On Fri, Jan 6, 2012 at 10:38 PM, John R Pierce <pierce@hogranch.com> wrote: > you should check your attitude at the door. this isn't Microsoft Pay per > Incident Tech Support. I saw the door. Found some other attitudes that were allowed to be let in. Like asking me to write my own patch. You see, attitudes come in different shapes. > what you're asking for is right on that very URL you so conveniently quoted, > both the configuration file format ("Global Configuration Section"), and > where the sample pgloader.conf can be found... No it isn't. The config file is not clear. And it's available in three different forms in three different places, which I found only thanks to Google. It's a community project, I get it. Moving on. I'll do the awk/sed thing. COPY from other databases already has several such convenient features. Postgresql is not interested in implementing them. Sure.
On 2012-01-06, Phoenix Kiula <phoenix.kiula@gmail.com> wrote: > On Fri, Jan 6, 2012 at 6:53 PM, Steve Crawford ><scrawford@pinpointresearch.com> wrote: > Thanks Steve. > > The file has 350 million lines. Sed, Awk etc are a little painful when > the file is 18GB witht hat many lines. On files of that size they're a lot nicer than an interactive editor. It's not like you need to find space on the disk for an edited copy: ( echo "copy table_name from stdin other_parameters; " cat bigfile | sed script_or_scriptfile echo '\\.' ) | psql connection_parameters > I'd want Postgresql to ignore the line altogether when something is > missing. Is this an option we can use, or are rules hoisted on us? The copy command is optimised and intended for use with data that is known to be good, or atleast acceptable to the database. -- ⚂⚃ 100% natural