Re: COPY FROM - Mailing list pgsql-odbc
From | Kevin Puetz |
---|---|
Subject | Re: COPY FROM |
Date | |
Msg-id | dt3fuq$3q3$1@sea.gmane.org Whole thread Raw |
In response to | COPY FROM ("Bart Degryse" <Bart.Degryse@indicator.be>) |
List | pgsql-odbc |
Ludek Finstrle wrote: >> In the driver logging I can see that it's waiting for the data now, >> but I can't really figure out how to deliver it. Since the source >> data (as a text file with fixed length fields) is only available >> on client side and needs some processing before being ready to import > > I think ODBC doesn't support something like standard input ... > Or maybe it does? Do you try print the prepared data to the standard > output? I don't know about ODBC at the low level, but I've certainly never found a way to do this with DAO or ADO. I don't *think* it's possible. >> So basically my question is : how do I deliver the prepared data >> to the driver? Any help (tips, working code, example, ...) >> would be appreciated. > > If you can't find the way you can try to use insert statements. > You may collect multiple insert statements into one transaction > to speed up the data loading. The fastest approach I've found that *is* VB-friendly is to ship the data over as a bound parameter to a stored procedure that unpacks it into rows on the server side (writing a suitable function returning SETOF RECORD to split apart the string or blob). I don't actually know if there's an upper limit on the size of such arguments, but I've pushed 30k rows or so (~1MiB of total data) without difficulty using psqlodbc and ADO.Command objects. Since that's past 64kiB, and computer programmers not being terribly creative folk with it comes to max sizes, I suspect this means it'll be safe up to at least 2GiB if you have the memory to prepare and receive such a behemoth. I have a trivial split-on-delimiter char routine for simple stuff, and a from_csv written in PL/perl using Text::CSV for multi-column data or anything that needs to potentially quote the delimiter appearing in the contained data. I'm not actually sure the 'simple' version is even faster in any meaningful sense, but I wrote it first so it's still around. If there's interest I should be able to post them, but it's a work project so I don't have it handy tonight. For the CSV version, I implemented everything except multi-line values, which I had no need for - the concept would certainly allow it with a little more work in the parser function. It streams through the input splitting consuming lines of input and using return_next to build the result set, so if the query is able to consume rows as they come out (no sort or similar) it shouldn't ever need to have both the argument and the full recordset in memory. But, with my strings only in the megabyte range I haven't every really tested carefully to prove that it doesn't. Usage looks roughly like the following INSERT INTO tbl_foo (foo,bar,baz) SELECT * FROM from_csv(?) AS t(foo int4, bar text, baz date) with the arg bound to a string like 'foo,bar,baz foo1,bar1,baz1 foo2,bar2,baz2' IIRC, this took the time to load the data from ~45 seconds (using a series of INSERTs within a single txn) to about 500ms (seemingly pretty much network constrained), so the difference is pretty dang dramatic. You have to be careful to stringify non-text values in a way that postgres is happy with when you pack the argument, and then the type statements in the AS clause will coerce things so the rest of the query can ignore the fact they were delivered packed in text. YMMV. One could also use something other than CSV as the serialization, and the typesafety and serialization/parsing efficiency might be better if you did, but this was good enough for me and is nicely generic. Someday I mifhr also do one for the QDataStream's packing shipped over in a blob, since that would let my other C++/Qt apps have a more type-strict container, but that hasn't happened yet. This technique also allows you to materialize tabular 'immediate' data for a query in its FROM statement, which can be quite handy. I've actually replaced quite a few instances where I used to prepare a temp table or loop repeating a prepared SELECT varying the arguments by doing so. It avoids the repeated network latency (important in my case, though obviously not if the DB is local) and lets me feed in all the search data on one shot, so the db can often switch techniques from a series of hash or index lookups to something that shows better locality - depending on the structure of the query it can be an enormous win (one important operation went from 90s to ~300ms). > Regards, > > Luf
pgsql-odbc by date: