Thread: making postgresql "source" its data feed from external csv file ...

making postgresql "source" its data feed from external csv file ...

From

Albretch Mueller

Date:

03 January 2011, 18:55:11

 or keeping data processed in a DBMS in synch with local csv file
~
 I kind of like to use a fast in memory DB for coding cycles (either
HSQLDB or H2) and then when I have the need to stress test I use
postgresql, so I keep all my data as csv files which I then "source"
to tables, after running the DDL statements to create them, simply by
going:
~
 SET TABLE "<table_name>" SOURCE "<file_name.csv>;ignore_first=false;"
~
 As I understand postgresql "COPY FROM" does in effect copy the data
to the DBMS, but what I need is for the DBMS to "source" its data feed
with the outer file and synchronize it as it changes without having to
run "COPY TO". Is this possible? This way I can keep the data in a way
that I can try on any DBMS
~
 Is there an equivalent statement in postgresql or some other way to
achieve the same result?
~
 Thanks
 lbrtchx

Re: making postgresql "source" its data feed from external csv file ...

From

Craig Ringer

Date:

03 January 2011, 20:14:57

On 01/04/2011 09:55 AM, Albretch Mueller wrote:

>   As I understand postgresql "COPY FROM" does in effect copy the data
> to the DBMS, but what I need is for the DBMS to "source" its data feed
> with the outer file and synchronize it as it changes without having to
> run "COPY TO". Is this possible? This way I can keep the data in a way
> that I can try on any DBMS

No, that's not currently possible. What you describe is covered by
SQL/MED. There's ongoing work to implement SQL/MED for 9.1 / 9.2, but
there is no support for it in any current release version of PostgreSQL.
SQL/MED is intended for a lot more than just accessing flat files, but
should be capable of that.

There's a reason relational databases don't store their backing data as
CSV files - because they're *SLOW* to work with, in particular to
update. You can expect horribly slow write performance with any scheme
like this even if you do get it working.

--
Craig Ringer

Re: making postgresql "source" its data feed from external csv file ...

From

Maciek Sakrejda

Date:

03 January 2011, 20:38:37

If your tables are read-only (which it kind of sounds like), you could
abstract out the reload mechanism, use the current implementation for
the in-memory databases, and use a combination of watching the CSV
files (e.g., through inotify on Linux) and TRUNCATE / COPY in
PostgreSQL. I think how well this will work will really depend on what
you're doing with the system, but it could be worth a shot.
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: making postgresql "source" its data feed from external csv file ...

From

Albretch Mueller

Date:

03 January 2011, 21:09:22

On Tue, Jan 4, 2011 at 12:14 AM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 01/04/2011 09:55 AM, Albretch Mueller wrote:
>
>>  As I understand postgresql "COPY FROM" does in effect copy the data
>> to the DBMS, but what I need is for the DBMS to "source" its data feed
>> with the outer file and synchronize it as it changes without having to
>> run "COPY TO". Is this possible? This way I can keep the data in a way
>> that I can try on any DBMS
>
> No, that's not currently possible. What you describe is covered by SQL/MED.
> There's ongoing work to implement SQL/MED for 9.1 / 9.2, but there is no
> support for it in any current release version of PostgreSQL. SQL/MED is
> intended for a lot more than just accessing flat files, but should be
> capable of that.
>
> There's a reason relational databases don't store their backing data as CSV
> files - because they're *SLOW* to work with, in particular to update. You
> can expect horribly slow write performance with any scheme like this even if
> you do get it working.
~
 Well, yes, writing to an inmem db while coding should be fast enough
I guess ;-)
~
 I thought of something else a few moments ago. Is there a way to
register to postgresql events like you do with java.awt., javax.swing.
and j2ee management events and listen to them?
~
 If you could somehow do that with jdbc, you could register to "COPY
FROM" events to certain files and regularly synchronize the data.
Certain schedule based on a measurable degree of dirtiness, time
stamps (back up incremental deltas every Tuesdays night ...) and data
semantics (if the increment or the total amount of monies reach some
value …) would help
~
 I think I will have to do with "COPY [FROM|TO]" for now until SQL/MED
~
 http://wiki.postgresql.org/wiki/SQL/MED
~
 http://wiki.postgresql.org/images/4/4c/SQLMED-FOSDEM2009.pdf
~
 becomes a viable technology
~
 Thanks
 lbrtchx