Re: any solution for doing a data file import spawning it on multiple processes - Mailing list pgsql-general
From | hb@101-factory.eu |
---|---|
Subject | Re: any solution for doing a data file import spawning it on multiple processes |
Date | |
Msg-id | 9E7215C2-C9C6-4B5A-A4D9-825ED610FF84@101-factory.eu Whole thread Raw |
In response to | Re: any solution for doing a data file import spawning it on multiple processes (Edson Richter <edsonrichter@hotmail.com>) |
List | pgsql-general |
thanks all, i will be looking into it. Met vriendelijke groet, Henk On 16 jun. 2012, at 18:23, Edson Richter <edsonrichter@hotmail.com> wrote: > Em 16/06/2012 12:59, hb@101-factory.eu escreveu: >> thanks i thought about splitting the file, but that did no work out well. >> >> so we receive 2 files evry 30 seconds and need to import this as fast as possible. >> >> we do not run java curently but maybe it's an option. >> are you willing to share your code? >> >> also i was thinking using perl for it >> >> >> henk >> >> On 16 jun. 2012, at 17:37, Edson Richter <edsonrichter@hotmail.com> wrote: >> >>> Em 16/06/2012 12:04, hb@101-factory.eu escreveu: >>>> hi there, >>>> >>>> I am trying to import large data files into pg. >>>> for now i used the. xarg linux command to spawn the file line for line and set and use the maximum available connections. >>>> >>>> we use pg pool as connection pool to the database, and so try to maximize the concurrent data import of the file. >>>> >>>> problem for now that it seems to work well but we miss a line once in a while, and that is not acceptable. also it createszombies ;(. >>>> >>>> does anybody have any other tricks that will do the job? >>>> >>>> thanks, >>>> >>>> Henk >>> I've used custom Java application using connection pooling (limited to 1000 connections, mean 1000 concurrent file imports). >>> >>> I'm able to import more than 64000 XML files (about 13Kb each) in 5 minutes, without memory leaks neither zombies, and(of course) no missing records. >>> >>> Besides I each thread import separate file, I have another situation where I have separated threads importing differentlines of same file. No problems at all. Do not forget to check your OS "file open" limits (it was a big issue inthe past for me due Lucene indexes generated during import). >>> >>> Server: 8 core Xeon, 16Gig, SAS 15000 rpm disks, PgSQL 9.1.3, Linux Centos 5, Sun Java 1.6.27. >>> >>> Regards, >>> >>> Edson Richter >>> >>> >>> -- >>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-general > I'm not allowed to publish my company's code, but the logic if very easy to understand (you will have to "invent" yourown solution, below code is bare bone): > > class MainThread implements Runnable { > private boolean keepRunning = true; > > public void run() { > while(keepRunning) { > try { > executeFiles(); > Thread.sleep(30000); // sleep 30 seconds > } catch(Exception ex) { > ex.printStackTrace(); > } > } > } > > private void executeFiles() { > File monitorDir = new File("/var/mydatafolder/"); > File processingDir = new File("/var/myprocessingfolder/"); > > // I'll import only files with names like "data20120621.csv": > FileFilter fileFilter = new FileFilter() { > public boolean accept(File file) { > boolean isfile = file.isFile() && !file.isHidden() && !file.isDirectory(); > if(!isfile) return false; > String fname = file.getName(); > return fname.startsWith("data") && (file.getName().endsWith("csv")); > } > }; > > List<File> forProcessing = monitorDir.listFiles(fileFilter); > > for(File fileFound : forProcessing) { > // FileUtil is a utility class, you will have to create your own... your move method will vary according yourOperating System > FileUtil.move(fileFound, processingDir); > // ProcessFile is a class that implements Runnable, and do your stuff there... > Thread t = new Thread(new ProcessFile(processingDir, fileFound.getName())); > t.start(); > } > } > > /** Use this method to stop the thread from another place in your complex system! */ > public void synchronized stopWorker() { > keepRunning = false; > } > > public static void main(String [] args) { > Thread t = new Thread(new MainThread()); > t.start(); > } > } > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
pgsql-general by date: