JOIN performance + exporting/importing text files (how to do bett er than with \copy?) - Mailing list pgsql-general
From | Paulo Sergio Fernandes Parola (Keyword) |
---|---|
Subject | JOIN performance + exporting/importing text files (how to do bett er than with \copy?) |
Date | |
Msg-id | 743CB750A13DD511A5930001023AA34106328407@ipcsrvmsr10.sp.vespersa.com.br Whole thread Raw |
List | pgsql-general |
Hi, I am running PostgreSQL over RedHat. I have two tables with no primary key (not well behaved data), so that I create an index on both tables on the fields I want to do an outer join. With PostgreSQL I can deal very well with the large amounts of data I am working with (GigaBytes of data with tables with millions of registers). My problems are basically: 1) performance while executing the queries upon these not well behaved data ('wrong' datatypes, non normalized tables, duplicated data which *has* to be imported, duplicates which sometimes *cannot be eliminated* by some business rules thus avoiding the correct setup of primary/unique keys, etc...) 2) The time necessary to build these queries by hand (I have to import from 1 up to 'n' files into some tables, then I perform some transforms into the columns, adjusting data types or format, and at last I perform the joins I need to generate the desired output tables). 3) Lack of flexibility to export/import data just by using command \COPY. My questions: 1) I create a third table as the result of the join between the two tables, but this process takes too long (one table has aprox. 20,000 records and the other one aprox. 600,000 records) - I have not waited for it to finish, but it was already a couple of hours. I performed some manipulation I had to do in the tables inside PostgreSQL and then I exported the data to text delimited files an imported into MS-Access. By performing a query by example I rebuilt the exact join between the tables and..... the results are generated very quickly. Why such joins in Access generate results almost instantly and take so long in PostgreSQL? 2) It would be much nicer if I had instead of static tables as the result of the joins (thus wasting lots of HD space), it would be preferrable to create the join as a view (as a bonus I can always know exactly how the join was performed by looking at the definition of the view). But then every time I try to select some data from the view the query will take endless time to finish as well (as opposed to what is experienced with Access). 3) Do you know if any of the existing front-ends (for either Linux or NT, such as PgAccess or PgAdmin II) would allow me to perform my *OUTER JOINS* in a more visual or easy way? 4) Do you know of any tool which will allow me to substitute \COPY ? E.g. some more powerful 'loader' as there is for Oracle (with which I could do already some data formatting for example)? As an example I need to be able to *import* not only text delimited files, but also 'fixed length' files (each line of the file has the same length, with every field occupying its precise size). Unfortunately this is the format I receive the files, so that I had to take advantage of a little Perl script to insert the delimiters in the files, and then use \COPY. This is unfortunately a time consuming and error-prone procedure (each file will have different configuration of field lengths). I also would like to be able to *export* files in this format. TIA, Paulo
pgsql-general by date: