Parallel Query - Mailing list pgsql-hackers
From | postgres-001@os10000.net |
---|---|
Subject | Parallel Query |
Date | |
Msg-id | 20070226184303.24E9B7D59D5C@bces-1600.de Whole thread Raw |
List | pgsql-hackers |
Hello, I have a single Athlon 2.3MHz. Then I bought a Netfinity 7600 with quad Xeon 700MHz (for EUR 151 on Ebay, 2gb RAM included). I worked over my application to run queries in parallel. I got a speedup from 13:37 min (on the AMD) to 4:01 min (on the IBM). With this email I am suggesting the following change: You may wish to try breaking up "union" operators into separate processes that are executed in parallel. In my application, I have a union of 12 queries. I run the first and simplest one as "create table ... as ..." and then run the remaining ones as "insert into ... select ..." in parallel. I realise that this harder for you than it is for me. But perhaps you may wish to give it some thought. I have written a summary of my application for the advocacy group, but will insert it here for your convenience. Bye, Oliver Thank you! I wish to thank the PostreSQL developers for their contribution. PostgreSQL is an incredibly powerful system. Please accept the below usage report as a modest contribution by a thankful user. I hope you can use it in your advocacy efforts as a poster to PostgreSQL's prowess. I am currently using PostgreSQL as the basis of a financial expert system. I built this system over the last 17 years. In 1989 I started a stock market program in AmigaBASIC. In 1990 it was transferred to WordPerfect Corporation's "PlanPerfect" spreadsheet. In 1992 I built an implementation of Ted Codd's "a model for large, shared databanks" without the ACID properties. Basically, it was an in-memory, single user, single threaded, C++ library for set operations. In 1996 I ported the whole thing to PostgreSQL and MAKE. The system, called mny-cnv and not publicly released, uses "gnumake" to allow for the building of several distinct targets: verify.txt -- check integrity of financial data cae.txt -- check accounts in EUR currency fns.txt -- compute net-worth by virtually liquidating all assetts irt.txt -- compute investment rentability In the Makefile, each of the above filenames constitutes a target. The file "verify.txt" is a list of "psql" output tables, while "cae", "fns" and "irt" are generated through Perl with a reports library. Each of the above target has a set of dependencies. The dependencies are files with the exact names of tables (and the filename-extension ".tab"), or files with the exact names of views (and the filename- extension ".vie"). I generate 63 views and tables with a total of 318982 records that way. Clearly, the calls form a hierarchy, the bottom of which consists of base tables. These base tables are generated and populated by a 'C' program, the core of which is formed by a flex/bison parser that reads textfiles with many brackets and a bespoke record structure. Over the last three years, the time for the generation of "cae.txt" has decreased from 21 to 13 minutes wall-clock time, simply through improvements in the optimiser. There is hardly any disc activity, since the working set of data is quite small. The C++ program, by comparison, took 34 minutes, was not as sophisticated in its portfolio cost analysis, and computed only a single point in time. Fundamentally, I use PostgreSQL as declarative programming language and execution system for set-operations. With "fns.txt", I compute, for a set of dates (1st of each month, over a 20-year period), the net-worth of our assetts after liquidation, with estimates for current pricing, fees and depreciation for low-liquidity positions like real estate. I could have reached the further level of abstraction for computing the figure for different dates by running the original C++ program repeatedly (12*20=240 times, 34 minutes each, for a total of 5.6 days). The fact, that I did write a library to implement Codd's operators, should tell you something about the complexity of the issue and the view that I took in how to best reduce the complexity by using higher- level operators. This is not strictly a PostgreSQL benefit, but the optimiser definitely is. The other results are: With "verify.txt", I generate a list of money transactions that don't add up. Consider the system a generalised double-entry book-keeping system. If I use the credit card on a business trip to pay for the limousine service, a negative entry is made in the credit card account, a positive entry is made in the business-expenses account and a positive entry is made in the credit-organisation account for the foreign currency fee. These three items are filed together and must sum up to zero. Also, I have accounts (about 150 in total) for different purposes: there's DAX index certificates, or other types of derivatives, the Portfolio at the bank, Cash-In and Cash-Out accounts, life insurance, etc. Most of these will add up to zero after liquidation. Some of them must add up to zero at all times, even without any automatically generated transfers. Think of this file as a report from unit testing. Only are we verifying conditions on a system of financial transfers. "cae.txt" is a set of account statements in one long textfile (a 5mb ASCII report). My complete bank statements are part of this, but so are all manner of other accounts, as hinted above. "irt.txt" computes the rentability percentage on all accounts, taking into consideration varying timescales and amounts. The "expert system" component implements methods of liquidation for each account type separately. There are liquidation methods for shares, funds, banking accounts, outstanding bills on credit cards, life insurance and about 3 other local specialty forms of retirement planning. On the assumption that any investment performing above or below average will return to the center, bail-out (for massive under-performers) or profit-taking (for over-performers) can be done. To be fair, the actual percentage is worked out with a bisection algorithm written in about 40 lines of python, but everything else is done in SQL. Let me stress again, that this endeavour would not be feasible without the abstraction of SQL, nor would it finish in any sensible time without the absolutely excellent optimiser of PostgreSQL. Let me just make the final remark, that the EURO-conversion used strange rounding rules and a very specific accuracy, which would have been a pain to try and do in C++. In PostgreSQL, I use the numeric datatype which effortlessly recreates a perfect match of half a dozen banks' conversion efforts. I did not have to perform a single corrective booking to "make things work". In Perl, I must use the "Currency.pm" package, which itself is based on "BigFloat" to perform reporting and formatting in a satisfactory way. Thank you very much to the PostgreSQL developers! Keep up the good work! Oliver
pgsql-hackers by date: