Implementing and Experimenting with a Full Disjunctions Operator. - Mailing list pgsql-hackers
From | Tzahi Fadida |
---|---|
Subject | Implementing and Experimenting with a Full Disjunctions Operator. |
Date | |
Msg-id | 002101c4f1d1$b0d27d20$0b00a8c0@llord Whole thread Raw |
List | pgsql-hackers |
Hi All, As part of my thesis I need to implement a new algorithm for Full-Disjunctions and 2 older ones. A short explanation of what Full-Disjunction is, is that it comes to solve what A natural outer join usually can't do when you have more than 2 relations. The main goal is to retrieve maximal answers from a set of relations. The natural representation of the operator that is usually depicted in the literature is FD(r1,...,rN). (r=relation/table) Ullman's algorithm uses several natural outer joins so there is no problem there but our algorithm must be run internally at the server since it uses no existing operators but it is also not limited to the gamma-acyclic restriction of Ullman's algorithm. I have already read most of the development documentations, faqs, presentations, listened on this mailing list, etc... I already compiled a dynamically loaded library with a function and ran it successfully. The research part of implementing the algorithm is theoretical and experimentation. After looking around in the code and seeing how the SPI generally works I have several concerns (the first one is the most crucial to me): 1) As part of the experimentation I need to know exactly how many blocks have been read when the algorithm ran. I need complete control over the process to run my simulations. I see that there are functions like heap heap_getnext() heap_fetch() SearchSysCache(). Our algorithm usually read sequentially the relations and I don't see how to read complete blocks and count these blocks. In addition temporary queues that must be held in memory will be needed to be dumped to disk at various times (because of their size) and fetched. Is there a way to control this process with accuracy and calculate the exact disk writes? 2) As part of the theoretical work and experimentation we want to load blocks of relation rows to the main memory and cache them using our techniques. Is there a way to control the memory blocks so they won't be swapped. In addition, is there a way to get a specific size of memory so we can plan our operator running path. I see that palloc return's to me a chunck of memory but I don't know in advance how much is available to me (aside from polling for it). 3) When outputting the results as a set of records, I cannot know in advance the type of temporary table that will come out just like a subquery like (select * from relationA,relationB); Is there a problem outputting this kind of table? 4) When inputting the various tables to the operator I understand that the function is limited to a fixed number of arguments. Is there a way to circumvent that or would I need to use an ugly ARRAY construct. Obviously there are better ways than a dynamically loaded library function, so after we study the algorithm I don't think there should be any problem integrating it to postgreSQL, of course if it will be good enough :) Thank you. Regards,tzahi. * - * - * Itzhak Fadida MSc Student Information System Engineering Area Faculty of Industrial Engineering & Management Technion - Israel Institute of Technology Technion City, Haifa, Israel 32000 Technion Email: Tzahi@TX.Technion.ac.il Alternative Email: TzahiFadida@MyRealBox.com * - * - * - * - * - * - * - * - * - * - * - * - * - * - * WARNING TO SPAMMERS: see at http://members.lycos.co.uk/my2nis/spamwarning.html
pgsql-hackers by date: