Thread: New Driver and Unique Indexes
Hello, I have posted a new version of the ODBC driver at our web site. (http://www.insightdist.com/psqlodbc). We are also now including a version number (this one is 06.30.0010). You can click on this link and see what changes this version includes. Also, you can look with the ODBC Administrator under "ODBC Drivers" and get the current version for correspondence and so forth. 1. This new version fixes problems with execution time parameters (SQLParamData, SQLPutData) for text fields where parameters were being dropped and a '?' was appearing in the query. 2. Also, functionality has been added to return information about UNIQUE INDEXES. This was never implemented in the old driver (it assumed Postgres couldn't have any). This should allow Access 2.0 users to be allowed to update records. Also, it should allow Visual Basic to do updates. --------- HACKERS INVITED TO PLEASE READ THIS SECTION --------- One downside about UNIQUE INDEXES however, is how Microsoft Access handles them when you open the table in datasheet view. Whether you specify the unique index at link time, or the driver provides the info, Access will try to use queries which show up a problem with the backend: Here is an example of an Access query with a unique index on a single field: SELECT balance_id,company_id, balance_date, is_audited,comment, balance_type, balance_filename FROM balance WHERE balance_id = 1 OR balance_id = 2 OR balance_id = 3 OR balance_id = 4 OR balance_id = 5 OR balance_id = 6 OR balance_id = 7 OR balance_id = 8 OR balance_id = 9 OR balance_id = 10 The more keyparts you have, the worse the problem is (2 keyparts): SELECT balance_id,company_id, balance_date, is_audited,comment, balance_type, balance_filename FROM balance WHERE balance_id = 1 AND company_id=1 OR balance_id = 1 AND company_id=2 OR balance_id = 1 AND company_id=3 OR balance_id = 2 AND company_id=1 OR balance_id = 2 AND company_id=2 OR balance_id = 2 AND company_id=3 OR balance_id = 3 AND company_id=1 OR balance_id = 3 AND company_id=2 OR balance_id = 3 AND company_id=3 OR balance_id = 4 AND company_id=1 Any more than 2 keyparts, results in crashing the backend with the message "palloc failure: memory exhausted". Even at 2 keyparts, performance suffers greatly. In both of the above examples, Access is trying to retrieve 10 records using a "Prepared" statement (prepared statementents are "faked" in the driver, since they are not implemented in the backend) with the unique index of the table. We have known about this problem and have discussed it with the hackers list in the past. It is on the todo list under "Performance" and it appears as "Allow indexes to be used with OR clauses(Vadim) ". I am not sure of the priority of this fix, however, or how difficult it would be to implement it. The reason we are mentioning this with renewed vigor, is that in the past, with the old driver, Access 7.0 and Access 97, would ask the user what they wanted the unique index to be. You could tell it whatever you wanted, and even, not specify any unique index. Now, with this new unique index fix, you will not have a choice as to whether you want to use unique indexes or not, which, depending on how many fields are being indexed on, may crash the backend. Of course, if you are not using "unique" indexes on your table, Access 7.0 and 97 will ask you at link time, as before. Does anyone have any knowledge of the above problem and/or the priority of the fix that Vadim is mentioned on? Sorry for the long length of this letter. Regards, Byron
Byron Nikolaidis wrote: > > SELECT balance_id,company_id, balance_date, is_audited,comment, > balance_type, balance_filename FROM balance WHERE balance_id = 1 AND > company_id=1 OR balance_id = 1 AND company_id=2 OR balance_id = 1 AND > company_id=3 OR balance_id = 2 AND company_id=1 OR balance_id = 2 AND > company_id=2 OR balance_id = 2 AND company_id=3 OR balance_id = 3 AND > company_id=1 OR balance_id = 3 AND company_id=2 OR balance_id = 3 AND > company_id=3 OR balance_id = 4 AND company_id=1 > > Any more than 2 keyparts, results in crashing the backend with the > message > "palloc failure: memory exhausted". Even at 2 keyparts, performance > suffers greatly. THis is known problem of canonificator in optimizer. This query will crash backend without any indices too. We told about this in the 6.3-beta period. No fix currently. I don't know when it will be at all. Vadim
Sorry for koi-8 charset :) Vadim B. Mikheev wrote: > > Byron Nikolaidis wrote: > > > > SELECT balance_id,company_id, balance_date, is_audited,comment, > > balance_type, balance_filename FROM balance WHERE balance_id = 1 AND > > company_id=1 OR balance_id = 1 AND company_id=2 OR balance_id = 1 AND > > company_id=3 OR balance_id = 2 AND company_id=1 OR balance_id = 2 AND > > company_id=2 OR balance_id = 2 AND company_id=3 OR balance_id = 3 AND > > company_id=1 OR balance_id = 3 AND company_id=2 OR balance_id = 3 AND > > company_id=3 OR balance_id = 4 AND company_id=1 > > > > Any more than 2 keyparts, results in crashing the backend with the > > message > > "palloc failure: memory exhausted". Even at 2 keyparts, performance > > suffers greatly. > > THis is known problem of canonificator in optimizer. This query will > crash backend without any indices too. > We told about this in the 6.3-beta period. > No fix currently. I don't know when it will be at all. > > Vadim
> One downside about UNIQUE INDEXES however, is how Microsoft Access > handles them when you open the table in datasheet view. Whether you > specify the unique index at link time, or the driver provides the info, > Access will try to use queries which show up a problem with the backend: > > Here is an example of an Access query with a unique index on a single > field: > > SELECT balance_id,company_id, balance_date, is_audited,comment, > balance_type, balance_filename FROM balance WHERE balance_id = 1 OR > balance_id = 2 OR balance_id = 3 OR balance_id = 4 OR balance_id = 5 OR > balance_id = 6 OR balance_id = 7 OR balance_id = 8 OR balance_id = 9 OR > balance_id = 10 > > The more keyparts you have, the worse the problem is (2 keyparts): > > SELECT balance_id,company_id, balance_date, is_audited,comment, > balance_type, balance_filename FROM balance WHERE balance_id = 1 AND > company_id=1 OR balance_id = 1 AND company_id=2 OR balance_id = 1 AND > company_id=3 OR balance_id = 2 AND company_id=1 OR balance_id = 2 AND > company_id=2 OR balance_id = 2 AND company_id=3 OR balance_id = 3 AND > company_id=1 OR balance_id = 3 AND company_id=2 OR balance_id = 3 AND > company_id=3 OR balance_id = 4 AND company_id=1 OK, I have the dope on this one. The palloc failure is not the OR indexing, but rather the item: * Fix memory exhaustion when using many OR's The bug report that prompted this is attached. As you can see, it was also prompted by MS-Access. The problem is that the backend uses the text-book method of processing OR's by converting the WHERE clause to Conjunctive-Normal-Form(CNF), and this exponentially explodes the number of tests where there are many OR clauses. We are not sure how to fix it yet. Vadim has improved the handling of this in 6.3.*, but it still is not perfect and needs a solution. Obviously other databases are not CNF'ifing the queries so there must be a solution. David? --------------------------------------------------------------------------- Date: Mon, 12 Jan 1998 15:53:18 -0500 From: David Hartwig <daveh@insightdist.com> To: Bruce Momjian <maillist@candle.pha.pa.us> Subject: Re: [BUGS] General Bug Report: palloc fails with lots of ANDs and ORs This is a multi-part message in MIME format. --------------20C7AC27E8BCA117B23354BE Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bruce, I did some homework. Here is what I have. The default max data segment size on our (AIX 4.1.4) box is around 130000 kbytes. I put together a query which put me just past the threshold of the palloc "out of memory error". It is as follows: create table outlet ( number int, name varchar(30), ... } create unique index outlet_key on outlet using btree (number); select count(*) from outlet where (number = 1 and number = 1 and number = 1) or (number = 1 and number = 1 and number = 1) or (number = 1 and number = 1 and number = 1) or (number = 1 and number = 1 and number = 1) or (number = 1 and number = 1 and number = 1) or (number = 1 and number = 1 and number = 1) or (number = 1 and number = 1 and number = 1) or (number = 1 and number = 1 and number = 1) or (number = 1 and number = 1 and number = 1); Not pretty but it makes the point. Take out two OR clauses and the query works fine (but a bit slow). The above query is all it takes to use up all 130000 Kbytes of memory. And, since the query takes a long time to finallyfail, I was able to observe the memory consumption. I extended the max data segment to 300000. And tried again. I could observer the memory consumption up to about 280000when the system suddenly got sick. I was getting all kinds of messages like "cant fork"; bad stuff. The system did finally recover onits own. I am not sure happened there. I know that ulimit puts us right around the physical memory limits of out system. Using 300 meg for the above query seems like a bit of a problem. It is difficult to imagine where all that memory is beingused. I will research the problem further if you need more information. Bruce Momjian wrote: > Try changing your OS default memory size. Unsure how to do this under > AIX. > > > > > > > ============================================================================ > > POSTGRESQL BUG REPORT TEMPLATE > > ============================================================================ > > > > > > Your name : David Hartwig > > Your email address : daveh@insightdist.com > > > > Category : runtime: back-end: SQL > > Severity : serious > > > > Summary: palloc fails with lots of ANDs and ORs > > > > System Configuration > > -------------------- > > Operating System : AIX 4.1 > > > > PostgreSQL version : 6.2 > > > > Compiler used : native CC > > > > Hardware: > > --------- > > RS 6000 > > > > Versions of other tools: > > ------------------------ > > NA > > > > -------------------------------------------------------------------------- > > > > Problem Description: > > -------------------- > > The follow is a mail message describing the problem on the PostODBC mailing list: > > > > > > I have run across this also. We traced it down to a failure in the PostgreSQL server. This occurs under the followingconditions. > > > > 1. MS Access > > 2. Specify a multi-part key in the link time setup with postgresql > > 3. Click on table view. > > > > What happens is MS Access takes the following steps. First it selects all possible key values for the table being viewed. I > > suspect it maps the key values to the relative row position in the display. Then it uses the mapping to generate futurequeries based > > on the mapping and the rows showing on the screen. The queries take the following form: > > > > SELECT keypart1, keypart2, keypart3, col4, col5, col6 ... FROM example_table > > WHERE > > (keypart1 = row1keypartval1 AND keypart2 = row1keypartval2 AND keypart3 = row1keypartval3) OR > > (keypart1 = row2keypartval1 AND keypart2 = row2keypartval2 AND keypart3 = row2keypartval3) OR > > . > > . -- 28 lines of this stuff. Why 28... Why not 28 > > . > > (keypart1 = row27keypartval1 AND keypart2 = row27keypartval2 AND keypart3 = row27keypartval3) OR > > (keypart1 = row28keypartval1 AND keypart2 = row28keypartval2 AND keypart3 = row28keypartval3); > > > > > > The PostgreSQL sever chokes on this statement claiming it is out of memory. (palloc) In this example I used a threepart key. I > > do not recall if a three part key is enough to trash the backend. It has been a while. I have tried sending thesekinds of statements > > directly through the psql monitor and get the same result. > > > > > > -------------------------------------------------------------------------- > > > > Test Case: > > ---------- > > select c1, c1 c3, c4, c5 ... from example_table > > where > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something) or > > (c1 = something and c2 = something and c3 = something and c4 = something); > > > > > > -------------------------------------------------------------------------- > > > > Solution: > > --------- > > > > > > -------------------------------------------------------------------------- > > > > > > > > -- > Bruce Momjian > maillist@candle.pha.pa.us --------------20C7AC27E8BCA117B23354BE Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for David Hartwig Content-Disposition: attachment; filename="vcard.vcf" begin: vcard fn: David Hartwig n: Hartwig;David org: Insight Distribution Systems adr: 222 Shilling Circle;;;Hunt Valley ;MD;21030;USA email;internet: daveh@insightdist.com title: Manager Research & Development tel;work: (410)403-2308 x-mozilla-cpt: ;0 x-mozilla-html: TRUE version: 2.1 end: vcard --------------20C7AC27E8BCA117B23354BE-- -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Byron Nikolaidis wrote: > > Hello, > > I have posted a new version of the ODBC driver at our web site. > (http://www.insightdist.com/psqlodbc). We are also now including a > version number (this one is 06.30.0010). You can click on this link and > see what changes this version includes. Also, you can look with the > ODBC Administrator under "ODBC Drivers" and get the current version for > correspondence and so forth. > > 1. This new version fixes problems with execution time parameters > (SQLParamData, SQLPutData) for text fields where parameters were being > dropped and a '?' was appearing in the query. Good! And thanks ;)! > > --------- HACKERS INVITED TO PLEASE READ THIS SECTION --------- > > One downside about UNIQUE INDEXES however, is how Microsoft Access > handles them when you open the table in datasheet view. Whether you > specify the unique index at link time, or the driver provides the info, > Access will try to use queries which show up a problem with the backend: > > Here is an example of an Access query with a unique index on a single > field: > > SELECT balance_id,company_id, balance_date, is_audited,comment, > balance_type, balance_filename FROM balance WHERE balance_id = 1 OR > balance_id = 2 OR balance_id = 3 OR balance_id = 4 OR balance_id = 5 OR > balance_id = 6 OR balance_id = 7 OR balance_id = 8 OR balance_id = 9 OR > balance_id = 10 > > The more keyparts you have, the worse the problem is (2 keyparts): > > SELECT balance_id,company_id, balance_date, is_audited,comment, > balance_type, balance_filename FROM balance WHERE balance_id = 1 AND > company_id=1 OR balance_id = 1 AND company_id=2 OR balance_id = 1 AND > company_id=3 OR balance_id = 2 AND company_id=1 OR balance_id = 2 AND > company_id=2 OR balance_id = 2 AND company_id=3 OR balance_id = 3 AND > company_id=1 OR balance_id = 3 AND company_id=2 OR balance_id = 3 AND > company_id=3 OR balance_id = 4 AND company_id=1 as a quick (?) fix, can't this kind of query be identified in the driver (now) or in the backend(later) and rewritten to a union query like this SELECT balance_id,company_id, balance_date, is_audited,comment, balance_type, balance_filename FROM balance WHERE balance_id = 1 AND company_id=1 union SELECT balance_id,company_id, balance_date, is_audited,comment, balance_type, balance_filename FROM balance WHERE balance_id = 1 AND company_id=2 union . . . union SELECT balance_id,company_id, balance_date, is_audited,comment, balance_type, balance_filename FROM balance WHERE balance_id = 4 AND company_id=1 ; Or is the optimiser too smart and rewrites it back to the original form ? once the identification phase is done in the backend, it should be easy to check that all the fields ORed together are from an unique index and do an index scan instead of a rewrite to union. > Any more than 2 keyparts, results in crashing the backend with the > message "palloc failure: memory exhausted". Even at 2 keyparts, performance > suffers greatly. Actually it did not crash on me even on 3 keyparts, the backend just grew to 97MB and stayed so until I closed access ;(. Once had to kill both access and backend, but then I had been careless and viewed two tables with a primary key of more than 1 field ;) > In both of the above examples, Access is trying to retrieve 10 records > using a "Prepared" statement (prepared statementents are "faked" in the > driver, since they are not implemented in the backend) with the unique > index of the table. perhaps the rewriting of ORs to UNION could be done while "Preparing". The heuristics would be just to check if the where clause has altenating ANDs and ORs and then split and rewrite it to union at each OR. This of course can hit the infamous 8k limitation of query size (is it still there ?) > > The reason we are mentioning this with renewed vigor, is that in the > past, with the old driver, Access 7.0 and Access 97, would ask the user > what they wanted the unique index to be. You could tell it whatever you > wanted, and even, not specify any unique index. Now, with this new > unique index fix, you will not have a choice as to whether you want to > use unique indexes or not, which, depending on how many fields are being > indexed on, may crash the backend. As a temporary fix, you could just return the unique indexes of one field only. You cold easyly remove the check later when backend gets fixed. or the ones with specific naming, for example ending in *_mspkey ? > > Sorry for the long length of this letter. > Until this is fixed it should go in some readme with BIG BOLD LETTERS ;) Hannu