Thread: New Driver and Unique Indexes

New Driver and Unique Indexes

From
Byron Nikolaidis
Date:
Hello,

I have posted a new version of the ODBC driver at our web site.
(http://www.insightdist.com/psqlodbc).  We are also now including a
version number (this one is 06.30.0010).  You can click on this link and
see what changes this version includes.  Also, you can look with the
ODBC Administrator under "ODBC Drivers" and get the current version for
correspondence and so forth.

1. This new version fixes problems with execution time parameters
(SQLParamData, SQLPutData) for text fields where parameters were being
dropped and a '?' was appearing in the query.

2. Also, functionality has been added to return information about UNIQUE
INDEXES. This was never implemented in the old driver (it assumed
Postgres couldn't have any).    This should allow Access 2.0 users to be
allowed to update records.  Also, it should allow Visual Basic to do
updates.

---------  HACKERS INVITED TO PLEASE READ THIS SECTION ---------

One downside about UNIQUE INDEXES however, is how Microsoft Access
handles them when you open the table in datasheet view.  Whether you
specify the unique index at link time, or the driver provides the info,
Access will try to use queries which show up a problem with the backend:

Here is an example of an Access query with a unique index on a single
field:

SELECT balance_id,company_id, balance_date, is_audited,comment,
balance_type, balance_filename  FROM balance  WHERE balance_id = 1 OR
balance_id = 2 OR balance_id = 3 OR balance_id = 4 OR balance_id = 5 OR
balance_id = 6 OR balance_id = 7 OR balance_id = 8 OR balance_id = 9 OR
balance_id = 10

The more keyparts you have, the worse the problem is (2 keyparts):

SELECT balance_id,company_id, balance_date, is_audited,comment,
balance_type, balance_filename  FROM balance  WHERE balance_id = 1 AND
company_id=1 OR balance_id = 1 AND company_id=2 OR balance_id = 1 AND
company_id=3 OR balance_id = 2 AND company_id=1 OR balance_id = 2 AND
company_id=2 OR balance_id = 2 AND company_id=3 OR balance_id = 3 AND
company_id=1 OR balance_id = 3 AND company_id=2 OR balance_id = 3 AND
company_id=3 OR balance_id = 4 AND company_id=1

Any more than 2 keyparts, results in crashing the backend with the
message
"palloc failure:  memory exhausted".  Even at 2 keyparts, performance
suffers greatly.

In both of the above examples, Access is trying to retrieve 10 records
using a "Prepared" statement  (prepared statementents are "faked" in the
driver, since they are not implemented in the backend)  with the unique
index of the table.

We have known about this problem and have discussed it with the hackers
list in the past.  It is on the todo list under "Performance" and it
appears as
"Allow indexes to be used with OR clauses(Vadim) ".  I am not sure of
the priority of this fix, however, or how difficult it would be to
implement it.

The reason we are mentioning this with renewed vigor, is that in the
past, with the old driver, Access 7.0 and Access 97, would ask the user
what they wanted the unique index to be.  You could tell it whatever you
wanted, and even, not specify any unique index.  Now, with this new
unique index fix, you will not have a choice as to whether you want to
use unique indexes or not, which, depending on how many fields are being
indexed on, may crash the backend.

Of course, if you are not using "unique" indexes on your table, Access
7.0 and 97 will ask you at link time, as before.

Does anyone have any knowledge of the above problem and/or the priority
of the fix that Vadim is mentioned on?

Sorry for the long length of this letter.

Regards,

Byron


Re: [HACKERS] New Driver and Unique Indexes

From
"Vadim B. Mikheev"
Date:
Byron Nikolaidis wrote:
>
> SELECT balance_id,company_id, balance_date, is_audited,comment,
> balance_type, balance_filename  FROM balance  WHERE balance_id = 1 AND
> company_id=1 OR balance_id = 1 AND company_id=2 OR balance_id = 1 AND
> company_id=3 OR balance_id = 2 AND company_id=1 OR balance_id = 2 AND
> company_id=2 OR balance_id = 2 AND company_id=3 OR balance_id = 3 AND
> company_id=1 OR balance_id = 3 AND company_id=2 OR balance_id = 3 AND
> company_id=3 OR balance_id = 4 AND company_id=1
>
> Any more than 2 keyparts, results in crashing the backend with the
> message
> "palloc failure:  memory exhausted".  Even at 2 keyparts, performance
> suffers greatly.

THis is known problem of canonificator in optimizer. This query will
crash backend without any indices too.
We told about this in the 6.3-beta period.
No fix currently. I don't know when it will be at all.

Vadim

Re: [INTERFACES] Re: [HACKERS] New Driver and Unique Indexes

From
"Vadim B. Mikheev"
Date:
Sorry for koi-8 charset :)

Vadim B. Mikheev wrote:
>
> Byron Nikolaidis wrote:
> >
> > SELECT balance_id,company_id, balance_date, is_audited,comment,
> > balance_type, balance_filename  FROM balance  WHERE balance_id = 1 AND
> > company_id=1 OR balance_id = 1 AND company_id=2 OR balance_id = 1 AND
> > company_id=3 OR balance_id = 2 AND company_id=1 OR balance_id = 2 AND
> > company_id=2 OR balance_id = 2 AND company_id=3 OR balance_id = 3 AND
> > company_id=1 OR balance_id = 3 AND company_id=2 OR balance_id = 3 AND
> > company_id=3 OR balance_id = 4 AND company_id=1
> >
> > Any more than 2 keyparts, results in crashing the backend with the
> > message
> > "palloc failure:  memory exhausted".  Even at 2 keyparts, performance
> > suffers greatly.
>
> THis is known problem of canonificator in optimizer. This query will
> crash backend without any indices too.
> We told about this in the 6.3-beta period.
> No fix currently. I don't know when it will be at all.
>
> Vadim

Re: [HACKERS] New Driver and Unique Indexes

From
Bruce Momjian
Date:
> One downside about UNIQUE INDEXES however, is how Microsoft Access
> handles them when you open the table in datasheet view.  Whether you
> specify the unique index at link time, or the driver provides the info,
> Access will try to use queries which show up a problem with the backend:
>
> Here is an example of an Access query with a unique index on a single
> field:
>
> SELECT balance_id,company_id, balance_date, is_audited,comment,
> balance_type, balance_filename  FROM balance  WHERE balance_id = 1 OR
> balance_id = 2 OR balance_id = 3 OR balance_id = 4 OR balance_id = 5 OR
> balance_id = 6 OR balance_id = 7 OR balance_id = 8 OR balance_id = 9 OR
> balance_id = 10
>
> The more keyparts you have, the worse the problem is (2 keyparts):
>
> SELECT balance_id,company_id, balance_date, is_audited,comment,
> balance_type, balance_filename  FROM balance  WHERE balance_id = 1 AND
> company_id=1 OR balance_id = 1 AND company_id=2 OR balance_id = 1 AND
> company_id=3 OR balance_id = 2 AND company_id=1 OR balance_id = 2 AND
> company_id=2 OR balance_id = 2 AND company_id=3 OR balance_id = 3 AND
> company_id=1 OR balance_id = 3 AND company_id=2 OR balance_id = 3 AND
> company_id=3 OR balance_id = 4 AND company_id=1

OK, I have the dope on this one.  The palloc failure is not the OR
indexing, but rather the item:

    * Fix memory exhaustion when using many OR's

The bug report that prompted this is attached.  As you can see, it was
also prompted by MS-Access.  The problem is that the backend uses the
text-book method of processing OR's by converting the WHERE clause to
Conjunctive-Normal-Form(CNF), and this exponentially explodes the number
of tests where there are many OR clauses.

We are not sure how to fix it yet.  Vadim has improved the handling of
this in 6.3.*, but it still is not perfect and needs a solution.
Obviously other databases are not CNF'ifing the queries so there must be
a solution.  David?

---------------------------------------------------------------------------

Date: Mon, 12 Jan 1998 15:53:18 -0500
From: David Hartwig <daveh@insightdist.com>
To: Bruce Momjian <maillist@candle.pha.pa.us>
Subject: Re: [BUGS] General Bug Report: palloc fails with lots of ANDs and ORs

This is a multi-part message in MIME format.
--------------20C7AC27E8BCA117B23354BE
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Bruce,

I did some homework.   Here is what I have.   The default max data segment size on our (AIX 4.1.4) box is around 130000
kbytes.

I put together a query which put me just past the threshold of the palloc "out of memory error".  It is as follows:

create table outlet (
    number int,
    name varchar(30),
    ...
}

create unique index outlet_key on outlet using btree (number);

select count(*) from outlet
where
    (number = 1 and number = 1 and number = 1) or
    (number = 1 and number = 1 and number = 1) or
    (number = 1 and number = 1 and number = 1) or
    (number = 1 and number = 1 and number = 1) or
    (number = 1 and number = 1 and number = 1) or
    (number = 1 and number = 1 and number = 1) or
    (number = 1 and number = 1 and number = 1) or
    (number = 1 and number = 1 and number = 1) or
    (number = 1 and number = 1 and number = 1);

Not pretty but it makes the point.   Take out two OR clauses and the query works fine (but a bit slow).

The above query is all it takes to use up all 130000 Kbytes of memory.    And, since the query takes a long time to
finallyfail, I was able to 
observe the memory consumption.

I extended the max data segment to 300000.  And tried again.   I could observer the memory consumption up to about
280000when the system 
suddenly got sick.   I was getting all kinds of messages like "cant fork"; bad stuff.  The system did finally recover
onits own.   I am not 
sure happened there.   I know that ulimit puts us right around the physical memory limits of out system.

Using 300 meg for the above query seems like a bit of a problem.   It is difficult to imagine where all that memory is
beingused.   I will 
research the problem further if you need more information.

Bruce Momjian wrote:

> Try changing your OS default memory size.  Unsure how to do this under
> AIX.
>
> >
> >
> > ============================================================================
> >                         POSTGRESQL BUG REPORT TEMPLATE
> > ============================================================================
> >
> >
> > Your name             : David Hartwig
> > Your email address    : daveh@insightdist.com
> >
> > Category              : runtime: back-end: SQL
> > Severity              : serious
> >
> > Summary: palloc fails with lots of ANDs and ORs
> >
> > System Configuration
> > --------------------
> >   Operating System   : AIX 4.1
> >
> >   PostgreSQL version : 6.2
> >
> >   Compiler used      : native CC
> >
> > Hardware:
> > ---------
> > RS 6000
> >
> > Versions of other tools:
> > ------------------------
> > NA
> >
> > --------------------------------------------------------------------------
> >
> > Problem Description:
> > --------------------
> > The follow is a mail message describing the problem on the PostODBC mailing list:
> >
> >
> > I have run across this also.   We traced it down to a failure in the PostgreSQL server.   This occurs under the
followingconditions. 
> >
> >     1.  MS Access
> >     2.  Specify a multi-part key in the link time setup with postgresql
> >     3.  Click on table view.
> >
> > What happens is MS Access takes the following steps.   First it selects all possible key values for the table being
viewed.  I 
> > suspect it maps the key values to the relative row position in the display.   Then it uses the mapping to generate
futurequeries based 
> > on the mapping and the rows showing on the screen.   The queries take the following form:
> >
> >     SELECT  keypart1, keypart2, keypart3, col4, col5, col6 ... FROM example_table
> >     WHERE
> >         (keypart1 = row1keypartval1 AND  keypart2 = row1keypartval2 AND  keypart3 = row1keypartval3) OR
> >         (keypart1 = row2keypartval1 AND  keypart2 = row2keypartval2 AND  keypart3 = row2keypartval3) OR
> >             .
> >             .      --  28 lines of this stuff.   Why 28... Why not 28
> >             .
> >         (keypart1 = row27keypartval1 AND  keypart2 = row27keypartval2 AND  keypart3 = row27keypartval3) OR
> >         (keypart1 = row28keypartval1 AND  keypart2 = row28keypartval2 AND  keypart3 = row28keypartval3);
> >
> >
> > The PostgreSQL sever chokes on this statement claiming it is out of memory.  (palloc)  In this example I used a
threepart key.  I 
> > do not recall if a three part key is enough to trash the backend.  It has been a while.    I have tried sending
thesekinds of statements 
> > directly through the psql monitor and get the same result.
> >
> >
> > --------------------------------------------------------------------------
> >
> > Test Case:
> > ----------
> > select c1, c1 c3, c4, c5 ... from example_table
> > where
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something);
> >
> >
> > --------------------------------------------------------------------------
> >
> > Solution:
> > ---------
> >
> >
> > --------------------------------------------------------------------------
> >
> >
> >
>
> --
> Bruce Momjian
> maillist@candle.pha.pa.us



--------------20C7AC27E8BCA117B23354BE
Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for David Hartwig
Content-Disposition: attachment; filename="vcard.vcf"

begin:          vcard
fn:             David Hartwig
n:              Hartwig;David
org:            Insight Distribution Systems
adr:            222 Shilling Circle;;;Hunt Valley ;MD;21030;USA
email;internet: daveh@insightdist.com
title:          Manager Research & Development
tel;work:       (410)403-2308
x-mozilla-cpt:  ;0
x-mozilla-html: TRUE
version:        2.1
end:            vcard


--------------20C7AC27E8BCA117B23354BE--


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: New Driver and Unique Indexes

From
Hannu Krosing
Date:
Byron Nikolaidis wrote:
>
> Hello,
>
> I have posted a new version of the ODBC driver at our web site.
> (http://www.insightdist.com/psqlodbc).  We are also now including a
> version number (this one is 06.30.0010).  You can click on this link and
> see what changes this version includes.  Also, you can look with the
> ODBC Administrator under "ODBC Drivers" and get the current version for
> correspondence and so forth.
>
> 1. This new version fixes problems with execution time parameters
> (SQLParamData, SQLPutData) for text fields where parameters were being
> dropped and a '?' was appearing in the query.

Good! And thanks ;)!

>
> ---------  HACKERS INVITED TO PLEASE READ THIS SECTION ---------
>
> One downside about UNIQUE INDEXES however, is how Microsoft Access
> handles them when you open the table in datasheet view.  Whether you
> specify the unique index at link time, or the driver provides the info,
> Access will try to use queries which show up a problem with the backend:
>
> Here is an example of an Access query with a unique index on a single
> field:
>
> SELECT balance_id,company_id, balance_date, is_audited,comment,
> balance_type, balance_filename  FROM balance  WHERE balance_id = 1 OR
> balance_id = 2 OR balance_id = 3 OR balance_id = 4 OR balance_id = 5 OR
> balance_id = 6 OR balance_id = 7 OR balance_id = 8 OR balance_id = 9 OR
> balance_id = 10
>
> The more keyparts you have, the worse the problem is (2 keyparts):
>
> SELECT balance_id,company_id, balance_date, is_audited,comment,
> balance_type, balance_filename  FROM balance  WHERE balance_id = 1 AND
> company_id=1 OR balance_id = 1 AND company_id=2 OR balance_id = 1 AND
> company_id=3 OR balance_id = 2 AND company_id=1 OR balance_id = 2 AND
> company_id=2 OR balance_id = 2 AND company_id=3 OR balance_id = 3 AND
> company_id=1 OR balance_id = 3 AND company_id=2 OR balance_id = 3 AND
> company_id=3 OR balance_id = 4 AND company_id=1

as a quick (?) fix, can't this kind of query be identified in the driver
(now)
or in the backend(later) and rewritten to a union query like this

SELECT balance_id,company_id, balance_date, is_audited,comment,
balance_type, balance_filename  FROM balance
WHERE balance_id = 1 AND company_id=1
union
SELECT balance_id,company_id, balance_date, is_audited,comment,
balance_type, balance_filename  FROM balance
WHERE balance_id = 1 AND company_id=2
union
.
.
.
union
SELECT balance_id,company_id, balance_date, is_audited,comment,
balance_type, balance_filename  FROM balance
WHERE balance_id = 4 AND company_id=1
;

Or is the optimiser too smart and rewrites it back to the original form
?

once the identification phase is done in the backend, it should be
easy to check that all the fields ORed together are from an unique
index and do an index scan instead of a rewrite to union.

> Any more than 2 keyparts, results in crashing the backend with the
> message "palloc failure:  memory exhausted".  Even at 2 keyparts, performance
> suffers greatly.

Actually it did not crash on me even on 3 keyparts, the backend just
grew to 97MB and stayed so until I closed access ;(.

Once  had to kill both access and backend, but then I had been
careless and viewed two tables with a primary key of more
than 1 field ;)

> In both of the above examples, Access is trying to retrieve 10 records
> using a "Prepared" statement  (prepared statementents are "faked" in the
> driver, since they are not implemented in the backend)  with the unique
> index of the table.

perhaps the rewriting of ORs to UNION could be done while "Preparing".

The heuristics would be just to check if the where clause has
altenating ANDs and ORs and then split and rewrite it to union at each
OR.

This of course can hit the infamous 8k limitation of query size
(is it still there ?)

>
> The reason we are mentioning this with renewed vigor, is that in the
> past, with the old driver, Access 7.0 and Access 97, would ask the user
> what they wanted the unique index to be.  You could tell it whatever you
> wanted, and even, not specify any unique index.  Now, with this new
> unique index fix, you will not have a choice as to whether you want to
> use unique indexes or not, which, depending on how many fields are being
> indexed on, may crash the backend.

As a temporary fix, you could just return the unique indexes of one
field only. You cold easyly remove the check later when backend gets
fixed.

or the ones with specific naming, for example ending in *_mspkey ?

>
> Sorry for the long length of this letter.
>

Until this is fixed it should go in some readme with BIG BOLD LETTERS ;)

Hannu