Thread: index unique

index unique

From
Marc Millas
Date:
Hi,
postgres 12 with postgis.
on a table we need a primary key and to get a unique combinaison, we need 3 columns of that table:
1 of type integer,
1 of type text,
1 of type geometry

creating the PK constraint doesn work: (even with our current small data set)
ERROR:  index row size 6072 exceeds btree version 4 maximum 2704 for index "xxx_spkey"
DETAIL:  Index row references tuple (32,1) in relation "xxx".
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.


ok. we can do this.
but if so, we need to create a gist index on the geometry column to do any topology request.
so 2 indexes containing this single column.

if we install extension btree_gist, no pb to create an index on all 3 columns.
but as gist does not support unicity, this index cannot be used for the PK.

OK, we may try to use a function to get the bounding box around the geometry objects and use the result into a btree index........

Any idea (I mean: another idea !) to tackle this ?
Or any critic on the "solution" ??

thanks,


Marc MILLAS
Senior Architect
+33607850334

Re: index unique

From
Paul Ramsey
Date:
Primary key is going to be a BTREE index. I'm surprised you require the geometry in order to achieve uniqueness?
You can't put the geometry into a BTREE because it's too large.
You could add a column and stick the MD5 hash of the geometry there, and use that as the last piece of uniqueness?
If the bbox of the geometry is "good enough" for your key, that kind of speaks to the idea that maybe your geometry
doesn'tneed to be part of the PK? It's possible for different geometries to have the same bbox... 
ATB,
P

> On Jun 3, 2021, at 1:51 PM, Marc Millas <marc.millas@mokadb.com> wrote:
>
> Hi,
> postgres 12 with postgis.
> on a table we need a primary key and to get a unique combinaison, we need 3 columns of that table:
> 1 of type integer,
> 1 of type text,
> 1 of type geometry
>
> creating the PK constraint doesn work: (even with our current small data set)
> ERROR:  index row size 6072 exceeds btree version 4 maximum 2704 for index "xxx_spkey"
> DETAIL:  Index row references tuple (32,1) in relation "xxx".
> HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
> Consider a function index of an MD5 hash of the value, or use full text indexing.
>
> ok. we can do this.
> but if so, we need to create a gist index on the geometry column to do any topology request.
> so 2 indexes containing this single column.
>
> if we install extension btree_gist, no pb to create an index on all 3 columns.
> but as gist does not support unicity, this index cannot be used for the PK.
>
> OK, we may try to use a function to get the bounding box around the geometry objects and use the result into a btree
index........
>
> Any idea (I mean: another idea !) to tackle this ?
> Or any critic on the "solution" ??
>
> thanks,
>
>
> Marc MILLAS
> Senior Architect
> +33607850334
> www.mokadb.com
>




Re: index unique

From
Marc Millas
Date:
Hi Paul,

obviously the bounding box is not a perfect choice. we also think of one of the "centers" point, but quite similar non uniqueness.

so, if no "tech" solution, we continue to work with the business to try to find an appropriate PK

thanks

Marc MILLAS
Senior Architect
+33607850334



On Thu, Jun 3, 2021 at 10:57 PM Paul Ramsey <pramsey@cleverelephant.ca> wrote:
Primary key is going to be a BTREE index. I'm surprised you require the geometry in order to achieve uniqueness?
You can't put the geometry into a BTREE because it's too large.
You could add a column and stick the MD5 hash of the geometry there, and use that as the last piece of uniqueness?
If the bbox of the geometry is "good enough" for your key, that kind of speaks to the idea that maybe your geometry doesn't need to be part of the PK? It's possible for different geometries to have the same bbox...
ATB,
P

> On Jun 3, 2021, at 1:51 PM, Marc Millas <marc.millas@mokadb.com> wrote:
>
> Hi,
> postgres 12 with postgis.
> on a table we need a primary key and to get a unique combinaison, we need 3 columns of that table:
> 1 of type integer,
> 1 of type text,
> 1 of type geometry
>
> creating the PK constraint doesn work: (even with our current small data set)
> ERROR:  index row size 6072 exceeds btree version 4 maximum 2704 for index "xxx_spkey"
> DETAIL:  Index row references tuple (32,1) in relation "xxx".
> HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
> Consider a function index of an MD5 hash of the value, or use full text indexing.
>
> ok. we can do this.
> but if so, we need to create a gist index on the geometry column to do any topology request.
> so 2 indexes containing this single column.
>
> if we install extension btree_gist, no pb to create an index on all 3 columns.
> but as gist does not support unicity, this index cannot be used for the PK.
>
> OK, we may try to use a function to get the bounding box around the geometry objects and use the result into a btree index........
>
> Any idea (I mean: another idea !) to tackle this ?
> Or any critic on the "solution" ??
>
> thanks,
>
>
> Marc MILLAS
> Senior Architect
> +33607850334
> www.mokadb.com
>

Re: index unique

From
"Peter J. Holzer"
Date:
On 2021-06-03 22:51:55 +0200, Marc Millas wrote:
> postgres 12 with postgis.
> on a table we need a primary key and to get a unique combinaison, we need 3
> columns of that table:
> 1 of type integer,
> 1 of type text,
> 1 of type geometry
>
> creating the PK constraint doesn work:

I find that if a natural primary key candidate is so complex, it is
usually better to use a surrogate key.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: index unique

From
"David G. Johnston"
Date:
On Sun, Jun 6, 2021 at 11:55 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2021-06-03 22:51:55 +0200, Marc Millas wrote:
> postgres 12 with postgis.
> on a table we need a primary key and to get a unique combinaison, we need 3
> columns of that table:
> 1 of type integer,
> 1 of type text,
> 1 of type geometry
>
> creating the PK constraint doesn work:

I find that if a natural primary key candidate is so complex, it is
usually better to use a surrogate key.

You make this sound like an either-or proposition, but personally it takes a very exceptional circumstance to forgo defining a unique natural key.  Whether I choose to supplement that with a surrogate key is a different matter altogether.

In this case identity would seem to make more sense using labels, not composition.  For instance, on a substance table I would have the identifier for water be "H20", and have the chemical composition of water be "H20" (not the best example...).  In this case ensuring uniqueness of the formula-to-label dependency would be trivial to implement and I probably would too - but in the case of geometry I'd just accept that using an index to do this would not be possible and, if I really needed reassurance of geometry uniqueness, I would do so in triggers.

David J.

Re: index unique

From
"Peter J. Holzer"
Date:
On 2021-06-07 10:20:22 -0700, David G. Johnston wrote:
> On Sun, Jun 6, 2021 at 11:55 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>     On 2021-06-03 22:51:55 +0200, Marc Millas wrote:
>     > postgres 12 with postgis.
>     > on a table we need a primary key and to get a unique combinaison, we need
>     > 3 columns of that table:
>     > 1 of type integer,
>     > 1 of type text,
>     > 1 of type geometry
>     >
>     > creating the PK constraint doesn work:
>
>     I find that if a natural primary key candidate is so complex, it is
>     usually better to use a surrogate key.
>
>
> You make this sound like an either-or proposition,

While he is talking about *a* primary key, it should be *the* primary
key. There can be only one (that's why it is the primary key).

There can be several unique keys, though.

> but personally it takes a very exceptional circumstance to forgo
> defining a unique natural key.

True, but not what he's talking about.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: index unique

From
Marc Millas
Date:
quite funny to see how a tech question seems to end into an english grammar thing :-)

quote
> You make this sound like an either-or proposition,

While he is talking about *a* primary key, it should be *the* primary
key. There can be only one (that's why it is the primary key).

There can be several unique keys, though.

> but personally it takes a very exceptional circumstance to forgo
> defining a unique natural key.

True, but not what he's talking about.
end quote


when I did write the question, I write: "on a table we need a primary key"
its quite obvious that there is only one PK, but, at that stage, I dont need one for functional reasons  
so, I need "a" primary key,  whatever it is, just because  postgres logical replication needs one to accept updates.

then, to build "the" needed PK, we need to decide which columns or set of will be appropriate
as it has to be able to identify something unique. and this is how we get to this list of 3 columns.

So, ok, this set doesn't work asis.
all solutions we thought of are just tech workaround, providing no functionnal meaning.

still, its somehow surprising: (at least to me !)
postgres has no problem creating a btree for a geometry column, and, so, no problem for putting a geom column in a PK.
(very different from Oracle...)


the only pb observed is the size of the object accepted. if the geom is a bit "big" then the index errors.about btree size of index object.
but if I create a table test_l with a text column blabla as a PK, and insert a 100 000 character long string , no pb.
if I do an explain analyze select blabla from test_l order by blabla, postgres tells me that it did an index only scan.
so, I wonder why Postgres is able to put a 100 000 long text in a btree index and NOT a geom column which wkt is 10 000 bytes long.

there, sure, is a reason.

please, let me know :-)





Marc MILLAS
Senior Architect
+33607850334



On Tue, Jun 8, 2021 at 12:09 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2021-06-07 10:20:22 -0700, David G. Johnston wrote:
> On Sun, Jun 6, 2021 at 11:55 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>     On 2021-06-03 22:51:55 +0200, Marc Millas wrote:
>     > postgres 12 with postgis.
>     > on a table we need a primary key and to get a unique combinaison, we need
>     > 3 columns of that table:
>     > 1 of type integer,
>     > 1 of type text,
>     > 1 of type geometry
>     >
>     > creating the PK constraint doesn work:
>
>     I find that if a natural primary key candidate is so complex, it is
>     usually better to use a surrogate key.
>
>
> You make this sound like an either-or proposition,

While he is talking about *a* primary key, it should be *the* primary
key. There can be only one (that's why it is the primary key).

There can be several unique keys, though.

> but personally it takes a very exceptional circumstance to forgo
> defining a unique natural key.

True, but not what he's talking about.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Re: index unique

From
"Peter J. Holzer"
Date:
On 2021-06-08 18:30:16 +0200, Marc Millas wrote:
> the only pb observed is the size of the object accepted. if the geom is a bit
> "big" then the index errors.about btree size of index object.
> but if I create a table test_l with a text column blabla as a PK, and insert a
> 100 000 character long string , no pb.
> if I do an explain analyze select blabla from test_l order by blabla, postgres
> tells me that it did an index only scan.
> so, I wonder why Postgres is able to put a 100 000 long text in a btree index
> and NOT a geom column which wkt is 10 000 bytes long.

Good question. Maybe your texts compresses better than your geometries:

hjp=> insert into t(t) values(repeat('a', 235327) || '1');
INSERT 0 1
Time: 60.057 ms
hjp=> insert into t(t) values(repeat('a', 235328) || '1');
ERROR:  index row size 2720 exceeds maximum 2712 for index "t_pkey1"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.
Time: 58.751 ms

Note the difference between the length of the string I was trying to
insert and the length of the row it complains about.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: index unique

From
Marc Millas
Date:
Cristal clear !
and it have to be the case as my test was done with some not so random data.

but this mean that we cannot put a bunch of datatypes in a PK, as soon as it may be longer than 2701.
I know, its clearly stated in postgres doc about btree," The only limitation is that an index entry cannot exceed approximately one-third of a page (after TOAST compression, if applicable)."
but as this looks quite hard coded, it means that for  long utf8 things the data length is not so long before hitting the limit.

is there any plan to adress this ?
kind of index toast ?? :-)

thanks


Marc MILLAS
Senior Architect
+33607850334



On Tue, Jun 8, 2021 at 8:42 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2021-06-08 18:30:16 +0200, Marc Millas wrote:
> the only pb observed is the size of the object accepted. if the geom is a bit
> "big" then the index errors.about btree size of index object.
> but if I create a table test_l with a text column blabla as a PK, and insert a
> 100 000 character long string , no pb.
> if I do an explain analyze select blabla from test_l order by blabla, postgres
> tells me that it did an index only scan.
> so, I wonder why Postgres is able to put a 100 000 long text in a btree index
> and NOT a geom column which wkt is 10 000 bytes long.

Good question. Maybe your texts compresses better than your geometries:

hjp=> insert into t(t) values(repeat('a', 235327) || '1');
INSERT 0 1
Time: 60.057 ms
hjp=> insert into t(t) values(repeat('a', 235328) || '1');
ERROR:  index row size 2720 exceeds maximum 2712 for index "t_pkey1"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.
Time: 58.751 ms

Note the difference between the length of the string I was trying to
insert and the length of the row it complains about.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Re: index unique

From
"David G. Johnston"
Date:
On Tuesday, June 8, 2021, Marc Millas <marc.millas@mokadb.com> wrote:

but as this looks quite hard coded, it means that for  long utf8 things the data length is not so long before hitting the limit.

is there any plan to adress this ?

None that I’ve seen, and I don’t expect to see one either.  Mainly because I’ve yet to see or think of a use case that would warrant even considering doing such, ignoring the fact that actually changing such a value is likely to be a non-starter from a compatability perspective.

David J.

Re: index unique

From
Tom Lane
Date:
Marc Millas <marc.millas@mokadb.com> writes:
> I know, its clearly stated in postgres doc about btree," The only
> limitation is that an index entry cannot exceed approximately one-third of
> a page (after TOAST compression, if applicable)."

Yup.

> is there any plan to adress this ?

No.  The cost/benefit ratio seems completely untenable.

The usual recommendation is that you shorten long values by hashing
them, eg create a unique index on md5(my_long_column).

            regards, tom lane



Re: index unique

From
Thomas Kellerer
Date:
Marc Millas schrieb am 03.06.2021 um 22:51:
> on a table we need a primary key and to get a unique combinaison, we need 3 columns of that table:
> 1 of type integer,
> 1 of type text,
> 1 of type geometry
>
> creating the PK constraint doesn work: (even with our current small data set)
> ERROR:  index row size 6072 exceeds btree version 4 maximum 2704 for index "xxx_spkey"
> DETAIL:  Index row references tuple (32,1) in relation "xxx".
> HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
> Consider a function index of an MD5 hash of the value, or use full text indexing.
>
> ok. we can do this.
> but if so, we need to create a gist index on the geometry column to do any topology request.
> so 2 indexes containing this single column.
>
> if we install extension btree_gist, no pb to create an index on all 3 columns.
> but as gist does not support unicity, this index cannot be used for the PK.
>
> OK, we may try to use a function to get the bounding box around the geometry objects and use the result into a btree
index........
>
> Any idea (I mean: another idea !) to tackle this ?
> Or any critic on the "solution" ??

How do you define the "uniqueness" of the geometry?

GIST can support "uniqueness" through exclusion constraints.
It's not a primary key, so you can't create foreign keys referencing that table,
but it does ensure uniqueness (In fact the "normal" unique indexes are
essentially a special case of exclusion constraints)

    create index on the_table using gist (int_column with =, text_col with =, geometry_col with &&);

Replace the && operator with whatever is appropriate for your use case.

Thomas



Re: index unique

From
Marc Millas
Date:
Thanks Thomas,

but, as stated after the first post, the need was for a PK as asked by postgres (ie. for tech needs, not for functionnal needs)
up to now, looks like we must create a PK (and so, the associated index) just to answer logical replication needs.(and qgis which also needs a PK)
that index  (some kind of hash on the geom column + the other 2).  have no meaning on a functionnal point of view, and there are chances that it will never be used by postgres  in normal use..
as the hash will not help on any topological request...

that was the reason of the first mail: as we must create a PK, is there any way to make something useful and not this unuseful "thing" ?




Marc MILLAS
Senior Architect
+33607850334



On Tue, Jun 8, 2021 at 10:51 PM Thomas Kellerer <shammat@gmx.net> wrote:
Marc Millas schrieb am 03.06.2021 um 22:51:
> on a table we need a primary key and to get a unique combinaison, we need 3 columns of that table:
> 1 of type integer,
> 1 of type text,
> 1 of type geometry
>
> creating the PK constraint doesn work: (even with our current small data set)
> ERROR:  index row size 6072 exceeds btree version 4 maximum 2704 for index "xxx_spkey"
> DETAIL:  Index row references tuple (32,1) in relation "xxx".
> HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
> Consider a function index of an MD5 hash of the value, or use full text indexing.
>
> ok. we can do this.
> but if so, we need to create a gist index on the geometry column to do any topology request.
> so 2 indexes containing this single column.
>
> if we install extension btree_gist, no pb to create an index on all 3 columns.
> but as gist does not support unicity, this index cannot be used for the PK.
>
> OK, we may try to use a function to get the bounding box around the geometry objects and use the result into a btree index........
>
> Any idea (I mean: another idea !) to tackle this ?
> Or any critic on the "solution" ??

How do you define the "uniqueness" of the geometry?

GIST can support "uniqueness" through exclusion constraints.
It's not a primary key, so you can't create foreign keys referencing that table,
but it does ensure uniqueness (In fact the "normal" unique indexes are
essentially a special case of exclusion constraints)

    create index on the_table using gist (int_column with =, text_col with =, geometry_col with &&);

Replace the && operator with whatever is appropriate for your use case.

Thomas


Re: index unique

From
Alban Hertroys
Date:
> On 8 Jun 2021, at 22:50, Thomas Kellerer <shammat@gmx.net> wrote:
>
> Marc Millas schrieb am 03.06.2021 um 22:51:
>> on a table we need a primary key and to get a unique combinaison, we need 3 columns of that table:
>> 1 of type integer,
>> 1 of type text,
>> 1 of type geometry
>>
>
> How do you define the "uniqueness" of the geometry?

That is actually the big question here. Multiple “unique” geometries can specify the same geometry!

A geom as simple as a line from (0,0) - (1,0) can just as easily be specified as (1,0) - (0,0). That’s the simplest
case,and one could argue that the point of origin is different, but the next example would be a triangle starting at
thesame origin but traversed in different directions. It gets harder the more vertices a polygon has. 

I would argue that a geometry type is ill-suited as a primary key column candidate.

Now, of course, the OP could have a case where their geometries are guaranteed to be unique regardless, but they’d
bettermake sure before adding them to the PK. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: index unique

From
Marc Millas
Date:
Hi Alban,

I plainly agree on the uniqueness thing. and on the fact that a PK with only a geometry column can be considered somehow "ill suited".
That said, the PK we finally use contains, as said, 3 columns:
--an id (integer column)
--a topology describer (the text column) 
--and the geometry column.

this set IS unique in our dataset. and helps provide some quality checks (wrong id, non coherent tolology,. and some geometry errors..)

anyway, my original post was about the fact that we must create a PK based on some kind of work around the limited btree length behaviour thus creating an index that will never be of any use, just for technical reasons.
People in this list have been quite clear that the btree limitation will NOT change. Which ends the point.

regards

Marc MILLAS
Senior Architect
+33607850334



On Thu, Jun 10, 2021 at 10:33 PM Alban Hertroys <haramrae@gmail.com> wrote:

> On 8 Jun 2021, at 22:50, Thomas Kellerer <shammat@gmx.net> wrote:
>
> Marc Millas schrieb am 03.06.2021 um 22:51:
>> on a table we need a primary key and to get a unique combinaison, we need 3 columns of that table:
>> 1 of type integer,
>> 1 of type text,
>> 1 of type geometry
>>
>
> How do you define the "uniqueness" of the geometry?

That is actually the big question here. Multiple “unique” geometries can specify the same geometry!

A geom as simple as a line from (0,0) - (1,0) can just as easily be specified as (1,0) - (0,0). That’s the simplest case, and one could argue that the point of origin is different, but the next example would be a triangle starting at the same origin but traversed in different directions. It gets harder the more vertices a polygon has.

I would argue that a geometry type is ill-suited as a primary key column candidate.

Now, of course, the OP could have a case where their geometries are guaranteed to be unique regardless, but they’d better make sure before adding them to the PK.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: index unique

From
"Peter J. Holzer"
Date:
On 2021-06-11 14:37:57 +0200, Marc Millas wrote:
> anyway, my original post was about the fact that we must create a PK based on
> some kind of work around the limited btree length behaviour thus creating an
> index that will never be of any use, just for technical reasons.

Actually, that index will be used, and it will be used every time you
update, insert, or delete a row - by the logical replication system
which has to identify the affected row(s) on the secondary system.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment