Thread: Collation-aware comparisons in GIN opclasses
Hackers,
some GIN opclasses uses collation-aware comparisons while they don't need to do especially collation-aware comparison. Examples are text[] and hstore opclasses. Depending on collation this may make them a much slower.
See example.
# show lc_collate ;
lc_collate
─────────────
ru_RU.UTF-8
(1 row)
# create table test as (select array_agg(i::text) from generate_series(1,1000000) i group by (i-1)/10);
SELECT 100000
# create index test_idx on test using gin(array_agg);
CREATE INDEX
Time: 26930,423 ms
# create index test_idx2 on test using gin(array_agg collate "C");
CREATE INDEX
Time: 5143,682 ms
Index creation with collation "ru_RU.UTF-8" is 5 times slower while collation has absolutely no effect on index functionality.
However, we can just replace comparison function for those opclasses because it would break binary compatibility for pg_upgrade. I see following solution:
- Rename such opclasses and make them not default.
- Create new default opclasses with bitwise comparison functions.
- Write recommendation to re-create indexes with default opclasses into documentation.
------
With best regards,
Alexander Korotkov.
Alexander Korotkov.
On Mon, Sep 15, 2014 at 8:28 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote: > some GIN opclasses uses collation-aware comparisons while they don't need to > do especially collation-aware comparison. Examples are text[] and hstore > opclasses. Depending on collation this may make them a much slower. I'm glad that I saw how pointless this was with the jsonb GIN default opclass during development. > Rename such opclasses and make them not default. > Create new default opclasses with bitwise comparison functions. > Write recommendation to re-create indexes with default opclasses into > documentation. I certainly think this should be fixed if at all possible, but I'm not sure about this plan. Can we really rename an opclass without consequence, including having that respected across pg_upgrade? -- Peter Geoghegan
On Mon, Sep 15, 2014 at 10:51 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Sep 15, 2014 at 8:28 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> some GIN opclasses uses collation-aware comparisons while they don't need to
> do especially collation-aware comparison. Examples are text[] and hstore
> opclasses. Depending on collation this may make them a much slower.
I'm glad that I saw how pointless this was with the jsonb GIN default
opclass during development.
> Rename such opclasses and make them not default.
> Create new default opclasses with bitwise comparison functions.
> Write recommendation to re-create indexes with default opclasses into
> documentation.
I certainly think this should be fixed if at all possible, but I'm not
sure about this plan. Can we really rename an opclass without
consequence, including having that respected across pg_upgrade?
Just rename doesn't seem to be safe. Since pg_upgrade uses pg_dump, all indexes are linked to opclasses using their names. Therefore existed indexes will be linked to new opclasses. It's likely we need to execute SQL script renaming opclasses before pg_upgrade. Another option is to don't rename old opclasses, just create new default opclasses with new names. Bruce, what is your opinion about pg_upgrade?
Contrib opclasses would be safe to rename using migration script.
------
With best regards,
Alexander Korotkov.
Peter Geoghegan <pg@heroku.com> writes: > On Mon, Sep 15, 2014 at 8:28 AM, Alexander Korotkov > <aekorotkov@gmail.com> wrote: >> Rename such opclasses and make them not default. >> Create new default opclasses with bitwise comparison functions. >> Write recommendation to re-create indexes with default opclasses into >> documentation. > I certainly think this should be fixed if at all possible, but I'm not > sure about this plan. Can we really rename an opclass without > consequence, including having that respected across pg_upgrade? No. And we don't know how to change the default opclass without breaking things, either. See previous discussions about how we might fix the totally-broken default gist opclass that btree_gist creates for the inet type [1]. The motivation for getting rid of that is *way* stronger than "it might be slow", but there's no apparent way to make something else be the default without creating havoc. regards, tom lane [1] http://www.postgresql.org/message-id/flat/CAE2gYzyUESd188j0b290Gf16502H9B-LwNRS3rFi1SwDb9Qcgw@mail.gmail.com
On Mon, Sep 15, 2014 at 12:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > No. And we don't know how to change the default opclass without > breaking things, either. Is there a page on the Wiki along the lines of "things that we would like to change if ever there is a substantial change in on-disk format that will break pg_upgrade"? ISTM that we should be intelligently saving those some place, just as Redhat presumably save up ABI-breakage over many years for the next major release of RHEL. Alexander's complaint is a good example of such a change, IMV. Isn't it more or less expected that the day will come when we'll make a clean break? -- Peter Geoghegan
On Mon, Sep 15, 2014 at 11:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
------
With best regards,
Alexander Korotkov.
Peter Geoghegan <pg@heroku.com> writes:
> On Mon, Sep 15, 2014 at 8:28 AM, Alexander Korotkov
> <aekorotkov@gmail.com> wrote:
>> Rename such opclasses and make them not default.
>> Create new default opclasses with bitwise comparison functions.
>> Write recommendation to re-create indexes with default opclasses into
>> documentation.
> I certainly think this should be fixed if at all possible, but I'm not
> sure about this plan. Can we really rename an opclass without
> consequence, including having that respected across pg_upgrade?
No. And we don't know how to change the default opclass without
breaking things, either. See previous discussions about how we
might fix the totally-broken default gist opclass that btree_gist
creates for the inet type [1]. The motivation for getting rid of that
is *way* stronger than "it might be slow", but there's no apparent
way to make something else be the default without creating havoc.
I've read thread about gist opclass for inet type. But that case is more difficult because conflict is between builtin opclass and contrib opclass. This case seems to be much simpler: we need to change builtin opclass to builtin opclass and contrib opclass to contrib opclass. I realized that it's problematic to rename builtin opclass due to pg_upgrade. However, it seems still possible to create new opclass and make it default.
With best regards,
Alexander Korotkov.
> No. And we don't know how to change the default opclass without > breaking things, either. See previous discussions about how we > might fix the totally-broken default gist opclass that btree_gist > creates for the inet type [1]. The motivation for getting rid of that > is *way* stronger than "it might be slow", but there's no apparent > way to make something else be the default without creating havoc. Inet case was not the same. We tried to replace the default opclass in contrib with another one in core. It did not work because pg_dump --binary-upgrade dumps the objects of the extension which cannot be restored when there is a default opclass for the same data type. Changing the default opclasses should work if we make pg_dump --binary-upgrade dump the default opclasses with indexes and exclusion constraints. I think it makes sense to do so in --binary-upgrade mode. I can try to come with a patch for this. I cannot see a way to rename opclasses in core. I think we can live with default opclasses which are not named as type_ops.
On Tue, Sep 16, 2014 at 11:29 AM, Emre Hasegeli <emre@hasegeli.com> wrote:
------
With best regards,
Alexander Korotkov.
Changing the default opclasses should work if we make
pg_dump --binary-upgrade dump the default opclasses with indexes
and exclusion constraints. I think it makes sense to do so in
--binary-upgrade mode. I can try to come with a patch for this.
Can you explain it a bit more detail? I didn't get it.
------
With best regards,
Alexander Korotkov.
> > Changing the default opclasses should work if we make > > pg_dump --binary-upgrade dump the default opclasses with indexes > > and exclusion constraints. I think it makes sense to do so in > > --binary-upgrade mode. I can try to come with a patch for this. > > Can you explain it a bit more detail? I didn't get it. pg_upgrade uses pg_dump --binary-upgrade to dump the schema of the old database. Now, it generates CREATE INDEX statements without explicit opclass if opclass is the default. We can change pg_dump to generate the statements with opclass even if opclass is the default in --binary-upgrade mode.
On Tue, Sep 16, 2014 at 12:14 PM, Emre Hasegeli <emre@hasegeli.com> wrote:
> > Changing the default opclasses should work if we make
> > pg_dump --binary-upgrade dump the default opclasses with indexes
> > and exclusion constraints. I think it makes sense to do so in
> > --binary-upgrade mode. I can try to come with a patch for this.
>
> Can you explain it a bit more detail? I didn't get it.
pg_upgrade uses pg_dump --binary-upgrade to dump the schema of
the old database. Now, it generates CREATE INDEX statements without
explicit opclass if opclass is the default. We can change pg_dump
to generate the statements with opclass even if opclass is the default
in --binary-upgrade mode.
Thanks, I get it. I checked pg_dump implementation. It appears to be not as easy as it could be. pg_dump doesn't form index definition by itself. It calls pg_get_indexdef function. This function have no option to dump names of default opclasses. Since we can't change behaviour of old postgres version, we have to make pg_dump form index definition by itself.
------
With best regards,
Alexander Korotkov.
On Tue, Sep 16, 2014 at 06:56:24PM +0400, Alexander Korotkov wrote: > On Tue, Sep 16, 2014 at 12:14 PM, Emre Hasegeli <emre@hasegeli.com> wrote: > > > > Changing the default opclasses should work if we make > > > pg_dump --binary-upgrade dump the default opclasses with indexes > > > and exclusion constraints. I think it makes sense to do so in > > > --binary-upgrade mode. I can try to come with a patch for this. > > > > Can you explain it a bit more detail? I didn't get it. > > pg_upgrade uses pg_dump --binary-upgrade to dump the schema of > the old database. Now, it generates CREATE INDEX statements without > explicit opclass if opclass is the default. We can change pg_dump > to generate the statements with opclass even if opclass is the default > in --binary-upgrade mode. > > > Thanks, I get it. I checked pg_dump implementation. It appears to be not as > easy as it could be. pg_dump doesn't form index definition by itself. It calls > pg_get_indexdef function. This function have no option to dump names of default > opclasses. Since we can't change behaviour of old postgres version, we have to > make pg_dump form index definition by itself. Well, the server is also operating in binary-upgrade mode, so you could have the server-side function pg_get_indexdef() behave differently for pg_upgrade. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Mon, Sep 15, 2014 at 03:42:20PM -0700, Peter Geoghegan wrote: > On Mon, Sep 15, 2014 at 12:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > No. And we don't know how to change the default opclass without > > breaking things, either. > > Is there a page on the Wiki along the lines of "things that we would > like to change if ever there is a substantial change in on-disk format > that will break pg_upgrade"? ISTM that we should be intelligently > saving those some place, just as Redhat presumably save up > ABI-breakage over many years for the next major release of RHEL. > Alexander's complaint is a good example of such a change, IMV. Isn't > it more or less expected that the day will come when we'll make a > clean break? It is on the TODO page under pg_upgrade: Desired changes that would prevent upgrades with pg_upgrade -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 09/15/2014 06:28 PM, Alexander Korotkov wrote: > Hackers, > > some GIN opclasses uses collation-aware comparisons while they don't need > to do especially collation-aware comparison. Examples are text[] and hstore > opclasses. Hmm. It would be nice to use the index for inequality searches, at least on text[]. We don't support that currently, but it would require collation-awareness. > Depending on collation this may make them a much slower. > > See example. > > # show lc_collate ; > lc_collate > ───────────── > ru_RU.UTF-8 > (1 row) > > # create table test as (select array_agg(i::text) from > generate_series(1,1000000) i group by (i-1)/10); > SELECT 100000 > > # create index test_idx on test using gin(array_agg); > CREATE INDEX > Time: *26930,423 ms* > > # create index test_idx2 on test using gin(array_agg collate "C"); > CREATE INDEX > Time: *5143,682 ms* > > Index creation with collation "ru_RU.UTF-8" is 5 times slower while > collation has absolutely no effect on index functionality. It occurs to me that practically all of those comparisons happen when we populate the red-black Tree, during the index build. The purpose of the red-black tree is to collect identical keys together, but there is actually no requirement that the order of the red-black tree matches the order of the index. It also isn't strictly required that it recognizes equal keys as equal. The only requirement is that it doesn't incorrectly put two keys that are equal according to the compare-function, into two different nodes. We could therefore use plain memcmp() to compare the Datums while building the red-black tree. Keys that are bit-wise equal are surely considered as equal by the compare-function. That makes the index build a lot faster. With the attached quick patch: postgres=# create index test_idx on test using gin(array_agg ); CREATE INDEX Time: 880.620 ms This is on my laptop. Without the patch, that takes about 4.7 seconds with the C locale, so this is much faster than even using the C locale. - Heikki
Attachment
On Mon, Sep 29, 2014 at 11:48 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 09/15/2014 06:28 PM, Alexander Korotkov wrote:Hackers,
some GIN opclasses uses collation-aware comparisons while they don't need
to do especially collation-aware comparison. Examples are text[] and hstore
opclasses.
Hmm. It would be nice to use the index for inequality searches, at least on text[]. We don't support that currently, but it would require collation-awareness.Depending on collation this may make them a much slower.
See example.
# show lc_collate ;
lc_collate
─────────────
ru_RU.UTF-8
(1 row)
# create table test as (select array_agg(i::text) from
generate_series(1,1000000) i group by (i-1)/10);
SELECT 100000
# create index test_idx on test using gin(array_agg);
CREATE INDEX
Time: *26930,423 ms*
# create index test_idx2 on test using gin(array_agg collate "C");
CREATE INDEX
Time: *5143,682 ms*
Index creation with collation "ru_RU.UTF-8" is 5 times slower while
collation has absolutely no effect on index functionality.
It occurs to me that practically all of those comparisons happen when we populate the red-black Tree, during the index build. The purpose of the red-black tree is to collect identical keys together, but there is actually no requirement that the order of the red-black tree matches the order of the index. It also isn't strictly required that it recognizes equal keys as equal. The only requirement is that it doesn't incorrectly put two keys that are equal according to the compare-function, into two different nodes.
Good point, Heikki. I experienced several times this problem, fixed it with C-locale and forgot again. Now, it's time to fix !
We could therefore use plain memcmp() to compare the Datums while building the red-black tree. Keys that are bit-wise equal are surely considered as equal by the compare-function. That makes the index build a lot faster. With the attached quick patch:
postgres=# create index test_idx on test using gin(array_agg );
CREATE INDEX
Time: 880.620 ms
This is on my laptop. Without the patch, that takes about 4.7 seconds with the C locale, so this is much faster than even using the C locale.
Hmm, on my MBA I got
17277.734 (patch) vs 39151.562 for ru_RU.UTF-8 and
17277.734 (patch) vs 39151.562 for ru_RU.UTF-8 and
6131.929 (patch) vs 6131.929 for C
Not much :(
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 28, 2014 at 10:33:33PM -0400, Bruce Momjian wrote: > On Mon, Sep 15, 2014 at 03:42:20PM -0700, Peter Geoghegan wrote: > > On Mon, Sep 15, 2014 at 12:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > No. And we don't know how to change the default opclass without > > > breaking things, either. > > > > Is there a page on the Wiki along the lines of "things that we would > > like to change if ever there is a substantial change in on-disk format > > that will break pg_upgrade"? ISTM that we should be intelligently > > saving those some place, just as Redhat presumably save up > > ABI-breakage over many years for the next major release of RHEL. > > Alexander's complaint is a good example of such a change, IMV. Isn't > > it more or less expected that the day will come when we'll make a > > clean break? > > It is on the TODO page under pg_upgrade: > > Desired changes that would prevent upgrades with pg_upgrade Item added to TODO list. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +