texteq/byteaeq: avoid detoast [REVIEW] - Mailing list pgsql-hackers

From Andy Colson
Subject texteq/byteaeq: avoid detoast [REVIEW]
Date
Msg-id 4D334167.7060108@squeakycode.net
Whole thread Raw
In response to texteq/byteaeq: avoid detoast  (Noah Misch <noah@leadboat.com>)
Responses Re: texteq/byteaeq: avoid detoast [REVIEW]
Re: texteq/byteaeq: avoid detoast [REVIEW]
Re: texteq/byteaeq: avoid detoast [REVIEW]
List pgsql-hackers
This is a review of:
https://commitfest.postgresql.org/action/patch_view?id=468

Purpose:
========
Equal and not-equal _may_ be quickly determined if their lengths are different.   This _may_ be a huge speed up if we
donthave to detoat.
 


The Patch:
==========
I was able to read and understand the patch, its a simple change and looked correct to me (a non PG hacker).
It applies clean to git head, compiles and runs fine with debug enabled.

make check passes


Usability:
==========
I used _may_ above.  The benchmark included with the patch, showing huge speedups, is really contrived.  It uses a
whereclause with a thousand character constant:  (where c = 'long...long...long...long...ConstantText...etc').  In my
opinionthis is very uncommon (the author does note this is a "best case").  If you have a field large enough to be
toastedyou are not going to be using that to search on, you are going to have an ID field that is indexed.  (select c
whereid = 7)
 

This also only touches = and <>.  > < and like wont be touched.  So I think the scope of this is limited.

THAT being said, the patch is simple, and if you do happen to hit the code, it will speed things up.  As a user of PG
I'dlike to have this included.  Its a corner case, but a big corner, and its a small, simple change, and it wont slow
anythingelse down.
 


Performance:
============
I created myself a more real world test, with a table with indexes and id's and a large toasted field.

create table junk(id serial primary key, xgroup integer, c text);
create index junk_group on junk(xgroup);


I filled it full of junk:

do $$declare i integer;declare j integer;
beginfor i in 1..100 loop    for j in 1..500 loop        insert into junk(xgroup, c) values (j, 'c'||i);        insert
intojunk (xgroup, c) select j, repeat('abc', 2000)|| n from generate_series(1, 5) n;    end loop;end loop;
 
end$$;


This will make about 600 records within the same xgroup.  As well as a simple 'c15' type of value in c we can search
for. My thinking is you may not know the exact unique id, but you do know what group its in, so that'll cut out 90% of
therecords, and then you'll have to add " and c = 'c15'" to get the exact one you want.
 

I still saw a nice performance boost.

Old PG:
$ psql < bench3.sql
Timing is on.
DO
Time: 2010.412 ms

Patched:
$ psql < bench3.sql
Timing is on.
DO
Time: 184.602 ms


bench3.sql:
do $$declare i integer;
beginfor i in 1..400 loop    perform count(*) from junk where xgroup = i and c like 'c' || i;end loop;
end$$;



Summary:
========
Performance speed-up:  Oh yeah!  If you just happen to hit it, and if you do hit it, you might want to re-think your
layouta little bit.
 

Do I want it?  Yes please.




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: We need to log aborted autovacuums
Next
From: Dimitri Fontaine
Date:
Subject: Re: Include WAL in base backup