texteq/byteaeq: avoid detoast [REVIEW] - Mailing list pgsql-hackers
From | Andy Colson |
---|---|
Subject | texteq/byteaeq: avoid detoast [REVIEW] |
Date | |
Msg-id | 4D334167.7060108@squeakycode.net Whole thread Raw |
In response to | texteq/byteaeq: avoid detoast (Noah Misch <noah@leadboat.com>) |
Responses |
Re: texteq/byteaeq: avoid detoast [REVIEW]
Re: texteq/byteaeq: avoid detoast [REVIEW] Re: texteq/byteaeq: avoid detoast [REVIEW] |
List | pgsql-hackers |
This is a review of: https://commitfest.postgresql.org/action/patch_view?id=468 Purpose: ======== Equal and not-equal _may_ be quickly determined if their lengths are different. This _may_ be a huge speed up if we donthave to detoat. The Patch: ========== I was able to read and understand the patch, its a simple change and looked correct to me (a non PG hacker). It applies clean to git head, compiles and runs fine with debug enabled. make check passes Usability: ========== I used _may_ above. The benchmark included with the patch, showing huge speedups, is really contrived. It uses a whereclause with a thousand character constant: (where c = 'long...long...long...long...ConstantText...etc'). In my opinionthis is very uncommon (the author does note this is a "best case"). If you have a field large enough to be toastedyou are not going to be using that to search on, you are going to have an ID field that is indexed. (select c whereid = 7) This also only touches = and <>. > < and like wont be touched. So I think the scope of this is limited. THAT being said, the patch is simple, and if you do happen to hit the code, it will speed things up. As a user of PG I'dlike to have this included. Its a corner case, but a big corner, and its a small, simple change, and it wont slow anythingelse down. Performance: ============ I created myself a more real world test, with a table with indexes and id's and a large toasted field. create table junk(id serial primary key, xgroup integer, c text); create index junk_group on junk(xgroup); I filled it full of junk: do $$declare i integer;declare j integer; beginfor i in 1..100 loop for j in 1..500 loop insert into junk(xgroup, c) values (j, 'c'||i); insert intojunk (xgroup, c) select j, repeat('abc', 2000)|| n from generate_series(1, 5) n; end loop;end loop; end$$; This will make about 600 records within the same xgroup. As well as a simple 'c15' type of value in c we can search for. My thinking is you may not know the exact unique id, but you do know what group its in, so that'll cut out 90% of therecords, and then you'll have to add " and c = 'c15'" to get the exact one you want. I still saw a nice performance boost. Old PG: $ psql < bench3.sql Timing is on. DO Time: 2010.412 ms Patched: $ psql < bench3.sql Timing is on. DO Time: 184.602 ms bench3.sql: do $$declare i integer; beginfor i in 1..400 loop perform count(*) from junk where xgroup = i and c like 'c' || i;end loop; end$$; Summary: ======== Performance speed-up: Oh yeah! If you just happen to hit it, and if you do hit it, you might want to re-think your layouta little bit. Do I want it? Yes please.
pgsql-hackers by date: