Re: sortsupport for text - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: sortsupport for text
Date
Msg-id CAEYLb_Wud3m49qKh_4Ki+7AAkabYqmVTM0Zw18bM9jZkT-nprg@mail.gmail.com
Whole thread Raw
In response to Re: sortsupport for text  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: sortsupport for text
List pgsql-hackers
On 18 March 2012 15:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> However, it occurred to me that we could pretty easily jury-rig
> something that would give us an idea about the actual benefit available
> here.  To wit: make a C function that wraps strxfrm, basically
> strxfrm(text) returns bytea.  Then compare the performance of
> ORDER BY text_col to ORDER BY strxfrm(text_col).
>
> (You would need to have either both or neither of text and bytea
> using the sortsupport code paths for this to be a fair comparison.)

I thought this was an interesting idea, so decided to try it out for
myself. I tried this out against master (not Robert's patch, per Tom's
direction). The results were interesting:

[peter@peterlaptop strxfrm_test]$ pgbench postgres -T 60 -f sort_strxfrm.sql -n
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 2795
tps = 46.563970 (including connections establishing)
tps = 46.568234 (excluding connections establishing)
[peter@peterlaptop strxfrm_test]$ pgbench postgres -T 60 -f sort_reg.sql -n
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 2079
tps = 34.638838 (including connections establishing)
tps = 34.640665 (excluding connections establishing)

The first test executed the following query against the dellstore database:

select * from products order by strxfrm_test(actor) offset 10001;

The second:

select * from products order by actor offset 10001;

So, this was pretty good - an improvement that is completely
independent of Robert's. Bear in mind, this simple demonstration adds
additional fmgr overhead, which we have plenty of reason to believe
could hurt things, besides which each call must allocate memory that
could perhaps be avoided. In addition, I don't know enough about
locale-aware sorting and related algorithms to have devised a test
that would stress strxfrm()/ strcoll() - these were all strings that
could be represented as ASCII.

In light of this, I think there is a pretty strong case to be made for
pre-processing text via strxfrm() as part of this patch.

Thoughts?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: s/UNSPECIFIED/SIMPLE/ in foreign key code?
Next
From: Jeff Janes
Date:
Subject: Re: measuring spinning