Packed varlena patch update - Mailing list pgsql-patches

From Gregory Stark
Subject Packed varlena patch update
Date
Msg-id 871wjuz7fl.fsf@stark.xeocode.com
Whole thread Raw
Responses Re: Packed varlena patch update  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-patches


I implemented Tom's suggestion of packing the external toast pointers
unaligned and copying them to a local struct to access the fields. This saves
3-6 bytes on every external toast pointer. Presumably this isn't interesting
if you're accessing the toasted values since they would be quite large
anyways, but it could be interesting if you're doing lots of queries that
don't access the toasted values. It does uglify the code a bit but it's all
contained in tuptoaster.c.

 http://community.enterprisedb.com/varlena/patch-varvarlena-19.patch.gz

I think this is the last of the todo suggestions that were mentioned on the
list previously, at least that I recall.

Some implications:

1) There's no longer any macro to determine if an external attribute is
   compressed. I could provide a function to do it if we need it but in all
   the cases where I see it being used we already needed to extract the fields
   of the toast pointer anyways, so it wouldn't make sense.

2) We check if a toasted value is replaced with a new one by memcmp'ing the
   entire toast pointer. We used to just compare valueid and relid, so this
   means we're now comparing extsize and rawsize as well. I can't see how they
   could vary for the same valueid though so I don't think that's actually
   changing anything.


Remaining things in the back of my mind though:

. I'm not 100% confident of the GIST changes. I think I may have changed a few
  too many lines of code there. It does pass all the contrib regressions
  though.

. I think there may be some opportunities for optimizing heaptuple.c. In
  particular the places where it uses VARSIZE_ANY where it knows some of the
  cases are impossible. It may not make a difference due to branch prediction
  though.

. I've left the htonl/ntohl calls in place in the BIG_ENDIAN #if-branch.
  They're easy enough to remove and it leaves us the option of removing the
  #if entirely and just us use network-byte-order instead of the #ifdef.

  I'm a bit worried about modules building against postgres.h without
  including the config.h include file. Is that possible? Is it worth worrying
  about? I think I could make it fail to build rather than crash randomly.

. One of the regression tests I wrote makes a datum of 1/2Meg. Can all the
  build-farm machines handle that? I don't really need anything so large for
  the regression but it was the most convenient way to make something which
  after compressing was large enough to toast externally. It might be better
  to find some less compressible data.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bitmapscan changes
Next
From: Heikki Linnakangas
Date:
Subject: Re: Bitmapscan changes