Re: MD5 aggregate - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: MD5 aggregate
Date
Msg-id CAEZATCUzc6V1t+rmQtyhssGAPJ6g=s6YwsROTVgq+Hrqi90TYQ@mail.gmail.com
Whole thread Raw
In response to Re: MD5 aggregate  (Hannu Krosing <hannu@2ndQuadrant.com>)
List pgsql-hackers
On 14 June 2013 16:09, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> What skytools/pgq/londiste uses for comparing tables on master
> and slave is query like this
>
> select sum(hashtext(t.*::text)) from <yourtable> t;
>
> This is non-modulo sum and does not use md5 but relies on
> whatever the hashtext() du jour is :)
>
> So it is not comparable to anything external (like the md5sum
> compatible idea above) but is usually good enough for fast
> checks of compatible tables.
>
> As tables are unordered by definition anyway, this should be
> good enough for most SQL.
>
> The speed comes from both fast(er) hashtext() function and
> avoiding the sort.
>

That sounds like a pretty good approach. We could do that if we had a
version of md5() that returned numeric. My impression is that numeric
computations are pretty fast compared to the sorting overhead.

On the other hand, if there is a usable index, select md5_agg(..) from
(sub-query) will do and index scan rather than a sort, making it much
faster than using an ORDER BY in the aggregate.

Regards,
Dean



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: pg_restore -l with a directory archive
Next
From: Tom Lane
Date:
Subject: Re: pg_filedump 9.3: checksums (and a few other fixes)