Hi Colin,
Il giorno 01/feb/08, alle ore 15:22, Colin Wetherbee ha scritto:
>
> I'm not sure about the internals of PostgreSQL (eg. the Datum
> object(?) you mention), but if you're just scaling vectors,
> consecutive memory addresses shouldn't be absolutely necessary. Add
> and multiply operations within a linked list (which is how I'm
> naively assuming Datum storage for arrays in memory is implemented)
> will be "roughly" just as fast.
>
I'm not an expert, anyway the SSE instructions family should make the
difference when performing this kind of workload, and those
instructions work on consecutive memory cells.
> How many scaling operations are you planning to execute per second,
> and how many elements do you scale per operation?
typically, arrays contain 1000 elements, and an operation is either
multiply it by a scalar or multiply it element-by-element with another
array. The time to rescale 1000 arrays, multiply it for another array
and at the end sum all the 1000 resulting arrays should be enough to
be carried on in an interactive application (let's say 0.5s). This, in
the case when no disk-access is required. Disk access will obviously
downgrade performances a bit ad the beginning, but the workload is
mostly read-only so after a while the whole table will be cached
anyway. The table containing the arrays would be truncated/repopulated
every day and the number of arrays is expected to be more or less
150000 (at least this is what we have now). Nowadays, we have a c++
middleware between the calculations and an aggressive caching of the
table contents (and we don't use arrays, just a row per element) but
the application could be refactored (and simplified a lot) if we have
a smart way to save data into the DB.
Bye,
e.