Thread: PG index architecture

PG index architecture

From
Andy Colson
Date:
I was thinking about indexes, and am kinda curious about sequential access.

I know nothing of PG guts, so this might even be a dumb question.

As I understand indexes, they are a key value pair, that contain a value
and a position.  You lookup the value then use the position to seek into
the database to load the record.

Do we, or could we, load all the the matching index records, then sort
them by position?  (maybe not all, maybe large batches)

When loading from the database, if access was slightly more sequential
(vs very random), would it increase performance?

Said another way:

I think of table scanning as sequential, and fast.  That would be
loading db record 1,2,3, etc.

Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4

-Andy


Re: PG index architecture

From
Igor Neyman
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Andy Colson
> Sent: Tuesday, July 15, 2014 4:27 PM
> To: pgsql
> Subject: [GENERAL] PG index architecture
>
> I was thinking about indexes, and am kinda curious about sequential access.
>
> I know nothing of PG guts, so this might even be a dumb question.
>
> As I understand indexes, they are a key value pair, that contain a value and a
> position.  You lookup the value then use the position to seek into the
> database to load the record.
>
> Do we, or could we, load all the the matching index records, then sort them
> by position?  (maybe not all, maybe large batches)
>
> When loading from the database, if access was slightly more sequential (vs
> very random), would it increase performance?
>
> Said another way:
>
> I think of table scanning as sequential, and fast.  That would be loading db
> record 1,2,3, etc.
>
> Would it be faster to load db records "mostly sequential": 1,3,4,7,10
> compared to randomly: 7,3,10,1,4
>
> -Andy
>

It is called CLUSTER:

http://www.postgresql.org/docs/9.2/static/sql-cluster.html

Regards,
Igor Neyman


Re: PG index architecture

From
Tom Lane
Date:
Andy Colson <andy@squeakycode.net> writes:
> As I understand indexes, they are a key value pair, that contain a value
> and a position.  You lookup the value then use the position to seek into
> the database to load the record.

> Do we, or could we, load all the the matching index records, then sort
> them by position?  (maybe not all, maybe large batches)

This is more or less what a "bitmap index scan" does.

            regards, tom lane


Re: PG index architecture

From
John R Pierce
Date:
On 7/15/2014 1:26 PM, Andy Colson wrote:
> As I understand indexes, they are a key value pair, that contain a
> value and a position.  You lookup the value then use the position to
> seek into the database to load the record.

indexes are stored as a B-tree.  each terminal node has a block number
for the target record.

>
> Do we, or could we, load all the the matching index records, then sort
> them by position?  (maybe not all, maybe large batches)
>

b-trees are inherently ordered.   data records, however are not.


> When loading from the database, if access was slightly more sequential
> (vs very random), would it increase performance?
>
> Said another way:
>
> I think of table scanning as sequential, and fast.  That would be
> loading db record 1,2,3, etc.

database tables are unordered sets, there is no record 1,2,3.

>
> Would it be faster to load db records "mostly sequential": 1,3,4,7,10
> compared to randomly: 7,3,10,1,4

its unclear to me what you mean here.


--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: PG index architecture

From
Andy Colson
Date:
On 7/15/2014 3:54 PM, John R Pierce wrote:
> On 7/15/2014 1:26 PM, Andy Colson wrote:
>> As I understand indexes, they are a key value pair, that contain a
>> value and a position.  You lookup the value then use the position to
>> seek into the database to load the record.
>
> indexes are stored as a B-tree.  each terminal node has a block number
> for the target record.
>
>>
>> Do we, or could we, load all the the matching index records, then sort
>> them by position?  (maybe not all, maybe large batches)
>>
>
> b-trees are inherently ordered.   data records, however are not.
>
>
>> When loading from the database, if access was slightly more sequential
>> (vs very random), would it increase performance?
>>
>> Said another way:
>>
>> I think of table scanning as sequential, and fast.  That would be
>> loading db record 1,2,3, etc.
>
> database tables are unordered sets, there is no record 1,2,3.
>
>>
>> Would it be faster to load db records "mostly sequential": 1,3,4,7,10
>> compared to randomly: 7,3,10,1,4
>
> its unclear to me what you mean here.
>
>

Ah, yea, sorry, I don't really mean record #1, I mean hard drive seek to
address 1.  (where address came from the index lookup)

Know what I mean?

-Andy


Re: PG index architecture

From
John R Pierce
Date:
On 7/15/2014 2:54 PM, Andy Colson wrote:
>
> Ah, yea, sorry, I don't really mean record #1, I mean hard drive seek
> to address 1.  (where address came from the index lookup)

of course, its a file+offset, so there's at least 1-2 more levels of
indirection before you get to physical addresses on the hard disk.

--
john r pierce                                      37N 122W
somewhere on the middle of the left coast