Thread: Efficient map looking table

Efficient map looking table

From

Yoann MOREAU

Date:

10 January 2012, 20:44:50

Hello, I store in a database some XML-tag data. I have one table for
the tag names with an int PK for each unique name (tag_names) and one
table for the tags (tags) where each row references a tag name. The
tag_names table is not big but the tags table is huge. When I insert a
row in tags I have to check if its tag name already exists to get its
id_tag_name and if not insert it in tag_names.

My question : is there an efficient way to check/get the tag_name,
it's doing the same job as a hashmap. I need it to be up to date in
the database when inserting new tag names, because different inserts
in tags table could occur at the same time, all checking for a tag
name. I then would need the whole tag_names table to be loaded in
memory and in the most "map" way as possible, is there anything
special to do ?

Thanks !
Yoann Moreau

Re: Efficient map looking table

From

"David Johnston"

Date:

10 January 2012, 21:05:58

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Yoann MOREAU
Sent: Tuesday, January 10, 2012 4:45 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Efficient map looking table

Hello, I store in a database some XML-tag data. I have one table for the tag
names with an int PK for each unique name (tag_names) and one table for the
tags (tags) where each row references a tag name. The tag_names table is not
big but the tags table is huge. When I insert a row in tags I have to check
if its tag name already exists to get its id_tag_name and if not insert it
in tag_names.

My question : is there an efficient way to check/get the tag_name, it's
doing the same job as a hashmap. I need it to be up to date in the database
when inserting new tag names, because different inserts in tags table could
occur at the same time, all checking for a tag name. I then would need the
whole tag_names table to be loaded in memory and in the most "map" way as
possible, is there anything special to do ?

Thanks !
Yoann Moreau

------------------------------------------------------

Cache the tag_names table in your language of choice; whenever you get a
miss simply execute an INSERT and then REFRESH THE CACHE.  If the INSERT
fails due to a duplicate key exception that is OK since once you refresh the
cache your re-lookup attempt should succeed.

You can do this directly in PostgreSQL via an Insert Trigger with a
SavePoint if/when you attempt to insert the new tag.  Catch and ignore any
unique key violation exception done within the SavePoint.

With a small table and lots of memory you probably won't save a huge amount
with the out-of-DB cache.

You can also issue the SELECT and INSERTs manually, without using the
trigger, if you so desire.  However, the Trigger should have excellent
performance compared to doing it manually but maybe not as much as caching
out-of-DB.

You should probably test both the cache and the Trigger and see which one
performs better.  You could also deploy both and just have a run-time flag
to disable the cache (and surrounding code) in your application(s) and you
can toggle the Trigger on-the-fly as well.

David J.