Thread: HELP all women were raped during the May riots in Jakarta

HELP all women were raped during the May riots in Jakarta

From
adm@pu.go.id
Date:
We Promise that we will throw a nuclear bomb to Indonesia to
destroy this besterd Islam country, if they still persecute our chinese.

B J Habibie, we will kill you if you don't settle this case.

see what they do!    http://lateline.muzi.net/topics/Indonesian_atrocity/
http://muzi.net

¦L¥§¬F©²email:dkiweb@indo.net.id
¦L¥§¥~¥æ³¡Home Page: http://www.deplu.go.id
¥»¦¸¨Æ¥óªººô­¶: http://members.spree.com/expose/
http://dailynews.sinanet.com/focus/1998071402/index.html

FORWARDED FROM MISSIONNET'S URGENT PRAYER NETWORK
    ___________________________________________________________
Source: Christian Leaders Association
¨Ó·½:¤Ñ¥D±Ð»â³S¨ó·|

21 June, 1998. Jakarta, Indonesia
1998¦~6¤ë21¤é,¦L¥§¶®¥[¹F

Dear friends,

(¿Ë·RªºªB¤Í)
Here I submit a victim's account of being raped during the
May riots here in Jakarta. Reference to Huaran Bulletin
Board June 12, 1998.
(³o¸Ì§Ú·Q§i¶D¤j®a¦³Ãö¤­¤ë¼É°Ê®É¦b¶®¥[¹Fµo¥Íªº
±j¼É¨Æ¥ó,³o¬O§Ú±q¤µ¦~6¤ë12¤éªºHuaran§G§iª©¤¤±oª¾ªº)

The purpose is to request your prayers for hundreds of
similar victims.
(§Æ±æ¤j®a¯à¬°³o¨ÇÄ묹ªÌ¬èë)

"My name is Vivian, and I am 18 years old. I have a little
sister and brother. As a family we live in what is supposed
to be a "secure" apartment.
("§Ú¥sVivian,¤µ¦~18·³.§Ú¦³¤@­Ó©f©f©M§Ì§Ì,¦í¦b¤@´É³Q»{
¬°¬O¦w¥þªº¤½´J)

At 9.15 am, May 14th, 1998 a huge crowd had gathered around
our apartment. They screamed, "Let's butcher the Chinese!",
"Let's eat pigs!", "Let's have a party!" We live of the 7th
floor and we  got a call from a family on the 3rd floor
saying that the crowd had reached the 2nd floor. They even
chased some occupants upstairs. We were all very frightened.
In our fright we prayed and left everything in God's hands.
(1998¦~5¤ë14¤é¤W¤È9ÂI15¤À,¤@¸s¤HÂô¶i§Ú­Ìªº¤½´J
¥L­Ì³ÛµÛ"§Ú­Ì­n±þ¤FµØ¤H!","§Ú­Ì§â½Þ¦Y¤F!",§Ú­Ì¦í¦b
¤C¼Ó,¤T¼Óªº®a¤H§i¶D§Ú­Ì¨º¸s¤H¤w¸g¨ì¤F¤G¼Ó,¥L­Ì¬Æ
¦Ü°lµÛ¦í¤á¤W¨Ó,§Ú­Ì³£³QÀ~Ãa¤F,¥u¯à¦b®£Äߤ¤¬èë,±N
¤@¤Á¥æµ¹¤W«Ò)

Afterward we left our room and went upstairs to the top
floor, as it was impossible to go downstairs and escape. We
got to the 15th floor and stayed with some friends. Not long
afterwards we were surprised because some of the crowd
coming out of the elevators right before we entered the
room. We hurried into the room and locked the door tightly.
At that time we heard them knock at the other rooms loudly
and there were some screams from women and girls. Our room
was filled with fear.
(¤§«á§Ú­ÌÂ÷¶}©Ð¶¡°k¨ì³»¼Ó,§¹¥þµLªk¤U¼Ó°k¨«,§Ú­Ì©M
¤@¨ÇªB¤Í°k¨ì15¼Ó,¨S¦³¦h¤[,¦b§Ú­Ì¸ú¶i©Ð¸Ì¤§«e,¨º¸s
¤H´N½Ä¥X¹q±è,§Ú­Ì»°§Ö¶i¤J©Ð¸ÌÂêºò©Ðªù,¥uÅ¥¨ìªù¥~
¶Ç¨Ó½ðªùªºÁn­µ,¥H¤Î¤@¨Ç°ü¤k©M¤k«Äªº¦y¥sÁn,¾ã­Ó
©Ð¶¡¥Rº¡µÛ®£Äß)

We realized that they would come to us. So we spread
throughout the room hiding in the corners.We could hear
girls of 10 to 12 years old screaming, "Mommy,
..mommy...mom...mom...it hurts" That time I didn't know
that these little girls were being raped. After about half
an hour the noise diminished and we had some guts to go out
and check. It was indescribable. A lot, some of them youg
girls, were lying on the floor. "Oh my God, what has
happened?" Seeing all of this we screemed and my little
sister Fenny, screamed histerically and hugged her father.
(§Ú­Ì¤F¸Ñ¨ì¥L­Ì¿ð¦­·|¶i¨Ó,©Ò¥H§Ú­Ì·æÁYªº¸ú¦bÀð¨¤
§Ú­ÌÅ¥¨ì¤j¬ù10¨ì12·³ªº¤k«Äªº¦y¥s"¶ý...¶ý...¶ý....¦nµh
§â¦o¥á¨ì¨Fµo¤W,§Ú°¨¤W¤ÏÀ³¨ì¦oªº¦MÀI,©ó¬O¤j¥s,
¦ý¬O¤@­Ó¼É¥Á¥´¤F§Ú¤@­Ó¦Õ¥ú,§Úª¨ª¨¤]³Q¥L­Ì¥Î¤ì
´Ò¥´©ü,§Ú¶ý¶ý¦bFenny³Q¥á¨ì¨Fµo¤W®É´N©ü­Ë¤F,§Ú¥u
¯à¬èë,¥u¯à¬èë¨aÃø¤£­n­°Á{)

Uncle Dodi kept trying to stop them by offering money. His
efforts were fruitless. And in the end 5 people raped Fenny.
Before beginning with the raping they always said "Allahu
Akbar" (an islamic phrase in arabic meaning "God is great".
They were ferocius and brutal.
(Dodi¨û¨û¸ÕµÛ¥Î¿úÅý¥L­Ì¤£­n¬I¼É,¦ý¬O¨S¦³¥Î,µM«á
¦³¤­­Ó¤H±j¼É¤FFenny,¨C­Ó¦b±j¼É«e³£©ÀµÛ"Allahu Akbar"
¬O¥ì´µÄõ±Ðªºµu¥yªü©Ô§B¸Üªº·N«ä¬O"°¶¤jªº¯«",¥L­Ì Åã±o´Ý¼É¦Ó¥B¹³³¥Ã~¤@¯ë)

Not long afterward, around 9 men came to the room and
dragged me. I also saw them forcing and dragging my Aunt
Vera. But at that time I passed out and everything went
blank. I became conscious at around 5 or 6 pm. My head
hurted and I realized I had no clothing on my body. I cried
and realized my family was still there. My father was
hugging my mother and little bother Doni. I also saw uncle
Dodi lying on the floor and Aunt Vera was crying over his
body. I felt so weak and fainted again.
(¨S¦³¦h¤[,¤j·§¦³¤E­Ó¤H§â§Ú©ì¥X¥h,§Ú¬Ý¨ì¥L­Ì¤]§â
VeraÂTÂT©ì¥X¨Ó,¦ý¬O§Ú©ü¤F¹L¥h,¤@¤Á³£Åܦ¨ªÅ¥Õ
¤j¬ù¤U¤È5ÂI¨ì6ÂIªº®É­Ô,§Ú³vº¥ªº«ì´_·NÃÑ,§ÚªºÀY
³¡¨ü¤F¶Ë,¨­¤W¤]¤@µ·¤£±¾,§Ú­ú¤F¥X¨Ó,¨Ã¥Bµo²{§Ú
ªº®a¤HÁÙ¦b,§Úª¨ª¨©êµÛ§Ú¶ý¶ý©M§Ì§Ì,Dodi¨û¨û­Ë¦b
¦a¤W¦ÓVeraÂTÂT©êµÛ¥Lµh­ú,§Ú·P¨ìµê®z¦Ó¤S·w¯t¹L¥h)

The next day I was in the Pluit hospital. My father and
mother were beside me. With all the pains on my body I
asked, "Mom, why Fenny. Mom?" I felt a stinging pain as I
said these words. My cheeks were swollen. My mother cried
again and couldn't speak any words, while my father, holding
back his tears, managed to smile at me. After 4 days in
treatment, my condition has improved. With a sad look, my
father told me then what had happened. After I fainted 7
people raped me. At that time my father still couldn't see
well after beling hit with a piece of wood. They raped me
repeatedly. Then my father said "Vivian, Fenny is gone..."
(²Ä¤G¤Ñ§Ú³Q°e¨ìPluitÂå°|,§Úªºª¨¶ý¦b§Ú¨­®Ç,§Ú§ÔµÛµh
°Ý¥L­Ì"¶ý,Fenny©O?¶ý"»¡¸ÜÅý§Ú·P¨ì°w¨ë¯ëªºµh­W,§Ú
¶ý¶ý­ú¤F°_¨Ó,¤@¥y¸Ü³£»¡¤£¥X¨Ó,§Úª¨ª¨§Ô¦í²\¤ô,¹ï§Ú
­W¯º¤@¤U,¥|¤Ñ¤§«á,§Úªº±¡ªp¦n¤FÂI,§Úªº¤÷¿Ë¤@Áy¶Ë´d
ªº§i¶D§Ú,·í®É§Ú©ü°g¤F¥H«á,¦³7­Ó¤H±j¼É¤F§Ú,§Ú¤÷¿Ë«h
³Q¶Ã´Ò¼Þ¥´,¨º¨Ç¼É¥Á­«½Æªº±j¼ÉµÛ§Ú.¶ý¶ý¦b¤@®Ç¶Ë¤ßªº
»¡"Vivian,Fenny¦º¤F...")

I was confused and cried out, "Why Dad?" My father couldn't
answer. He told me to rest and went out of the room. I cried
over and over again, feeling that my life had no meaning any
more. A week ago, after I was released from the hospital I
was told everything that had happend.
(§Ú¸£¤¤¤@¤ù²V¶Ã,­ú¤F°_¨Ó,"ª¨ª¨.¬°¤°»ò?"§Ú¤÷¿ËµLªk
¦^µª§Ú,¥L§i¶D§Ú¦n¦n¥ð®§,¨«¤F¥X¥h,§Ú¤£°±ªº­ú,§Úªº¤H
¥Í¤w¸g§¹¥þ¨S¦³¥ô¦ó·N¸q¤F,¤@­Ó¬P´Á¹L¥h¤F,¦b§Ú¥X
°|¤§«á,¤~ª¾¹D¨Æ±¡ªº¾ã­Ó¸g¹L)

When Fenny was raped she kept on fighting and so she was
repeatedly slapped by her rapists. The last time she fought
Fenny spitted on one of them. Offended, the man grabbed a
knife and stabbed Fenny's stomach over and over again.
Finally she died with blood over her whole body.
(Fenny¦b³Q±j¼Éªº®É­Ô¤£°±ªº¤Ï§Ü,©ó¬O¨º¨Ç¼É¥Á¤£Â_
ªº¥´¦o,³Ì«áFennyªº¤Ï§Ü·S¤õ¤F¨ä¤¤¤@­Ó¼É¥Á,¥L§ì°_
¤@§â¤M¤l¨ë¶iFennyªº¨{¤l,¤@¦¸¤S¤@¦¸ªº¨ë¶i¨ë¥X,³Ì
«áFenny¥þ¨­¬O¦åªº¦º¤F)

My father told me that uncle Dodi had the same fate watched
by aunt Vera who was also raped. "God...why should all of
this happen? Where are you God? Are you still alive?" My
aunt Vera now stays with her parents. She is in shock. Her
face is blank and refuses to eat. Almost every hour my
mother and I cry over all these happenings. I can never
forget. These mobs of people are uncivilized monsters."
(¤÷¿Ë§i¶D§Ú,Dodi¨û¨û¤]¬ÝµÛ¦Û¤vªº¤Ó¤Ó³Q±j¼É,"¤Ñ§r!
¬°¤°»ò·|µo¥Í³oºØ¨Æ?¯«¦b­þ¸Ì?Í¢ÁÙ¬¡µÛ¶Ü?"§ÚÂTÂT Vera²{¦b©M¥Lªº¤÷¥À¦í¦b¤@°_,¦o¨ü¨ìÄY­«ªºÅåÀ~,
¦oªºÁy¤W¨S¦³¦å¦â¦Ó¥B©Úµ´¶i­¹,§Ú©M¶ý¶ýµL®ÉµL¨è
¦]¬°³o³õ´c¹Ú¦Ó­úª_,§Ú¥Ã»·§Ñ¤£¤F,¨º¨Ç¼É¥Á´N¹³¬O
¨S¦³¶i¤Æªº©ÇÃ~)

Additional comments from Bill Hekman:
(¥H¤U¬OBill Hekmanªºªþµù)

This is one of many victims. Hundreds of women and children
were raped, mutilated and killed by muslim mobs. Some had
their vaginas ripped apart, their bodies cut into pieces.
(³o¥u¬O«Ü¦hªºÄ묹ªÌ¤¤ªº¤@­Ó,¦³¼Æ¦Ê¦ì°ü¤k»P¤p«Ä
³Q¦^±Ð¼É®{±j¼É,±ÙÂ_¤â¸},±þ®`,¦o­Ìªºªº³±¹D³Q¼¹µõ
,¨­Åé³Q¬å¦¨¦n´X¬q)

Over 5000 of the Chinese Indonesian's shops were looted and
burned down. A few days ago anther 63 shops were burned in
Tegal, Central Java. The city of Solo is burned down. There
is no protection and no justice in this country any more.
(¶W¹L5000®a¥H¤Wªº¦L¥§µØ¤Hªº°Ó©±³Q±°¹Ü©MµI¿N,
´X¤Ñ«e¦b¤ö«zªºTegal,¥t¥~63®a°Ó©±³Q©ñ¤õ¿N±¼,¦L¥§
²{¦b§¹¥þ¨S¦³«O»Ù,¥¿¸q¿ºµMµL¦s)

Yesterday I was in the Kelapa Gading area and that area was
spared from destruction. The police and military had guarded
all the entry roads. The people there had collected large
sums of money from door to door and paid for their protection.
(¬Q¤Ñ§Ú¦bKelapa Gading°Ï,¨º¸Ì¥Ø«eÁÙ¨S¦³³Q¯}Ãa,ĵ¹î
©M­x¶¤¦b©Ò¦³¸ô¤fĵ§Ù,¨º¸Ìªº¤H­Ì¶°¦X¤F¤@¤jµ§¿ú¨Ó
¤ä¥I³o¨Ç«OÅ@)

A similar situation took place in the Pondok Indah area.For the people
who cannot pay millions to the armed forces there is no protection. Right
now
the hunderds of thousands of thugs,robbers, rapist, and killers live
all around us. They are our neighbors. There is no punishment for the
criminals and no justice for the victims. Yet, all Indonesians call themselves
believers in God almighty. What a hipocracy. Shouting "God is great" when
raping women andchildren is a blasphemy against a Holy God.
(¦bPondok Indah°Ï¤]¬O¦P¼Ëªº±¡ªp,®³¤£¥X¿úªº¤H´N§¹¥þ
¨S¦³¥ô¦ó«O»Ù,²{¦b¦³¦¨¤d¤W¸Uªº¦^±Ð¨ë«È,±jµs,±j¼ÉªÌ©M
±þ¤HÅ]¦í¦b§Ú­Ìªº¥|©P,¥L­Ì¬O§Ú­Ìªº¾F©~,¹ï©ó¸o¥Ç¨S¦³
³B»@,¹ï©ó¨ü®`ªÌ¤]¨S¦³¤½²z,¦ý¬O,©Ò¦³ªº¦L¥§¤H©I³êµÛ
¥L­Ì©Ò«H¥õªº"¥þ¯àªº¯«:"¤Ó¿Ø¨ë¤F.¦b±j¼É°ü¤k©M¤p«Ä®É
©I³Û°¶¤jªº¯«,³o¬O¹ï¯«¸t¤W«Òªº«_Âp)

Pray that God will annoint His preachers and missionaries
throughout this nation with the power of the Holy Spirit to
preach the message of repentance. God's word in 2 Chronicles
7:14 needs to be proclaimed boldly. There is no room for
preachers filled with fear who think of evacuation and other
selfish plans. Pray for Revival in all our churches.
(¬èë¤W«Ò¦b³o­Ó°ê®a¤¤¬°Í¢ªº¤l¥Á¶î¤Wªo»I,¥Î¯«¸tªº
¤O¶q¶Ç»¼®¬§ïªº°T®§,¬ù¸t¸g¾ú¥N§Ó²Ä7³¹²Ä14¸`À³¸Ó
³QÅã©úªº«Å§i,

There is no room for preachers filled with fear who thinkof
evacuation and other selfish plans.(³o¤@¥y¦b¤U½¤£¥X¨Ó¾A¤Áªº¦r¥y,±æ¦³
¯àªÌ¸É¤W) Ä@§Ú­Ìªº¸t·µ¬Ò¯à´_¿³

Some Christians are putting signs on their shops "Owned by
Muslim". May God forgive them. Healing of this nation filled
with crime and unjustice is bringing God's judgement and
punishment. Healing and Salvation can only come with a
nasional repentance at all levels in the government, armed
forces and society. Then we need to share the Gospel of the
Lord Jesus Christ. Christ is the One and Only Savior. No one
will ever receive forgiveness and see heaven except through
God's appointed Savior, the Lord Jesus Christ. Thank you for
standing with us now. God bless you.
(¦³ªº¤Ñ¥D±Ð®{ªº°Ó©±³Q¶K¤W"¦^±Ð®{©Ò¦³"ªº¼Ð»y,Ä@
¤W«Ò¼e®¤¥L­Ì,¥Î¤W«Òªº¼f§P»PÃg»@¨ÓÂåªv³o­Ó¥Rº¡
¸o´c»P¤£¸qªº°ê«×,°ß¦³¾ã­Ó°ê®a-¦U¯Å¬F©²,­x¶¤©MªÀ·|
-ªº®¬§ï¤~¯à°±¤î¶Ëµh,§Ú­Ì­n¶Ç¼½°ò·þªººÖ­µ,°ò·þ¬O°ß

chaos in Indonesia last May 13-15. Many Chinese Indonesian
citizens were abused, tortured and killed. Their houses and stores
were
looted and burnt. Hundreds of Chinese Indonesian
girls/women (aged 10-55) were sexually harassed and gang raped
brutally.
Some victims were even raped in front of their family members or in
front
of inhuman cheering crowd.
(½Ð±N³o«Ê"¶Àµ·±a"ªº«H¥ó±Hµ¹±zªºªB¤Í,¨Óªí¥Ü§Ú­Ì¹ï©ó
¦b¤­¤ë13-15¤é©ó¦L¥§¼É°Ê¤¤¨ü®`ªºÄ묹ªÌ¤@Åé·P»P¦P±¡
³\¦hµØ¸Çªº¦L¥§¤½¥Á³Q±j¼É,­â­h,±þ®`.¥L­Ìªº©Ð¤l©M°Ó©±
³Q·m§T©MµI¿N,¦³ªº¨ü®`ªÌ´N¦b®a¤H»P¤@¸s¨S¦³¤H©Ê
ÁÙÅw©Iªº¸s²³­±«e³Q½ü¼É)

Some of them  were even thrown into the fire and burnt to death after
being
raped. Yet, not many actions seem to have been taken to investigate
all
this or to help the victims. And not very many people seem to know or
care
about what happened. Please help to spread the news and let the world
know.
(¨ä¤¤¦³¨Ç¤H¦b³Q±j¼É¤§«á,ÁÙ³Q¥á¤J¤õ¤¤µI¿N¦Ü¦º,¦ý¬O,¨Ã
¨S¦³³\¦h½Õ¬d©Î¨ó§Uªº¦æ°Ê¦b¶i¦æ,¦Ó¥B¤]ÁÙ¨S¦³«Ü¦h¤HÃö
¤ß©Î¬Oª¾¹D³o¨Ç¨Æ±¡,½Ð¨ó§U´²§G³o¨Ç®ø®§,Åý°ê»Ú¶¡ª¾¹D
³o¨Ç¨Æ)

We need help to get more international attention to help Chinese
Indonesians, who are now living in fear in Indonesia. Please pass this
ribbon around as the symbol of campaign against human rights
violations, injustice, and racism towards Chinese Indonesians.
(§Ú­Ì»Ý­n¤Þ°_°ê»Ú¶¡ªº­«µø,¥H«KÀ°§U¥Ø«e¥Í¬¡¦b®£Äߤ¤
ªº¦L¥§µØ¤H,½Ð±Nµ·±a§@¬°¤Ï¹ï¥[½Ñ©ó¦L¥§µØ¤Hªº¼É¤O,
¤£¥¿¸q¥H¤ÎºØ±Ú¥D¸qªº¼Ð»x)

Show that we care and may God help us!
(§i¶D¥@¤H§Ú­ÌÃö¤ß,Ä@¤W«Ò½ç»P§Ú­Ì¤O¶q)

Re: Bitmap index

From
"Pawel Niewiadomski"
Date:
On Wed, 24 Nov 2004 19:26:41 +1100, Neil Conway <neilc@samurai.com> wrote :

> On Mon, 2004-11-22 at 07:57 +0000, PaweX Niewiadomski wrote:
> > I saw discussion about bitmap indexes few weeks ago. I wonder if
> > any of you is working on it (in secret)?
> 
> For what it's worth, I don't know of anyone working on them.
> 
> > I will be chosing subject
> > of my master thesis and thougth about implementing bitmap indexes.
> 
> No objection here :)

My promoter accepted the subject. I'm waiting for an official acceptance
of the subject. Until then I would want to get familiar with PostgreSQL
internals and bitmap index implementations. I will appreciate 
any links to online papers, books that could help me.

-- 
**Pawel Niewiadomski**, new()foo-baz.com, http://new.foo-baz.com/
Virtual Qmail (http://v-q.foo-baz.com), qmail-patches (http://q-p.foo-baz.com)



Re: Bitmap index

From
Yann Michel
Date:
Hi,

On Fri, Nov 26, 2004 at 10:25:41AM -0000, Pawel Niewiadomski wrote:
> 
> My promoter accepted the subject. I'm waiting for an official acceptance
> of the subject. Until then I would want to get familiar with PostgreSQL
> internals and bitmap index implementations. I will appreciate 
> any links to online papers, books that could help me.

That sounds nice! I thought of implementing it as part of my master
thesis next year so I already started reading about bitmap indexing and
so on. A nice start is possibly "Hector Garcia-Mollina" "Database
Implementation". 

Regards,
Yann


Re: Implementing Bitmap Indexes

From
"Pawel Niewiadomski"
Date:
On Sat, 29 Jan 2005 18:46:44 +0000, Mike Rylander <mrylander@gmail.com> wrote :

> For on-disk bitmap indexes, yes.  I don't see any reason this couldn't
> be done with GiST, perhaps even as a generalization of the index stuff
> in the int_array contrib module. 

I was thinking about playing with the core 
source of PostgreSQL and creating patches, GiST as I read
has some limitations that I want to overcome. Speaking about
my idea - I was thinking about implementing on-disk indexes, not in
memory. I think having both of them would be great :-)

-- 
**Pawel Niewiadomski**, new()foo-baz.com, http://new.foo-baz.com/
Virtual Qmail (http://v-q.foo-baz.com), qmail-patches (http://q-p.foo-baz.com)



Interesting COPY edge case...

From
Chris Browne
Date:
Just ran into a fascinating edge case.  One of our folks was building
a stored function, and ran into an odd error when trying to COPY to
stdout.

Here's a characteristic example:

create or replace function build_table (integer) returns integer as '
begin execute ''copy foo to stdout;''; return 1;
end' language plpgsql;

The result:

/* cbbrowne@[local]/dba2 in=*/ select try_copy(1);
ERROR:  unexpected error -2 in EXECUTE of query "copy dups to stdout;"
CONTEXT:  PL/pgSQL function "try_copy" line 2 at execute statement

If I instead create

create or replace function build_table (integer) returns integer as ' copy foo to stdout; return 1;
' language sql;

The latter works fine.

There is evidently Something Strange about the state of stdout when it
is referenced inside a stored procedure.

We can work around this reasonably, but the "unexpected error -2"
elicits some curiosity I'd like to satisfy.  (I bounced it off one of
the others that aren't off on vacation, and his reaction was exactly
the same as mine, namely "Hmmm...  I'm just not sure what to expect
from that...")

Is there a good reason/excuse to give as to why the pl/pgsql version
of the COPY *shouldn't* work?
-- 
(format nil "~S@~S" "cbbrowne" "acm.org")
http://www.ntlug.org/~cbbrowne/sap.html
Rules of the Evil Overlord #78.  "I will not tell my Legions of Terror
"And he must  be taken alive!" The command will be:  ``And try to take
him alive if it is reasonably practical.''"
<http://www.eviloverlord.com/>


Re: Interesting COPY edge case...

From
Bruno Wolff III
Date:
On Tue, Jul 26, 2005 at 17:56:42 -0400, Chris Browne <cbbrowne@acm.org> wrote:
> 
> There is evidently Something Strange about the state of stdout when it
> is referenced inside a stored procedure.

I suspect this is related to trusted PLs not being able to write files.
It does seem like a problem that COPY has access to stdout in one case
and not the other.


Re: Interesting COPY edge case...

From
Tom Lane
Date:
Chris Browne <cbbrowne@acm.org> writes:
> create or replace function build_table (integer) returns integer as '
> begin
>   execute ''copy foo to stdout;'';
>   return 1;
> end' language plpgsql;

> The result:

> /* cbbrowne@[local]/dba2 in=*/ select try_copy(1);
> ERROR:  unexpected error -2 in EXECUTE of query "copy dups to stdout;"
> CONTEXT:  PL/pgSQL function "try_copy" line 2 at execute statement

In CVS tip I get

ERROR:  cannot COPY to/from client in PL/pgSQL
CONTEXT:  PL/pgSQL function "build_table" line 2 at execute statement

There are some protocol-level reasons for not supporting this, IIRC.
You're already inside a SELECT and the client library would probably
get terribly confused.
        regards, tom lane


request for feature: psql "DSN" option

From
Christopher Browne
Date:
Per tonight's dinner discussion, it Sure Would Be Nice if psql had an
additional option that allowed passing in a conninfo string.

The notion:  Plenty of libraries out there like Pg, DBI::Pg, and such
make you specify connections in the form:  "host=my.db.host.example.org port=5678 dbname=dumb_db user=chewie"

Since those strings need to be kept around for the programs that are
directly using conninfo, it Sure Would Be Nice if those strings could
also be used as arguments to psql.

Peter Eisentraut observed that "-D" is not yet used; that's certainly
an option.

Thus, supposing we kept conninfo strings in a series of files in
/opt/scripts/conninfo, we might run a script via...

#!/bin/sh
CONNINFO=`cat /opt/scripts/conninfo/dumb_db.conninfo`
PERIOD='90 days'
QUERY="delete from my_table where trans_on < now() - '${PERIOD}'::interval;"
QUERY2="vacuum analyze my_table;"
psql -D "${CONNINFO}" -c "${QUERY}"
psql --pqconninfo="${CONNINFO}" -c "${QUERY2}"

I'm not sure those are forcibly the best names for options, but
they're a thought...
-- 
select 'cbbrowne' || '@' || 'gmail.com';
http://linuxdatabases.info/info/
Rules of the Evil Overlord #10.  "I will not interrogate my enemies in
the inner sanctum  -- a small hotel well outside  my borders will work
just as well." <http://www.eviloverlord.com/>


Re: request for feature: psql 'DSN' option

From
"Andrew Dunstan"
Date:
Christopher Browne said:
> Per tonight's dinner discussion, it Sure Would Be Nice if psql had an
> additional option that allowed passing in a conninfo string.
>
> The notion:  Plenty of libraries out there like Pg, DBI::Pg, and such
> make you specify connections in the form:
>   "host=my.db.host.example.org port=5678 dbname=dumb_db user=chewie"
>
> Since those strings need to be kept around for the programs that are
> directly using conninfo, it Sure Would Be Nice if those strings could
> also be used as arguments to psql.
>
> Peter Eisentraut observed that "-D" is not yet used; that's certainly
> an option.
>
> Thus, supposing we kept conninfo strings in a series of files in
> /opt/scripts/conninfo, we might run a script via...
>
> #!/bin/sh
> CONNINFO=`cat /opt/scripts/conninfo/dumb_db.conninfo`
> PERIOD='90 days'
> QUERY="delete from my_table where trans_on < now() -
> '${PERIOD}'::interval;" QUERY2="vacuum analyze my_table;"
> psql -D "${CONNINFO}" -c "${QUERY}"
> psql --pqconninfo="${CONNINFO}" -c "${QUERY2}"
>
> I'm not sure those are forcibly the best names for options, but
> they're a thought...



Why is this better than using the service file?

cheers

andrew




Re: request for feature: psql 'DSN' option

From
David Fetter
Date:
On Fri, Jul 07, 2006 at 05:33:14AM -0500, Andrew Dunstan wrote:
> Christopher Browne said:
> >
> > The notion:  Plenty of libraries out there like Pg, DBI::Pg, and such
> > make you specify connections in the form:
> >   "host=my.db.host.example.org port=5678 dbname=dumb_db user=chewie"
> 
> Why is this better than using the service file?

What Andrew asked.

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!


Re: request for feature: psql 'DSN' option

From
Chris Browne
Date:
david@fetter.org (David Fetter) writes:
> On Fri, Jul 07, 2006 at 05:33:14AM -0500, Andrew Dunstan wrote:
>> Christopher Browne said:
>> >
>> > The notion:  Plenty of libraries out there like Pg, DBI::Pg, and such
>> > make you specify connections in the form:
>> >   "host=my.db.host.example.org port=5678 dbname=dumb_db user=chewie"
>> 
>> Why is this better than using the service file?
>
> What Andrew asked.

It's not *better* than PGSERVICE; it's just that this does reflect the
connection representation widely used by every sort of library out
there that uses libpq.

Let's do a quick "guess the language"...

Which language is this? conn = PGconn.open("host=my.db.host.example.org port=5678 dbname=dumb_db user=chewie");

How about this? $conn = Pg::connectdb("host=my.db.host.example.org port=5678 dbname=dumb_db user=chewie");

Hmm...  This one should be easy to guess... import psycopg2  dbcon = psycopg2.connect('host=my.db.host.example.org
port=5678dbname=dumb_db user=chewie')
 

This one might conceivably be easy to mistake for another... $dbconn = pg_connect("host=my.db.host.example.org
port=5678dbname=dumb_db user=chewie")    or die('Could not connect: ' . pg_last_error());
 

How about this? set conres [catch {set newdbc [pg_connect -conninfo          "host=my.db.host.example.org port=5678
dbname=dumb_dbuser=chewie"]} msg]
 

This isn't as obvious as it may seem... (setq P (pq-connectdb "host=my.db.host.example.org port=5678 dbname=dumb_db
user=chewie"))

But this probably is... (setf conn (postgresql::PQconnectdb "host=my.db.host.example.org port=5678 dbname=dumb_db
user=chewie"))

Gavin would be extremely disappointed if I had left this one out :-) let main () =   let c = new connection
~conninfo:"host=my.db.host.example.orgport=5678 dbname=dumb_db user=chewie" () in   let _ = c#exec ~expect:[Copy_in]
("copy" ^ Sys.argv.(1) ^ " from stdin") in   c#copy_in_channel stdin;   c#finish
 

The common thing across all of these is that there's a single string
passed in to request the connection.  A string that libpq in effect
requires, but which psql, somewhat oddly, does not allow you to
directly specify.

We've had cases where we needed to store pgconn "connection
signatures", for reports, and then discovered we wanted to use them to
describe psql connections.

A "--pgconn" or "-D" option would allow doing this.

I agree that pg_service.conf is probably cleaner, but it seems
somewhat odd for The Poster Child libpq application, psql, *not* to
have a way of passing in a conninfo value.
-- 
let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/oses.html
If a mute swears, does his mother wash his hands with soap? 


Re: request for feature: psql 'DSN' option

From
"hubert depesz lubaczewski"
Date:
On 7/7/06, Andrew Dunstan <andrew@dunslane.net> wrote:
Why is this better than using the service file?


service file is not well described in manual. quick grep showed that we have PGSERVICE variable, and pg_service.conf.sample, which doesn't even include all possible parameter names.
and - comments in this file suggest it is to be used globally only: "Copy this to your sysconf directory (typically /usr/local/pgsql/etc) and...", and not "per user".
i would really love to see pg_service.conf extended in such a way that i could:
1. store it in $HOME/.pg_service.conf
2. use name of service on psql invocation (not as env variable). preferably using psql service_name. this will conflict with current behaviour (treating argument as dbname), but with some basic logic, i belive it can be done.

just my $0.02.

depesz

Re: request for feature: psql 'DSN' option

From
Andrew Dunstan
Date:
hubert depesz lubaczewski wrote:
> On 7/7/06, *Andrew Dunstan* <andrew@dunslane.net 
> <mailto:andrew@dunslane.net>> wrote:
>
>     Why is this better than using the service file?
>
>
>
> service file is not well described in manual. quick grep showed that 
> we have PGSERVICE variable, and pg_service.conf.sample, which doesn't 
> even include all possible parameter names.
> and - comments in this file suggest it is to be used globally only: 
> "Copy this to your sysconf directory (typically /usr/local/pgsql/etc) 
> and...", and not "per user".
> i would really love to see pg_service.conf extended in such a way that 
> i could:
> 1. store it in $HOME/.pg_service.conf
> 2. use name of service on psql invocation (not as env variable). 
> preferably using psql service_name. this will conflict with current 
> behaviour (treating argument as dbname), but with some basic logic, i 
> belive it can be done.
>


The service file location is runtime configurable Just set PGSYSCONFDIR.

cheers

andrew


gBorg misbehaviour

From
Chris Browne
Date:
Looks like gBorg has gone down...

The Slony-I project does plan to move to pgFoundry, once 1.2 is released...
<http://slony-wiki.dbitech.ca/index.php/Move_to_PgFoundry_Checklist>

But we need to get to that point (1.2) first.  Alas, gBorg being down
today doesn't help :-(.
-- 
(format nil "~S@~S" "cbbrowne" "ntlug.org")
http://cbbrowne.com/info/sap.html
"Success is something  I  will dress for   when I get  there,  and not
until."  -- Unknown


Re: gBorg misbehaviour

From
"Marc G. Fournier"
Date:
I mentioned this on -general, I believe, but gborg is on one of our last 
FreeBSD 4.x servers right now ... I have to scheduale moving it to the 
FreeBSD 6.x machines ... neptune had some problems early this morning, and 
the 4.x fsck took 9 @@#$@$@# hours to run ;(

On Tue, 18 Jul 2006, Chris Browne wrote:

> Looks like gBorg has gone down...
>
> The Slony-I project does plan to move to pgFoundry, once 1.2 is released...
> <http://slony-wiki.dbitech.ca/index.php/Move_to_PgFoundry_Checklist>
>
> But we need to get to that point (1.2) first.  Alas, gBorg being down
> today doesn't help :-(.
> -- 
> (format nil "~S@~S" "cbbrowne" "ntlug.org")
> http://cbbrowne.com/info/sap.html
> "Success is something  I  will dress for   when I get  there,  and not
> until."  -- Unknown
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664


Help! - Slony-I - saving/setting/restoring GUC

From
Chris Browne
Date:
In support of PG 8.2, we need to have the log trigger function do the
following: - Save value of standards_conforming_string - Set value of standards_conforming_string to FALSE - proceed
withsaving data to sl_log_?  - Recover value of standards_conforming_string
 

The variable, standards_conforming_string, does not appear to be
widely exported, but rather seems pretty localized to guc.c

This is the prime thing holding us from RC2.

Tom, Peter, you have touched guc.c in the context of
standards_conforming_string; perahps you can suggest something?  Darcy
and I aren't seeing what to do...
-- 
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://www3.sympatico.ca/cbbrowne/
Why do we drive on parkways and park on driveways?


Re: Help! - Slony-I - saving/setting/restoring GUC

From
Peter Eisentraut
Date:
Chris Browne wrote:
> In support of PG 8.2, we need to have the log trigger function do the
> following:
>   - Save value of standards_conforming_string
>   - Set value of standards_conforming_string to FALSE
>   - proceed with saving data to sl_log_?
>   - Recover value of standards_conforming_string

Would SET LOCAL help you?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Help! - Slony-I - saving/setting/restoring GUC

From
Tom Lane
Date:
Chris Browne <cbbrowne@acm.org> writes:
> In support of PG 8.2, we need to have the log trigger function do the
> following:
>   - Save value of standards_conforming_string
>   - Set value of standards_conforming_string to FALSE

This is a good idea why exactly?  What are you even expecting it to do?
The backend doesn't use standard_conforming_strings to determine any
output, other than in ruleutils.c which I didn't think Slony uses.
        regards, tom lane


Re: Help! - Slony-I - saving/setting/restoring GUC

From
Chris Browne
Date:
peter_e@gmx.net (Peter Eisentraut) writes:
> Chris Browne wrote:
>> In support of PG 8.2, we need to have the log trigger function do the
>> following:
>>   - Save value of standards_conforming_string
>>   - Set value of standards_conforming_string to FALSE
>>   - proceed with saving data to sl_log_?
>>   - Recover value of standards_conforming_string
>
> Would SET LOCAL help you?

Not really.  The log trigger function is an SPI function, and I don't
think I want to be invoking an extra SQL request every time a tuple is
updated.

Consider our present handling of date localizations...  Slony-I
prefers to operate using ISO dates.  So the log trigger function must
force the datestyle correspondingly.

So our code looks like...  (eliding irrelevant code)

int OldDateStyle = DateStyle;
DateStyle = USE_ISO_DATES;

/* code that generates data to stow in sl_log_n */

DateStyle = OldDateStyle;  /* Retrieve user's local settings */

-------------------------------

At one point, I thought that we'd modify this to:
int OldDateStyle = DateStyle;
int OldSCS = standards_conforming_strings;
DateStyle = USE_ISO_DATES;
standards_conforming_strings = FALSE; 

/* code that generates data to stow in sl_log_n */

DateStyle = OldDateStyle;  /* Retrieve user's local settings */
standards_conforming_strings = OldSCS;

Unfortunately (perhaps) standards_conforming_strings does not appear
to be exported, so I'm not sure how to do this otherwise.
-- 
(reverse (concatenate 'string "gro.mca" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/wp.html
Editing is a rewording activity.
-- Alan J. Perlis
[And EMACS a rewording editor.  Ed.]


Re: Help! - Slony-I - saving/setting/restoring GUC

From
Tom Lane
Date:
Chris Browne <cbbrowne@acm.org> writes:
> Unfortunately (perhaps) standards_conforming_strings does not appear
> to be exported, so I'm not sure how to do this otherwise.

Perhaps your problem is one of spelling?  It's
"standard_conforming_strings", and it's certainly a global variable.

You still haven't explained why you need this, though.  There are
no datatype output functions that examine this variable.
        regards, tom lane


Re: Help! - Slony-I - saving/setting/restoring GUC

From
Chris Browne
Date:
tgl@sss.pgh.pa.us (Tom Lane) writes:
> Chris Browne <cbbrowne@acm.org> writes:
>> Unfortunately (perhaps) standards_conforming_strings does not appear
>> to be exported, so I'm not sure how to do this otherwise.
>
> Perhaps your problem is one of spelling?  It's
> "standard_conforming_strings", and it's certainly a global variable.
>
> You still haven't explained why you need this, though.  There are
> no datatype output functions that examine this variable.

Apologies.  I was reading too much into the problem report.

After some discussion on IRC, I think we've "simplified" things back
to what the problem is.

Our logtrigger() function is already applying a "non-SCS" policy to
quote backslashes.  No need to declare anything there, as had been the
case with a recent DateStyle issue.

With DateStyle, we had to control the style in two places:1.  In logtrigger(), when PostgreSQL is asked to generate the
  string version of a datestamp.  We do so by temporarily    setting DateStyle.2.  When loading data, we need to make
surethe connection    uses a consistent DateStyle.  We do so by setting the GUC    variable using SET.
 

I was under the misapprehension that we needed something akin to that
step 1; apparently we only need worry about step 2.  To which Peter's
suggestion is entirely sufficient.

Sorry about the confusion.
-- 
let name="cbbrowne" and tld="ntlug.org" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/internet.html
"Who is General Failure and why is he reading my hard disk?" 
-- <leitner@inf.fu-berlin.de>, Felix von Leitner


Re: [PATCHES] Replication Documentation

From
Peter Eisentraut
Date:
Chris Browne wrote:
> Here's a patch to add in the material on replication recently
> discussed on pgsql.docs.  I'm not thrilled that there were only a few
> comments made; I'd be happy to see "slicing and dicing" to see this
> made more useful.

The agreed-to process was

1. post information on pgsql-general
1.a. solicit comments
2. put information page on web site
3. link from documentation to web site

You seem to have short-circuited all that.

I don't think this sort of material belongs directly into the PostgreSQL
documentation.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: [PATCHES] Replication Documentation

From
"Joshua D. Drake"
Date:
>
> 1. post information on pgsql-general
> 1.a. solicit comments
> 2. put information page on web site
> 3. link from documentation to web site
>
> You seem to have short-circuited all that.
>
> I don't think this sort of material belongs directly into the PostgreSQL
> documentation.

It might be interesting to have some links in the external projects area
for replication, but a section of its own doesn't seem relevant.

Joshua D. Drkae




--

    === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
    Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/



Re: [PATCHES] Replication Documentation

From
Alvaro Herrera
Date:
Joshua D. Drake wrote:

> >I don't think this sort of material belongs directly into the PostgreSQL
> >documentation.

Why not?

> It might be interesting to have some links in the external projects area
> for replication, but a section of its own doesn't seem relevant.

I disagree about "having some links".  Maybe we should consider adding
this as a section in the external projects chapter, instead of having a
chapter of its own, but "some links" seems a little short on actual
contents.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [PATCHES] Replication Documentation

From
"Joshua D. Drake"
Date:
Alvaro Herrera wrote:
> Joshua D. Drake wrote:
>
>>> I don't think this sort of material belongs directly into the PostgreSQL
>>> documentation.
>
> Why not?

Well Peter said that, not me :)

>
>> It might be interesting to have some links in the external projects area
>> for replication, but a section of its own doesn't seem relevant.
>
> I disagree about "having some links".  Maybe we should consider adding
> this as a section in the external projects chapter, instead of having a
> chapter of its own, but "some links" seems a little short on actual
> contents.

O.k. more specifically, I think that the content (even if it is a
section) probably deserves discussion in the external projects section.

Joshua D. Drake


>


--

    === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
    Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/



Re: [PATCHES] Replication Documentation

From
Alvaro Herrera
Date:
Joshua D. Drake wrote:
> Alvaro Herrera wrote:
> >Joshua D. Drake wrote:
> >
> >>>I don't think this sort of material belongs directly into the PostgreSQL
> >>>documentation.
> >
> >Why not?
>
> Well Peter said that, not me :)

I know, but I though I'd post one message instead of two.  (In fact I
didn't even think about it -- I just assume it's clear.)

> >>It might be interesting to have some links in the external projects area
> >>for replication, but a section of its own doesn't seem relevant.
> >
> >I disagree about "having some links".  Maybe we should consider adding
> >this as a section in the external projects chapter, instead of having a
> >chapter of its own, but "some links" seems a little short on actual
> >contents.
>
> O.k. more specifically, I think that the content (even if it is a
> section) probably deserves discussion in the external projects section.

Sure, see my suggestion above.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [PATCHES] Replication Documentation

From
Peter Eisentraut
Date:
Alvaro Herrera wrote:
> > >I don't think this sort of material belongs directly into the
> > > PostgreSQL documentation.
>
> Why not?

PostgreSQL documentation (or any product documentation) should be 
factual: describe what the software does and give advice on its use.  
This should be mostly independent of the external circumstances, 
because people will still read that documentation three or four years 
from now.

The proposed text is, at least partially, journalistic: it evaluates 
competing ideas, gives historical and anecdotal information, reports on 
current events, and makes speculations about the future.  That is the 
sort of material that is published in periodicals or other volatile 
media.

At the summit, we resolved, for precisely these reasons, to keep the 
journalistic parts on the web site, for clear separation from the 
shipped product and for easier updates (and for easier reference as 
well, because the PostgreSQL documentation is not the single obvious 
place to look for it) and refer to it from the documentation.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: [PATCHES] Replication Documentation

From
Bruce Momjian
Date:
I was thinking of something similar to our encryption section:
http://www.postgresql.org/docs/8.1/static/encryption-options.html

The idea being to define issues like multi/single master, async vs,
sync, and mention the projects which are in each category.

---------------------------------------------------------------------------

Peter Eisentraut wrote:
> Alvaro Herrera wrote:
> > > >I don't think this sort of material belongs directly into the
> > > > PostgreSQL documentation.
> >
> > Why not?
> 
> PostgreSQL documentation (or any product documentation) should be 
> factual: describe what the software does and give advice on its use.  
> This should be mostly independent of the external circumstances, 
> because people will still read that documentation three or four years 
> from now.
> 
> The proposed text is, at least partially, journalistic: it evaluates 
> competing ideas, gives historical and anecdotal information, reports on 
> current events, and makes speculations about the future.  That is the 
> sort of material that is published in periodicals or other volatile 
> media.
> 
> At the summit, we resolved, for precisely these reasons, to keep the 
> journalistic parts on the web site, for clear separation from the 
> shipped product and for easier updates (and for easier reference as 
> well, because the PostgreSQL documentation is not the single obvious 
> place to look for it) and refer to it from the documentation.
> 
> -- 
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Replication Documentation

From
Markus Schiltknecht
Date:
Hi,

Bruce Momjian wrote:
> The idea being to define issues like multi/single master, async vs,
> sync, and mention the projects which are in each category.

You could even add shared-nothing vs. shared-disk nodes.

Generally I'd say it makes sense to 'educate' people, but does it really 
make sense to explain all that if there is no replication solution for 
most of these combinations?

I'd vote for an external (not in the documentation) information site 
about replication solutions. There we can put all the information we see 
fit (even 'journalistic' ones).

I might change my mind once we have multiple replication solutions 
covering most situations. ;-)

I like what and how Chris wrote [1] - an overview over existing and 
upcomming replication solutions.

Regards

Markus

[1]: I can't find Chris' original message. My answer to it is in the 
archives, but not the original message. Why is that? (Thread view says 
'message not available'). My answer contains Chris' text, though:
http://archives.postgresql.org/pgsql-docs/2006-07/msg00019.php


Re: Replication Documentation

From
"Andrew Hammond"
Date:
Peter Eisentraut wrote:
> Alvaro Herrera wrote:
> > > >I don't think this sort of material belongs directly into the
> > > > PostgreSQL documentation.
> >
> > Why not?
>
> PostgreSQL documentation (or any product documentation) should be
> factual: describe what the software does and give advice on its use.
> This should be mostly independent of the external circumstances,
> because people will still read that documentation three or four years
> from now.
>
> The proposed text is, at least partially, journalistic: it evaluates
> competing ideas, gives historical and anecdotal information, reports on
> current events, and makes speculations about the future.  That is the
> sort of material that is published in periodicals or other volatile
> media.

I can see value in documenting what replication systems are known to
work (for some definition of work) with a given release in the
documentation for that release. Five years down the road when I'm
trying to implement replication for a client who's somehow locked into
postgres 8.2 (for whatever reason), it would be very helpful to know
that slony1.2 is an option. I don't know if this is sufficient
justification.

Including a separate page on the history of postgres replication to
date also makes some sense, at least to me. It should be relatively
easy to maintain.

If we do talk about replicatoin, then including a probably separate and
presumably quite static page on the taxonomy of replication seems
necessary. As Chris notes, the term replication by it'self is can mean
quite a number of things.

> At the summit, we resolved, for precisely these reasons, to keep the
> journalistic parts on the web site, for clear separation from the
> shipped product and for easier updates (and for easier reference as
> well, because the PostgreSQL documentation is not the single obvious
> place to look for it) and refer to it from the documentation.
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org



Re: Replication Documentation

From
Markus Schiltknecht
Date:
Hi,

Andrew Hammond wrote: > I can see value in documenting what replication systems are known to
> work (for some definition of work) with a given release in the
> documentation for that release. Five years down the road when I'm
> trying to implement replication for a client who's somehow locked into
> postgres 8.2 (for whatever reason), it would be very helpful to know
> that slony1.2 is an option. I don't know if this is sufficient
> justification.

Please keep in mind, that most replication solutions (that I know of) 
are quite independent from the PostgreSQL version used. Thus, 
documenting which version of PostgreSQL can be used with which version 
of a replication system should better be covered in the documentation of 
the replication system. Otherwise you would have to update the 
PostgreSQL documentation for new releases of your favorite replication 
system - which seems to lead to confusion.

> Including a separate page on the history of postgres replication to
> date also makes some sense, at least to me. It should be relatively
> easy to maintain.

I agree that having such a 'replication guide for users of PostgreSQL' 
is a good thing to have. But I think not much of that should be part of 
the official PostgreSQL documentation - mainly because the replication 
solutions are not part of PostgreSQL.

Regards

Markus


Re: Replication Documentation

From
"Andrew Hammond"
Date:
Markus Schiltknecht wrote:
> Hi,
>
> Andrew Hammond wrote:
>   > I can see value in documenting what replication systems are known to
> > work (for some definition of work) with a given release in the
> > documentation for that release. Five years down the road when I'm
> > trying to implement replication for a client who's somehow locked into
> > postgres 8.2 (for whatever reason), it would be very helpful to know
> > that slony1.2 is an option. I don't know if this is sufficient
> > justification.
>
> Please keep in mind, that most replication solutions (that I know of)
> are quite independent from the PostgreSQL version used. Thus,
> documenting which version of PostgreSQL can be used with which version
> of a replication system should better be covered in the documentation of
> the replication system.

I would agree to this with the caveat that there needs to be something
in the postgres documentation that points people to the various
replication systems available.

> Otherwise you would have to update the
> PostgreSQL documentation for new releases of your favorite replication
> system - which seems to lead to confusion.

Yeah, updating the docs based on other software releases would suck.
How about "what works with a given release at the time of the release"?
Perhaps this could be limited to a pointer to the docs for such
replication systems, and maybe a very brief description (based on
Chris' taxonomy)?

> > Including a separate page on the history of postgres replication to
> > date also makes some sense, at least to me. It should be relatively
> > easy to maintain.
>
> I agree that having such a 'replication guide for users of PostgreSQL'
> is a good thing to have. But I think not much of that should be part of
> the official PostgreSQL documentation - mainly because the replication
> solutions are not part of PostgreSQL.

Arguably, neither are most of the procedural languages in the Server
Programming section of the documentation, and yet they're included. I
agree that it's improtant to keep the documentation from getting
cluttered up with stuff that's "not part of PostgreSQL". However, I
think the very fact so many people assume that there's no replication
for PostgreSQL simply because it's not mentioned in the documentation
shows that for many people replication is precieved as "part of" the
dbms. Even a single page in the documentation wich consists of
something along the lines of the following would help these folks find
what they're looking for.

"There are a number of different approaches to solving the problem of
replication, each with strengths and weaknesses. As a result, there are
a number of different replication solutions available for PostgreSQL.
To find out more, please refer to the website."



Re: Replication Documentation

From
Peter Eisentraut
Date:
Andrew Hammond wrote:
> How about "what works with a given release at the time of the
> release"?

We just threw that idea out in the context of the procedural language 
discussion because we do not have the resources to check what works.

> Arguably, neither are most of the procedural languages in the Server
> Programming section of the documentation, and yet they're included.

That is false.  The documentation documents exactly those pieces of code 
that we distribute.

> "There are a number of different approaches to solving the problem of
> replication, each with strengths and weaknesses. As a result, there
> are a number of different replication solutions available for
> PostgreSQL. To find out more, please refer to the website."

Well, that's what I've been talking about all along, and it has also 
been the resolution at the Toronto meeting.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Replication Documentation

From
"Andrew Hammond"
Date:
> > "There are a number of different approaches to solving the problem of
> > replication, each with strengths and weaknesses. As a result, there
> > are a number of different replication solutions available for
> > PostgreSQL. To find out more, please refer to the website."
>
> Well, that's what I've been talking about all along, and it has also
> been the resolution at the Toronto meeting.

Great. Is the above text sufficient for the documentation then, or does
anyone have a suggestion on how to say this better?

Drew



gBorg status?

From
Chris Browne
Date:
What's up there?  It has been down all week.

We're trying to get the Slony-I 1.2 release out, so we can then
migrate over to pgFoundry.  But that doesn't working terribly well
when gBorg's down...
-- 
let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/emacs.html
"...Yet terrible as Unix addiction  is, there are worse fates. If Unix
is the heroin of operating systems, then VMS is barbiturate addiction,
the Mac is MDMA, and MS-DOS is sniffing glue. (Windows is filling your
sinuses  with  lucite and  letting  it set.)   You  owe  the Oracle  a
twelve-step program."  --The Usenet Oracle


Re: gBorg status?

From
elein
Date:
Also people trying to download slony have to do some
hunting to find things.  The source only tar is not
available on pgfoundry.

one of them,

elein

On Thu, Aug 31, 2006 at 10:33:36AM -0400, Chris Browne wrote:
> What's up there?  It has been down all week.
> 
> We're trying to get the Slony-I 1.2 release out, so we can then
> migrate over to pgFoundry.  But that doesn't working terribly well
> when gBorg's down...
> -- 
> let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];;
> http://www.ntlug.org/~cbbrowne/emacs.html
> "...Yet terrible as Unix addiction  is, there are worse fates. If Unix
> is the heroin of operating systems, then VMS is barbiturate addiction,
> the Mac is MDMA, and MS-DOS is sniffing glue. (Windows is filling your
> sinuses  with  lucite and  letting  it set.)   You  owe  the Oracle  a
> twelve-step program."  --The Usenet Oracle
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 


Slony-I 1.1.5 binaries at pgFoundry.org

From
Chris Browne
Date:
elein@varlena.com (elein) writes:
> Also people trying to download slony have to do some
> hunting to find things.  The source only tar is not
> available on pgfoundry.

The source tarball for version 1.1.5 is now in place:
<http://pgfoundry.org/frs/download.php/1063/slony1-1.1.5.tar.bz2>

We may as well have at least that bit of backup.

I didn't bother putting up the documentation tarball; it is better to
grab a newer version of the docs.
-- 
"cbbrowne","@","ntlug.org"
http://cbbrowne.com/info/slony.html
You can lead a horse to water, but if you can get him to swim on his
back, you've got something.


Re: gBorg status?

From
"Magnus Hagander"
Date:
> Also people trying to download slony have to do some hunting
> to find things.  The source only tar is not available on pgfoundry.

All gborg *downloads* are available on:
http://www.postgresql.org/ftp/projects/gborg/

Seems Slony hasn't released files using the gborg file release system,
perhaps? Because for some reason Slony stuff isn't there. But I figured
it'd be a good idea t oget that pointer in for people looking for
anything else off gborg that didn't know we mirrored those.

//Magnus




Re: gBorg status?

From
Chris Browne
Date:
mha@sollentuna.net ("Magnus Hagander") writes:
>> Also people trying to download slony have to do some hunting 
>> to find things.  The source only tar is not available on pgfoundry.
>
> All gborg *downloads* are available on:
> http://www.postgresql.org/ftp/projects/gborg/
>
> Seems Slony hasn't released files using the gborg file release system,
> perhaps? Because for some reason Slony stuff isn't there. But I figured
> it'd be a good idea t oget that pointer in for people looking for
> anything else off gborg that didn't know we mirrored those.

In the past, binaries got hosted on Jan Wieck's "downloads area" which
probably lives somewhere nearby there.  Apparently it wasn't terribly
convenient to add/drop files from the gBorg downloads area.

For 1.2, I have been putting release candidate files over at
pgFoundry, which is certainly still accessible.
-- 
(format nil "~S@~S" "cbbrowne" "ntlug.org")
http://www3.sympatico.ca/cbbrowne/finances.html
"Recursion is the root of computation since it trades description for time."
-- Alan J. Perlis


Re: gBorg status?

From
"Jim C. Nasby"
Date:
On Thu, Aug 31, 2006 at 10:33:36AM -0400, Chris Browne wrote:
> What's up there?  It has been down all week.
> 
> We're trying to get the Slony-I 1.2 release out, so we can then
> migrate over to pgFoundry.  But that doesn't working terribly well
> when gBorg's down...

Speaking of which, what's the status of the migration tools? Is that
ball still in Larry's court?
-- 
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


Re: gBorg status?

From
"Larry Rosenman"
Date:
No, since my time is up in the air at the moment, I've bowed out for now.

Once I get settled at Surgient, I might take it up again, but not right now.


-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Jim C. Nasby
Sent: Saturday, September 02, 2006 5:42 PM
To: Chris Browne
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] gBorg status?

On Thu, Aug 31, 2006 at 10:33:36AM -0400, Chris Browne wrote:
> What's up there?  It has been down all week.
> 
> We're trying to get the Slony-I 1.2 release out, so we can then 
> migrate over to pgFoundry.  But that doesn't working terribly well 
> when gBorg's down...

Speaking of which, what's the status of the migration tools? Is that ball
still in Larry's court?


-- 
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend



Re: gBorg status?

From
Christopher Browne
Date:
Oops! jim@nasby.net ("Jim C. Nasby") was seen spray-painting on a wall:
> On Thu, Aug 31, 2006 at 10:33:36AM -0400, Chris Browne wrote:
>> What's up there?  It has been down all week.
>> 
>> We're trying to get the Slony-I 1.2 release out, so we can then
>> migrate over to pgFoundry.  But that doesn't working terribly well
>> when gBorg's down...
>
> Speaking of which, what's the status of the migration tools? Is that
> ball still in Larry's court?

Which migration tools?  Were there migration tools specific to
gBorg->pgFoundry?  Or something else?
-- 
let name="cbbrowne" and tld="linuxfinances.info" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/internet.html
REALITY is an illusion that stays put.


Re: gBorg status?

From
"Joshua D. Drake"
Date:
Christopher Browne wrote:
> Oops! jim@nasby.net ("Jim C. Nasby") was seen spray-painting on a wall:
>> On Thu, Aug 31, 2006 at 10:33:36AM -0400, Chris Browne wrote:
>>> What's up there?  It has been down all week.
>>>
>>> We're trying to get the Slony-I 1.2 release out, so we can then
>>> migrate over to pgFoundry.  But that doesn't working terribly well
>>> when gBorg's down...
>> Speaking of which, what's the status of the migration tools? Is that
>> ball still in Larry's court?
> 
> Which migration tools?  Were there migration tools specific to
> gBorg->pgFoundry?  Or something else?

There was, kind of, sort of. AFAIK they are dead and gone and the 
current method of thought on Gborg is here:

http://archives.postgresql.org/pgsql-general/2006-08/msg01167.php
-- 
   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240   Providing the most comprehensive  PostgreSQL
solutionssince 1997             http://www.commandprompt.com/
 




Re: gBorg status?

From
Christopher Browne
Date:
The world rejoiced as jd@commandprompt.com ("Joshua D. Drake") wrote:
> Christopher Browne wrote:
>> Oops! jim@nasby.net ("Jim C. Nasby") was seen spray-painting on a wall:
>>> On Thu, Aug 31, 2006 at 10:33:36AM -0400, Chris Browne wrote:
>>>> What's up there?  It has been down all week.
>>>>
>>>> We're trying to get the Slony-I 1.2 release out, so we can then
>>>> migrate over to pgFoundry.  But that doesn't working terribly well
>>>> when gBorg's down...
>>> Speaking of which, what's the status of the migration tools? Is that
>>> ball still in Larry's court?
>> Which migration tools?  Were there migration tools specific to
>> gBorg->pgFoundry?  Or something else?
>
> There was, kind of, sort of. AFAIK they are dead and gone and the
> current method of thought on Gborg is here:
>
> http://archives.postgresql.org/pgsql-general/2006-08/msg01167.php

That seems fine to me.

We already had a plan set up for Slony-I that was not too dissimilar.

To wit...- There already is a project set up at pgFoundry, and fairly much   all committers at gBorg are members of the
projectat pgFoundry- Downloads are being handled from pgFoundry- We figured we'd migrate outstanding bugs to pgFoundry
byhand- We figured we'd migrate copies of old mailing list archives- We figured we'd announce the deaths of old lists,
andsuggest  subscribing to the new ones- We figured there would be some ability to copy CVS over fairly  losslessly
 

The *big* steps are moving email and CVS.

I really haven't yet heard a peep yet that hasn't been third-hand
about the gBorg outage, after it has been out for over a week now.  To
say that's displeasing is something of an understatement.  It
certainly leaves me in no mood to want to keep any services I care to
actually have running hosted on gBorg.

The fact that it has been out for a week, without any public comment
being made, certainly gives me pause.  I *HOPE* that we can still
recover CVS and email.

Question about email: Do we need to migrate the old mailing list
archives before setting up new lists on pgFoundry?  Or could I set up
a "slony1-discuss" at pgFoundry, fairly immediately, and migrate in
old archives later?  [Possibly helpful factor:  It's now September,
and the last gBorg traffic was dated in August, so we may not need to
mix months...]
-- 
(format nil "~S@~S" "cbbrowne" "gmail.com")
http://linuxdatabases.info/info/x.html
Who  needs fault-tolerant  computers when  there's obviously  an ample
market of fault-tolerant users?


Re: gBorg status?

From
"Joshua D. Drake"
Date:
> The *big* steps are moving email and CVS.

*big* but quite easy :) As long as we have an mbox file from the slony 
lists, we can import easily into pgfoundry.

CVS is just an rsync ;)

Joshua D. Drake



-- 
   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240   Providing the most comprehensive  PostgreSQL
solutionssince 1997             http://www.commandprompt.com/
 




Re: gBorg status?

From
"Dave Page"
Date:

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
> Christopher Browne
> Sent: 04 September 2006 03:55
> To: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] gBorg status?
>
> The fact that it has been out for a week, without any public comment
> being made, certainly gives me pause.  I *HOPE* that we can still
> recover CVS and email.

My understanding is that Gborg is being recovered from backup as I type.
I also understand that the delay was not caused by lack of backups or
anything similarly scary, but simply by other priorities.

Regards, Dave.


New Linux Filesystem: NILFS

From
Chris Browne
Date:
Recently seen in ACM Operating Systems Review (this is the first time
I've found as many as 1 interesting article in it in a while, and
there were 3 things I found worthwhile...):

NTT (of the recent "NTT Power Hour") have created a new filesystem: <http://www.nilfs.org/en/>

NILFS is a log-structured file system developed for Linux.  

In effect, it provides the "moral equivalent" to MVCC for filesystems;
overwrites are equivalent to delete/insert, and requires a "Cleaner"
process in order to clean out formerly-used space.

It ought to have two merits over journalling filesystems:
1.  It doesn't need to write data twice, which should improve    performance
2.  It doesn't need to repetitively overwrite metadata, which should    improve crash safety.

On the latter, per the paper:

"... These journaling filesystems enable fast and consistent recovery
of the file system after unexpected system freezes or power
failures. However, they still allow the fatal destruction of the file
system due to the characteristic that recovery is realized by
overwriting meta data with their copies saved in a journal file.  This
recovery is guaranteed to work properly only if the write order of the
on-disk data blocks and meta data blocks is physically conserved on
the disk platters. Unfortunately, this constraint is often violated by
the write optimizations performed by the block I/O subsystem and disk
controllers."

It's still at a somewhat early stage, as they haven't completed coding
the Cleaner.  (Probably should call it the Reaper... :-))

By the way, the Google SOC 2005 also produced one: <http://logfs.sourceforge.net/>

NetBSD used to have a LFS; has that gone anywhere?  Or been
essentially dropped?
-- 
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://linuxdatabases.info/info/emacs.html
"I develop for  Linux for a living, I used to  develop for DOS.  Going
from DOS to Linux is like trading a glider for an F117."
-- <entropy@world.std.com> Lawrence Foard


Re: New Linux Filesystem: NILFS

From
Douglas McNaught
Date:
Chris Browne <cbbrowne@acm.org> writes:

> NetBSD used to have a LFS; has that gone anywhere?  Or been
> essentially dropped?

My reading over the last few years has indicated that LFSs tend to
suffer bad performance degradation as data and metadata for a given
file get scattered all over the disk.  This tends to cancel out the
performance gain from being able to cluster writes in a single area.
For a heavily write-intensive workload, it might be a win, but no one
seems to have demonstrated an advantage for "normal", mostly
read-heavy usage.

-Doug


Re: New Linux Filesystem: NILFS

From
Jeff Davis
Date:
On Tue, 2006-09-05 at 18:24 -0400, Chris Browne wrote:
> Recently seen in ACM Operating Systems Review (this is the first time
> I've found as many as 1 interesting article in it in a while, and
> there were 3 things I found worthwhile...):
> 
> NTT (of the recent "NTT Power Hour") have created a new filesystem:
>   <http://www.nilfs.org/en/>
> 
> NILFS is a log-structured file system developed for Linux.  
> 

As I understand LFSs, they are not ideal for a database system. An LFS
is optimized so that it writes sequentially. However, PostgreSQL already
writes transactions sequentially in the WAL, and tries to optimize the
cleaning of dirty data pages with the background writer. So I don't see
the advantage of an LFS for a database.

Also, LFSs assume very effective read cache. Databases often hold much
more than can fit in read cache, and so frequently require disk access
for reads. An LFS scatters the data all over the disk, which destroys
the sequential access that PostgreSQL depends on for efficient index and
table scans.

Do you see an advantage in using LFS for PostgreSQL?

Did the quotation refer to people leaving write cache enabled on a
journaling filesystem?

Regards,Jeff Davis



Re: New Linux Filesystem: NILFS

From
mark@mark.mielke.cc
Date:
On Tue, Sep 05, 2006 at 05:54:50PM -0700, Jeff Davis wrote:
> On Tue, 2006-09-05 at 18:24 -0400, Chris Browne wrote:
> > Recently seen in ACM Operating Systems Review (this is the first time
> > I've found as many as 1 interesting article in it in a while, and
> > there were 3 things I found worthwhile...):
> > ...
> > NILFS is a log-structured file system developed for Linux.  
> As I understand LFSs, they are not ideal for a database system. An LFS
> is optimized so that it writes sequentially. However, PostgreSQL already
> ...
> Do you see an advantage in using LFS for PostgreSQL?

Hey guys - I think the original poster only meant to suggest that it
was *interesting*... :-)

To me, applying database concepts to file systems is interesting, and
practical. It's not a perfected science by any means, but the idea that
a file system is a hierarchical database isn't new. :-)

Applying any database on top of another database seems inefficient to me.
That's one reason why I argue the opposite - PostgreSQL *should* have its
own on disk layout, and not being laid out on top of another generic
system designed for purposes other than database storage. The reason it
isn't pursued at present, and perhaps should not be pursued at present,
is that PostgreSQL has other more important priorities in the short term.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: New Linux Filesystem: NILFS

From
Jeff Davis
Date:
On Tue, 2006-09-05 at 23:28 -0400, mark@mark.mielke.cc wrote:
> On Tue, Sep 05, 2006 at 05:54:50PM -0700, Jeff Davis wrote:
> > On Tue, 2006-09-05 at 18:24 -0400, Chris Browne wrote:
> > > Recently seen in ACM Operating Systems Review (this is the first time
> > > I've found as many as 1 interesting article in it in a while, and
> > > there were 3 things I found worthwhile...):
> > > ...
> > > NILFS is a log-structured file system developed for Linux.  
> > As I understand LFSs, they are not ideal for a database system. An LFS
> > is optimized so that it writes sequentially. However, PostgreSQL already
> > ...
> > Do you see an advantage in using LFS for PostgreSQL?
> 
> Hey guys - I think the original poster only meant to suggest that it
> was *interesting*... :-)
> 

I see, my mistake.


> Applying any database on top of another database seems inefficient to me.
> That's one reason why I argue the opposite - PostgreSQL *should* have its
> own on disk layout, and not being laid out on top of another generic
> system designed for purposes other than database storage. The reason it
> isn't pursued at present, and perhaps should not be pursued at present,
> is that PostgreSQL has other more important priorities in the short term.
> 

I think that it would be a higher priority if someone showed a
substantial performance improvement. Some filesystems don't really cause
much overhead that isn't needed by PostgreSQL.

If someone did show a substantial improvement, I would be interested to
see it.

And if there is an improvement, shouldn't that be a project for
something like Linux, where other databases could also benefit? It could
just be implemented as a database-specific filesystem.

Regards,Jeff Davis





Re: New Linux Filesystem: NILFS

From
Chris Browne
Date:
pgsql@j-davis.com (Jeff Davis) writes:
> On Tue, 2006-09-05 at 23:28 -0400, mark@mark.mielke.cc wrote:
>> On Tue, Sep 05, 2006 at 05:54:50PM -0700, Jeff Davis wrote:
>> > On Tue, 2006-09-05 at 18:24 -0400, Chris Browne wrote:
>> > > Recently seen in ACM Operating Systems Review (this is the first time
>> > > I've found as many as 1 interesting article in it in a while, and
>> > > there were 3 things I found worthwhile...):
>> > > ...
>> > > NILFS is a log-structured file system developed for Linux.  
>> > As I understand LFSs, they are not ideal for a database system. An LFS
>> > is optimized so that it writes sequentially. However, PostgreSQL already
>> > ...
>> > Do you see an advantage in using LFS for PostgreSQL?
>> 
>> Hey guys - I think the original poster only meant to suggest that it
>> was *interesting*... :-)
>> 
>
> I see, my mistake.

From a reliability perspective, I can see some value to it...  

I have seen far too many databases corrupted by journalling gone bad
in the past year...  :-(

>> Applying any database on top of another database seems inefficient
>> to me.  That's one reason why I argue the opposite - PostgreSQL
>> *should* have its own on disk layout, and not being laid out on top
>> of another generic system designed for purposes other than database
>> storage. The reason it isn't pursued at present, and perhaps should
>> not be pursued at present, is that PostgreSQL has other more
>> important priorities in the short term.
>
> I think that it would be a higher priority if someone showed a
> substantial performance improvement. Some filesystems don't really
> cause much overhead that isn't needed by PostgreSQL.
>
> If someone did show a substantial improvement, I would be interested
> to see it.
>
> And if there is an improvement, shouldn't that be a project for
> something like Linux, where other databases could also benefit? It
> could just be implemented as a database-specific filesystem.

The classic problem with log structured filesystems is that sequential
reads tend to be less efficient than in overwriting systems; perhaps
if they can get "vacuuming" to be done frequently enough, that might
change the shape of things.

That would be a relevant lesson that _we_ have discovered that is
potentially applicable to filesystem implementors.

And I don't consider this purely of academic interest; the ability to:a) Avoid the double writing of journalling, andb)
Avoidthe risks of failures due to misordered writes
 
are both genuinely valuable.
-- 
output = reverse("ofni.sesabatadxunil" "@" "enworbbc")
http://cbbrowne.com/info/lisp.html
All ITS machines now have hardware for a new machine instruction --
PFLT    Prove Fermat's Last Theorem.
Please update your programs.


Re: New Linux Filesystem: NILFS

From
Jeff Davis
Date:
On Wed, 2006-09-06 at 18:55 -0400, Chris Browne wrote:
> pgsql@j-davis.com (Jeff Davis) writes:
> >> > Do you see an advantage in using LFS for PostgreSQL?
> >> 
> >> Hey guys - I think the original poster only meant to suggest that it
> >> was *interesting*... :-)
> >> 
> >
> > I see, my mistake.
> 
> >From a reliability perspective, I can see some value to it...  
> 
> I have seen far too many databases corrupted by journalling gone bad
> in the past year...  :-(
> 

Can you elaborate a little? Which filesystems have been problematic?
Which filesystems are you more confident in?

> >
> > And if there is an improvement, shouldn't that be a project for
> > something like Linux, where other databases could also benefit? It
> > could just be implemented as a database-specific filesystem.
> 
> The classic problem with log structured filesystems is that sequential
> reads tend to be less efficient than in overwriting systems; perhaps
> if they can get "vacuuming" to be done frequently enough, that might
> change the shape of things.
> 
> That would be a relevant lesson that _we_ have discovered that is
> potentially applicable to filesystem implementors.
> 
> And I don't consider this purely of academic interest; the ability to:
>  a) Avoid the double writing of journalling, and
>  b) Avoid the risks of failures due to misordered writes
> are both genuinely valuable.

Right, LFS is promising in a number of ways. I've read about it in the
past, and it would be nice if this NILFS implementation sparks some new
research in the area.

Regards,Jeff Davis



Re: gBorg status?

From
"Marc G. Fournier"
Date:
On Mon, 4 Sep 2006, Dave Page wrote:

> My understanding is that Gborg is being recovered from backup as I type. 
> I also understand that the delay was not caused by lack of backups or 
> anything similarly scary, but simply by other priorities.

Yes, I have the backup uploaded right now, and doing 'recover procedures' 
as I type this ... it will be up before I go to bed tonight ...

As for the 'other priorities' ... the delay was due to an insufficient 
Internet connection to upload the backup ... we were waiting for our 
Internet to be installed in our new location, and were using an 
underpowered temporary connection in the interim ... that one was going to 
tie up the connection for >40 hours ... once we were able to get the 
backup server onto its 'permanent connection', the upload took ~5hrs ...

As I mentioned, am working on it right now ... will post a follow up once 
she's back up live and needing to be tested ... she's also moving to one 
of our 64bit servers, so should be a wee bit better performance wise ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664


Re: New Linux Filesystem: NILFS

From
Christopher Browne
Date:
pgsql@j-davis.com (Jeff Davis) wrote:
> On Wed, 2006-09-06 at 18:55 -0400, Chris Browne wrote:
>> pgsql@j-davis.com (Jeff Davis) writes:
>> >> > Do you see an advantage in using LFS for PostgreSQL?
>> >> 
>> >> Hey guys - I think the original poster only meant to suggest that it
>> >> was *interesting*... :-)
>> >> 
>> >
>> > I see, my mistake.
>> 
>> >From a reliability perspective, I can see some value to it...  
>> 
>> I have seen far too many databases corrupted by journalling gone bad
>> in the past year...  :-(
>
> Can you elaborate a little? Which filesystems have been problematic?
> Which filesystems are you more confident in?

Well, more or less *all* of them, on AMD-64/Linux.

The "pulling the fibrechannel cable" test blew them all.  XFS, ext3,
JFS.  ReiserFS was, if I recall correctly, marginally better, but only
marginally.

On AIX, we have seen JFS2 falling over when there were enough levels
of buffering in the way on disk arrays.

>> > And if there is an improvement, shouldn't that be a project for
>> > something like Linux, where other databases could also benefit? 
>> > It could just be implemented as a database-specific filesystem.
>> 
>> The classic problem with log structured filesystems is that
>> sequential reads tend to be less efficient than in overwriting
>> systems; perhaps if they can get "vacuuming" to be done frequently
>> enough, that might change the shape of things.
>> 
>> That would be a relevant lesson that _we_ have discovered that is
>> potentially applicable to filesystem implementors.
>> 
>> And I don't consider this purely of academic interest; the ability to:
>>  a) Avoid the double writing of journalling, and
>>  b) Avoid the risks of failures due to misordered writes
>> are both genuinely valuable.
>
> Right, LFS is promising in a number of ways. I've read about it in
> the past, and it would be nice if this NILFS implementation sparks
> some new research in the area.

Indeed.

I don't see it being a "production-ready" answer yet, but yeah, I'd
certainly like to see the research continue.  A vital problem is in
the area of vacuuming; there may be things to be learned in both
directions.
-- 
output = reverse("moc.liamg" "@" "enworbbc")
http://linuxdatabases.info/info/fs.html
Health is merely the slowest possible rate at which one can die.


Re: gBorg status?

From
"Marc G. Fournier"
Date:
everything should be back up and running ... vServer is now running on our 
newest, 64bit HP server with FreeBSD 6.x ...

On Wed, 6 Sep 2006, Marc G. Fournier wrote:

> On Mon, 4 Sep 2006, Dave Page wrote:
>
>> My understanding is that Gborg is being recovered from backup as I type. I 
>> also understand that the delay was not caused by lack of backups or 
>> anything similarly scary, but simply by other priorities.
>
> Yes, I have the backup uploaded right now, and doing 'recover procedures' as 
> I type this ... it will be up before I go to bed tonight ...
>
> As for the 'other priorities' ... the delay was due to an insufficient 
> Internet connection to upload the backup ... we were waiting for our Internet 
> to be installed in our new location, and were using an underpowered temporary 
> connection in the interim ... that one was going to tie up the connection for 
>> 40 hours ... once we were able to get the backup server onto its 'permanent 
> connection', the upload took ~5hrs ...
>
> As I mentioned, am working on it right now ... will post a follow up once 
> she's back up live and needing to be tested ... she's also moving to one of 
> our 64bit servers, so should be a wee bit better performance wise ...
>
> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
> Email . scrappy@hub.org                              MSN . scrappy@hub.org
> Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>              http://archives.postgresql.org
>

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664


Re: New Linux Filesystem: NILFS

From
Jeff Davis
Date:
On Wed, 2006-09-06 at 22:12 -0400, Christopher Browne wrote:

> > Can you elaborate a little? Which filesystems have been problematic?
> > Which filesystems are you more confident in?
> 
> Well, more or less *all* of them, on AMD-64/Linux.
> 
> The "pulling the fibrechannel cable" test blew them all.  XFS, ext3,
> JFS.  ReiserFS was, if I recall correctly, marginally better, but only
> marginally.
> 
> On AIX, we have seen JFS2 falling over when there were enough levels
> of buffering in the way on disk arrays.
> 

Well, that's interesting. I suppose I can't count on the filesystem as
much as I thought. Are you implying that the filesystems aren't ready on
64-bit? Is it more of a hardware issue (a controller lying about the
security of the write)? Any comments on FreeBSD/UFS+SU? I would expect
UFS+SU to have similar issues, since it depends on write ordering also.

What do you do for better data security (aside from the obvious "don't
pull cables")?

Regards,Jeff Davis







Re: New Linux Filesystem: NILFS

From
Chris Browne
Date:
pgsql@j-davis.com (Jeff Davis) writes:
> On Wed, 2006-09-06 at 22:12 -0400, Christopher Browne wrote:
>
>> > Can you elaborate a little? Which filesystems have been problematic?
>> > Which filesystems are you more confident in?
>> 
>> Well, more or less *all* of them, on AMD-64/Linux.
>> 
>> The "pulling the fibrechannel cable" test blew them all.  XFS, ext3,
>> JFS.  ReiserFS was, if I recall correctly, marginally better, but only
>> marginally.
>> 
>> On AIX, we have seen JFS2 falling over when there were enough levels
>> of buffering in the way on disk arrays.
>
> Well, that's interesting. I suppose I can't count on the filesystem
> as much as I thought. Are you implying that the filesystems aren't
> ready on 64-bit?

I don't think this necessarily is a 64 bit issue; it's more that with
the more esoteric, expensive disk array hardware, there are fewer with
the ability to test it, because you need $200K worth of hardware
around to do the testing.

> Is it more of a hardware issue (a controller lying about the
> security of the write)? Any comments on FreeBSD/UFS+SU? I would
> expect UFS+SU to have similar issues, since it depends on write
> ordering also.
>
> What do you do for better data security (aside from the obvious
> "don't pull cables")?

The last time we looked, FreeBSD wasn't an option at all, because
there wasn't any suitable FibreChannel support.  That may have
changed; haven't researched lately.

The trouble that the NILFS people pointed out seems a troublesome one,
namely that the more levels of cacheing (even if battery-backed), the
less certain you can be that the hardware isn't lying about write
ordering.

I haven't got an answer...
-- 
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://linuxdatabases.info/info/multiplexor.html
Jury  -- Twelve  people  who  determine which  client  has the  better
lawyer.


Stats Collector Oddity

From
Chris Browne
Date:
We're getting a bit of an anomaly relating to pg_stat_activity...

oxrstld=# SELECT * from pg_stat_activity where current_query <> '<IDLE>';  datid    | datname | procpid | usesysid |
usename| current_query |          query_start          
 
------------+---------+---------+----------+---------+---------------+-------------------------------1347729970 |
oxrstld|  893094 |      122 | tldepp  | commit        | 2006-12-16 19:34:08.583978+00
 
(1 row)
oxrstld=# select version();                                  version                                    
------------------------------------------------------------------------------PostgreSQL 7.4.12 on
powerpc-ibm-aix5.3.0.0,compiled by GCC gcc (GCC) 4.0.1
 
(1 row)

That PID has been dead for several days, but this connection is marked
as being open, still, after lo many days.

This has *traditionally* been a sign that the stats collector has been
clobbered, at which point pg_stat_activity becomes useless.  That's
not the case here; a less restricted query on pg_stat_activity shows
other more recent data that keeps changing, seemingly consistent with
system activity.

I can track most of the lifecycle of that PID; the connection was
established at 2006-12-16 18:46:38, and I see a few errors associated
with the PID (violations of unique constraints; usual business stuff).
I see nothing in the logs at around 19:34 which would suggest a reason
for any strange behaviour.  There are no records about that PID after
19:34...

I'd like to get rid of this entry, if I can; it's blowing up tests
that warn us about elderly transactions, causing a false positive...
-- 
output = reverse("moc.enworbbc" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/
Rules  of the  Evil Overlord  #212. "I  will not  send  out battalions
composed wholly of robots or  skeletons against heroes who have qualms
about killing living beings.  <http://www.eviloverlord.com/>


Re: Stats Collector Oddity

From
Tom Lane
Date:
Chris Browne <cbbrowne@acm.org> writes:
> We're getting a bit of an anomaly relating to pg_stat_activity...
> ...
> That PID has been dead for several days, but this connection is marked
> as being open, still, after lo many days.

This probably just means that the "backend termination" stats message
got dropped due to heavy load.  That's expected behavior in all pre-8.2
releases: the stats system was never intended to provide
guaranteed-exactly-correct status.  PG 8.2 has reimplemented the
pg_stat_activity view to make it more trustworthy.  (The other stuff is
still probabilistic, but being just event counters, message loss isn't
so obvious.)
        regards, tom lane


Re: Stats Collector Oddity

From
Chris Browne
Date:
tgl@sss.pgh.pa.us (Tom Lane) writes:
> Chris Browne <cbbrowne@acm.org> writes:
>> We're getting a bit of an anomaly relating to pg_stat_activity...
>> ...
>> That PID has been dead for several days, but this connection is marked
>> as being open, still, after lo many days.
>
> This probably just means that the "backend termination" stats message
> got dropped due to heavy load.  That's expected behavior in all pre-8.2
> releases: the stats system was never intended to provide
> guaranteed-exactly-correct status.  PG 8.2 has reimplemented the
> pg_stat_activity view to make it more trustworthy.  (The other stuff is
> still probabilistic, but being just event counters, message loss isn't
> so obvious.)

That seems a *bit* surprising; the system wasn't expected to be under
particularly heavy load during the period in question; I would have
expected "particularly light load."  No matter; there may have been
some brief heavy load to cause this.

There isn't any way, short of restarting the postmaster, to get rid of
that PID, is there?
-- 
"cbbrowne","@","linuxdatabases.info"
http://cbbrowne.com/info/linuxdistributions.html
"High-level languages are a pretty good indicator that all else is
seldom equal." - Tim Bradshaw, comp.lang.lisp


Re: Stats Collector Oddity

From
Tom Lane
Date:
Chris Browne <cbbrowne@acm.org> writes:
> There isn't any way, short of restarting the postmaster, to get rid of
> that PID, is there?

The entry will get overwritten when that BackendId slot gets re-used,
so just starting enough concurrent backends should do it.  (Since
incoming backends always take the lowest free slot, the fact that the
dead entry has persisted awhile means that it must have a number higher
than your normal number of concurrent sessions ... which is evidence
in favor of the idea that it happened during a load spike ...)
        regards, tom lane


Re: Stats Collector Oddity

From
Christopher Browne
Date:
A long time ago, in a galaxy far, far away, tgl@sss.pgh.pa.us (Tom Lane) wrote:
> Chris Browne <cbbrowne@acm.org> writes:
>> There isn't any way, short of restarting the postmaster, to get rid of
>> that PID, is there?
>
> The entry will get overwritten when that BackendId slot gets re-used,
> so just starting enough concurrent backends should do it.  (Since
> incoming backends always take the lowest free slot, the fact that the
> dead entry has persisted awhile means that it must have a number higher
> than your normal number of concurrent sessions ... which is evidence
> in favor of the idea that it happened during a load spike ...)

Cool.  I started up a nice little bunch of psql sessions in the
background, and then once they were all up, shut down my shell
session, thereby eliminating them.  And that did, indeed, clear out
that pg_stat_activity entry.

... And five minutes later, Nagios sent me message indicating that
node had recovered from having an "ancient" open connection.

I'll re-add a few gratuitous details here in the hopes that that makes
this easily findable if anyone else should search for the issue...

The Problem:- pg_stat_activity was reporting an elderly transaction in progress
- that backend process wasn't running anymore
- pg_stat_activity *was* reporting other legitimate activity; this  was not the scenario where it had gotten deranged
(normallydue to  excessive load)
 
- Per Tom's comments, there evidently *was* some load spike where  the closing of this particular connection did not
getlogged by  the stats collector
 

The Solution: 
- We needed to roll the stats collector through a bunch of its slots  in order to clean the apparently-still-populated
entryout.
 
- Ran, in a shell:    for i in `seq 100`; do        psql &    done
  That left 100 psql sessions in the background, all connected to the  database backend.
- Closed the shell.  That then HUPped the 100 psql sessions.

That got the offending pg_stat_activity entry cleared out.
-- 
let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/finances.html
"Temporary tattoos  are a CRITICAL  ELEMENT  of our security strategy.
To suggest otherwise is sheer lunacy."  -- Reid Fleming, cDc


TOASTing smaller things

From
Chris Browne
Date:
In some of our applications, we have cases where it would be very nice
if we could activate TOAST at some sort of lower threshold than the
usual 2K that is true now.  Let me note the current code that controls
the threshold:

/** These symbols control toaster activation.  If a tuple is larger than* TOAST_TUPLE_THRESHOLD, we will try to toast
itdown to no more than* TOAST_TUPLE_TARGET bytes.  Both numbers include all tuple header overhead* and between-fields
alignmentpadding, but we do *not* consider any* end-of-tuple alignment padding; hence the values can be compared
directly*to a tuple's t_len field.  We choose TOAST_TUPLE_THRESHOLD with the* knowledge that toast-table tuples will be
exactlythat size, and we'd* like to fit four of them per page with minimal space wastage.** The numbers need not be the
same,though they currently are.** Note: sizeof(PageHeaderData) includes the first ItemId, but we have* to allow for 3
more,if we want to fit 4 tuples on a page.*/
 
#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I   MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I  / 4)

We have cases where we're storing XML message information which is
near the 0.5K mark, that being the case, tuples virtually never get
TOASTed.

somesystem=# select min(length(xml)), max(length(xml)), avg(length(xml)), stddev(length(xml)) from table_with_xml;min |
max |         avg          |      stddev      
 
-----+------+----------------------+------------------244 | 2883 | 651.6900720788174376 | 191.602077911138
(1 row)

I can see four controls as being pretty plausible:

1.  Compile time...

#define TOAST_DENOMINATOR 17    /* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4
*/

#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I   MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I  / TOAST_DENOMINATOR)

That's obviously cheapest to the DB engine.

I just made this change to a checkout of CVS HEAD, and it readily
survived a regression test.

2.  GUC value for TOAST_DENOMINATOR

Do the above, but with the added detail that TOAST_DENOMINATOR refers
to a GUC value.

I think I could probably make this change; the principle remains much
the same as with #1.

3.  GUC value for TOAST_TUPLE_THRESHOLD

This probably has to get modified to the nearest feasible value,
modulo alignment; it's not all that different from #1 or #2.

4.  A different mechanism would be to add a fifth storage column
strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
say, TOAST.

At present, the 4 values are essentially advisory; columns get TOASTed
if the column permits EXTENDED storage, but that only occurs if the
size is greater than TOAST_TUPLE_THRESHOLD.

If the new value was chosen, the column would *always* get stored as
TOAST.

Presumably #1 or #2 could readily get into 8.3 as they're pretty easy;
#3 is a bit trickier, whilst #4 is probably not "8.3-fittable".

Question:

Which of these sounds preferable?
-- 
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/finances.html
Where do you  *not* want to go today?  "Confutatis maledictis, flammis
acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html>


Re: TOASTing smaller things

From
Tom Lane
Date:
Chris Browne <cbbrowne@acm.org> writes:
> #define TOAST_DENOMINATOR 17  
>    /* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */

> #define TOAST_TUPLE_THRESHOLD^I\
> ^IMAXALIGN_DOWN((BLCKSZ - \
> ^I^I^I^I   MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
> ^I^I^I^I  / TOAST_DENOMINATOR)

Given that you are quoting code that was demonstrably broken since the
original coding of TOAST up till a month or two back, "it passes
regression" is not adequate proof of "it's right".  In fact I think
it's not right; you have not got the roundoff condition straight.

> 4.  A different mechanism would be to add a fifth storage column
> strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
> say, TOAST.

Anything along this line would require invoking the toaster on every
single tuple, since we'd always have to crawl through all the columns
to see if toasting was supposed to happen.  No thanks.

> Which of these sounds preferable?

It's a bit late in the cycle to be proposing any of these for 8.3.
        regards, tom lane


Re: TOASTing smaller things

From
Jan Wieck
Date:
On 3/21/2007 2:05 PM, Tom Lane wrote:
> Chris Browne <cbbrowne@acm.org> writes:
>> #define TOAST_DENOMINATOR 17  
>>    /* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */
> 
>> #define TOAST_TUPLE_THRESHOLD^I\
>> ^IMAXALIGN_DOWN((BLCKSZ - \
>> ^I^I^I^I   MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
>> ^I^I^I^I  / TOAST_DENOMINATOR)
> 
> Given that you are quoting code that was demonstrably broken since the
> original coding of TOAST up till a month or two back, "it passes
> regression" is not adequate proof of "it's right".  In fact I think
> it's not right; you have not got the roundoff condition straight.
> 
>> 4.  A different mechanism would be to add a fifth storage column
>> strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
>> say, TOAST.

FORCE_COMPRESSION, FORCE_EXTERNAL and FORCE_EXTERNAL_UNCOMPRESSED.

> 
> Anything along this line would require invoking the toaster on every
> single tuple, since we'd always have to crawl through all the columns
> to see if toasting was supposed to happen.  No thanks.

Not necessarily. A flag in Relation telling if the table has any column 
marked like that could be set while constructing the relcache entry.

> 
>> Which of these sounds preferable?
> 
> It's a bit late in the cycle to be proposing any of these for 8.3.

Certainly.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: TOASTing smaller things

From
Bruce Momjian
Date:
Is this a TODO?

---------------------------------------------------------------------------

Jan Wieck wrote:
> On 3/21/2007 2:05 PM, Tom Lane wrote:
> > Chris Browne <cbbrowne@acm.org> writes:
> >> #define TOAST_DENOMINATOR 17  
> >>    /* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */
> > 
> >> #define TOAST_TUPLE_THRESHOLD^I\
> >> ^IMAXALIGN_DOWN((BLCKSZ - \
> >> ^I^I^I^I   MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
> >> ^I^I^I^I  / TOAST_DENOMINATOR)
> > 
> > Given that you are quoting code that was demonstrably broken since the
> > original coding of TOAST up till a month or two back, "it passes
> > regression" is not adequate proof of "it's right".  In fact I think
> > it's not right; you have not got the roundoff condition straight.
> > 
> >> 4.  A different mechanism would be to add a fifth storage column
> >> strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
> >> say, TOAST.
> 
> FORCE_COMPRESSION, FORCE_EXTERNAL and FORCE_EXTERNAL_UNCOMPRESSED.
> 
> > 
> > Anything along this line would require invoking the toaster on every
> > single tuple, since we'd always have to crawl through all the columns
> > to see if toasting was supposed to happen.  No thanks.
> 
> Not necessarily. A flag in Relation telling if the table has any column 
> marked like that could be set while constructing the relcache entry.
> 
> > 
> >> Which of these sounds preferable?
> > 
> > It's a bit late in the cycle to be proposing any of these for 8.3.
> 
> Certainly.
> 
> 
> Jan
> 
> -- 
> #======================================================================#
> # It's easier to get forgiveness for being wrong than for being right. #
> # Let's break this rule - forgive me.                                  #
> #================================================== JanWieck@Yahoo.com #
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOASTing smaller things

From
Chris Browne
Date:
tgl@sss.pgh.pa.us (Tom Lane) writes:
> Chris Browne <cbbrowne@acm.org> writes:
>> #define TOAST_DENOMINATOR 17  
>>    /* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */
>
>> #define TOAST_TUPLE_THRESHOLD^I\
>> ^IMAXALIGN_DOWN((BLCKSZ - \
>> ^I^I^I^I   MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
>> ^I^I^I^I  / TOAST_DENOMINATOR)
>
> Given that you are quoting code that was demonstrably broken since
> the original coding of TOAST up till a month or two back, "it passes
> regression" is not adequate proof of "it's right".  In fact I think
> it's not right; you have not got the roundoff condition straight.

OK, then maybe some refinement was needed.  That seemed too easy.

Mind you, the problem seems to me to be that TOAST_TUPLE_THRESHOLD is
not solely a threshold to compare things to (as done in
heapam.c/toasting.c), but gets reused to calculate
TOAST_MAX_CHUNK_SIZE.  If the threshold was solely used as that,
alignment wouldn't matter.

FYI, I took a sample table and loaded it into the resulting 8.3
backend based on the us of the totally naive TOAST_DENOMINATOR; there
may be something off in the sizing of the chunks, but that does not
appear to have injured fidelity of the tuples I stored.

Vacuum output:

--- Production system (no TOASTing)
INFO:  "xml_log_table": found 0 removable, 1731329 nonremovable row versions in 175870 pages
DETAIL:  0 dead row versions cannot be removed yet.

--- In the 8.3 instance that did toast things...
INFO:  "xml_log_table": found 0 removable, 1730737 nonremovable row versions in 41120 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
610 pages contain useful free space.
0 pages are entirely empty.
CPU 1.08s/0.36u sec elapsed 14.94 sec.
INFO:  vacuuming "pg_toast.pg_toast_49194"
INFO:  index "pg_toast_49194_index" now contains 2303864 row versions in 6319 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.09s/0.03u sec elapsed 1.71 sec.
INFO:  "pg_toast_49194": found 0 removable, 2303864 nonremovable row versions in 98191 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
514 pages contain useful free space.
0 pages are entirely empty.

Problem with alignment of TOAST_MAX_CHUNK_SIZE or not, I seem to be
getting the right results, and this nicely partitions the table into 2
chunks, one, with the non-XML data, that occupies 41K pages, and the
TOAST section storing those less-frequently-accessed columns.  (There
is a size difference; the production instance has more empty space
since it sees active inserts + deletes.)

In all ways except for "strict hygenic correctness of code," this
accomplished what I was hoping.

If someone could make a round-off-safe calculation of
TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE that exposed the
denominator so it could be safely modified, that would be one step
ahead...  I generally try not to make changes to the core, so I'll try
to avoid that...

>> 4.  A different mechanism would be to add a fifth storage column
>> strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
>> say, TOAST.
>
> Anything along this line would require invoking the toaster on every
> single tuple, since we'd always have to crawl through all the columns
> to see if toasting was supposed to happen.  No thanks.

Ah, I see.  I infer from that that the code starts by checking to see
if the tuple size is > TOAST_TUPLE_THRESHOLD, and only starts
rummaging through TOAST infrastructure if the tuple is big enough.

In that case, "TOAST by default" becomes rather a nonstarter, I agree.
In the application context I'm thinking of, one table out of ~80 is a
"TOAST candidate;" making access to the other 79 slower would not be
of benefit.

(Aside: I'll link to Simon Rigg's related note, as well as to the item
on the TODO list...)
<http://archives.postgresql.org/pgsql-hackers/2007-02/msg00213.php>
<http://www.postgresql.org/docs/faqs.TODO.html>
-- 
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/finances.html
Where do you  *not* want to go today?  "Confutatis maledictis, flammis
acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html>


Re: TOASTing smaller things

From
"Jim C. Nasby"
Date:
On Wed, Mar 21, 2007 at 12:37:36PM -0400, Chris Browne wrote:
> 4.  A different mechanism would be to add a fifth storage column
> strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
> say, TOAST.
> 
> At present, the 4 values are essentially advisory; columns get TOASTed
> if the column permits EXTENDED storage, but that only occurs if the
> size is greater than TOAST_TUPLE_THRESHOLD.
> 
> If the new value was chosen, the column would *always* get stored as
> TOAST.
Rather than a hard and fast limit of 0, why not allow defining a size
threshold? And while we're at it, how about a size threshold for when to
try compressing, too?

> Presumably #1 or #2 could readily get into 8.3 as they're pretty easy;
> #3 is a bit trickier, whilst #4 is probably not "8.3-fittable".
> 
> Question:
> 
> Which of these sounds preferable?

1 and 2 (variations on how to set the denominator) sound completely
ugly. Trying to minimize wasted space in a toast table is great for a
default, but exposing something like that to the users via any kind of
setting seems way to obtuse.

#3 (GUC for number of bytes) may not make sense for performance reasons,
as Tom mentioned. I'm hoping that it would be easy to check either
pg_class or pg_attribute to see if a table/column has non-standard
toast/compression limits.
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: TOASTing smaller things

From
"Luke Lonergan"
Date:
I advocate the following:

- Enable specification of TOAST policy on a per column basis

As a first step, then:

- Enable vertical partitioning of tables using per-column specification of
storage policy.

- Luke


On 3/21/07 1:12 PM, "Bruce Momjian" <bruce@momjian.us> wrote:

> 
> Is this a TODO?
> 
> ---------------------------------------------------------------------------
> 
> Jan Wieck wrote:
>> On 3/21/2007 2:05 PM, Tom Lane wrote:
>>> Chris Browne <cbbrowne@acm.org> writes:
>>>> #define TOAST_DENOMINATOR 17
>>>>    /* Use this as the divisor; current default behaviour falls from
>>>> TOAST_DENOMINATOR = 4 */
>>> 
>>>> #define TOAST_TUPLE_THRESHOLD^I\
>>>> ^IMAXALIGN_DOWN((BLCKSZ - \
>>>> ^I^I^I^I   MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
>>>> ^I^I^I^I  / TOAST_DENOMINATOR)
>>> 
>>> Given that you are quoting code that was demonstrably broken since the
>>> original coding of TOAST up till a month or two back, "it passes
>>> regression" is not adequate proof of "it's right".  In fact I think
>>> it's not right; you have not got the roundoff condition straight.
>>> 
>>>> 4.  A different mechanism would be to add a fifth storage column
>>>> strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
>>>> say, TOAST.
>> 
>> FORCE_COMPRESSION, FORCE_EXTERNAL and FORCE_EXTERNAL_UNCOMPRESSED.
>> 
>>> 
>>> Anything along this line would require invoking the toaster on every
>>> single tuple, since we'd always have to crawl through all the columns
>>> to see if toasting was supposed to happen.  No thanks.
>> 
>> Not necessarily. A flag in Relation telling if the table has any column
>> marked like that could be set while constructing the relcache entry.
>> 
>>> 
>>>> Which of these sounds preferable?
>>> 
>>> It's a bit late in the cycle to be proposing any of these for 8.3.
>> 
>> Certainly.
>> 
>> 
>> Jan
>> 
>> -- 
>> #======================================================================#
>> # It's easier to get forgiveness for being wrong than for being right. #
>> # Let's break this rule - forgive me.                                  #
>> #================================================== JanWieck@Yahoo.com #
>> 
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: Don't 'kill -9' the postmaster




Re: TOASTing smaller things

From
Andreas Pflug
Date:
Luke Lonergan wrote:
> I advocate the following:
>
> - Enable specification of TOAST policy on a per column basis
>
> As a first step, then:
>
> - Enable vertical partitioning of tables using per-column specification of
> storage policy.
>   
Wouldn't it be enough to enable having the toast table on a different
table space?

Regards,
Andreas


Re: TOASTing smaller things

From
"Luke Lonergan"
Date:
Andreas,

On 3/22/07 9:40 AM, "Andreas Pflug" <pgadmin@pse-consulting.de> wrote:

> Wouldn't it be enough to enable having the toast table on a different
> table space?

Yes, but the ultimate goal would allow the allocation of a storage mechanism
that is unique to each column.  The most frequently used mechanism for our
customers would likely be to have each column associated with it's own
internal relation (like a separate TOAST table), which puts each column into
it's own dense page storage.

Beside the advantages of separating out keys columns from data columns,
compression and encryption approaches that use column packed data are much
more effective.

As Tom points out there are complications WRT update, access, etc that need
to be worked out to support this, but it's an important capability to have
IMO.

- Luke       




Re: TOASTing smaller things

From
Chris Browne
Date:
llonergan@greenplum.com ("Luke Lonergan") writes:
> Andreas,
>
> On 3/22/07 9:40 AM, "Andreas Pflug" <pgadmin@pse-consulting.de> wrote:
>
>> Wouldn't it be enough to enable having the toast table on a different
>> table space?
>
> Yes, but the ultimate goal would allow the allocation of a storage mechanism
> that is unique to each column.  The most frequently used mechanism for our
> customers would likely be to have each column associated with it's own
> internal relation (like a separate TOAST table), which puts each column into
> it's own dense page storage.
>
> Beside the advantages of separating out keys columns from data columns,
> compression and encryption approaches that use column packed data are much
> more effective.
>
> As Tom points out there are complications WRT update, access, etc that need
> to be worked out to support this, but it's an important capability to have
> IMO.

Hmm.  Are you trying to do something sort of like CStore?

http://db.csail.mit.edu/projects/cstore/

That seems to have some relevant ideas...
-- 
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/finances.html
Where do you  *not* want to go today?  "Confutatis maledictis, flammis
acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html>


Re: TOASTing smaller things

From
"Luke Lonergan"
Date:
Chris,

> Hmm.  Are you trying to do something sort of like CStore?
> 
> http://db.csail.mit.edu/projects/cstore/
> 
> That seems to have some relevant ideas...

I think "something like" is a good way to put it.  As you know Cstore was a
prototype for Vertica and these are in the same class as SybaseIQ and
SandDB.

The huge drawback of the pure column approach is update/insert while query
is difficult if not impossible.  I think there are hybrid approaches that
yield most, if not all of the benefits of the column store approach without
the disadvantages.

For instance, a bitmap index with index only access in a row-store may
outperform the column store on queries.  Note the "index only access" part.
The next advantage of a column store is deep compression, preserved through
the executor access path - we can do this with selective vertical
partitioning using a page-segmented WAH compression similar to what we did
with bitmap index.  Lastly, vectorization of the operators in the executor
can be implemented with vertical partitioning and an access path that does
projection before feeding the columns into the executor - this can be done
in Postgres with a cache-bypass method.  Some of this requires working out
answers to the visibility challenges inherent to MVCC, but it's all possible
IMO.

So - under the guise of "TOASTing smaller things", it seems relevant to
think about vertical partitioning, perhaps making use of what's already in
Postgres as baby steps toward more advanced features.

- Luke   




Re: TOASTing smaller things

From
Hannu Krosing
Date:
Ühel kenal päeval, N, 2007-03-22 kell 10:19, kirjutas Luke Lonergan:
> Andreas,
> 
> On 3/22/07 9:40 AM, "Andreas Pflug" <pgadmin@pse-consulting.de> wrote:
> 
> > Wouldn't it be enough to enable having the toast table on a different
> > table space?
> 
> Yes, but the ultimate goal would allow the allocation of a storage mechanism
> that is unique to each column.  The most frequently used mechanism for our
> customers would likely be to have each column associated with it's own
> internal relation (like a separate TOAST table), which puts each column into
> it's own dense page storage.

Maybe monetdb ( http://monetdb.cwi.nl/ ) can be of some inspiration ?

btw, it may be a good idea to have a verion of bizgresMPP which has
monetdb as partition db, if monetdb is as efficient as they tell it is .

> 
> Beside the advantages of separating out keys columns from data columns,
> compression and encryption approaches that use column packed data are much
> more effective.
> 
> As Tom points out there are complications WRT update, access, etc that need
> to be worked out to support this, but it's an important capability to have
> IMO.
> 
> - Luke       
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
> 
>                 http://www.postgresql.org/about/donate
-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




Re: TOASTing smaller things

From
"Luke Lonergan"
Date:
Hi Hannu,

On 3/22/07 3:21 PM, "Hannu Krosing" <hannu@skype.net> wrote:

> Maybe monetdb ( http://monetdb.cwi.nl/ ) can be of some inspiration ?
> 
> btw, it may be a good idea to have a verion of bizgresMPP which has
> monetdb as partition db, if monetdb is as efficient as they tell it is .

Yep - I've talked this over with the MonetDB folks in the past.  The major
benefits they observe are those same things we talk about here,
implementation of long loops for operators and de-abstraction of operations
like compare() when appropriate, say comparing two INT columns in a sort.

We can get many of those benefits without by "vectorizing" the executor of
PostgreSQL even without the full column partitioning.  We're in the midst of
working some of those changes as we speak.  Early indications are that we
see large performance gains from this approach.  Note that the actual
instruction counts per row don't change, but the more effective use of L2 I
and D cache and superscaler instruction units on the CPU create the big
gains.  The MonetDB people present some effective literature on this, but
the important gains mostly come from the "vectorization", not the operator
de-abstraction IMO, which is good news for us all.

- Luke   




Re: TOASTing smaller things

From
Bruce Momjian
Date:
Luke Lonergan wrote:
> I advocate the following:
> 
> - Enable specification of TOAST policy on a per column basis
> 
> As a first step, then:
> 
> - Enable vertical partitioning of tables using per-column specification of
> storage policy.
> 

How are these different from ALTER TABLE SET STORAGE?  They need to be
more specific.

---------------------------------------------------------------------------


> - Luke
> 
> 
> On 3/21/07 1:12 PM, "Bruce Momjian" <bruce@momjian.us> wrote:
> 
> > 
> > Is this a TODO?
> > 
> > ---------------------------------------------------------------------------
> > 
> > Jan Wieck wrote:
> >> On 3/21/2007 2:05 PM, Tom Lane wrote:
> >>> Chris Browne <cbbrowne@acm.org> writes:
> >>>> #define TOAST_DENOMINATOR 17
> >>>>    /* Use this as the divisor; current default behaviour falls from
> >>>> TOAST_DENOMINATOR = 4 */
> >>> 
> >>>> #define TOAST_TUPLE_THRESHOLD^I\
> >>>> ^IMAXALIGN_DOWN((BLCKSZ - \
> >>>> ^I^I^I^I   MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
> >>>> ^I^I^I^I  / TOAST_DENOMINATOR)
> >>> 
> >>> Given that you are quoting code that was demonstrably broken since the
> >>> original coding of TOAST up till a month or two back, "it passes
> >>> regression" is not adequate proof of "it's right".  In fact I think
> >>> it's not right; you have not got the roundoff condition straight.
> >>> 
> >>>> 4.  A different mechanism would be to add a fifth storage column
> >>>> strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
> >>>> say, TOAST.
> >> 
> >> FORCE_COMPRESSION, FORCE_EXTERNAL and FORCE_EXTERNAL_UNCOMPRESSED.
> >> 
> >>> 
> >>> Anything along this line would require invoking the toaster on every
> >>> single tuple, since we'd always have to crawl through all the columns
> >>> to see if toasting was supposed to happen.  No thanks.
> >> 
> >> Not necessarily. A flag in Relation telling if the table has any column
> >> marked like that could be set while constructing the relcache entry.
> >> 
> >>> 
> >>>> Which of these sounds preferable?
> >>> 
> >>> It's a bit late in the cycle to be proposing any of these for 8.3.
> >> 
> >> Certainly.
> >> 
> >> 
> >> Jan
> >> 
> >> -- 
> >> #======================================================================#
> >> # It's easier to get forgiveness for being wrong than for being right. #
> >> # Let's break this rule - forgive me.                                  #
> >> #================================================== JanWieck@Yahoo.com #
> >> 
> >> ---------------------------(end of broadcast)---------------------------
> >> TIP 2: Don't 'kill -9' the postmaster

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOASTing smaller things

From
"Luke Lonergan"
Date:
Hi Bruce,

How about these:

- Allow specification of TOAST size threshold in bytes on a per column basis
- Enable storage of columns in separate TOAST tables
- Enable use of multi-row compression method(s) for TOAST tables

- Luke


On 3/26/07 5:39 PM, "Bruce Momjian" <bruce@momjian.us> wrote:

> Luke Lonergan wrote:
>> I advocate the following:
>> 
>> - Enable specification of TOAST policy on a per column basis
>> 
>> As a first step, then:
>> 
>> - Enable vertical partitioning of tables using per-column specification of
>> storage policy.
>> 
> 
> How are these different from ALTER TABLE SET STORAGE?  They need to be
> more specific.




Re: TOASTing smaller things

From
Bruce Momjian
Date:
Luke Lonergan wrote:
> Hi Bruce,
> 
> How about these:
> 
> - Allow specification of TOAST size threshold in bytes on a per column basis
> - Enable storage of columns in separate TOAST tables
> - Enable use of multi-row compression method(s) for TOAST tables

At this point I would be happy just to set the TOAST threshold to a
value defined as optimal, rather than as the most minimal use of TOAST
possible.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOASTing smaller things

From
"Luke Lonergan"
Date:
Hi Bruce,

On 4/12/07 9:24 AM, "Bruce Momjian" <bruce@momjian.us> wrote:

> Luke Lonergan wrote:
>> Hi Bruce,
>> 
>> How about these:
>> 
>> - Allow specification of TOAST size threshold in bytes on a per column basis
>> - Enable storage of columns in separate TOAST tables
>> - Enable use of multi-row compression method(s) for TOAST tables
> 
> At this point I would be happy just to set the TOAST threshold to a
> value defined as optimal, rather than as the most minimal use of TOAST
> possible.

I agree that's a good starting point, I guess I was thinking that was
already included in the work that Tom has been doing.  If not, we can add a
TODO like this as a precursor to the ones above:

- Allow specification of TOAST size threshold (in bytes) on a per table
basis

- Luke




Re: TOASTing smaller things

From
Tom Lane
Date:
"Luke Lonergan" <llonergan@greenplum.com> writes:
> On 4/12/07 9:24 AM, "Bruce Momjian" <bruce@momjian.us> wrote:
>> At this point I would be happy just to set the TOAST threshold to a
>> value defined as optimal, rather than as the most minimal use of TOAST
>> possible.

> I agree that's a good starting point, I guess I was thinking that was
> already included in the work that Tom has been doing.

No.  I put in the code needed to decouple toast tuple size from toasting
threshold, but I don't have the time or interest to run performance
tests to see whether there are better default values than the historical
quarter-page values.  Someone should do that before 8.3 beta ...

> If not, we can add a
> TODO like this as a precursor to the ones above:

> - Allow specification of TOAST size threshold (in bytes) on a per table
> basis

I would suggest that *all* of those TODOs are premature in the absence
of experimental evidence about the effect of varying the parameters.
If we end up finding out that the existing settings are about right
anyway across a range of test cases, who needs more knobs?  We've got
too many mostly-useless knobs already.
        regards, tom lane


Re: TOASTing smaller things

From
"Luke Lonergan"
Date:
Hi Tom,

On 4/12/07 1:40 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> I would suggest that *all* of those TODOs are premature in the absence
> of experimental evidence about the effect of varying the parameters.
> If we end up finding out that the existing settings are about right
> anyway across a range of test cases, who needs more knobs?  We've got
> too many mostly-useless knobs already.

This set of TODOs is really about a binary change in behavior that data
warehouse users will employ to shift given columns into a separate storage
mechanism while preserving their schema.  By contrast, the knob you describe
is about tuning the existing storage mechanism by offlining values that are
too large.

We can talk about these TODOs as a group using the name "vertical
partitioning" if that suits.

To demonstrate the effectiveness of vertical partitioning, we would write
queries that use the partitioned columns independently or as groups that
correlate with the storage mechanism.

The other demonstration applies to the use of compression techniques that
align with the column type(s) and operate across tuple boundaries within
pages.  Examples there include the segmentation of fixed width types and
variable width types into separate page storage classes, which allows for
the application of different compression and/or representations on pages.

- Luke 




Re: TOASTing smaller things

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> I would suggest that *all* of those TODOs are premature in the absence
> of experimental evidence about the effect of varying the parameters.
> If we end up finding out that the existing settings are about right
> anyway across a range of test cases, who needs more knobs?  We've got
> too many mostly-useless knobs already.

Isn't it obvious that the "right" value is going to depend extraordinarily
heavily on the precise usage pattern though? 

A typical table with 100-byte columns which are normally read with the rest of
the columns, sometimes in sequential scans or updates, will find TOASTing them
counter-productive as it t urns those all into additional random access i/o
and extra inserts and deletes.

Another system with a dozen 100-byte columns that are virtually never accessed
and a handful of heavily-updated integer columns will benefit heavily from
TOAST as it changes the table from an awfully performing 5-6 tuple/page table
into a tremendously performing 100+ tuple/page table.

We're not ever going to be able to predict when data is being stored what
future usage pattern to expect, at least not realistically. The best we can
hope for is to find the tipping point at which the cost if we guess wrong is
some tolerable level of pain and set that to be the default and giving the
tools to the user to obtain the benefit in the cases where he knows it'll
help.

Or perhaps TOAST is the wrong kind of vertical partitioning for this. Perhaps
we should be using TOAST to deal with the "large datum" problem and have a
different looking tool entirely for the "vertical partitioning rarely used
columns" problem.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOASTing smaller things

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> I would suggest that *all* of those TODOs are premature in the absence
>> of experimental evidence about the effect of varying the parameters.

> Isn't it obvious that the "right" value is going to depend extraordinarily
> heavily on the precise usage pattern though? 

It's not yet "obvious" that there's any win to be had at all.  AFAIK no
one has done any playing around with alternative TOAST settings.  It
could be that the mechanism is simply not very sensitive to those values.

> Or perhaps TOAST is the wrong kind of vertical partitioning for this.

Exactly my point --- quoting anecdotes about wins from someone else's
vertical partitioning implementation doesn't really prove a darn thing
about how TOAST will behave.  It's suggestive, but I'd like to see some
actual experimental evidence before we start constructing a lot of
infrastructure.  "Infrastructure first, learn how to use it later" is
the philosophy that's given us nigh-useless stuff like commit_delay.
        regards, tom lane


PG-MQ?

From
Chris Browne
Date:
I'm seeing some applications where it appears that there would be
value in introducing asynchronous messaging, ala "message queueing."
<http://en.wikipedia.org/wiki/Message_queue>

The "granddaddy" of message queuing systems is IBM's MQ-Series, and I
don't see particular value in replicating its functionality.

On the other side, the "big names" these days are:

a) The Java Messaging Service, which seems to implement *way* more  options than I'm even vaguely interested in having
(notably,lots  that involve data stores or lack thereof that I do not care to use);
 

b) AMPQ  <http://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol> which  again is a big, complex,
protocol-orientedthing that seems mighty big  and complex;
 

c) There are lesser names, like isectd <http://isectd.sf.net> and the
(infamous?) Spread Toolkit which both implement memory-based messaging
systems.

There's a simple + free system called MQS that is implemented in Perl
and uses XML-RPC; <http://www.foo.be/mqs/> that seems perhaps a little
*too* primitive.

FYI, here's its simple schema (filtered into PG):

create table mqname ( name varchar (255) not null default '', id serial, priority integer not null default 10, primary
key(id));
 

create table message ( id serial, timestamp timestamptz NOT NULL cid varchar(255), queue integer not null references
mqname(id),body text not null, priority integer not null default 10, flag integer default 0, primary key(id)
 
);

My bias would be to have something that can basically run as a thin
set of stored procedures atop PostgreSQL :-).  It would be trivial to
extend that to support SOAP/XML-RPC, if desired.

It would be nice to achieve 'higher availability' by having queues
where you might replicate the contents (probably using the MQ system
itself ;-)) to other servers.

There tend to be varying semantics out there:

- Some queues may represent "subscriptions" where a whole bunch of listeners want to get all the messages;

- Sometimes you have the semantics where: - messages need to be delivered at least once - messages need to be delivered
nomore than once - messages need to be delivered exactly once
 

Is there any existing work out there on this?  Or should I maybe be
looking at prototyping something?
-- 
(format nil "~S@~S" "cbbrowne" "linuxfinances.info")
http://linuxfinances.info/info/lsf.html
Q: How many Newtons does it take to change a light bulb?
A: Faux!  There to eat lemons, axe gravy soup!


Re: PG-MQ?

From
Steve Atkins
Date:
On Jun 19, 2007, at 2:45 PM, Chris Browne wrote:

> I'm seeing some applications where it appears that there would be
> value in introducing asynchronous messaging, ala "message queueing."
> <http://en.wikipedia.org/wiki/Message_queue>

Me too.

> My bias would be to have something that can basically run as a thin
> set of stored procedures atop PostgreSQL :-).  It would be trivial to
> extend that to support SOAP/XML-RPC, if desired.
>
> It would be nice to achieve 'higher availability' by having queues
> where you might replicate the contents (probably using the MQ system
> itself ;-)) to other servers.
>
> There tend to be varying semantics out there:
>
> - Some queues may represent "subscriptions" where a whole bunch of
>   listeners want to get all the messages;
>
> - Sometimes you have the semantics where:
>   - messages need to be delivered at least once
>   - messages need to be delivered no more than once
>   - messages need to be delivered exactly once
>
> Is there any existing work out there on this?  Or should I maybe be
> looking at prototyping something?

The skype tools have some sort of decent-looking publish/subscribe
thing, PgQ, then they layer their replication on top of. It's multi
consumer and producer, with "delivered at least once" semantics.

Looks nice.

Cheers,  Steve



Re: PG-MQ?

From
Chris Browne
Date:
steve@blighty.com (Steve Atkins) writes:
>> Is there any existing work out there on this?  Or should I maybe be
>> looking at prototyping something?
>
> The skype tools have some sort of decent-looking publish/subscribe
> thing, PgQ, then they layer their replication on top of. It's multi
> consumer and producer, with "delivered at least once" semantics.
>
> Looks nice.

I had not really noticed that - I need to take a look at their
connection pooler too, so I guess that puts more "skype" items on my
ToDo list ;-).  Thanks for pointing it out...
-- 
let name="cbbrowne" and tld="linuxdatabases.info" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/advocacy.html
Signs of a Klingon Programmer #1: "Our users will  know fear and cower
before our software. Ship it! Ship it and let  them flee like the dogs
they are!"


Re: PG-MQ?

From
"Jeroen T. Vermeulen"
Date:
On Wed, June 20, 2007 04:45, Chris Browne wrote:
> I'm seeing some applications where it appears that there would be
> value in introducing asynchronous messaging, ala "message queueing."
> <http://en.wikipedia.org/wiki/Message_queue>
>
> The "granddaddy" of message queuing systems is IBM's MQ-Series, and I
> don't see particular value in replicating its functionality.

I'm quite interested in this.  Maybe I'm thinking of something too
complex, but I do think there are some "oh it'll need to do that too"
pitfalls that are best considered up front.

The big thing about MQ is that it participates as a resource manager in
two-phase commits (and optionally a transaction manager as well).  That
means that you get atomic processing steps: application takes message off
a queue, processes it, commits its changes to the database, replies to
message.  The queue manager then does a second-phase commit for all of
those steps, and that's when the reply really goes out.  If the
application fails, none of this will have happened so you get ACID over
the complete cycle.  That's something we should have free software for.

Perhaps the time is right for something new.  A lot of the complexity
inside MQ comes from data representation issues like encodings and
fixed-length strings, as I recall, and things have changed since MQ was
designed.  I agree it could be useful (and probably not hard either) to
have a transactional messaging system inside the database.  It saves you
from having to do two-phase commits.

But it does tie everything to postgres to some extent, and you lose the
interesting features—atomicity and assured, single delivery—as soon as
anything in the chain does anything persistent that does not participate
in the postgres transaction.  Perhaps what we really need is more mature
components, with a unified control layer on top.  That's how a lot of
successful free software grows.  See below.


> On the other side, the "big names" these days are:
>
> a) The Java Messaging Service, which seems to implement *way* more
>    options than I'm even vaguely interested in having (notably, lots
>    that involve data stores or lack thereof that I do not care to use);

Far as I know, JMS is an API, not a product.  You'd still slot some
messaging middleware underneath, such as MQ.  That is why MQSeries was
renamed: it fits into the WebSphere suite as the implementing engine
underneath the JMS API.  From what I understand MQ is one of the
"best-of-breed" products that JMS was designed around.  (Sun's term, bit
hypey for my taste).

In one way, Java is easy: the last thing you want to get into is yet
another marshaling standard.  There are plenty of "standards" to choose
from already, each married to one particular communications mechanism:
RPC, EDI, CORBA, D-Bus, XMLRPC, what have you.  Even postgres has its own.I'd say the most successful mechanism is TCP
itself,because it isolates
 
itself from content representation so effectively.

It's hard not to get into marshaling: someone has to do it, and it's often
a drag to do it in the application, but the way things stand now *any*
choice limits the usefulness of what you're building.  That's something
I'd like to see change.

Personally I'd love to see marshaling or low-level data representation
isolated into a mature component that speaks multiple programming
languages on the one hand and multiple data representation formats on the
other.  Something the implementers of some of these messaging standards
would want to use to compose their messages, isolating their format
definitions into plugins.  Something that would make application writers
stop composing messages in finicky ad-hoc code that fails with unexpected
locales or has trouble with different line breaks.

If we had a component like that, combining it with existing transactional
variants of TCP and [S]HTTP might even be enough to build a usable
messaging system.  I haven't looked at them enough to know.  Of course
we'd need implementations of those protocols; see
http://ttcplinux.sourceforge.net/ and http://www.csn.ul.ie/~heathclf/fyp/
for example.

Another box of important tools, and I have no idea where we stand with
this one, is transaction management.  We have 2-phase commit in postgres
now.  But do we have interoperability with existing transaction managers? 
Is there a decent free, portable, everything-agnostic transaction manager?With those, the sphere of reliability of a
database-drivenmessaging
 
package could extend much further.

A free XA-capable filesystem would be great too, but I guess I'm daydreaming.


> There tend to be varying semantics out there:
>
> - Some queues may represent "subscriptions" where a whole bunch of
>   listeners want to get all the messages;

The two simplest models that offer something more than TCP/UDP are 1:n
reliable publish-subscribe without persistence, and 1:1 request-reply with
persistent storage.  D-Bus does them both; IIRC MQ does 1:1 and has
add-ons on top for publish-subscribe.

I could imagine variations such as persistent publish-subscribe, where you
can come back once in a while and see if your subscriptions caught
anything since your last visit.  But such things probably get more complex
and less useful as you add more ideas.

On top of that goes communication model: symmetric or asymmetric,
synchronous or asynchronous.  Do you end up with a "remote procedure call"
model like RPC, D-Bus, CORBA?  Or do you stick with a pure message/event
view of communication?  Personally I think it's good not to intrude into
the application's event loop too much, but others seem to feel the central
event loop should not be part of application code.


> - Sometimes you have the semantics where:
>   - messages need to be delivered at least once
>   - messages need to be delivered no more than once
>   - messages need to be delivered exactly once

IMHO, if you're not doing "exactly once," or something very close to it,
you might as well stay with ad-hoc code.  You can ensure single delivery
by having the sender re-send when in doubt, and keeping track of
duplications in the recipient.


> Is there any existing work out there on this?  Or should I maybe be
> looking at prototyping something?

I've looked around a bit (not much) and not found anything very generally
useful.  I think it's an exciting area that probably needs work, so
prototyping might be a good idea.  If nothing else, I hope I've given you
some examples of what you don't want to get yourself into.  :-)


Jeroen




Re: PG-MQ?

From
Markus Schiltknecht
Date:
Hi Chris,

Chris Browne wrote:
> I'm seeing some applications where it appears that there would be
> value in introducing asynchronous messaging, ala "message queueing."
> <http://en.wikipedia.org/wiki/Message_queue>

ISTM that 'message queue' is a way too general term. There are hundreds 
of different queues at different levels on a standard server. So I'm 
somewhat unsure about what problem you want to solve.

> c) There are lesser names, like isectd <http://isectd.sf.net> and the
> (infamous?) Spread Toolkit which both implement memory-based messaging
> systems.

If a GCS is about what you're looking for, then you also might want to 
consider these: ensemble, appia or jGroups. There's a Java layer called 
jGCS, which supports even more, similar systems.

Another commonly used term is 'reliable multicast', which guarantees 
that messages are delivered to a group of recipients. These algorithms 
often are the basis for a GCS.

> My bias would be to have something that can basically run as a thin
> set of stored procedures atop PostgreSQL :-).  It would be trivial to
> extend that to support SOAP/XML-RPC, if desired.

Hm.. in Postgres-R I currently have (partial) support for ensemble and 
spread. Exporting that interface via stored procedures could be done, 
but you would probably need a manager process, as you certainly want 
your connections to persist across transactions (or not?).

Together with that process, we already have half of what Postgres-R is: 
an additional process which connects to the GCS. Thus I'm questioning, 
if there's value for exporting the interface. Can you think of other use 
cases than database replication? Why do you want to do that via the 
database, then, and not directly with the GCS?

> It would be nice to achieve 'higher availability' by having queues
> where you might replicate the contents (probably using the MQ system
> itself ;-)) to other servers.

Uhm.. sorry, but I fail to see the big news here. Which replication 
solution does *not* work that way?

Regards

Markus



Re: PG-MQ?

From
"Marko Kreen"
Date:
On 6/20/07, Jeroen T. Vermeulen <jtv@xs4all.nl> wrote:
> On Wed, June 20, 2007 04:45, Chris Browne wrote:
> > - Sometimes you have the semantics where:
> >   - messages need to be delivered at least once
> >   - messages need to be delivered no more than once
> >   - messages need to be delivered exactly once
>
> IMHO, if you're not doing "exactly once," or something very close to it,
> you might as well stay with ad-hoc code.  You can ensure single delivery
> by having the sender re-send when in doubt, and keeping track of
> duplications in the recipient.

In case of PGQ, the "at least once" semantics is related to batch-based
processing it does - in case of failure, full batch is delivered again,
so if consumer had managed to process some of the items already, it gets
them double.

As it is responsponsible only for delivering events from database,
it has no way of guaranteeing "exactly once" behaviour, that needs
to be built on top of PGQ.

Simplest case would be if the events are processed in same database
that the queue resides.  Then you can just fetch, process, close batch
in one transaction and immidiately you get "exactly once" behaviour.

To achieve "exactly once" behaviour with different databases, look
at the "pgq_ext" module for sample.  Basically it just requires
storing batch_id/event_id on remote db and committing there first.
Later it can be checked if the batch/event is already processed.

It's tricky only if you want to achieve full transactionality for
event processing.  As I understand, JMS does not have a concept
of transactions, probably also other solutions mentioned before,
so to use PgQ as backend for them should be much simpler...

To Chris: you should like PgQ, its just stored procs in database,
plus it's basically just generalized Slony-I, with some optimizations,
so should be familiar territory ;)

-- 
marko


Re: PG-MQ?

From
Heikki Linnakangas
Date:
Marko Kreen wrote:
> As I understand, JMS does not have a concept
> of transactions, probably also other solutions mentioned before,
> so to use PgQ as backend for them should be much simpler...

JMS certainly does have the concept of transactions. Both distributed 
ones through XA and two-phase commit, and local involving just one JMS 
provider. I don't know about others, but would be surprised if they didn't.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: PG-MQ?

From
"Marko Kreen"
Date:
On 6/20/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
> Marko Kreen wrote:
> > As I understand, JMS does not have a concept
> > of transactions, probably also other solutions mentioned before,
> > so to use PgQ as backend for them should be much simpler...
>
> JMS certainly does have the concept of transactions. Both distributed
> ones through XA and two-phase commit, and local involving just one JMS
> provider. I don't know about others, but would be surprised if they didn't.

Ah, sorry, my mistake then.  Shouldn't trust hearsay :)

-- 
marko


Re: PG-MQ?

From
"Jeroen T. Vermeulen"
Date:
On Wed, June 20, 2007 18:18, Heikki Linnakangas wrote:
> Marko Kreen wrote:
>> As I understand, JMS does not have a concept
>> of transactions, probably also other solutions mentioned before,
>> so to use PgQ as backend for them should be much simpler...
>
> JMS certainly does have the concept of transactions. Both distributed
> ones through XA and two-phase commit, and local involving just one JMS
> provider. I don't know about others, but would be surprised if they
> didn't.

Wait...  I thought XA did two-phase commit, and then there was XA+ for
*distributed* two-phase commit, which is much harder?


Jeroen




Re: PG-MQ?

From
Heikki Linnakangas
Date:
Jeroen T. Vermeulen wrote:
> On Wed, June 20, 2007 18:18, Heikki Linnakangas wrote:
>> Marko Kreen wrote:
>>> As I understand, JMS does not have a concept
>>> of transactions, probably also other solutions mentioned before,
>>> so to use PgQ as backend for them should be much simpler...
>> JMS certainly does have the concept of transactions. Both distributed
>> ones through XA and two-phase commit, and local involving just one JMS
>> provider. I don't know about others, but would be surprised if they
>> didn't.
> 
> Wait...  I thought XA did two-phase commit, and then there was XA+ for
> *distributed* two-phase commit, which is much harder?

Well, I meant distributed as in one transaction manager, multiple 
resource managers, all participating in a single atomic transaction. I 
don't know what XA+ adds on top of that.

To be precise, being a Java-thing, JMS actually supports two-phase 
commit through JTA (Java Transaction API), not XA. It's the same design 
and interface, just defined as Java interfaces instead of at native 
library level.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: PG-MQ?

From
Chris Browne
Date:
markokr@gmail.com ("Marko Kreen") writes:
> To Chris: you should like PgQ, its just stored procs in database,
> plus it's basically just generalized Slony-I, with some optimizations,
> so should be familiar territory ;)

Looks interesting...

Random ideas
============
- insert_event in C (way to get rid of plpython)

Yeah, I'm with that...  Ever tried building [foo] on AIX, where foo in
('perl', 'python', ...)???  :-(

It seems rather excessive to add in a whole stored procedure language
simply for one function...
-- 
(format nil "~S@~S" "cbbrowne" "linuxdatabases.info")
http://www3.sympatico.ca/cbbrowne/sgml.html
I always try to do things in chronological order. 


Re: PG-MQ?

From
"Marko Kreen"
Date:
On 6/20/07, Chris Browne <cbbrowne@acm.org> wrote:
> markokr@gmail.com ("Marko Kreen") writes:
> > To Chris: you should like PgQ, its just stored procs in database,
> > plus it's basically just generalized Slony-I, with some optimizations,
> > so should be familiar territory ;)
>
> Looks interesting...

Thanks :)

> Random ideas
> ============
> - insert_event in C (way to get rid of plpython)
>
> Yeah, I'm with that...  Ever tried building [foo] on AIX, where foo in
> ('perl', 'python', ...)???  :-(
>
> It seems rather excessive to add in a whole stored procedure language
> simply for one function...

Well, it's standard in our installations as we use it for
other stuff too.  It's much easier to prototype in PL/Python
than in C...

As it has not been performance problem I have not bothered
to rewrite it.  But now the interface has been stable some
time, it could be done.

-- 
marko


Index Tuple Compression Approach?

From
Chris Browne
Date:
I recently had a chat with someone who was pretty intimate with Adabas
for a number of years who's in the process of figuring things out
about PostgreSQL.  We poked at bits of the respective implementations,
seeing some similarities and differences.  He pointed out one aspect
of index handling that could (in principle) be an interesting
optimization.

Evidently, in Adabas, index leaf nodes were not simply tuples, but
lists where the index value would not be repeated.

In PostgreSQL, if you have the index value 'abc', and there are 10
tuples with that value, then you'll have a page full of tuples of the
following form:

|abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth...

Now, the Adabas approach was rather different.  It would only have the
index value once, and then have the list of tuple pointers:

|abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]|

This could allow a fair bit of compression, for cases where the index
value is not unique.

There is a concommitant downside, that concurrent updates may fight
over a page, and, since there would be a higher density, there would
be more need to fight over pages.

Does this seem pretty much like madness?  Or is it a plausible "some
day ToDo"?
-- 
"cbbrowne","@","acm.org"
http://linuxfinances.info/info/postgresql.html
"I don't do drugs anymore 'cause I  find I get the same effect just by
standing up really fast." -- Jonathan Katz


Re: Index Tuple Compression Approach?

From
Decibel!
Date:
Isn't this what Grouped Index Tuples is?

On Tue, Aug 14, 2007 at 05:21:16PM -0400, Chris Browne wrote:
> I recently had a chat with someone who was pretty intimate with Adabas
> for a number of years who's in the process of figuring things out
> about PostgreSQL.  We poked at bits of the respective implementations,
> seeing some similarities and differences.  He pointed out one aspect
> of index handling that could (in principle) be an interesting
> optimization.
>
> Evidently, in Adabas, index leaf nodes were not simply tuples, but
> lists where the index value would not be repeated.
>
> In PostgreSQL, if you have the index value 'abc', and there are 10
> tuples with that value, then you'll have a page full of tuples of the
> following form:
>
> |abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth...
>
> Now, the Adabas approach was rather different.  It would only have the
> index value once, and then have the list of tuple pointers:
>
> |abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]|
>
> This could allow a fair bit of compression, for cases where the index
> value is not unique.
>
> There is a concommitant downside, that concurrent updates may fight
> over a page, and, since there would be a higher density, there would
> be more need to fight over pages.
>
> Does this seem pretty much like madness?  Or is it a plausible "some
> day ToDo"?
> --
> "cbbrowne","@","acm.org"
> http://linuxfinances.info/info/postgresql.html
> "I don't do drugs anymore 'cause I  find I get the same effect just by
> standing up really fast." -- Jonathan Katz
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Index Tuple Compression Approach?

From
Jeff Davis
Date:
On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote:
> Isn't this what Grouped Index Tuples is?
> 

http://community.enterprisedb.com/git/git-readme.txt

It looks like GIT is a little different. 

GIT actually stores a lower-bound key of a contiguous* range of keys
that all point to the same page, and for each of those ranges stores a
bitmap of page offsets. A search searches first for an exact match in
the index, and failing that, looks to see if the previous index tuple
happens to be one of these ranges.

The algorithm Chris is talking about stores a set of tuple ids (which
include page and offset) for each distinct key.

Both could be helpful, although I don't think they can work together
very well.

GIT has the disadvantage that it's "lossy". It doesn't even store every
key in the index, so it can't be sure that the match actually is a
match. Thus, it returns candidate matches. That also has implications
for enforcing UNIQUE (although it's not impossible, according to the
readme). However, GIT can be used effectively on an index that happens
to be unique. GIT also assumes a tree structure, and makes no sense for
a hash index, and makes no sense for a types without ordering. GIT's
space savings is dependent on the clustering of the table.

Chris's suggestion would work on a UNIQUE index, but would be no help at
all, because there would be no duplicate keys to collapse. However, it
could be used for non-tree indexes. The space savings for this strategy
is dependent on how repetitive the keys are.

I guess the ultimate deciding factor is which can save you more space.
If you have lots of duplicates, Chris's suggestion might work better,
because you don't have to try to maintain cluster order. If you have a
wider distribution of data, GIT is probably better, although you have to
keep some degree of clustering (HOT may help with that).

Heikki is the authority on GIT, so I'm including him in the CC so he can
correct me :)

Regards,Jeff Davis

*: I'm not 100% sure I'm using "contiguous" correctly, but the range of
keys can contain gaps or duplicates, so long as every key in the range
points to that same page. That is, if keys 1,1,2,3,5 all point to page
P, they can be grouped into just "1" so long as there doesn't exist a
key 4 that points to a page other than P.



Re: Index Tuple Compression Approach?

From
Heikki Linnakangas
Date:
Jeff Davis wrote:
> On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote:
>> Isn't this what Grouped Index Tuples is?
> 
> http://community.enterprisedb.com/git/git-readme.txt
> 
> It looks like GIT is a little different. 
> 
> GIT actually stores a lower-bound key of a contiguous* range of keys
> that all point to the same page, and for each of those ranges stores a
> bitmap of page offsets. A search searches first for an exact match in
> the index, and failing that, looks to see if the previous index tuple
> happens to be one of these ranges.
> 
> The algorithm Chris is talking about stores a set of tuple ids (which
> include page and offset) for each distinct key.

Right.

> Both could be helpful, although I don't think they can work together
> very well.

What Chris is suggesting is basically a special case of GIT, where all
the heap tuples represented by an index tuple have the same key. I was
actually thinking of adding a flag to index tuples to indicate that
special case in GIT. We could effectively do both.

> GIT has the disadvantage that it's "lossy". It doesn't even store every
> key in the index, so it can't be sure that the match actually is a
> match. Thus, it returns candidate matches. That also has implications
> for enforcing UNIQUE (although it's not impossible, according to the
> readme). However, GIT can be used effectively on an index that happens
> to be unique. GIT also assumes a tree structure, and makes no sense for
> a hash index, and makes no sense for a types without ordering. GIT's
> space savings is dependent on the clustering of the table.
> 
> Chris's suggestion would work on a UNIQUE index, but would be no help at
> all, because there would be no duplicate keys to collapse. However, it
> could be used for non-tree indexes. The space savings for this strategy
> is dependent on how repetitive the keys are.

Right. I wasn't concerned about the case of a lot of duplicate keys,
because the bitmap index is more efficient for that. And also because
HOT should reduce the number of index tuples with duplicate keys
pointing to different versions of the same row.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Index Tuple Compression Approach?

From
"Simon Riggs"
Date:
On Wed, 2007-08-15 at 06:51 +0100, Heikki Linnakangas wrote:
> Jeff Davis wrote:
> > On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote:
> >> Isn't this what Grouped Index Tuples is?
> > 
> > http://community.enterprisedb.com/git/git-readme.txt
> > 
> > It looks like GIT is a little different. 
> > 
> > GIT actually stores a lower-bound key of a contiguous* range of keys
> > that all point to the same page, and for each of those ranges stores a
> > bitmap of page offsets. A search searches first for an exact match in
> > the index, and failing that, looks to see if the previous index tuple
> > happens to be one of these ranges.
> > 
> > The algorithm Chris is talking about stores a set of tuple ids (which
> > include page and offset) for each distinct key.
> 
> Right.
> 
> > Both could be helpful, although I don't think they can work together
> > very well.
> 
> What Chris is suggesting is basically a special case of GIT, where all
> the heap tuples represented by an index tuple have the same key. I was
> actually thinking of adding a flag to index tuples to indicate that
> special case in GIT. We could effectively do both.

A few additional thoughts...

The approach suggested by Chris is also used by Teradata Non-Unique
Secondary Indexes, known as NUSIs (but not named by me!). The following
link is a public domain description that is detailed enough:
http://teradata.uark.edu/research/wang/indexes.html

Replicating this approach directly isn't that useful because it would
interact badly with the way we handle updates. Thinking about the basic
design pattern however, we can envisage a type of index that changes the
1:1 mapping between index and heap tuple into the concept of a "tuple
set index" where we have a 1:Many mapping between index and heap. 

In general, the tuple set index approach can significantly reduce index
size. This won't be of interest to anyone unless all of your data
overflows RAM and you need to do I/O to constantly page back in pieces
of your data. If your database does fit in RAM, reducing the number of
index blocks might just increase contention. This means that the tuple
set approach is only useful when you have very large databases, but when
you do it is very, very useful.

GIT is a tuple set index with two important extensions:

1. GIT allows a range of values to be indexed, not just a single value,
so this allows it to be useful for both Unique and Non-Unique cases.

2. GIT restricts the set of tuples to a small range of blocks within the
table. Making the range of blocks = 1 means that GIT is designed to work
well with HOT, which also stays on the same block. Keeping the range of
blocks small means GIT degrades as slowly as possible in the face of
"cold" UPDATEs. If the values are inserted in roughly ordered/clustered
sequence then this doesn't increase index size at all, so the most
common/highest volume use cases are covered.

So from my perspective, GIT is very close to the optimal design for a
tuple set index that addresses the need for high concurrency and
unique/near-uniqueness with PostgreSQL. There are certainly many other
options for a tuple set index, and bitmap indexes are simply another
version of a tuple set index but with different behaviours tuned to a
different use case. There maybe other use cases that require more than
two kinds of tuple set index...and we have discussed implementing the
tuple set behaviour as part of the other main index types.

As an aside, it turns out, after further research that GIT is similar to
a clustered index in SQLServer, as described by Louis Davidson, "Pro SQL
Server 2005 Database Design and Optimization", p.405. SQLServer creates
a clustered index by default on each PK, so it says.

--  Simon Riggs EnterpriseDB  http://www.enterprisedb.com



Index Tuple Compression Approach?

From
"Dawid Kuroczko"
Date:
On 8/14/07, Chris Browne <cbbrowne@acm.org> wrote:
> I recently had a chat with someone who was pretty intimate with Adabas
> for a number of years who's in the process of figuring things out
> about PostgreSQL.  We poked at bits of the respective implementations,
> seeing some similarities and differences.  He pointed out one aspect
> of index handling that could (in principle) be an interesting
> optimization.
>
> Evidently, in Adabas, index leaf nodes were not simply tuples, but
> lists where the index value would not be repeated.
>
> In PostgreSQL, if you have the index value 'abc', and there are 10
> tuples with that value, then you'll have a page full of tuples of the
> following form:
>
> |abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth...
>
> Now, the Adabas approach was rather different.  It would only have the
> index value once, and then have the list of tuple pointers:
>
> |abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]|
>
> This could allow a fair bit of compression, for cases where the index
> value is not unique.

Interesting.  Some time ago I've played a little with quite a big table
which constained path (file path) as a primary key.  It did have sense
to have a strucure (SELECTs were mostly ORDER BY path WHERE path >
'/foo' LIMIT n).
The actual index was only a little bit smaller than the table (there were
maybe 4 or 5 columns there).

Some time ago I've had an idea that it might be possible to compress
th index size, even if it is a unique index.  Take the path example.
My idea would be to to split indexed value to 8-byte chunks.
For example: /var/lib/postgresql/8.2/main would be split into: "/var/lib" "/postgre" "sql/8.2" -- these would be
inserteredinto a tree as a "scaffold",
 
and only vacuum should remove them.. "main" -- this would be a leaf node.  It could be repeated in non-unique
indexes.

[/etc/pas] -- scaffold-node|-"swd"    -- leaf node
[/etc/sha]|-"dow"
[/var/lib]   -- a problematic mixed scaffold/leaf node.[/postgre] |-"sql" |-"sql/8.2" [sql/8.2/]  |-"main"  |-"foobar"

The scaffold nodes would be there to guarantee that there is some
place to attach leafs to.  They should not be removed by DELETE
(unless we are sure no other node depends on them).

Advantages?  The repeated values (as "/var/lib/postgresql/8.2")
are not repeated -- they are embedded into tree, as a "scaffold",
actual nodes that are significant (files, not directories, in my
example) are put as actual leafs.

I guess it would reduce large indexes size and at the same time
it could remove limitation that B-tree index cannot index values
larger than 1/3 of the database page.  8-byte chunks was given
as an example here, perhaps larger value would be better.

(Of course redesigning schema to put directories separate from
files woul be useful, but it would not help with ORDER BY .. LIMIT
queries -- they would have to be JOIN-ed and re-sorted in memory
I'm afraid).
 Regards,   Dawid


Re: Index Tuple Compression Approach?

From
Jeff Davis
Date:
On Wed, 2007-08-15 at 06:51 +0100, Heikki Linnakangas wrote:
> What Chris is suggesting is basically a special case of GIT, where all
> the heap tuples represented by an index tuple have the same key. I was
> actually thinking of adding a flag to index tuples to indicate that
> special case in GIT. We could effectively do both.
> 

The bigger difference that I see is that GIT doesn't just group together
ranges of keys, it also groups by heap page number (or a small range of
page numbers, as Simon pointed out). 

Regards,Jeff Davis





Re: Index Tuple Compression Approach?

From
Heikki Linnakangas
Date:
Dawid Kuroczko wrote:
> Some time ago I've had an idea that it might be possible to compress
> th index size, even if it is a unique index.  Take the path example.
> My idea would be to to split indexed value to 8-byte chunks.
> For example: /var/lib/postgresql/8.2/main would be split into:
>   "/var/lib"
>   "/postgre"
>   "sql/8.2" -- these would be insertered into a tree as a "scaffold",
> and only vacuum should remove them..
>   "main" -- this would be a leaf node.  It could be repeated in non-unique
> indexes.

That general approach of storing a common part leading part just once is
called prefix compression. Yeah, it helps a lot on long text fields.
Tree structures like file paths in particular.

It's been discussed before. One big problem is extracting the common
leading part. You could only do it for text, but it should be done in a
datatype neutral way.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Index Tuple Compression Approach?

From
Chris Browne
Date:
qnex42@gmail.com ("Dawid Kuroczko") writes:
> On 8/14/07, Chris Browne <cbbrowne@acm.org> wrote:
>> I recently had a chat with someone who was pretty intimate with Adabas
>> for a number of years who's in the process of figuring things out
>> about PostgreSQL.  We poked at bits of the respective implementations,
>> seeing some similarities and differences.  He pointed out one aspect
>> of index handling that could (in principle) be an interesting
>> optimization.
>>
>> Evidently, in Adabas, index leaf nodes were not simply tuples, but
>> lists where the index value would not be repeated.
>>
>> In PostgreSQL, if you have the index value 'abc', and there are 10
>> tuples with that value, then you'll have a page full of tuples of the
>> following form:
>>
>> |abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth...
>>
>> Now, the Adabas approach was rather different.  It would only have the
>> index value once, and then have the list of tuple pointers:
>>
>> |abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]|
>>
>> This could allow a fair bit of compression, for cases where the index
>> value is not unique.
>
> Interesting.  Some time ago I've played a little with quite a big table
> which constained path (file path) as a primary key.  It did have sense
> to have a strucure (SELECTs were mostly ORDER BY path WHERE path >
> '/foo' LIMIT n).
> The actual index was only a little bit smaller than the table (there were
> maybe 4 or 5 columns there).
>
> Some time ago I've had an idea that it might be possible to compress
> th index size, even if it is a unique index.  Take the path example.
> My idea would be to to split indexed value to 8-byte chunks.
> For example: /var/lib/postgresql/8.2/main would be split into:
>   "/var/lib"
>   "/postgre"
>   "sql/8.2" -- these would be insertered into a tree as a "scaffold",
> and only vacuum should remove them..
>   "main" -- this would be a leaf node.  It could be repeated in non-unique
> indexes.
>
> [/etc/pas] -- scaffold-node
>  |-"swd"    -- leaf node
> [/etc/sha]
>  |-"dow"
> [/var/lib]   -- a problematic mixed scaffold/leaf node.
>  [/postgre]
>   |-"sql"
>   |-"sql/8.2"
>   [sql/8.2/]
>    |-"main"
>    |-"foobar"
>
> The scaffold nodes would be there to guarantee that there is some
> place to attach leafs to.  They should not be removed by DELETE
> (unless we are sure no other node depends on them).

Note that there is a well-understood name for this; this is assortedly
known as a "Radix tree" or a "Patricia trie".
 <http://en.wikipedia.org/wiki/Radix_tree>

As you observe, the tree/trie edges consist not of individual
characters, but rather of sequences of characters.

> Advantages?  The repeated values (as "/var/lib/postgresql/8.2")
> are not repeated -- they are embedded into tree, as a "scaffold",
> actual nodes that are significant (files, not directories, in my
> example) are put as actual leafs.

Certainly as compared to a traditional trie, this representation leads
to there being a whole lot less nodes and a whole lot less branching.

The Radix/Patricia tree compresses things two ways:
- As you observe, there can be fewer, larger componets
- By combining common prefixes together, this makes cases of  strings that are highly patterned much, much, much
cheaper,as the  tree branches at (and only at) the places where they tend to  differ.
 

It could conceivably make it workable to have indexes on big, highly
repeating things such as blobs of XML.  (Although it *doesn't* get you
the ability to search on substrings, which is probably what you'd also
want...)

> I guess it would reduce large indexes size and at the same time it
> could remove limitation that B-tree index cannot index values larger
> than 1/3 of the database page.  8-byte chunks was given as an
> example here, perhaps larger value would be better.
>
> (Of course redesigning schema to put directories separate from files
> woul be useful, but it would not help with ORDER BY .. LIMIT queries
> -- they would have to be JOIN-ed and re-sorted in memory I'm
> afraid).

I'll gleefully ignore the nature of the example, as it's kind of
beside the point.  The point is to try to compress what's in the
column.  If it's being abused somewhat, and has more crud in it, that
gives a potential for a *bigger* win.
-- 
output = reverse("gro.mca" "@" "enworbbc")
http://linuxdatabases.info/info/spiritual.html
"Documentation wants to be obsolete." 
-- Bruce R. Lewis


Re: Index Tuple Compression Approach?

From
Gregory Stark
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

> That general approach of storing a common part leading part just once is
> called prefix compression. Yeah, it helps a lot on long text fields.
> Tree structures like file paths in particular.

You kind of want to do avoid both the prefix and the suffix, no? 

> It's been discussed before. One big problem is extracting the common
> leading part. You could only do it for text, 

Or for multi-column indexes

I could see this being especially useful if you have some columns in the index
key which are small and some that are quite large. So if you have an event
table with an index on <userid,timestamp> you wouldn't have to store lots of
timestamps on the upper level tree nodes. You would only store them for the
leaf nodes.

> but it should be done in a datatype neutral way.

I wonder if there's an analogous operation for other data types though.
Numeric could store the a value relative to the parent value. Arrays could
store only the elements needed. bytea of course works just as well as text (or
better in the face of i18n).

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Index Tuple Compression Approach?

From
Heikki Linnakangas
Date:
Gregory Stark wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> 
>> That general approach of storing a common part leading part just once is
>> called prefix compression. Yeah, it helps a lot on long text fields.
>> Tree structures like file paths in particular.
> 
> You kind of want to do avoid both the prefix and the suffix, no? 

You're much more likely to find common prefixes than suffixes in an
index page, because of the ordering. I suppose compressing the suffix
would be useful in some cases as well. You might be better off with some
generic compression algorithm at that point, though.

>> It's been discussed before. One big problem is extracting the common
>> leading part. You could only do it for text, 
> 
> Or for multi-column indexes

Oh yeah, that you could do more easily.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Index Tuple Compression Approach?

From
Gregory Stark
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

> Gregory Stark wrote:
>> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> 
>>> That general approach of storing a common part leading part just once is
>>> called prefix compression. Yeah, it helps a lot on long text fields.
>>> Tree structures like file paths in particular.
>> 
>> You kind of want to do avoid both the prefix and the suffix, no? 
>
> You're much more likely to find common prefixes than suffixes in an
> index page, because of the ordering. I suppose compressing the suffix
> would be useful in some cases as well. You might be better off with some
> generic compression algorithm at that point, though.

Sorry, by "suffix" I don't mean common sufixes, I mean the bits of the key
following the point which discriminates between the left and right side of the
tree.

So for example if you're indexing a text field and have a
tree structure like:
            Redhat Fedora Core 7                /             \ Debian Etch (Unstable)      Ubuntu hoary

We don't really need the whole of "Redhat Fedora Core 7" in the index node. We
could actually get by with just "R". Everything before "R" is on the left and
everything after "R" is on the right.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Index Tuple Compression Approach?

From
Chris Browne
Date:
stark@enterprisedb.com (Gregory Stark) writes:

> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>
>> Gregory Stark wrote:
>>> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>>> 
>>>> That general approach of storing a common part leading part just once is
>>>> called prefix compression. Yeah, it helps a lot on long text fields.
>>>> Tree structures like file paths in particular.
>>> 
>>> You kind of want to do avoid both the prefix and the suffix, no? 
>>
>> You're much more likely to find common prefixes than suffixes in an
>> index page, because of the ordering. I suppose compressing the suffix
>> would be useful in some cases as well. You might be better off with some
>> generic compression algorithm at that point, though.
>
> Sorry, by "suffix" I don't mean common sufixes, I mean the bits of the key
> following the point which discriminates between the left and right side of the
> tree.
>
> So for example if you're indexing a text field and have a
> tree structure like:
>
>              Redhat Fedora Core 7
>                  /             \
>   Debian Etch (Unstable)      Ubuntu hoary
>
> We don't really need the whole of "Redhat Fedora Core 7" in the index node. We
> could actually get by with just "R". Everything before "R" is on the left and
> everything after "R" is on the right.

Right.  The case where you get more characters than just "R" is when
you introduce extra entries that have the same prefix.  Say, "Redhat
Fedora Core 4", "Redhat Fedora Core 5", "Redhat Fedora Core 6".  And,
for good measure, let's throw in "Redhat RHAS 3", "Redhat RHAS 4", and
"Redhat RHAS 5".  

In that case, you'd have substrings: "R" "edhat " "Fedora Core " "RHAS "
as discriminators.
-- 
"cbbrowne","@","cbbrowne.com"
http://www3.sympatico.ca/cbbrowne/spreadsheets.html
If a mute swears, does his mother wash his hands with soap? 


Problem with server/utils/snapmgr.h

From
Chris Browne
Date:
There's a new #include file that it turns out we need for Slony-I to
reference, namely include/server/utils/snapmgr.h

I tried adding an autoconf rule to Slony-I to check for its existence
(goal then is to do a suitable #define so that we can #ifdef the
#include, so that we #include this only with versions of PostgreSQL
that have the file).

If I use: AC_CHECK_HEADER(utils/snapmgr.h, HAVE_SNAPMGR=1)

this turns out to fail.  Apparently autoconf wants to compile the
#include file to validate that it's an OK #include file.

GCC barfs on it, thus:

cbbrowne@dba2:~/Slony-I/CMD/slony1-HEAD> gcc -I/opt/OXRS/dbs/pgsql84-beta/include/server
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h
In file included from /opt/OXRS/dbs/pgsql84-beta/include/server/storage/itemptr.h:17,                from
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:17,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapshot.h:16,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:18,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:16:
/opt/OXRS/dbs/pgsql84-beta/include/server/storage/block.h:31: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'BlockNumber'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/storage/block.h:55: error: expected specifier-qualifier-list before 'uint16'
In file included from /opt/OXRS/dbs/pgsql84-beta/include/server/storage/off.h:17,                from
/opt/OXRS/dbs/pgsql84-beta/include/server/storage/itemptr.h:18,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:17,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapshot.h:16,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:18,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:16:
/opt/OXRS/dbs/pgsql84-beta/include/server/storage/itemid.h:46: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'ItemOffset'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/storage/itemid.h:47: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'ItemLength'
 
In file included from /opt/OXRS/dbs/pgsql84-beta/include/server/storage/itemptr.h:18,                from
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:17,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapshot.h:16,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:18,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:16:
/opt/OXRS/dbs/pgsql84-beta/include/server/storage/off.h:24: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'OffsetNumber'
 
In file included from /opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:17,                from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapshot.h:16,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:18,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:16:
/opt/OXRS/dbs/pgsql84-beta/include/server/storage/itemptr.h:40: error: expected specifier-qualifier-list before
'OffsetNumber'
/opt/OXRS/dbs/pgsql84-beta/include/server/storage/itemptr.h:143: error: expected '=', ',', ';', 'asm' or
'__attribute__'before 'ItemPointerEquals'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/storage/itemptr.h:144: error: expected '=', ',', ';', 'asm' or
'__attribute__'before 'ItemPointerCompare'
 
In file included from /opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:18,                from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapshot.h:16,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:18,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:16:
/opt/OXRS/dbs/pgsql84-beta/include/server/storage/relfilenode.h:44: error: expected specifier-qualifier-list before
'Oid'
In file included from /opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapshot.h:16,                from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:18,               from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:16:
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:106: error: expected specifier-qualifier-list before
'TransactionId'
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:118: error: expected specifier-qualifier-list before 'int32'
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:142: error: expected specifier-qualifier-list before 'uint16'
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:442: error: expected specifier-qualifier-list before 'uint32'
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:498: error: expected specifier-qualifier-list before 'uint32'
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:623: error: expected specifier-qualifier-list before 'uint16'
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:671: error: expected specifier-qualifier-list before
'BlockNumber'
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:684: error: expected specifier-qualifier-list before
'BlockNumber'
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:694: error: expected specifier-qualifier-list before
'TransactionId'
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:714: error: expected specifier-qualifier-list before
'BlockNumber'
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:722: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'HeapTupleHeaderGetCmin'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:723: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'HeapTupleHeaderGetCmax'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:725: error: expected declaration specifiers or '...' before
'CommandId'
/opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:726: error: expected declaration specifiers or '...' before
'bool'
In file included from /opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:18,                from
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:16:
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapshot.h:30: error: expected declaration specifiers or '...' before
'*'token
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapshot.h:31: error: 'bool' declared as function returning a function
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapshot.h:35: error: expected specifier-qualifier-list before
'SnapshotSatisfiesFunc'
In file included from /opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:16:
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:22: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'SnapshotData'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:23: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'SnapshotData'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:24: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'SnapshotData'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:25: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'SnapshotData'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:69: error: 'HeapTupleSatisfiesMVCC' declared as function
returninga function
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:71: error: 'HeapTupleSatisfiesNow' declared as function
returninga function
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:73: error: 'HeapTupleSatisfiesSelf' declared as function
returninga function
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:75: error: 'HeapTupleSatisfiesAny' declared as function
returninga function
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:77: error: 'HeapTupleSatisfiesToast' declared as function
returninga function
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:79: error: 'HeapTupleSatisfiesDirty' declared as function
returninga function
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:83: error: expected declaration specifiers or '...' before
'CommandId'
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:85: error: expected declaration specifiers or '...' before
'TransactionId'
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:88: error: expected declaration specifiers or '...' before
'uint16'
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:88: error: expected declaration specifiers or '...' before
'TransactionId'
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:20: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'Snapshot'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:21: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'Snapshot'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:22: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'Snapshot'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:24: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'TransactionXmin'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:25: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'RecentXmin'
 
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:26: error: expected '=', ',', ';', 'asm' or '__attribute__'
before'RecentGlobalXmin'
 

Is there something in here that needs cleaning?

Note: I pulled CVS HEAD earlier today, so this is the state of the
code just a little while ago.
-- 
"cbbrowne","@","linuxfinances.info"
http://linuxfinances.info/info/advocacy.html
Humpty Dumpty sat on the wall,
Humpty Dumpty had a great fall!
All the king's horses,
And all the king's men,
Had scrambled eggs for breakfast again!


Re: Problem with server/utils/snapmgr.h

From
Alvaro Herrera
Date:
Chris Browne wrote:

> If I use:
>   AC_CHECK_HEADER(utils/snapmgr.h, HAVE_SNAPMGR=1)
> 
> this turns out to fail.  Apparently autoconf wants to compile the
> #include file to validate that it's an OK #include file.
> 
> GCC barfs on it, thus:
> 
> cbbrowne@dba2:~/Slony-I/CMD/slony1-HEAD> gcc -I/opt/OXRS/dbs/pgsql84-beta/include/server
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h
> In file included from /opt/OXRS/dbs/pgsql84-beta/include/server/storage/itemptr.h:17,
>                  from /opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:17,
>                  from /opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapshot.h:16,
>                  from /opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:18,
>                  from /opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:16:


Hmm.  It works for me if I forcefully include postgres.h:

gcc -I/pgsql/install/00head/include/server -include postgres.h /pgsql/install/00head/include/server/utils/snapmgr.h 

Our header file rule says that a header must include any header it needs
to compile, but never include postgres.h, which must be the first
include in all the .c files.

So I'm not sure the fix for this.

What does Slony-I need snapmgr.h for, anyway?  This code is in a state
of a flux right now -- there are pending patches which are likely to
change the horizon a bit.  Perhaps it does not make sense for Slony to
adjust to a state that's expected to be short-lived.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Problem with server/utils/snapmgr.h

From
Tom Lane
Date:
Chris Browne <cbbrowne@acm.org> writes:
> I tried adding an autoconf rule to Slony-I to check for its existence
> (goal then is to do a suitable #define so that we can #ifdef the
> #include, so that we #include this only with versions of PostgreSQL
> that have the file).

The customary way of handling Postgres version differences at
compilation time has been something like

#include "catalog/catversion.h"
#if CATALOG_VERSION_NO >= 200804201
... new code ...
#else
... old code ...
#endif

Seems to me that would do just fine and you don't need autoconf help.

I concur with Alvaro's comment that you're probably wasting your time
to do anything touching snapmgr.h right now, but if you must ...
        regards, tom lane


Re: Problem with server/utils/snapmgr.h

From
Chris Browne
Date:
alvherre@commandprompt.com (Alvaro Herrera) writes:

> Chris Browne wrote:
>
>> If I use:
>>   AC_CHECK_HEADER(utils/snapmgr.h, HAVE_SNAPMGR=1)
>> 
>> this turns out to fail.  Apparently autoconf wants to compile the
>> #include file to validate that it's an OK #include file.
>> 
>> GCC barfs on it, thus:
>> 
>> cbbrowne@dba2:~/Slony-I/CMD/slony1-HEAD> gcc -I/opt/OXRS/dbs/pgsql84-beta/include/server
/opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h
>> In file included from /opt/OXRS/dbs/pgsql84-beta/include/server/storage/itemptr.h:17,
>>                  from /opt/OXRS/dbs/pgsql84-beta/include/server/access/htup.h:17,
>>                  from /opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapshot.h:16,
>>                  from /opt/OXRS/dbs/pgsql84-beta/include/server/utils/tqual.h:18,
>>                  from /opt/OXRS/dbs/pgsql84-beta/include/server/utils/snapmgr.h:16:
>
>
> Hmm.  It works for me if I forcefully include postgres.h:
>
> gcc -I/pgsql/install/00head/include/server -include postgres.h /pgsql/install/00head/include/server/utils/snapmgr.h 
>
> Our header file rule says that a header must include any header it needs
> to compile, but never include postgres.h, which must be the first
> include in all the .c files.
>
> So I'm not sure the fix for this.
>
> What does Slony-I need snapmgr.h for, anyway?  This code is in a state
> of a flux right now -- there are pending patches which are likely to
> change the horizon a bit.  Perhaps it does not make sense for Slony to
> adjust to a state that's expected to be short-lived.

Well, one of the Sun guys observed this...
http://www.slony.info/bugzilla/show_bug.cgi?id=46

And I was trying to build against CVS HEAD (for both projects ;-)) and
observed it:

cbbrowne@dba2:CMD/slony1-HEAD/src/backend> make
gcc -g -O2 -Wall -Wmissing-prototypes -Wmissing-declarations -I../.. -fpic -I/opt/OXRS/dbs/pgsql84-beta/include/
-I/opt/OXRS/dbs/pgsql84-beta/include/server/ -c -o slony1_funcs.o slony1_funcs.c
 
slony1_funcs.c: In function '_Slony_I_createEvent':
slony1_funcs.c:142: error: 'SerializableSnapshot' undeclared (first use in this function)
slony1_funcs.c:142: error: (Each undeclared identifier is reported only once
slony1_funcs.c:142: error: for each function it appears in.)

The definition for SerializableSnapshot has moved from
server/utils/tqual.h to server/utils/snapmgr.h

I agree that the code seems in flux; it seems quite likely that there
will be further changes between now and release of 8.4.  That being
said, it's useful to observe these problems *early*, and have
workarounds, so that we can validate that Slony-I is generally
compatible with 8.4 throughout its development cycle.

As a workaround, for now, I'll see if Tom's counsel on this works out
well; I expect so.

The point of the exercise wasn't so much to ask "How do I work around
this?" as it was to point out that there's *something* up with the
header file, in that the autoconf AC_CHECK_HEADER function barfs on
it.

*THAT* (the fact that AC_CHECK_HEADER breaks) appears to be a bad
thing, irrespective of any Slony-I issues.
-- 
let name="cbbrowne" and tld="linuxdatabases.info" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/x.html
Signs of   a Klingon Programmer -   1.  "Defensive  programming? Never!
Klingon programs are always on the offense. Yes, offensive programming
is what we do best."


Re: Problem with server/utils/snapmgr.h

From
Alvaro Herrera
Date:
Chris Browne wrote:

> And I was trying to build against CVS HEAD (for both projects ;-)) and
> observed it:
> 
> cbbrowne@dba2:CMD/slony1-HEAD/src/backend> make
> gcc -g -O2 -Wall -Wmissing-prototypes -Wmissing-declarations -I../.. -fpic -I/opt/OXRS/dbs/pgsql84-beta/include/
-I/opt/OXRS/dbs/pgsql84-beta/include/server/ -c -o slony1_funcs.o slony1_funcs.c
 
> slony1_funcs.c: In function '_Slony_I_createEvent':
> slony1_funcs.c:142: error: 'SerializableSnapshot' undeclared (first use in this function)
> slony1_funcs.c:142: error: (Each undeclared identifier is reported only once
> slony1_funcs.c:142: error: for each function it appears in.)
> 
> The definition for SerializableSnapshot has moved from
> server/utils/tqual.h to server/utils/snapmgr.h

Well, considering that my next patch proposes removing
SerializableSnapshot altogether, I think Slony will need some
rejiggering.

What do you need SerializableSnapshot for?  Perhaps this is reason
enough not to remove it (but I doubt it.)  Please point me to an online
copy of slony1_funcs.c.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Problem with server/utils/snapmgr.h

From
"Marko Kreen"
Date:
On 4/22/08, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Chris Browne wrote:
>
>  > And I was trying to build against CVS HEAD (for both projects ;-)) and
>  > observed it:
>  >
>  > cbbrowne@dba2:CMD/slony1-HEAD/src/backend> make
>  > gcc -g -O2 -Wall -Wmissing-prototypes -Wmissing-declarations -I../.. -fpic -I/opt/OXRS/dbs/pgsql84-beta/include/
-I/opt/OXRS/dbs/pgsql84-beta/include/server/ -c -o slony1_funcs.o slony1_funcs.c
 
>  > slony1_funcs.c: In function '_Slony_I_createEvent':
>  > slony1_funcs.c:142: error: 'SerializableSnapshot' undeclared (first use in this function)
>  > slony1_funcs.c:142: error: (Each undeclared identifier is reported only once
>  > slony1_funcs.c:142: error: for each function it appears in.)
>  >
>  > The definition for SerializableSnapshot has moved from
>  > server/utils/tqual.h to server/utils/snapmgr.h
>
>
> Well, considering that my next patch proposes removing
>  SerializableSnapshot altogether, I think Slony will need some
>  rejiggering.
>
>  What do you need SerializableSnapshot for?  Perhaps this is reason
>  enough not to remove it (but I doubt it.)  Please point me to an online
>  copy of slony1_funcs.c.

You probably simply need to use ActiveSnapshot there.

Same thing came up when txid landed in core - it had inherited
SerializableSnapshot from xxid code.

If you really need exact same sematics, you need to simply
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE.  But if the whole
transaction is not SERIALIZABLE already, its unlikely you need it.

-- 
marko


Re: Problem with server/utils/snapmgr.h

From
Alvaro Herrera
Date:
Marko Kreen escribió:

> You probably simply need to use ActiveSnapshot there.

Heh, ActiveSnapshot as a symbol is also gone in the new code ;-)  I
changed txid.c to use GetActiveSnapshot() instead.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Buildfarm + Git tryouts

From
Chris Browne
Date:
I'm trying to start preparing buildfarm nodes for the upcoming Git
migration, and have run into a few issues.  I speculate that -hackers
is one of the better places for this to get discussed; if it should be
elsewhere, I'm sure Andrew Dunstan won't be shy to redirect this :-).

What I was hoping to do was to run nodes for a little while against
the repo at git://github.com/oicu/pg-cvs-mirror.git to validate that
it all works against git.

The best instructions available thus far...
<http://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto>

Unfortunately, I'm finding some anomalies in terms of differences
between my environments and what that suggests.

1.  git clone --mirror is apparently new in git version 1.6
 <http://kerneltrap.org/mailarchive/git/2008/8/2/2793244>
 My somewhat mouldy Ubuntu node is on git 1.5.4, which doesn't include this.
 There's a workaround, as --mirror is just "syntactic sugar"
  $ git clone --mirror $URL     may be replaced by  $ git clone --bare $URL  $ (cd $(basename $URL) && git remote add
--mirrororigin $URL)
 
 I see options:  a) Require git 1.6 or so (latest stable is 1.7.1.1)  b) Offer the alternative mirror approach

2.  SCM.pl seems to expect the repository directory to be called    "pgsql" (around line #373)
   I wound up hacking up the code to head to a specific directory,   which isn't a terribly good hack, but apparently
worksfor now.
 

-               chdir 'pgsql';
+               chdir '/opt/build-farm/pgsql.git';
   It feels as though something's not quite right about how the   nearby Git repository is referenced.

3.  Some problems checking status.

i) Status Line: 491 bad ts parameter - [timestamp omitted] is in the future

I know my clock's reasonable - ntp is reporting I'm within 0.25s of
some stratum 2 nodes.  Is it possible that the buildfarm server is
ill-synced?

ii) Status Line: 460 script version too low

I just pulled a buildfarm build last week, so I'm not sure what this
would relate to.

That's where I've gotten thus far.  Node caracara isn't reporting in
just now because of this testing.  I hope other people ought to be
doing similar tryouts so that the revision effort to shift to Git
doesn't wait to *start* until the development repo shifts.
-- 
let name="cbbrowne" and tld="gmail.com" in String.concat "@" [name;tld];;
http://linuxfinances.info/info/linuxdistributions.html
Rules of the Evil Overlord #77. "If I have a fit of temporary insanity
and decide to give  the hero the chance to reject a  job as my trusted
lieutentant,  I will  retain enough  sanity to  wait until  my current
trusted  lieutenant  is  out  of  earshot before  making  the  offer."
<http://www.eviloverlord.com/>


Re: Buildfarm + Git tryouts

From
Alvaro Herrera
Date:
Excerpts from Chris Browne's message of lun jul 05 12:33:49 -0400 2010:

> 3.  Some problems checking status.
> 
> i) Status Line: 491 bad ts parameter - [timestamp omitted] is in the future
> 
> I know my clock's reasonable - ntp is reporting I'm within 0.25s of
> some stratum 2 nodes.  Is it possible that the buildfarm server is
> ill-synced?

I asked around and was told that the server is in sync currently.  It
has been known to be out of sync before; if this problem still comes up,
maybe there's a software bug or something.


Re: Buildfarm + Git tryouts

From
Andrew Dunstan
Date:

Alvaro Herrera wrote:
> Excerpts from Chris Browne's message of lun jul 05 12:33:49 -0400 2010:
>
>   
>> 3.  Some problems checking status.
>>
>> i) Status Line: 491 bad ts parameter - [timestamp omitted] is in the future
>>
>> I know my clock's reasonable - ntp is reporting I'm within 0.25s of
>> some stratum 2 nodes.  Is it possible that the buildfarm server is
>> ill-synced?
>>     
>
> I asked around and was told that the server is in sync currently.  It
> has been known to be out of sync before; if this problem still comes up,
> maybe there's a software bug or something.
>
>   


This discussion really belongs on pgbuildfarm-members, I think.

I have added a small fudge factor to the time test - let's see if that 
changes things.

cheers

andrew