Garbled diacritics and what to do about them

Subject: Re: [romconlang] Garbled diacritics and what to do about them (was Re: Ecce hoc et eccu hic!)
From: Benct Philip Jonsson 
Date: Sat, 31 Jan 2009 12:01:17 +0100
To: romconlang@yahoogroups.com

On 2009-01-31 Eric Christopherson wrote:

On Jan 30, 2009, at 4:57 PM, old_astrologer wrote:

PS What encoding are you using? My browser can't work it out, and I've tried half a dozen manually, with no luck.

The accented characters came out as gibberish here too (I'm using OS X Mail), but it looked like UTF-8. However, David, yours also look like gibberish.

I think the Yahoo server double-converts posts, i.e. tries to convert what already is in UTF-8 to UTF-8, since when I look at my own post in Latin-15 cio' looks like ciò, which is what double-converted UTF-8 typically looks like. Unfortunately I know of no easy way to rectify this.

In the old days some philological mailing lists used an ASCII transliteration using various punctuation marks to simulate diacritical marks. Clearly we seem to need something similar:

Diacritic HTML ASCII
acute á /a
grave à \a
trema/diaeresis ä :a
tilde ñ ~n
circumflex â <a
caron/hacek č >c
macron ā |a
breve ă )a
tie a͡e (ae
dot above ċ .c
dot below !e
cedilla ç ,c
ogonek/hook ę ;e
superscript ci c^i
subscript e1 e_1

This should cover most our needs in terms of what we meet in our sources.

The benefit of putting the pseudo-diacritics in front of the letters is that the punctuation characters seldom appear in that position in normal text. However there is still potential ambiguity with the characters /()_ in particular. I suggest to enclose words containing pseudo-diacriticized letters in curly braces when there is risk of ambiguity. {n/ee}.

An added benefit is that unlike with UTF-8 these can be freely combined with arbitrary letters and with each other without worrying about font and Unicode support. I've actually seen things like {/>g} and {/)|;e} in old Romanist publications!

I've uploaded an HTML version of this post to the files section, and hope to put up some kind of converter form on my site

/BP 8^)>
-- 
Benct Philip Jonsson -- melroch atte melroch dotte se
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 "C'est en vain que nos Josués littéraires crient
 à la langue de s'arrêter; les langues ni le soleil
 ne s'arrêtent plus. Le jour où elles se *fixent*,
 c'est qu'elles meurent."           (Victor Hugo)