Subject: Re: [romconlang] Garbled diacritics and what to do about them (was Re: Ecce hoc et eccu hic!)
From: Benct Philip Jonsson
Date: Sat, 31 Jan 2009 12:01:17 +0100
To: romconlang@yahoogroups.com
On 2009-01-31 Eric Christopherson wrote:
On Jan 30, 2009, at 4:57 PM, old_astrologer wrote:
PS What encoding are you using? My browser can't work it out, and I've tried half a dozen manually, with no luck.
The accented characters came out as gibberish here too (I'm using OS X Mail), but it looked like UTF-8. However, David, yours also look like gibberish.
I think the Yahoo server double-converts posts, i.e. tries to
convert what already is in UTF-8 to UTF-8, since when I look at my
own post in Latin-15 cio' looks like
ciò
, which is what
double-converted UTF-8 typically looks like. Unfortunately I know
of no easy way to rectify this.
In the old days some philological mailing lists used an ASCII transliteration using various punctuation marks to simulate diacritical marks. Clearly we seem to need something similar:
Diacritic | HTML | ASCII |
---|---|---|
acute | á | /a |
grave | à | \a |
trema/diaeresis | ä | :a |
tilde | ñ | ~n |
circumflex | â | <a |
caron/hacek | č | >c |
macron | ā | |a |
breve | ă | )a |
tie | a͡e | (ae |
dot above | ċ | .c |
dot below | ẹ | !e |
cedilla | ç | ,c |
ogonek/hook | ę | ;e |
superscript | ci | c^i |
subscript | e1 | e_1 |
This should cover most our needs in terms of what we meet in our sources.
The benefit of putting the pseudo-diacritics in front of the letters is that the punctuation characters seldom appear in that position in normal text. However there is still potential ambiguity with the characters /()_ in particular. I suggest to enclose words containing pseudo-diacriticized letters in curly braces when there is risk of ambiguity. {n/ee}.
An added benefit is that unlike with UTF-8 these can be freely combined with arbitrary letters and with each other without worrying about font and Unicode support. I've actually seen things like {/>g} and {/)|;e} in old Romanist publications!
I've uploaded an HTML version of this post to the files section, and hope to put up some kind of converter form on my site
/BP 8^)>
--
Benct Philip Jonsson -- melroch atte melroch dotte se
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"C'est en vain que nos Josués littéraires crient
à la langue de s'arrêter; les langues ni le soleil
ne s'arrêtent plus. Le jour où elles se *fixent*,
c'est qu'elles meurent." (Victor Hugo)