[YG Conlang Archives] > [romconlang group] > messages [Date Index] [Thread Index] >


[Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [romconlang] Garbled diacritics (was Re: Ecce hoc et eccu hic!)



On 2009-01-31 Eric Christopherson wrote:

> On Jan 30, 2009, at 4:57 PM, old_astrologer wrote:
>
> > PS What encoding are you using? My browser can't work it
> > out, and I've tried half a dozen manually, with no luck.
>
> The accented characters came out as gibberish here too
> (I'm using OS  X Mail), but it looked like UTF-8. However,
> David, yours also look  like gibberish.

I think the Yahoo server double-converts posts, i.e.
tries to convert what already is in UTF-8 to UTF-8,
since when I look at my own post in Latin-15 _cio'_
looks like ciò, which is what double-converted
UTF-8 typically looks like.  Unfortunately I know
of no easy way to rectify this.

In the old days some philological mailing lists
used an ASCII transliteration using various punctuation
marks to simulate diacritical marks.  Clearly we
seem to need something similar:

Diacritic                HTML               ASCII
------------------  ------------------  -------------------
On 2009-01-31 Eric Christopherson wrote:

> On Jan 30, 2009, at 4:57 PM, old_astrologer wrote:
>
> > PS What encoding are you using? My browser can't work it
> > out, and I've tried half a dozen manually, with no luck.
>
> The accented characters came out as gibberish here too
> (I'm using OS  X Mail), but it looked like UTF-8. However,
> David, yours also look  like gibberish.

I think the Yahoo server double-converts posts, i.e.
tries to convert what already is in UTF-8 to UTF-8,
since when I look at my own post in Latin-15 _cio'_
looks like ciò, which is what double-converted
UTF-8 typically looks like.  Unfortunately I know
of no easy way to rectify this.

In the old days some philological mailing lists
used an ASCII transliteration using various punctuation
marks to simulate diacritical marks.  Clearly we
seem to need something similar:

Diacritic                HTML               ASCII
------------------  ------------------  -------------------
acute               á              /a
grave               à              \a
trema/diaeresis     ä              :a
tilde               ñ              ~n
circumflex          &#226;              <a
caron/hacek         &#269;              >c
macron              &#257;              |a
breve               &#259;              )a
tie                 a&#865;e            (ae
dot above           &#267;              .c
dot below           &#7865;             !e
cedilla             &#231;              ,c
ogonek/hook         &#281;              ;e
superscript         c<sup>i</sup>       c^i
subscript           e<sub>1</sub>       e_1

This should cover most our needs in terms of what we
meet in our sources.

The benefit of putting the pseudo-diacritics in front
of the letters is that the punctuation characters
seldom appear in that position in normal text.
However there is still potential ambiguity with
the characters /()_ in particular.  I suggest to
enclose words containing pseudo-diacriticized letters
in curly braces when there is risk of ambiguity.
{n/ee}.

An added benefit is that unlike with UTF-8 these can
be freely combined with arbitrary letters and with
each other without worrying about font and Unicode support.
I've actually seen things like {/>g} and {/)|;e} in
old Romanist publications!

I've uploaded an HTML version of this post to the files
section, and hope to put up some kind of converter form
on my site

    /BP 8^)>
    --
    Benct Philip Jonsson -- melroch atte melroch dotte se
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     "C'est en vain que nos Josu&#233;s litt&#233;raires crient
&#224; la langue de s'arr&#234;ter; les langues ni le soleil ne s'arr&#234;tent plus. Le jour o&#249; elles se *fixent*,
     c'est qu'elles meurent."           (Victor Hugo)


This should cover most our needs in terms of what we
meet in our sources.

The benefit of putting the pseudo-diacritics in front
of the letters is that the punctuation characters
seldom appear in that position in normal text.
However there is still potential ambiguity with
the characters /()_ in particular.  I suggest to
enclose words containing pseudo-diacriticized letters
in curly braces when there is risk of ambiguity.
{n/ee}.

An added benefit is that unlike with UTF-8 these can
be freely combined with arbitrary letters and with
each other without worrying about font and Unicode support.
I've actually seen things like {/>g} and {/)|;e} in
old Romanist publications!

I've uploaded an HTML version of this post to the files
section, and hope to put up some kind of converter form
on my site

/BP 8^)>
--
Benct Philip Jonsson -- melroch atte melroch dotte se
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 "C'est en vain que nos Josu�s litt�raires crient
 � la langue de s'arr�ter; les langues ni le soleil
 ne s'arr�tent plus. Le jour o� elles se *fixent*,
 c'est qu'elles meurent."           (Victor Hugo)