Re: Self-segmenting words & the treatment of names

From: Martín Baldán <martinob@hidden.email>
Date: Tue, 09 May 2006 01:29:37 -0000
Subject: Re: Self-segmenting words & the treatment of names
To: engelang@yahoogroups.com

Hi!
--- In engelang@yahoogroups.com, And Rosta <a.rosta@...> wrote:
>
> [I haven't had anything but spam off this list for years!]

That's surprising, given the catchy name of this group. Where do all
the people interested in engelangs meet? :)

BTW, I've read you are the originator of the "engineered language"
term. If, so, congrats (and also to John Cowan for the shorter
"engelang"): I've found it very useful to describe my field of
interest, as opposed to "artistic languages" or the more general
"constructed languages", while "logical languages" was too restrictive.

[..]

> 
> Yes please. ("lisp-like syntax" means nothing to me, I might add; I
know nowt about the syntax of LISP.)

I'm no expert, but I'd say Lisp is basically a programming language
based on lambda-calculus with polish notation.
The term "lisp-like syntax" has been used to describe some software
ontology languages, such as CycL and SUO-KIF,
which I am considering as starting points for my language design. I
totally agree with your preference for making a rigorous language
more user-friendly, instead of making a natural-seeming language more
rigorous (as you expressed in message#51).

Here are some links on SUO-KIF and CycL:

http://en.wikipedia.org/wiki/KIF
http://www.cs.umbc.edu/kse/kif/

http://en.wikipedia.org/wiki/CycL
http://www.opencyc.org/doc/#programming

As for the examples of my proposed parenthesis notation, please see my
reply to Jorge (message#128)

> I agree. IIRC, my conlang has a couple of ways of exiting foreign
text (that are specified when entering it). The requirement is of
>course that the exit marker does not occur within the text itself. In
one method, the exit marker contains a click phoneme. In the other
>method, a la Lojban, the exit marker is defined at the point of entry.

It seems that the most similar quotation style in lojban is "zoi X
.text. X", as described in:
http://www.lojban.org/sv/publications/reference_grammar/chapter19.html

An advantage of my method is that it doesn't require the leading and
trailing pauses.
Besides, my treatment of foreign pronunciation is different. I'll take
the lojban example: "gyrations":

Quoted in lojban:

correct in writing, incorrect in speech:
zoi jai. gyrations .jai

correct in speech, incorrect in writing :
zoi gy. gyrations .gy

correct in both speech and writing:
zoi ka .gyrations. ka

Quoted in my language 

sei liukayair'eixonsaka (gyrations)  

"sei" indicates it is not a proper name, but an unspecified piece of
foreign speech. I'm considering to eliminate it 
and reserve other four words instead, but still didn't decide which
ones to reserve. I may change some details, but the
basic concept is to write the foreign word "as it sounds", and then,
optionally, add its written form with some indication
that it should not be read aloud. 

> [...]
> Like Jorge, I feel that this proposal does not score high with
respect to the Concision desideratum... Your solution is pretty
analogous to one of those my conlang uses for foreign text, but it
strikes me as too cumbersome for ordinary names.
> 
> I guess the solution will depend both on the phonology of your
conlang and on its segmentation strategy.
> 

I agree, and that's why I've considered to let the speaker set a
persistent keyword. I prefer not to distort names for the sake of
conciseness. My design principles, in this respect are:

1) First provide strong encapsulation, then try to make it concise.
2) I have no idea of which phonemes will be unlikely in the foreign
names the speaker will use.

One strategie for conciseness is to let the speaker assign nicknames,
so that foreign names only have to be pronounced once.
In my language all roots are monosyllabic, and their structure is
CV(V)(V). Two consecutive consonants are always part of a compound
word, but I still haven't decided whether a compound word may contain
a "VCV" group or not. 

This has to do with the word composition scheme I'm trying to devise.
The idea is that you build compound words by joining two or more words
and inserting a letter in the middle. Which letter you insert depends
on the kind of composition ("coordinate", "subordinate" or "literal")
and on the compositional hierarchical level, which is one unit higher
than the highest level in the constituent words (much like the
hierarchy of parenthesised expressions I've described to Jorge). 

As for the kinds of composition, "coordinate" means that the relation
is simmetrical (for instance, an AND or an OR relation, as in the
English words "greenhouse", "houseboat",..), "subordinate" mean that
the relation is asimmetrical (as in the English words "skyscraper",
"treekiller", "mousetrap",..). "Literal" means that the string of
constituent words should be read as if they were independent words in
a phrase, and then search for a metaphoric meaning of the whole phrase
(as in the English words "wannabe" "look-alike").

 Besides, in my language, word derivation is a particular case of
literal word composition. Following with the English examples, there
would be an independent "un" word, so that if I said "you are un" it
would mean "there's something you are not", and if I said "you are un
American" it would mean "you are not American", which is not the same
as saying "you are un-American". Here, "un-American" would be a
literal compound.

What I still haven't decided, as I said, is whether "VCV" is allowed
inside a compound word. In my first design it was not, because you
have to insert a letter between every two constituent words. This had
the advantage that only one letter is needed for every composition
type and level, and that long nicknames can be easily used, as long as
there's no "VCV" group in them. The drawback is lack of concission and
euphony for compound words with many constituents of the same type and
level.

Another approach is assign two letters to each type and level. One of
them is for two-element compounding. The other one is for
several-element compounding, and its span is limited by other letters
of the same or greater level. A two-element compounding letter of the
same level means that the following element is the last one. A
several-element compounding letter of the same level means that
element before the previous one is the last one. A letter of higher
level encloses all the elements before it as the first element for its
compounding.

 Please excuse all this confusing description, I'll try to be clearer
in later posts, but the point is to achieve maximum conciseness (use
as little compound-marking letters as possible) while retaining
parseability of every compound into compounds of lower level
(reversibility of word compounding) and never having more than two
consecutive consonants. Quasi-encapsulation of constituents is also
achieved, since only the second, end marking, highest-level
compound-marking letter (when there are more than two elements at the
top level) of each element is removed. Full encapsulation could be
easily achieved, but at the expense of lower conciseness. 

I think I'll follow the second approach, but then it means I need a
beginning and an end keyword for nicknames, or something equivalent.

By the way, one important concept in my language is that supra-word
lexemes should be marked, and they can be compound, just as words. It
means that multi-word terms such as "black ice" or phrases like "kick
the bucket", whose meaning is not equal to the meaning of their
constituents, should be marked by keywords as independent lexemes.

regards,

Martin Baldan

Follow-Ups:
- Re: [engelang] Re: Self-segmenting words & the treatment of names
  - From: And Rosta <a.rosta@hidden.email>

References:
- Re: [engelang] Re: Self-segmenting words & the treatment of names
  - From: And Rosta <a.rosta@hidden.email>

Prev by Date: Re: [engelang] Re: Self-segmenting words & the treatment of names
Next by Date: Parenthetical notation, corrected
Previous by thread: Re: [engelang] Re: Self-segmenting words & the treatment of names
Next by thread: Re: [engelang] Re: Self-segmenting words & the treatment of names
Index(es):
- Date
- Thread