Re: Self-segmenting words & the treatment of names

From: Martín Baldán <martinob@hidden.email>
Date: Mon, 15 May 2006 01:28:57 -0000
Subject: Re: Self-segmenting words & the treatment of names
To: engelang@yahoogroups.com
 
--- In engelang@yahoogroups.com, And Rosta <a.rosta@...> wrote:
>
>
> > I'm no expert, but I'd say Lisp is basically a programming language
> > based on lambda-calculus with polish notation.
> 
> If polish notation, then why does the syntax involve all these
brackets? Polish notation, of course, is bracket-free.


Quote from http://en.wikipedia.org/wiki/Polish_notation :

"While the examples above use parentheses, one of the benefits of
Polish notation is that, assuming the arity of each operator is known,
parentheses are unnecessary: the order of operations is unique and
easy to determine, provided that the expression is well-formed."


That is, only if you assume that operators have a fixed arity (number
of arguments), which is rather inconvenient, especially for operators
such as AND, OR, and many others. SUO-KIF has variable arity in all
the operators, since it lets you ommit as many arguments as you like.
CycL, IIRC, has two kinds of operators: fixed-arity and
variable-arity. The latter is reserved for AND, OR and the like.

 



> > An advantage of my method is that it doesn't require the leading and
> > trailing pauses.
> 
> Better to think of Lojban /./ as a glottal stop. It is normal for
languages to phonologize glottal stops, but unheard of for them to
phonologize pauses.
> 

I think this is hiding the fact that you can't use a glottal stop to
separate two consonants in the same way you use it to separate two
vowels. Vowels at the beginning of a sentence tend to be preceded by a
glottal stop, whereas consonants (especially plosive ones) don't. So,
if you need to separate two vowels, you just pronounce the second one
as if it were at the beginning of a sentence, so to speak, but you
can't use this trick with consonants. For instance, in the sentence
"la meris. klama" I don't see how to separate "s" from "k" without
making a pause, because "k" has the same sound (as far as I can tell)
at the beginning of a sentence and in the middle of it. You can call
it a glottal stop, but I find no difference between it and a little pause.


> It's this nicknaming stage that I'm interested in. That is, Lojban
has the la'o stage, where names are given in their original unaltered
form. But this is very clunky, and names are normally in cmevla form,
i.e. normalized to the conventions of Lojban cmevla, which requires
that they end in /C./. So in my remarks that you quoted in beginning
this thread, I was thinking about how best to adapt foreign names to
the name-conventions of the conlang. On the whole, I think the
nativized form of the foreign name should be no longer >than the
original and should have its shape as little distorted as possible.

I see. I had underestimated the value of having *both* conciseness and
fidelity in the same method. A major advantage of this is that I can
readily use proper names in compound/derived words (the equivalent of,
say, Whorfian, Darwinism and so on).

> 
> > As for the kinds of composition, "coordinate" means that the relation
> > is simmetrical (for instance, an AND or an OR relation, as in the
> > English words "greenhouse", "houseboat",..), 
> 
> Are there examples where the relation is OR?

I can't recall any actual examples in English, but I think it would be
a useful feature for a language to have a way of making them.
For instance, take the phrase: "the circumstances of his
suicide/murder are not clear". In police jargon it might be useful to
have a compound word meaning "apparent suicide with some hints that
point to murder" and its form could be something equivalent to
"suicide/murder". Notice that I'm not saying it should strictly mean
"suicide OR murder". It should be an independent word with a meaning
of its own. More on that below.


> 
> > "subordinate" mean that
> > the relation is asimmetrical (as in the English words "skyscraper",
> > "treekiller", "mousetrap",..). "Literal" means that the string of
> > constituent words should be read as if they were independent words in
> > a phrase, and then search for a metaphoric meaning of the whole phrase
> > (as in the English words "wannabe" "look-alike").
> 
> This is nice. But couldn't "skyscraper", "treekiller", "mousetrap"
be literals? I.e. "scrapes sky", "kills trees", "traps mice"? A
clearer example of a subordinate might be "leafmold", say, = "mold
that grows on leaves". I look forward to hearing more, anyway.

Yes, your example is much better. BTW, those are just examples I gave
off-the top of my head, not anything actually implemented in my
language. I don't have any lexicon yet, because the semantic space for
short words in my language is small, and I want all roots to be
monosyllabic, so it's not easy to decide what goes in. My basic
lexicon consists of 536 to 558 monosyllables (there are 22 of them I
prefer not to use, because I find them too  similar to others, but I
may change my mind), including grammatical words. The rest would be
compounds. I'll leave some "safety valves", reserved keywords to mark
the beginning of longer roots, but I won't use them, since the idea is
to build new words by means of composition and derivation.

Anyway, here are some more: caveman, necklace, eyebrow, milkman (and
other profession-related compounds), firewall, fireplace, cocktail,
cockpit, watermelon, earthquake, rainbow, keyboard, , 

Those are single words, but many others, such as "straw man", "atom
bomb", "contact lenses", "corn flakes", etc are made of two or more
words and others, as I said before, are phrases or even whole
sentences. For instance, I'd say that a sentence like "if we build it,
they will come" can be considered a single lexeme, that is, something
with a meaning on its own, closely related but not exactly equivalent
to the meaning of the sentence as such. I suspect that natural
language processing would be much easier if there existed a (concise,
usable) way to mark these extended lexemes. 


> 
> For Livagian I have no productive methods of stem-formation. If you
want something whose meaning is unambiguously determinable from its
parts, then you use a syntactic phrase, not a single words. If it is
sufficient for the meaning to be vaguely determinable from its parts,
then you don't need productive rules of stem formation. For many
engelangers the rationale for having productive derivation is that
words are shorter than phrases. But to me that just shows that in that
engelang, phrases are too long, and the language needs to be conciser.
> 

Agreed. IMHO, those who treat compounds as a shorthand for syntactic
phrases are missing the whole point. You don't need compound/derived
words to speak faster, you need them to express new concepts. Of
course, you can simply build new roots with no morphological relation
with existing ones, computers won't care. The reason why I place so
much emphasis in word composition is, basically, mnemonics and other
needs of human speakers: I want people to have an easy way to remember
the meaning of compounds, and to tell them apart, once they are
familiar with their meanings. I want them to be able to build
compounds on-the-fly, and then add a particle to indicate that they
know the word is not in the official lexicon, and they are creating it
now, or maybe another particle to indicate that they know there is a
word for the concept they are trying to express, but don't remember
its exact form, and so on.


> > By the way, one important concept in my language is that supra-word
> > lexemes should be marked, and they can be compound, just as words. It
> > means that multi-word terms such as "black ice" or phrases like "kick
> > the bucket", whose meaning is not equal to the meaning of their
> > constituents, should be marked by keywords as independent lexemes.
> 
> Do these multi-word idioms have to form a contiguous sequence? Or do
you allow things like "Tabs were kept on me", using the idiom "keep
tabs on" = "monitor"? 
> 
> --And.
>

There should be a way to use these kind of lexeme-phrases, but I have
to work on the implementation details. I'm thinking of something like
anonymous lambda expressions, but the result should be reasonably
concise and usable.

For instance (a bit oversimplified) :

keep-on: (keep-on x1 x2 x3) : x1 keeps x3 on x2

((lambda-phrase ?x?y (E and tab (keep-on ?x ?y ))) they I)

OR:

(assign (lambda-phrase ?x?y (E and tab (keep-on ?x ?y ))) ?x they ?y I )

There should be hierarchical parenthesis for compound phrases, besides
those used for syntactic phrases, so that a lambda-phrase does not
affect the hierarchical markup of the sentence where it's used, just
as if it were a single word:

(1] (c1] lambda-phrase ?x?y (2]E and tab (1] keep-on ?x ?y )c1] they I )1]

(1]assign (c1] lambda-phrase ?x?y (2]E and tab (1] keep-on ?x ?y )c1]
?x they ?y I )1]

And I'm still not taking into account the tense system! I'm basing it
in event-token reification, so it would be more like the following:


(1]assign (c1] lambda-phrase ?x?y (2]E and
(1]unchanging-state-sentence tab |1]event-existential-sentence keep-on
subj ?x obj ?y theme )c1] ?x they ?y I )1]


Regards,

Martin Baldan
Follow-Ups:
- Re: [engelang] Re: Self-segmenting words & the treatment of names
  - From: And Rosta <a.rosta@hidden.email>
References:
- Re: [engelang] Re: Self-segmenting words & the treatment of names
  - From: And Rosta <a.rosta@hidden.email>
Prev by Date: Re: [engelang] Re: Self-segmenting words & the treatment of names
Next by Date: Re: [engelang] Re: Self-segmenting words & the treatment of names
Previous by thread: Re: Self-segmenting words & the treatment of names
Next by thread: Re: [engelang] Re: Self-segmenting words & the treatment of names
Index(es):
- Date
- Thread