[YG Conlang Archives] > [engelang group] > messages [Date Index] [Thread Index] >


[Date Prev] [Date Next] [Thread Prev] [Thread Next]

Binary Trees by Tone



I worked on this a few days ago.  I am putting this in a new thread because I don't think it's going to be much help to Loccan3, though it may be interesting to some.  Skip to TL;DR at the very end if you're in a hurry.

On Wed, Aug 15, 2012 at 9:46 PM, And Rosta <and.rosta@hidden.email> wrote:
 
The thread beginning with <http://listserv.brown.edu/archives/cgi-bin/wa?A2=ind0905b&L=conlang&F=&S=&P=4004> discovered that given a binary tree, a four-way inflectional contrast on terminal nodes is sufficient to yield an unambiguous parse.

As I mentioned, this Fink-Miller system is interesting, but I don't think it's going to be possible for humans to choose correctly among the four tones (or vowels) on the fly.  It appears to require a highly abstract algorithm to determine where to the mark the edges of internal groupings.  However, it did inspire me to sketch out a similar, but what I believe is a more straightforward, system.  Here is the gist:

o We start by observing that any grammar can be represented as a minimal binary tree (simply by adding as many internal nodes as you need and removing any that are redundant).

o There will always be more terminals than nodes in a minimal n-sized binary tree (n-1 to be exact).

o Starting from the bottom, there is always a way to recursively "raise" a "head" terminal for each node from among the terminals under that node until all the nodes contain terminals.  For example, starting with ((t1 t2)(t3 t4)):


        /\
       /  \
      /    \
     /\    /\
    t1 t2 t3 t4

o o Step 1. We raise t1 to the node above it.  We could also have chosen t2:

        /\
       /  \
     t1    \
      \    /\
       t2 t3 t4

o o Step 2. We raise t3 to the node above it.  We could also have chosen t4:

        /\
       /  \
     t1    t3
      \     \
       t2    t4

o o Step 3. We raise t3 again.  We could NOT have chosen t1, because t1 already has a right branch, and no node is allowed more than one right branch:

        t3
       /  \
     t1    t4
      \
       t2  

o Voilà, we're done.  Now that we know which terminal governs which, we can give each terminal one of the following tones.

LL  \ low     (grave accent) governs child on right

LH \/ rising  (caron) governs no child

HH /  high    (acute accent) governs child on left

HL /\ falling (circumflex) governs children on left & right

Both t2 and t4 get rising tone, t1 gets low tone and t3 gets falling.  I paired each tone with a node type based on the shape of the corresponding diacritic; the shape of each diacritic encodes which children are governed, except the rising tone, which corresponds to having no children.

t1(è) t2(ě) t3(ê) t4(ě)

Getting back the original tree ((t1 t2)(t3 t4)) requires an interpretation rule: A right branch takes precedence over a left branch.  Thus, t1 pairs with t2, and t3 with t4.  This rule follows precisely because we initially raised the left terminal of each terminal node (lowest level nodes).  If we change the way we raise heads, we may have to change the interpretation rules, though I have not worked out every possible permutation  The other interpretation rule - for this example - is that right branches group to the right. 

What I did next is experiment a bit using a simplified Lojban-like language:

lò brà brě brì brô lò brù drà drě

= ((lò (brà brě)) ((brì brô) (lò (brù (drà drě)))))

In order to pull this off, we are forced to add a new raising operation and interpretation rule for left branching:  Terminals marked as having left branches do not signify that the terminal by itself governs the node to the left.  Rather, a left branch signifies that the entire right-branching structure of which the marked terminal is the final part governs that branch.  In other words,  (t1 ((t2 t3) t4)), shown here with t2 already raised...

        /\
       / /\
      / t2 \
     /    \ \
    t1    t3 t4

...can be re-written:
       
      (t2\t3)
       /   \
      t1   t4

Where (t2\t3) indicates that the structure containing t2 as head has, as a whole, itself been raised to head, first over t4, and then over t1.  Since t2 governs t3, it must be t3 that carries the tone indicating the left & right branch.  Of course, in order to get back to the original tree the other detail is that a double-branch is understood always as first branching to the right and then to the left.

Other than this left-branching rule, the language is mostly right-branching.  A modifying predicate (seltau) governs the predicate phrase (tanru) that it forms from another predicate phrase (tertau), and the article serves as the head of an argument.  It has to be this way because a no terminal can serve as head for more than one child per side.  So the rightmost predicate cannot be the head of each subsequent predicate phrase formed each time an adjective is added.  Instead, the bare predicate is a leaf and everything that comes before is a head (for purposes of forming the tree, maybe not semantics, or maybe semantics too depending how you look at it). 

In practice, it does seem to get complicated.  Frankly after all this work I see little advantage to these tree-encoding approaches; I am able to eliminate the [cu] selbri separator and probably [be], but I can not get rid of "lo".  The reason is because although the tones are capable of determining the binary tree hierarchy, they encode no information about the syntactic category of the terminal, so articles are needed to distinguish arguments from a predicates.  On the other hand, the surface valency of all selbri are now marked.  What follows is the language description in a nutshell.

Here are the basic syntactic types:

A = argument i.e. sumti
P = predicate i.e. selbri
S = sentence i.e. bridi

Basic types are combined into complex types as follows: If X, Y, are Z are types, then X/Y, Y\Z, and X/Y\Z are also types, where X is the left-child type, Z is the right-child type, and Y is the type resulting from the combination.

Some morphophonology:
CV + caron = type A (note that lǒ = Lojban zo'e)
CV + grave = type A\P or A\S (article i.e. gadri)
CCV + caron = type P or S (basic predicate i.e. brivla)
CCV + grave = type P\P or S\A or S\S
CCV + circumflex = type A/S\A
CCV + acute = type A/S
 
TL;DR: Encoding tree info using tones by itself doesn't seem to be worth it.  It probably makes more sense to encode _syntactic types_ than simply branching information.