Sylli as a Python Module
************************

Sylli can also by used as a python module (read the :doc:`api`).
For example, sylli was used to divide a corpus of Italian into syllables.
Note that you'll need a corpus reader to access the data.

Processing Data
===============

In our example case we used a corpus reader integrated with NLTK 
to access the data in the corpus.
First, import the modules, one for the corpus reader and the other for Sylli.

::

 >>> import sylli
 >>> from ntlk.corpus import clips
 >>>

Now it is possible to query the corpus and syllabify the output. 
First, we define an object item which contains the ids of a corpus unit,
in this case the fifth dialogue of a clips sub-corpus (DG).

::

 >>> item = clips.utteranceids('DG')[5]
 >>>

Then, we create an object SylModule and use the method syllabify()
to syllabify input string. 

::

 >>> item = clips.utteranceids('DG')[5]
 >>> syl = SylModule()
 >>> syl.syllabify(''.joinclips.phonemes(item))
 ['ak.kan.to.a.si.nis.tra']
 >>>

You can also syllabify each word separately.

::

 >>> item = clips.utteranceids('DG')[5]
 >>> syl = SylModule()
 >>> for word in clips.phonemes(item):
 >>>    print syl.syllabify(word) 
 ['ak.kan.to']
 ['a']
 ['si.ni.stra']
 >>>

Or syllabify a single word.

::

 >>> item = clips.utteranceids('DG')[5]
 >>> syl = SylModule()
 >>> syl.syllabify(clips.phonemes(item)[0])
 ['a.kk"an.to']
 >>>

You can also load another configuration file.

::

 >>> syl = SylModule()
 >>> syl.load_conf('/home/jako/sonority.txt')
 >>>

Or specify the the configuration using the object's attributes. 

::

 >>> syl = SylModule()
 >>> syl.sonority_file = '/home/jako/sonority.txt' 
 >>> syl.output = 'cvcv'
 >>> syl.extra = 0
 >>>syl.syllabify('strada')
 CCCV.CV
 >>>

Syllabify a Corpus
==================

Finally, it is possible to display the TIMIT as well
as any other information available in the desired layout. 
For example, this simple code will display the entire sentence, 
its syllabification, the phonological transcription of each word, 
the orthographic transcription and its TIMIT by using clips' corpus reader.

::

 # import clips corpus reader
 from nltk.corpus import clips
 import sylli

 syl = SylModule()
 # all corpus' utterances
 item = clips.utteranceids()
 
 # for every sentence
 for it in item:
     print it + ":"
     # print the sentence with timit indicators
     print clips.sent_times(it)
 
 # for every word in the corpus print the phoneme, TIMIT,
 # and the ortographic form.
 for word, phone in zip(clips.word_times(it), clips.phoneme_times(it)):
    print phone, '>', syl.syllabify(word[0])

Output:

::

 [('akk"anto% %asin"istra', 0, 37007)] > 'ak.kan.to.a.si.nis.tra'
 accanto% 8264 - 20419 : akk"anto% > ak.kan.to
 %a 20419 - 21789 : %a > a
 sinistra 21789 - 37007 : sin"istra > si.nis.tra