Tag Archives: speech to text

IPA – machine transcription

What with all the voice recognition software and Text-to-speech software available for free, the idea of IPA as a working tool for practitioners is fading fast.  However, there are still times when you have basic technology (photocopied worksheets) and you would like to do some detailed work on pronunciation.  Trouble is, this often means transcribing some text into IPA-speak.  Not an easy thing to do–especially for people like me who have low musical intelligence.

This site is a neat solution…just paste in your text, and it will transcribe into an IPA(ish) version: http://upodn.com/phun.asp.


Here is the above paragraph in IPA-speak:

wə́t wɪθ ɒ́l ðə vɔ̀js rɛ̀kəgnɪ́ʃən sɒ́ftwɛ̀r ǽnd tɛ́kst- tú- spítʃ sɒ́ftwɛ̀r əvéləbəl fɔ́r frí, ðə ajdíə ə́v ajpié ǽz ə wə́rkɪŋ túl fɔ́r præktɪ́ʃnərz ɪ́z fédɪŋ fǽst. hɑ̀wɛ́vər, ðɛ́r ɑ́r stɪ́l tájmz wɛ́n jú hǽv bésɪk tɛknɑ́lɪdʒi ( fótokɑ̀pid wə́rkʃìts) ǽnd jú wʊ́d lájk tú dú sə́m dətéld wə́rk ɑ́n pronə̀nsiéʃən. trə́bəl ɪ́z, ðɪ́s ɒ́fən mínz trænskrájbɪŋ sə́m tɛ́kst ɪ̀ntú ajpié- spík. nɑ́t ǽn ízi θɪ́ŋ tú dú– əspɛ́ʃli fɔ́r pípəl lájk mí hú hǽv ló mjúzɪkəl ɪ̀ntɛ́lədʒəns.

Bear in mind that like all machine translations, it may not always give the correct transcription in context.


The Federer match is live on Eurosport. He must win to keep his hopes alive of winning a seventh Wimbledon title.

ðə fɛdərər mætʃ ɪz lajv ɑn eurosport. hi məst wɪn tu kip hɪz hops əlajv əv wɪnɪŋ ə sɛvənθ wɪmbəldən tajtəl.


Federer and Mirka live in Switzerland with a live-in nanny and the lively twins.

fɛdərər ænd mirka lajv ɪn swɪtsərlənd wɪθ ə lajv– ɪn næni ænd ðə lajvli twɪnz.

Speech to text software

As we move more into the world of corpora of written English, the next logical step is to consider a corpus-informed approach to teaching and learning spoken English.

Corpora of spoken English

There are some spoken corpora available online (http://quod.lib.umich.edu/m/micase/ is a good example of one) but the key problem is how to get recorded speech into a text form that can be processed.

Individual speech to text transcription

If we are thinking about the notion of the ‘i-corpus’ then it is possible for individuals to easily transcribe their own voice.  You can buy DRAGON (see http://www.nuance.com/naturallyspeaking/products/editions/default.asp) — if you train it to your own voice, they claim 99% accuracy. I’ve got a friend who uses this, who confirms that it does what it claims.

General speech to text software

However, if you want to transcribe a collection of various recordings of different people, with different quality of recordings, you might get something vaguely usable, but it would have to be checked and edited.  There is research being done in this area that is in the open source community.  One example is http://cmusphinx.sourceforge.net/ – developed at Carnegie Mellon University.  In fact, this has spawned a READING TUTOR — which will listen to a child reading a text, and point out any errors in pronunciation and stress: http://www.cs.cmu.edu/~listen/.

Archives of spoken English

Aligned to this work are researchers who are attempting to build an archive of spoken corpora, which can then be used as a basis for testing speech to text software.  One of these is http://www.voxforge.org/.  Another interesting area of research is based on accents – read the Guardian article about this at http://www.guardian.co.uk/education/2010/jun/01/english-accents-research?&CMP=%20EMCEDUEML1088.  If you want to contribute your own voice recording to the database of accents, just go here and record yourself:  http://accent.gmu.edu/