Text to Speech Conversion: Terminate the Language Barrier

3 min readFeb 1, 2021

UNDERSTANDING THE PROBLEM

It is known that the technological advancements are increasing at a faster pace. But the utilization of technologies in various sectors are very low. It is known that most of the people find it difficult to detect the text from the paper and books.

The problem area in speech synthesis is very wide. There are several problems in text preprocessing, such as numerals, abbreviations, and acronyms. Correct prosody and pronunciation analysis from written text is also a major problem today. Written text contains no explicit emotions and pronunciation of proper and foreign names is sometimes very anomalous.

OBJECTIVE

The general objective of the project is to develop a Text-to-speech synthesizer for the physically impaired and the vocally disturbed individuals using English language. The specific objectives are:

• To enable the deaf and dumb to communicate and contribute to the growth of an organization through synthesized voice.

• To enable the blind and elderly people enjoy a User-friendly computer interface.

• To create modern technology appreciation and awareness by computer operators.

To implement an isolated whole word speech synthesizer that is capable of converting text and responding with speech.

METHODOLOGY

Text-to-speech device consists of two main modules, the image processing module and voice processing modules. Image processing module captures image using camera, converting the image into text. Voice processing module changes the text into sound and processes it with specific physical characteristics so that the sound can be understood.

Figure shows the block diagram of Text-To-Speech device, 1st block is image processing module, where OCR converts .jpg to .txt form. 2nd is voice processing module which converts .txt to speech. OCR is important element in this module. OCR or Optical Character Recognition is a technology that automatically recognize the character through the optical mechanism, this technology imitate the ability of the human senses of sight, where the cam- era becomes a replacement for eye and image processing is done in the computer engine as a substitute for the human brain.

THE SOFTWARE DESIGN

Software processes the input image and converted into text format. The software implementation is showed in Figure.

Text is extracted from the image and converted to audio.
It recognizes both capital as well as small letters.
It recognizes numbers as well.
Range of reading distance was 38-42cm.
Character font size should be minimum 12pt.
Maximum tilt of the text line is 4–5 degree from the vertical.

REFERENCES

1. Archana A, Shinde D. Text pre-processing and text seg- mentation for OCR. International Journal of Computer Science Engineering and Technology. 2012:810–12.

2. Mithe R, Indalkar S, Divekar N. Optical character recognition. International Journal of Recent Technology and Engineering. 2013 Mar; 2(1).

3. Smith R. An overview of the Tesseract OCR engine, USA: Google Inc; 2007.

4. Shah H, Shah A. Optical character recognition of Gujarati numerical. International Conference on Signals, Systems and Automation. 2009; 49–53.

5. Monk S. Raspberry pi cook.

6. Text localization and extraction in images using mathematical morphology and OCR Techniques; 2013.

Written By-

Sourav Tripathi

Hasti Shah

Rishabh Sharma

Text to Speech Conversion: Terminate the Language Barrier

Written by Sourav Tripathi