A DIY Voice Controlled Smartphone using Raspberry Pi

Introduction

With the rapid development of artificial intelligence technologies, voice recognition has greatly improved over the years. Many applications have integrated this new functionality to enhance user experience. In this project, you will build a Raspberry Pi-based smartphone that functions like a typical phone but includes additional voice control abilities.

You will call your device a “Piphone,” as it uses a Raspberry Pi as the core computing component. The key functions you aim to achieve through voice commands include making phone calls and playing music. To realize the concept of a portable device, you’ll utilize off-the-shelf electronics and simple circuit designs to construct a compact body that minimizes wiring tangles. Overall, your DIY Piphone will provide a fun demonstration of integrating voice interactions into common smartphone tasks.

Hardware Design

A DIY Voice Controlled Smartphone using Raspberry Pi

Circuit Design

The first step is to design the necessary circuitry. A critical component is the FONA GSM module, which enables network communication protocols. You’ll also use a 1200mAh lithium polymer battery for power. Since the Pi requires 5V but the FONA module and battery output are only 3.7V, you’ll add a DC boost converter to step up the voltage.

To minimize size, all electronics will be assembled onto a compact PCB. Connectivity will be established through appropriate pin breakouts and headers. You’ll integrate the antenna, SIM slot, audio jack, and battery connector as well. Thanks to onboard battery management circuitry, only a single battery will be needed.

Hardware Implementation:

A DC-DC boost converter is an important inclusion to step up the 3.7V battery voltage to the 5V required by the Raspberry Pi, allowing you to utilize a single portable battery. Header pins will facilitate quick prototyping and modifications during testing and debugging. SMD soldering skills will be needed for the final miniaturized board assembly.

Status indicator LEDs will help you gauge battery and connectivity levels and troubleshoot issues without console access. Additionally, electromagnetic shielding of the antenna will aid performance, as wireless function can be susceptible to nearby metal structures.

Circuit Wiring

You’ll create wiring diagrams to properly connect all the components. The FONA module will interface with the Pi via UART serial. An external antenna will be required along with a functioning SIM card inserted at the back. Debugging tools like a USB console cable will also be incorporated for initial testing.

Circuit-Wiring A DIY Voice Controlled Smartphone

Key aspects of the wiring were:
  • The Vio pin for logic level shifting between 3.7V and 5V domains.
  • The key pin that keeps the module permanently powered on.
  • A reset pin to enable software module resets.
  • Proper orientation of battery and charger connections.

Hardware Testing

You will conduct various tests to verify hardware functionality, including:
  • Battery voltage readings
  • SIM card identification
  • Network signal strength checks
  • Battery charging status monitoring

Assembly

All components will be assembled onto a prototyping board with headers. If desired, you can 3D print or laser cut additional supporting brackets. The compact multi-layer design will allow your DIY phone prototype to function portably.

Voice Recognition

Voice-Recognition by using A DIY Voice Controlled Smartphone

Online Speech API

You will first explore utilizing Google’s speech recognition API for voice control. Its cloud-based deep learning models provide high accuracy, but internet dependency may slow response times. To address this, you’ll implement a button trigger instead of continuous listening. The system will support basic commands for calling names, numbers, and simple phrases.

  • Offline cross-correlation analysis involved raw audio sampling, normalization, FFT conversion, and smoothing before frequency domain matching calculations.
  • Limited vocabulary recognition was partially due to limited sample training data and constraints of the correlation coefficient approach without classification layers.
  • Further machine learning techniques like Hidden Markov Models, LSTMRNNs, or CNNs could boost language modeling capacity if implemented on-device.

Offline Implementation

For offline capability, you will develop your speech recognition using Python. Raw audio will be recorded and compared against stored samples using FFT cross-correlation in the frequency domain. While distinguishing a large vocabulary may prove challenging without training, you will primarily rely on the online API.

Phone Application

Phone-Application-of-A-DIY-Voice-Controlled-Smartphone

Phone App Functionality:

Your phone app will feature additional menus for dialing voicemail, viewing call logs, and configuring advanced settings for the FONA module. Serial port interruption pin monitoring will facilitate incoming call alerts and status queries. A database of simulated contacts will streamline storage and retrieval compared to linear lists with duplicate entries.

Hardware Setup

After inserting the SIM card and ensuring adequate network registration, you’ll configure the FONA GSM module. UART communication to send AT commands will be verified using the serial monitor tool Minicom.

Design Overview

Your phone app will feature basic call functionality through various GUI screens for number dialing, connecting states, and incoming calls. Information like call times and phone numbers will be dynamically displayed.

Interface Implementation

You will use Pygame to code GUI elements and interactions. Touch buttons for dial-pad keys, additional controls, and updated status displays will be created on a resized surface clipped to the PiTFT size. Scrolling will allow navigation through longer contact lists.

Voice Integration

Voice commands could initiate the dialing process by speech-to-text conversion through our trigger function, complementing touch input. Holding the “call” button would initiate dial-out through AT commands sent over serial to the FONA module.

Call Management

You will monitor active calls by reading serial port responses of the CLCC response strings. Status codes will distinguish between dialing, connecting, and call end states, allowing screen transitions at appropriate times. A hang-up command will end interactions.

Music Player App

Music Player Features

Your music player will store song collections locally on the SD card instead of streaming external media. Additional database fields will track play counts, ratings, etc., while SQL aggregation functions will compile statistics. Sound effects will enhance the app interface experience beyond the core voice and GUI implementation.

Database Setup

A MySQL database will organize the music library and playlists more efficiently than raw file scanning. Tables will store song metadata and playlists as customizable entities.

Python Connector

The mysql-python module will facilitate database queries and management directly from your code. Specialized functions will perform SQL operations on the tables for various app needs.

GUI Design

You will implement a scrolling surface approach to handle overflow lists since PiTFT screen space is limited. Progress bars will track playback position updates. Volume control will utilize PWM signals sent over GPIO, and menus will switch between lists, playlists, and playback views.

Voice Commands

Voice commands will allow common music queries like genre and artist searches to extract relevant song IDs from the database for instant playlist generation. Playback commands will also use voice triggers for contact-free control.

Results and Conclusion

Your Piphone will successfully demonstrate functional voice-controlled apps in action. The online speech API will provide sufficient accuracy while avoiding training costs. The device’s portability will be realized through its compact design and single-cell operation.

Potential improvements could include expanded vocabulary support with data-driven speech models, cloud database syncing, contact list integration, and additional voice commands. Overall, you will be pleased with your DIY smartphone prototype and gain valuable experience with embedded Linux, circuit design, and cross-discipline project work.

Code Appendix

import serial
import pygame
from pygame.locals import *
import RPi.GPIO as GPIO
import time
import subprocess
import os
import speech_recognition as sr
from os import path
os.putenv('SDL_VIDEODRIVER', 'fbcon') # Display on piTFT
os.putenv('SDL_FBDEV', '/dev/fb1') #
os.putenv('SDL_MOUSEDRV', 'TSLIB') # Track mouse clicks on piTFT
os.putenv('SDL_MOUSEDEV', '/dev/input/touchscreen')
GPIO.setmode(GPIO.BCM)
GPIO.setup(27, GPIO.IN, pull_up_down=GPIO.PUD_UP)
#quit
def GPIO27_callback(channel):
exit (0)
GPIO.add_event_detect(27,GPIO.FALLING,callback=GPIO27_callback)

def returnnumber(st):
n=""
for char in st:
if char.isdigit():
n=n+char
return n
# get incoming number
def get_incoming(st):
n=""
in_flag = 0
for char in st:
if in_flag<2 and in_flag>0: n=n+char
if char == '"':
in_flag = in_flag +1
return n[:-1]
# get connecting status
def get_status(st):
n=0
for char in st:
if char.isdigit():
n=n+1
if n==3:
return char

def check_over(st):
return st[0]
# get battery status
def get_battery(st):
n=""
in_flag = 0
for char in st:
if in_flag<2 and in_flag>0: n=n+char
if char == ',':
in_flag = in_flag +1
return n[:-1]
# translate time to minute:second mode
def get_progress(t):
minutes=int(t/60)
second=int(t-minutes*60)
if second<10:
time_string=str(minutes)+" : 0"+str(second)
else:
time_string=str(minutes)+" : "+str(second)
return time_string

class Background(pygame.sprite.Sprite):
def __init__(self, image_file, location):
pygame.sprite.Sprite.__init__(self) #call Sprite initializer
self.image = pygame.image.load("/home/pi/project/"+image_file)
self.rect = self.image.get_rect()
self.rect.left, self.rect.top = location

pygame.init()
pygame.mouse.set_visible(False)
clock=pygame.time.Clock()
size = width, height = 240,320
black = 0, 0, 0
WHITE = 255, 255, 255
screen = pygame.display.set_mode(size)
background = Background("background.png",[0,0])
call = Background("call.png",[10,10])
music = Background("music.png",[80,10])
one = Background("1.png",[30,70])
two = Background("2.png",[100,70])
three = Background("3.png",[170,70])
four = Background("4.png",[30,130])
five = Background("5.png",[100,130])
six = Background("6.png",[170,130])
seven = Background("7.png",[30,190])
eight = Background("8.png",[100,190])
nine = Background("9.png",[170,190])
zero = Background("0.png",[100,250])
micro = Background("microphone.png",[40,25])
micro2 = Background ("microphone2.png",[40,25])
back = Background("back.png",[0,10])
call2 = Background("call2.png",[40,260])
delete = Background("delete.png",[180,260])
hang_up = Background("hang_up.png",[90,220])
hang_up2 = Background("hang_up.png",[130,220])
answer = Background("answer.png",[50,220])
battery = Background("battery.png",[200,10])

calling={'Calling...':(100,70)}
ring={'Ring...':(100,70)}
x1 = [20,40,60,80,100,120,140,160,180,200,220]
y1 = 40
my_font = pygame.font.Font(None, 15)
my_font2 = pygame.font.Font(None,25)
menu = 0
number = " "
press = 0
voice_recog=0
old_time=0
status='1'
previous_status=0

serialport = serial.Serial("/dev/ttyS0", 9600, timeout=0.5)
serialport.write("AT\r")
response = serialport.readlines(None)
serialport.write("ATE0\r")
response = serialport.readlines(None)
serialport.write("AT\r")
response = serialport.readlines(None)
serialport.write("AT+CBC\r")
response = serialport.readline()
while len(response)<5:
response = serialport.readline()
battery_n=get_battery(response)

while 1:
#main menu 
if menu == 0:
screen.blit(background.image,background.rect)
screen.blit(call.image,call.rect)
screen.blit(music.image,music.rect)
screen.blit(battery.image,battery.rect)
screen.blit(my_font.render(battery_n,True,(0,0,0)),(208,35))
pygame.display.flip()

for event in pygame.event.get():
if(event.type is MOUSEBUTTONDOWN):
pos = pygame.mouse.get_pos()
x,y = pos
if y>10 and y<70 and x>10 and x<70:
menu = 1
if y>10 and y<70 and x>80 and x<145:
GPIO.cleanup()
import m_player2

#phone call menu
if menu == 1:
screen.fill(WHITE)
screen.blit(one.image,one.rect)
screen.blit(two.image,two.rect)
screen.blit(three.image,three.rect)
screen.blit(four.image,four.rect)
screen.blit(five.image,five.rect)
screen.blit(six.image,six.rect)
screen.blit(seven.image,seven.rect)
screen.blit(eight.image,eight.rect)
screen.blit(nine.image,nine.rect)
screen.blit(zero.image,zero.rect)
screen.blit(back.image,back.rect)
screen.blit(call2.image,call2.rect)
screen.blit(delete.image,delete.rect)
if voice_recog == 0:
screen.blit(micro.image,micro.rect)
screen.blit(my_font2.render(number,True,(0,0,0)),(80,30))
pygame.display.flip()
#voice recognition to dial 
if voice_recog == 1:
screen.blit(micro2.image,micro2.rect)
pygame.display.flip()
cmd = 'arecord -d 8 -D plughw:1,0 output.wav'
print subprocess.check_output(cmd, shell=True)
voice_recog = 0
AUDIO_FILE = path.join(path.dirname('\home\pi\project'), 'output.wav')
r=sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file
try:
number = number + returnnumber(r.recognize_google(audio))
except sr.UnknownValueError:
number = number
except sr.RequestError as e:
number = number

for event in pygame.event.get():
if(event.type is MOUSEBUTTONDOWN):
pos = pygame.mouse.get_pos()
x,y = pos
if x>0 and x<20 and y>0 and y<30:
menu = 0
if x>30 and x<80 and y>65 and y<115:
number = number+'1'
if x>100 and x<150 and y>65 and y<115:
number = number+'2'
if x>170 and x<220 and y>65 and y<115:
number = number+'3'
if x>30 and x<80 and y>125 and y<175:
number = number+'4'
if x>100 and x<150 and y>125 and y<175:
number = number+'5'
if x>170 and x<220 and y>125 and y<175:
number = number+'6'
if x>30 and x<80 and y>185 and y<235:
number = number+'7'
if x>100 and x<150 and y>185 and y<235:
number = number+'8'
if x>170 and x<220 and y>185 and y<235:
number = number+'9'
if x>100 and x<150 and y>245 and y<295:
number = number+'0'
if x>170 and x<220 and y>245 and y<295:
number = number[:-1]
#make call 
if x>30 and x<80 and y>245 and y<295:
menu = 2
voice_recog=0
print("Calling " + number);
serialport.write("AT\r")
response = serialport.readlines(None)
serialport.write("ATD " + number + ';\r')
response = serialport.readlines(None)
print response

if x>40 and x<70 and y>20 and y<60:
voice_recog = 1
#calling menu 
if menu == 2:

while len(s)<5:
serialport.write("AT+CLCC\r")
s=serialport.readline()
status=get_status(s)

if status == '0':
previous_status=1
if old_time==0:
old_time=time.time()
else:
current_time=time.time()-old_time
screen.fill(WHITE)
screen.blit(my_font2.render(number,True,(0,0,0)),(80,30))
screen.blit(my_font2.render(get_progress(current_time),True,(0,0,0)),(100,60))
screen.blit(hang_up.image,hang_up.rect)
pygame.display.flip()
else:
if previous_status==1:
menu=1
previous_status=0
else:
old_time=0
screen.fill(WHITE)
screen.blit(my_font2.render(number,True,(0,0,0)),(80,30))
for my_text, text_pos in calling.items():
text_surface = my_font2.render(my_text, True, black)
rect = text_surface.get_rect(center=text_pos)
screen.blit(text_surface, rect)
screen.blit(hang_up.image,hang_up.rect)
pygame.display.flip()

#hang up
for event in pygame.event.get():
if(event.type is MOUSEBUTTONDOWN):
pos = pygame.mouse.get_pos()
x,y = pos
print (x,y)
if x>100 and x<160 and y>220 and y<270:
old_time=0
previous_status=0
menu = 1
n=0
while n<2000:
serialport.write("ATH\r")
n=n+1
#incoming call detection
s = serialport.readline()
if (s =='RING\r\n'):
menu = 3
serialport.write("AT+CLCC\r")
s = serialport.readline()
while len(s)<10:
s = serialport.readline()
incoming_number=get_incoming(s)
number=incoming_number
#incoming menu
if menu == 3:
screen.fill(WHITE)
screen.blit(my_font2.render(incoming_number,True,(0,0,0)),(60,90))
for my_text, text_pos in ring.items():
text_surface = my_font2.render(my_text, True, black)
rect = text_surface.get_rect(center=text_pos)
screen.blit(text_surface, rect)
screen.blit(hang_up2.image,hang_up2.rect)
screen.blit(answer.image,answer.rect)

pygame.display.flip()
for event in pygame.event.get():
if(event.type is MOUSEBUTTONDOWN):
pos = pygame.mouse.get_pos()
x,y = pos
if x>50 and x<110 and y>220 and y<270:
menu = 2
n=0
while n<800:
serialport.write("ATA\r")
n=n+1
status='0'
if x>130 and x<190 and y>220 and y<270:
menu = 0
n=0
while n<2000:
serialport.write("ATH\r")
n=n+1
For further details or future updates on this project visit: A DIY Voice Controlled Smartphone using Raspberry Pi

 


About The Author

Ibrar Ayyub

I am an experienced technical writer holding a Master's degree in computer science from BZU Multan, Pakistan University. With a background spanning various industries, particularly in home automation and engineering, I have honed my skills in crafting clear and concise content. Proficient in leveraging infographics and diagrams, I strive to simplify complex concepts for readers. My strength lies in thorough research and presenting information in a structured and logical format.

Follow Us:
LinkedinTwitter
Scroll to Top