Introduction
This project, RoboGPT, is a compact mobile robot platform built around cloud AI services and a Raspberry Pi 4, featuring four continuously rotating servos mounted on a laser-cut acrylic sheet. For sensors, it uses an accelerometer and gyroscope, two ultrasonic sensors on the front and back to avoid running into obstacles, a microphone to collect audio input, and a Bluetooth speaker to play sounds from the robot. The robot software uses Azure Speech Services for speech-to-text transcription of user requests and text-to-speech for the robot to respond to the user. It also uses the OpenAI ChatGPT API with a custom prompt that processes user requests and returns a response including what the robot should say or do. ChatGPT’s role was to take a high-level command like “draw a square with sides of 1 meter” and create lower-level movement commands for the robot to carry out, allowing it to have wide functionality and responsiveness to most commands. The Raspberry Pi collects data locally, sends it to the cloud AI services, uses sensor input to avoid obstacles while moving, and processes the ChatGPT response to move and speak to the user.
RoboGPT Project Objective:
- The goal was to create an AI-powered robot capable of communication and following commands.
- It aimed to enable conversation with the robot using speech-to-text and text-to-speech conversion for input and output.
- It also aimed to sense any obstructions in front of and behind the Robot to prevent collisions.
- Another objective was to display the robot’s actions on the piTFT screen.
The project’s objective was to develop a mobile AI-powered robot that can understand verbal instructions, move according to those instructions, and respond to the user. The robot accepts user input through speech recognition, sends the converted text to ChatGPT, processes the returned response for movement commands, and then employs text-to-speech to reply to the user. It utilizes two ultrasonic sensors and an accelerometer/gyroscope as part of its safety controller, sending sensor data to ChatGPT. A piTFT screen on the robot informs the user when it’s ready for commands and its current status.
Design
Software Design of RoboGPT
The design process began by assessing the feasibility of utilizing all the cloud services on the Raspberry Pi. Small demonstrations of speech-to-text, text-to-speech, and the OpenAI ChatGPT API were created. The individual API documentation from each service was reviewed to build out the mini-demos, with links included in the references. After building each demo, an integrated full-scale demo combined all the services to test a basic AI assistant for conversational abilities. Upon integrating the cloud services into a demo, another ChatGPT prompt and instance, trainerGPT, was created to emulate a user interacting with the ChatGPT instance using the prompt the robot would employ, robotGPT. This quickly tested the prompt and uncovered unusual cases, such as how commands were sent. trainerGPT also provided a feedback summary to analyze robotGPT’s prompt for improvements. Additionally, trainerGPT used another ChatGPT prompt, and instance, dataGPT, to synthesize data sent to robotGPT for further testing and comprehension of incoming information.
Data Flow Diagram
Alongside trainerGPT testing, command parsing was added to robotGPT’s output by extracting strings wrapped in a designated format. This enabled easy parsing of commands from non-command text, with the latter spoken using speech synthesis. Using trainerGPT to test each code or prompt change alongside manual testing helped speed up testing and uncover edge cases.
The next software development step integrated the services on the Raspberry Pi. The Azure Speech library required a 64-bit OS, but the Pi OS was 32-bit, so both speech-to-text and text-to-speech modules required modification. The speech was sent to Azure using a SpeechRecognition module and USB microphone. Text-to-speech used the Azure REST API to obtain audio files.
Audio playback initially used PyGame but was inconsistent, so it was replaced with the VLC Python module using bindings for playback. Audio is played through a Bluetooth speaker to minimize robot space/weight.
A sensor code was added once cloud services were integrated into the Pi. Documentation informed interfacing with the ultrasonic sensor and accelerometer/gyroscope using Adafruit libraries providing CircuitPython support. Sensor code deployed the libraries. Motors also received code using hardware PWM instead of software PWM/CircuitPython to reduce delays impacting movement.
The manager.py class tied everything together, managing sensor refreshing, speech collection, ChatGPT communication via robotGPT, and motor control. A state machine detailed data processing and robot control. Movements packaged into actionUnit classes to track durations for timing-based control. The state machine checked for completed actions/no actions to stop motors, processed more actions from queues, or obtained new input from ChatGPT. Sensors checked distances to stop obstacles before resuming.
State Machine for Implementation
The final software component was the piTFT display to let the user resume robot movement, switch between speech and text commands, and inform the user of the robot’s status and readiness. Pygame displayed text on piTFT but issues arose running it alongside manager.py. Display.py required sudo to show on piTFT from SSH, but sudo broke manager.py modules.
A solution used a FIFO file where manager.py sent display text and display.py continuously read it, updating piTFT only if new from the previous. This easily allowed any code to send piTFT text for testing and informing the user. This proved especially important for collecting speech, ensuring the user knew when the robot listened to avoid repetition.
If the emergency stop is activated from an obstacle, manager.py would stop the robot, save the command state, and send “Resume” to the FIFO for piTFT display. This pointed to a physical resume button managed by the manager.py class. Buttons also toggled input between speech and text, using physical buttons over the touchscreen which degraded with heat.
Extensive combined testing aimed to improve the user experience and reduce robot errors, as further described in the testing section.
Hardware Design
The figure below shows the circuit schematic of the robot. The Raspberry Pi acts as the central controller powered by an external power bank. Continuous servo motors receive power from a 6V battery pack. The two ultrasonic sensors and accelerometer/gyroscope receive power from the Pi. A switch was added to the servo circuit for easier testing without driving.
Two ultrasonic sensors measure distance in front and behind to prevent collisions. Each takes a trigger input and outputs an echo signal. The accelerometer/gyroscope also tracks speed and orientation. All sensors connect directly to the Pi GPIO pins, receiving 3.3V and GND power. These signals break out on a robot protoboard with header pins, allowing easy wiring and replacement of damaged sensors/wires.
Hardware Diagram of Robot
The power signal connects to a 6V external battery pack, with the GND pin shared with the Pi GND. Each motor’s signal pin connects to PWM signals from Pi GPIO pins. Since hardware PWM controlled the servos, only GPIO 12 and 13 were available, with left motors on 13 and right motors on 12.
Initial planning considered sensor placement and organization on solder boards. Solder boards reduced connectivity issues from loose wiring, with headers connecting sensors for easy replacement if damaged. An acrylic body held components, mounting four 3D printed motor braces.
Sensors soldered and tested individually with respective codes. CircuitPython is installed from Adafruit, providing a Python-based programming language for microcontrollers. Adafruit libraries interface CircuitPython/Python to easily control sensors. Install instructions referenced in Github Read Me.
Hardware tested as described after CircuitPython and library installation.
Drawings and Photos
Front of Robot
Top view of Robot
Robot Sensor Diagram
Testing
Sensor Testing
Testing started with individual hardware validation and test scripts to confirm expected outputs. Motor control tested direction and movement-matched controls by measuring PWM outputs with an oscilloscope. The duty cycle was calculated based on the datasheet formula at 50Hz frequency was initially incorrectly scaled until the jittery output was fixed.
Ultrasonic and gyroscope sensor data were separately tested in scripts with print statements.
Ultrasonic sensors initially returned noisy values, so a median of five measurements within 10ms heavily reduced noise and improved accuracy. This also minimized false positive emergency stops from small noisy readings. Cross-interference between sensors excluded.
The accelerometer and gyroscope connecting via I2C required no modifications and worked well with other sensors during testing.
RoboGPT Software Testing
As mentioned previously, another ChatGPT prompt and instance tested the robot’s ChatGPT prompt by providing feedback on each change. This significantly simplified testing and iterating the prompts, especially given the robot prompt contained over 200 words sent to the model. Using an additional ChatGPT instance to emulate a user-assisted evaluation and improve the robot’s ability to understand commands.
Robot System Testing
Once hardware testing concluded, the full robot system underwent testing. Sample commands tested speech recognition and ChatGPT responses. For example, saying “move forward 1 meter” verified correct, distanced movement and completion message playback. Similar tests explored edge cases. Error handling added graceful retries to cloud services versus crashes.
A public demo gathered feedback, leading to improvements in status updates. The piTFT display informed when the robot listened or performed tasks to enhance usability.
Validation aimed commands executed as intended while identifying areas for refined user experience such as enhanced robot communications during operation. Iterative testing and input helped produce a more polished final system.
Results and Conclusions
The robot successfully communicated through spoken conversation, following commands, and responding to confirm actions. It navigated shapes like squares, octagons, and pentagons through natural language requests parsed by ChatGPT into discrete movements. Originally intended for feedback control, accurate timing-based movement negated gyroscope/accelerometer use during operation. Sensor data-informed ChatGPT of speed.
Ultrasonic sensors prevented collisions through object detection, yielding a reliable, responsive system. The robot navigated complex scenarios involving multiple neighboring obstacles. Testing validated conversational capabilities and reactive behavior and navigation through parsing of high-level requests. Iterative prototyping optimized systems for intended functions.
Future Work
Potential future work involves several avenues of exploration. Gyroscope and accelerometer data could detect varied terrains beneath the robot. Additionally, connecting a camera would enable straight-line navigation through fine motor adjustments based on visual input. Object detection implementation also introduces complexity for the AI program.
Camera use presents opportunities to expand capabilities such as terrain sensing and autonomous piloting tied to real-time visual analysis. Continued refinement paves ways to endow greater autonomy, perception, and environmental adaptation. Ongoing development points to avenues building upon an established framework. Integration of advanced sensing broadens operational scope through enriched awareness and reasoning from multimodal inputs.
Work Distribution
Team
Raphael Rabail
[email protected] [email protected]
Focused on software design of the project, Focused on mechanical assembly and
incorporating AI and sensor outputs together. hardware design of the robot.
Parts List
Provided in Lab:
- Raspberry Pi 4
- Breakout cable and pinout
- piTFT
- 4 robot wheels
- 4 3D printed braces
- Laser-cut base
- 3 solder boards
- Zip Ties
- Header Pins
- Jumper Wires
Extra Parts:
- 2x Ultrasonic Distance Sensors – $7.90
- 1x Accelerometer and Gyroscope – $20.0
- 12V Battery Pack – $8.00
- 4x AA battery – $5.00
- Power Bank – $25.00
- 4 Continuous Motor Servos – $40.00
- OpenAI API costs – $2.00
- Microphone – $5.00
Total: $112.90
References
GPIO
Speech-to-text
- Azure Speech SDK samples and quickstarts
- Recognizing speech-to-text with Azure
- Using the Python SpeechRecognition Library
Text-to-speech
- Text-to-speech with Azure
- Text-to-speech with Azure Speech REST API
- Text-to-speech example with Azure Speech REST API
OpenAI API
- OpenAI chat completion
- Prompting the chat completion
- Prompting the chat completion
- OpenAI chat completion quickstart
CircuitPython
- Starting with CircuitPython
- Raspberry Pi 4B CircuitPython installation instructions
- I2C address, enabling I2C, and pins
- I2C Protocol Overview
- I2C list of addresses
Sensors and Servos
- Accelerometer shop page
- Accelerometer Python library from Adafruit
- Ultrasonic shop page
- Ultrasonic sensors with CircuitPython
- Ultrasonic Python library from Adafruit
- Student project with ultrasonic example
- Continuous Servo Datasheet