RoboGPT- an AI-Powered Robot

Introduction

This project, RoboGPT, is a compact mobile robot platform built around cloud AI services and a Raspberry Pi 4, featuring four continuously rotating servos mounted on a laser-cut acrylic sheet. For sensors, it uses an accelerometer and gyroscope, two ultrasonic sensors on the front and back to avoid running into obstacles, a microphone to collect audio input, and a Bluetooth speaker to play sounds from the robot. The robot software uses Azure Speech Services for speech-to-text transcription of user requests and text-to-speech for the robot to respond to the user. It also uses the OpenAI ChatGPT API with a custom prompt that processes user requests and returns a response including what the robot should say or do. ChatGPT’s role was to take a high-level command like “draw a square with sides of 1 meter” and create lower-level movement commands for the robot to carry out, allowing it to have wide functionality and responsiveness to most commands. The Raspberry Pi collects data locally, sends it to the cloud AI services, uses sensor input to avoid obstacles while moving, and processes the ChatGPT response to move and speak to the user.

RoboGPT Project Objective:

RoboGPT

  • The goal was to create an AI-powered robot capable of communication and following commands.
  • It aimed to enable conversation with the robot using speech-to-text and text-to-speech conversion for input and output.
  • It also aimed to sense any obstructions in front of and behind the Robot to prevent collisions.
  • Another objective was to display the robot’s actions on the piTFT screen.

The project’s objective was to develop a mobile AI-powered robot that can understand verbal instructions, move according to those instructions, and respond to the user. The robot accepts user input through speech recognition, sends the converted text to ChatGPT, processes the returned response for movement commands, and then employs text-to-speech to reply to the user. It utilizes two ultrasonic sensors and an accelerometer/gyroscope as part of its safety controller, sending sensor data to ChatGPT. A piTFT screen on the robot informs the user when it’s ready for commands and its current status.

Design

Software Design of RoboGPT

The design process began by assessing the feasibility of utilizing all the cloud services on the Raspberry Pi. Small demonstrations of speech-to-text, text-to-speech, and the OpenAI ChatGPT API were created. The individual API documentation from each service was reviewed to build out the mini-demos, with links included in the references. After building each demo, an integrated full-scale demo combined all the services to test a basic AI assistant for conversational abilities. Upon integrating the cloud services into a demo, another ChatGPT prompt and instance, trainerGPT, was created to emulate a user interacting with the ChatGPT instance using the prompt the robot would employ, robotGPT. This quickly tested the prompt and uncovered unusual cases, such as how commands were sent. trainerGPT also provided a feedback summary to analyze robotGPT’s prompt for improvements. Additionally, trainerGPT used another ChatGPT prompt, and instance, dataGPT, to synthesize data sent to robotGPT for further testing and comprehension of incoming information.

RoboGPT

Data Flow Diagram

Alongside trainerGPT testing, command parsing was added to robotGPT’s output by extracting strings wrapped in a designated format. This enabled easy parsing of commands from non-command text, with the latter spoken using speech synthesis. Using trainerGPT to test each code or prompt change alongside manual testing helped speed up testing and uncover edge cases.

The next software development step integrated the services on the Raspberry Pi. The Azure Speech library required a 64-bit OS, but the Pi OS was 32-bit, so both speech-to-text and text-to-speech modules required modification. The speech was sent to Azure using a SpeechRecognition module and USB microphone. Text-to-speech used the Azure REST API to obtain audio files.

Audio playback initially used PyGame but was inconsistent, so it was replaced with the VLC Python module using bindings for playback. Audio is played through a Bluetooth speaker to minimize robot space/weight.

A sensor code was added once cloud services were integrated into the Pi. Documentation informed interfacing with the ultrasonic sensor and accelerometer/gyroscope using Adafruit libraries providing CircuitPython support. Sensor code deployed the libraries. Motors also received code using hardware PWM instead of software PWM/CircuitPython to reduce delays impacting movement.

The manager.py class tied everything together, managing sensor refreshing, speech collection, ChatGPT communication via robotGPT, and motor control. A state machine detailed data processing and robot control. Movements packaged into actionUnit classes to track durations for timing-based control. The state machine checked for completed actions/no actions to stop motors, processed more actions from queues, or obtained new input from ChatGPT. Sensors checked distances to stop obstacles before resuming.

RoboGPT

State Machine for Implementation

The final software component was the piTFT display to let the user resume robot movement, switch between speech and text commands, and inform the user of the robot’s status and readiness. Pygame displayed text on piTFT but issues arose running it alongside manager.py. Display.py required sudo to show on piTFT from SSH, but sudo broke manager.py modules.

A solution used a FIFO file where manager.py sent display text and display.py continuously read it, updating piTFT only if new from the previous. This easily allowed any code to send piTFT text for testing and informing the user. This proved especially important for collecting speech, ensuring the user knew when the robot listened to avoid repetition.

If the emergency stop is activated from an obstacle, manager.py would stop the robot, save the command state, and send “Resume” to the FIFO for piTFT display. This pointed to a physical resume button managed by the manager.py class. Buttons also toggled input between speech and text, using physical buttons over the touchscreen which degraded with heat.

Extensive combined testing aimed to improve the user experience and reduce robot errors, as further described in the testing section.

Hardware Design

The figure below shows the circuit schematic of the robot. The Raspberry Pi acts as the central controller powered by an external power bank. Continuous servo motors receive power from a 6V battery pack. The two ultrasonic sensors and accelerometer/gyroscope receive power from the Pi. A switch was added to the servo circuit for easier testing without driving.

Two ultrasonic sensors measure distance in front and behind to prevent collisions. Each takes a trigger input and outputs an echo signal. The accelerometer/gyroscope also tracks speed and orientation. All sensors connect directly to the Pi GPIO pins, receiving 3.3V and GND power. These signals break out on a robot protoboard with header pins, allowing easy wiring and replacement of damaged sensors/wires.

RoboGPT

Hardware Diagram of Robot

The power signal connects to a 6V external battery pack, with the GND pin shared with the Pi GND. Each motor’s signal pin connects to PWM signals from Pi GPIO pins. Since hardware PWM controlled the servos, only GPIO 12 and 13 were available, with left motors on 13 and right motors on 12.

Initial planning considered sensor placement and organization on solder boards. Solder boards reduced connectivity issues from loose wiring, with headers connecting sensors for easy replacement if damaged. An acrylic body held components, mounting four 3D printed motor braces.

Sensors soldered and tested individually with respective codes. CircuitPython is installed from Adafruit, providing a Python-based programming language for microcontrollers. Adafruit libraries interface CircuitPython/Python to easily control sensors. Install instructions referenced in Github Read Me.

Hardware tested as described after CircuitPython and library installation.

Drawings and Photos

RoboGPTFront of Robot

RoboGPT

Top view of Robot

Robot Sensor Diagram

Testing

Sensor Testing

Testing started with individual hardware validation and test scripts to confirm expected outputs. Motor control tested direction and movement-matched controls by measuring PWM outputs with an oscilloscope. The duty cycle was calculated based on the datasheet formula at 50Hz frequency was initially incorrectly scaled until the jittery output was fixed.

Ultrasonic and gyroscope sensor data were separately tested in scripts with print statements.

Ultrasonic sensors initially returned noisy values, so a median of five measurements within 10ms heavily reduced noise and improved accuracy. This also minimized false positive emergency stops from small noisy readings. Cross-interference between sensors excluded.

The accelerometer and gyroscope connecting via I2C required no modifications and worked well with other sensors during testing.

RoboGPT Software Testing

As mentioned previously, another ChatGPT prompt and instance tested the robot’s ChatGPT prompt by providing feedback on each change. This significantly simplified testing and iterating the prompts, especially given the robot prompt contained over 200 words sent to the model. Using an additional ChatGPT instance to emulate a user-assisted evaluation and improve the robot’s ability to understand commands.

Robot System Testing

Once hardware testing concluded, the full robot system underwent testing. Sample commands tested speech recognition and ChatGPT responses. For example, saying “move forward 1 meter” verified correct, distanced movement and completion message playback. Similar tests explored edge cases. Error handling added graceful retries to cloud services versus crashes.

A public demo gathered feedback, leading to improvements in status updates. The piTFT display informed when the robot listened or performed tasks to enhance usability.

Validation aimed commands executed as intended while identifying areas for refined user experience such as enhanced robot communications during operation. Iterative testing and input helped produce a more polished final system.

Results and Conclusions

The robot successfully communicated through spoken conversation, following commands, and responding to confirm actions. It navigated shapes like squares, octagons, and pentagons through natural language requests parsed by ChatGPT into discrete movements. Originally intended for feedback control, accurate timing-based movement negated gyroscope/accelerometer use during operation. Sensor data-informed ChatGPT of speed.

Ultrasonic sensors prevented collisions through object detection, yielding a reliable, responsive system. The robot navigated complex scenarios involving multiple neighboring obstacles. Testing validated conversational capabilities and reactive behavior and navigation through parsing of high-level requests. Iterative prototyping optimized systems for intended functions.

Future Work

Potential future work involves several avenues of exploration. Gyroscope and accelerometer data could detect varied terrains beneath the robot. Additionally, connecting a camera would enable straight-line navigation through fine motor adjustments based on visual input. Object detection implementation also introduces complexity for the AI program.

Camera use presents opportunities to expand capabilities such as terrain sensing and autonomous piloting tied to real-time visual analysis. Continued refinement paves ways to endow greater autonomy, perception, and environmental adaptation. Ongoing development points to avenues building upon an established framework. Integration of advanced sensing broadens operational scope through enriched awareness and reasoning from multimodal inputs.

Work Distribution

Team

 

    Raphael                                                                                 Rabail

   [email protected]                                                                                                              [email protected]

Focused on software design of the project,                                    Focused on mechanical assembly and

incorporating AI and sensor outputs together.                            hardware design of the robot.

Parts List

Provided in Lab:

  • Raspberry Pi 4
  • Breakout cable and pinout
  • piTFT
  • 4 robot wheels
  • 4 3D printed braces
  • Laser-cut base
  • 3 solder boards
  • Zip Ties
  • Header Pins
  • Jumper Wires

Extra Parts:

  • 2x Ultrasonic Distance Sensors – $7.90
  • 1x Accelerometer and Gyroscope – $20.0
  • 12V Battery Pack – $8.00
  • 4x AA battery – $5.00
  • Power Bank – $25.00
  • 4 Continuous Motor Servos – $40.00
  • OpenAI API costs – $2.00
  • Microphone – $5.00

Total: $112.90

References

GPIO

Speech-to-text

Text-to-speech

OpenAI API

CircuitPython

Sensors and Servos

Other references used


About The Author

Ibrar Ayyub

I am an experienced technical writer holding a Master's degree in computer science from BZU Multan, Pakistan University. With a background spanning various industries, particularly in home automation and engineering, I have honed my skills in crafting clear and concise content. Proficient in leveraging infographics and diagrams, I strive to simplify complex concepts for readers. My strength lies in thorough research and presenting information in a structured and logical format.

Follow Us:
LinkedinTwitter
Scroll to Top