Realtime Drowsiness and Yawn Detector Using Raspberry Pi or Any Other PC

In the current epoch, drowsy driver detection is the most necessary procedure to prevent any road accidents. So this project is about building a system, which can detect the drowsiness of the user and alert the user so that an accident can be prevented. Also, this system will detect if the user is taking a yawn, and based on that also, the system will alert the user. Now, this system can be implemented on any computer. But definitely, no one is going to put a whole computer inside a car, thus I have also implemented this system in a Raspberry Pi 3, so that the whole system can be easily fitted in front of the driver in the car. Also, all the algorithms I have used in this system are very much optimized, thus this system will work in real-time, which is very much necessary for this application.

You can find the demo of this project in the below video:

You can see the full build video here :

Supplies

Hardware Requirement:

You can use any computer having a webcam for this project or you can implement this in a Raspberry Pi 3/4 also. If you want a compact solution, then you need the following hardware:

1. Raspberry Pi 3/4

2. Raspberry Pi Camera

3. Raspberry Pi power supply or a power bank

Software Requirement:

1. Python 3

2.OpenCV

3. dlib

4. imutils

5. scipy

6. numpy

7. argparse

Step 1: Detecting the Face of the User

So, before we can find if the user is drowsy or not, we need to detect the face of the user. Now there are many algorithms available to use, but here we have to use an algorithm that requires less processing power (so that we can run it in Raspberry Pi), which will be fast, and finally it should be accurate too. Thus I decided to use the famous “Viola-Jones” algorithm for face detection. There are deep learning-based algorithms also that provide better accuracy, but those will not work real-time in Raspberry Pi (not even on the PC, if you don't have a powerful GPU). To know more about “Viola-Jones” algorithm, you can visit: https://towardsdatascience.com/the-intuition-behi…

Step 2: Detecting If the Eyes Are Opened or Closed

To detect if the user is sleeping or not, we have to find out if the user's eyes are open or not. To find out that, we are going to use the Eye-Aspect-Ratio (EAR). The average eye aspect ratio is 0.339 and 0.141 when the eyes are opened and closed, respectively. So whenever our system will detect a face, it will calculate the EAR and if it's below the threshold (set by the user), then it will alert the user and it will alert the user continuously until the user will open the eyes.

Now to calculate the EAR, we need to find out the eye landmarks in the face (as you can see in the figure). To find out these landmarks, we will use Dlib's 68 facial landmark model, which is a pre-trained model and can be easily used with python. It is used to estimate the location of 68 coordinates (x, y) that map the facial points on a person’s face like the above figure.

Finally, after obtaining the points, we can find the EAR, using the formula, EAR = (|(P2-P6)| + |(P3-P5)|) / (2* |(P1 – P4)|) . Next, we will check if this EAR value is within the threshold or not, and based on that the system will alert the user.

Step 3: Detecting the Yawn

To detect the yawn, we need to find out the distance between the user's upper lip and lower lip. So, when a person is talking this distance will be within a limit, but when the person will take a yawn, the distance will be much higher than the limit or threshold. Now to find out the distance between two lips, we need to find out the landmarks of the lips and again we will use the DLIB's facial landmark model here. Then we will simply calculate the distance between the midpoint of the upper lip to the midpoint of the lower lip. And if this distance is more than the threshold, the system will give a yawn alert to the user.

Step 4: Hardware Setup

If you are using a computer to run this setup, then You just have to connect a webcam with the computer. And if your PC has an inbuilt camera, then that will work too.

Now if you are using Raspberry Pi for this task, then also you can use a webcam with it, but it is recommended to use a Raspberry Pi camera with Raspberry Pi for better results.

Step 5: Software Setup

If you are using a PC, then you can use an IDE like pycharm where you can simply install all the required python libraries easily. But in the case of Raspberry Pi, you may have issues while installing DLIB and OpenCV in the Pi, so for that, you can follow the below videos:

Now for the remaining libraries, you should not have any issues while installing them, as those are pretty straightforward. You need to install the following libraries in python 3:

  • imutils
  • scipy
  • numpy
  • argparse

If somehow you get an issue installing Scipy, you can use math.dist() also instead of Scipy distance function.

Now after installing all the libraries, you also need to download two files, the first one is haarcascade_frontalface_default.xml, which will be used for face detection, and the second one is shape_predictor_68_face_landmarks.dat, which will be used to find the landmarks in the face. You will get both the files in the following Github repository:

https://github.com/Arijit1080/Drowsiness-and-Yawn-…

Step 6: Programming

Now, after doing all the setup finally, it's time to write the code. Here I will explain the important parts of the code and a few parts, which you may need to modify based on your usage. You can get all the codes and files at: https://github.com/Arijit1080/Drowsiness-and-Yawn…

Firstly, we need to import all the required libraries,

from imutils.video import VideoStream<br>from imutils import face_utils
from threading import Thread
import numpy as np
import argparse
import imutils
import time
import dlib
import cv2
import osfrom scipy.spatial import distance as dist

Next, we have a function to calculate the EAR. It will take the landmarks of a single eye as input.

def eye_aspect_ratio(eye):
	A = dist.euclidean(eye[1], eye[5])<br>	B = dist.euclidean(eye[2], eye[4])
	C = dist.euclidean(eye[0], eye[3])<br>	ear = (A + B) / (2.0 * C)
	return ear<br>

The next function will calculate the average EAR for both of the eys, using the previous function “eye_aspect_ratio”. It will take the landmarks list as the input, which the DLIB's shape predictor will return. It will also return left eye and right eye coordinates so that we can draw lines around the eyes in the output video feed.:

def final_ear(shape):
	(lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"
        (rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]
	leftEye = shape[lStart:lEnd]
       	rightEye = shape[rStart:rEnd]
	leftEAR = eye_aspect_ratio(leftEye)
       	rightEAR = eye_aspect_ratio(rightEye)
	ear = (leftEAR + rightEAR) / 2.0
        return (ear, leftEye, rightEye)

Next, we have a function named “lip_distance” to calculate the distance between lips. It will also take the DLIB's shape predictor's output as input:

def lip_distance(shape):<br>    top_lip = shape[50:53]
    top_lip = np.concatenate((top_lip, shape[61:64]))

    low_lip = shape[56:59]
    low_lip = np.concatenate((low_lip, shape[65:68]))

    top_mean = np.mean(top_lip, axis=0)
    low_mean = np.mean(low_lip, axis=0)

    distance = abs(top_mean[1] - low_mean[1])
    return distance

After detecting that the user is drowsy, to alert the user, we have an alarm function. Here I have used espeak, so that our system can say the user “Wake Up” in a robotics voice. I have used a few global variables here, as I was in a little hurry, but definitely, this piece of code can be improved and that responsibility I am giving to you 🙂

def alarm(msg):<br>    global alarm_status
    global alarm_status2
    global saying

    while alarm_status:
        print('call')
        s = 'espeak "'+msg+'"'
        os.system(s)

    if alarm_status2:
        print('call')
        saying = True
        s = 'espeak "' + msg + '"'
        os.system(s)
        saying = False

Finally, we have the main code where we have a lot of things, so I will only explain the important parts:

We have a small part, where we have used argparse to ensure that if the user is using a secondary webcam, he/she can do that simply by passing the webcam number as the command-line argument.

ap = argparse.ArgumentParser()<br>ap.add_argument("-w", "--webcam", type=int, default=0,
                help="index of webcam on system")
args = vars(ap.parse_args())

Next, we have some global threshold values, which you may need to change according to the distance of the user from the camera (especially the threshold values to detect the yawn, YAWN_THRESH). By changing the value of EYE_AR_CONSEC_FRAMES, you can set the number of frames for which the system will wait before giving the alert to a user when the user has closed the eyes. Depending on the FPS of your camera and how much time you want to wait before giving an alert, you need to set the value of EYE_AR_CONSEC_FRAMES variable.

EYE_AR_THRESH = 0.3<br>EYE_AR_CONSEC_FRAMES = 30
YAWN_THRESH = 20
alarm_status = False
alarm_status2 = False
saying = False
COUNTER = 0

Next, we will load the models for face detection and landmarks prediction on the face.

detector = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")    #Faster but less accurate
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')

Finally, we will start capturing video from the webcam or pi camera. Here, we will try to find faces in each of the frame and if we can find any face, then we will pass that face in the shape predictor, which will give us the face landmarks. Using the landmarks we will call the earlier defined functions to calculate the EAR and distance between lips. If the lip distance is more than the threshold, the system will call the alarm function immediately to alert the user. If the EAR value is not in the limit, the system will wait for a pre-defined amount of time and if after the time also the EAR value is not in the limit, then it will call the alarm function and will alert the user.

Also just for the demo purpose, here the system will show us the live feed and also how accurately it is detecting the eyes and lips. Also, it will show the calculated EAR value and lip distance.

print("-> Starting Video Stream")<br>vs = VideoStream(src=args["webcam"]).start()
#vs= VideoStream(usePiCamera=True).start()       //For Raspberry Pi
time.sleep(1.0)


while True:
    frame = vs.read()
    frame = imutils.resize(frame, width=450)
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    #rects = detector(gray, 0)

    rects = detector.detectMultiScale(gray, scaleFactor=1.1, 
        minNeighbors=5, minSize=(30, 30),
        flags=cv2.CASCADE_SCALE_IMAGE)

    #for rect in rects:
    for (x, y, w, h) in rects:
        rect = dlib.rectangle(int(x), int(y), int(x + w),int(y + h))
        shape = predictor(gray, rect)
        shape = face_utils.shape_to_np(shape)
        eye = final_ear(shape)
        ear = eye[0]
        leftEye = eye [1]
        rightEye = eye[2]

        distance = lip_distance(shape)
        leftEyeHull = cv2.convexHull(leftEye)
        rightEyeHull = cv2.convexHull(rightEye)

        cv2.drawContours(frame, [leftEyeHull], -1, (0, 255, 0), 1)
        cv2.drawContours(frame, [rightEyeHull], -1, (0, 255, 0), 1)
        lip = shape[48:60]
        cv2.drawContours(frame, [lip], -1, (0, 255, 0), 1)

        if ear < EYE_AR_THRESH:
            COUNTER += 1
            if COUNTER >= EYE_AR_CONSEC_FRAMES:
                if alarm_status == False:
                    alarm_status = True
                    t = Thread(target=alarm, args=('wake up sir',))
                    t.deamon = True
                    t.start()
                cv2.putText(frame, "DROWSINESS ALERT!", (10, 30),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
        else:
            COUNTER = 0
            alarm_status = False

        if (distance > YAWN_THRESH):
                cv2.putText(frame, "Yawn Alert", (10, 30),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
                if alarm_status2 == False and saying == False:
                    alarm_status2 = True
                    t = Thread(target=alarm, args=('take some fresh air sir',))
                    t.deamon = True
                    t.start()
        else:
            alarm_status2 = False
        cv2.putText(frame, "EAR: {:.2f}".format(ear), (300, 30),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
        cv2.putText(frame, "YAWN: {:.2f}".format(distance), (300, 60),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF
    if key == ord("q"):
        break

cv2.destroyAllWindows()
vs.stop()

Step 7: Running the System

Finally, after writing the full code, you should have 3 files in the same folder: a python code and two models. Now to run the system, you need to use the command:

Python3 drowsiness_yawn.py -- webcam 0		//For external webcam, use the webcam number accordingly

After running this, the system should work like this demo video.

Step 8: Conclusion and Improvements

So in this way, I have built this drowsiness and yawn detection system. Now there are several improvements one can add to this project. The face detection part can be improved by using some other algorithms, but in that case, one must have to use some optimized technique so that it will work in real-time in systems like Raspberry Pi.

Source: Realtime Drowsiness and Yawn Detector Using Raspberry Pi or Any Other PC


About The Author

Muhammad Bilal

I am highly skilled and motivated individual with a Master's degree in Computer Science. I have extensive experience in technical writing and a deep understanding of SEO practices.

Scroll to Top