Eye and Eyebrow Movement Recognition Model

License Python TensorFlow

📖 Table of Contents

📚 Description

The Eye and Eyebrow Movement Recognition model is an advanced real-time system designed to accurately detect and classify subtle facial movements, specifically focusing on the eyes and eyebrows. Currently, the model is trained to recognize three distinct movements:

  • Yes: Characterized by the raising of eyebrows.
  • No: Indicated by the lowering of eyebrows.
  • Normal: Representing a neutral facial expression without significant eye or eyebrow movements.

Leveraging a CNN-LSTM (Convolutional Neural Network - Long Short-Term Memory) architecture, the model effectively captures both spatial features from individual frames and temporal dynamics across sequences of frames. This ensures robust and reliable performance in real-world scenarios.

🔍 Features

  • Real-Time Detection: Continuously processes live webcam feeds to detect eye and eyebrow movements without noticeable lag.
  • GPU Acceleration: Optimized for GPU usage via TensorFlow-Metal on macOS, ensuring efficient computations.
  • Extensible Design: While currently supporting "Yes," "No," and "Normal" movements, the system is designed to be easily extended to accommodate additional facial gestures or movements.
  • User-Friendly Interface: Provides visual feedback by overlaying predictions directly onto the live video feed for immediate user feedback.
  • High Accuracy: Demonstrates high accuracy in distinguishing between the supported movements, making it a reliable tool for real-time facial gesture recognition.

🎯 Intended Use

This model is ideal for a variety of applications, including but not limited to:

  • Human-Computer Interaction (HCI): Enhancing user interfaces with gesture-based controls.
  • Assistive Technologies: Providing non-verbal communication tools for individuals with speech impairments.
  • Behavioral Analysis: Monitoring and analyzing facial expressions for psychological or market research.
  • Gaming: Creating more immersive and responsive gaming experiences through facial gesture controls.

Note: The model is intended for research and educational purposes. Ensure compliance with privacy and ethical guidelines when deploying in real-world applications.

🧠 Model Architecture

The model employs a CNN-LSTM architecture to capture both spatial and temporal features:

  1. TimeDistributed CNN Layers:

    • Conv2D: Extracts spatial features from each frame independently.
    • MaxPooling2D: Reduces spatial dimensions.
    • BatchNormalization: Stabilizes and accelerates training.
  2. Flatten Layer:

    • Flattens the output from CNN layers to prepare for LSTM processing.
  3. LSTM Layer:

    • Captures temporal dependencies across the sequence of frames.
  4. Dense Layers:

    • Fully connected layers that perform the final classification based on combined spatial-temporal features.
  5. Output Layer:

    • Softmax Activation: Provides probability distribution over the three classes ("Yes," "No," "Normal").

📋 Training Data

The model was trained on a curated dataset consisting of short video clips (1-2 seconds) capturing the three target movements:

  • Yes: 50 samples
  • No: 50 samples
  • Normal: 50 samples

Each video was recorded using a standard webcam under varied lighting conditions and backgrounds to ensure robustness. The videos were manually labeled and organized into respective directories for preprocessing.

📈 Evaluation

The model was evaluated on a separate test set comprising 60 samples for each class. The evaluation metrics are as follows:

  • Accuracy: 85%
  • Precision: 84%
  • Recall: 86%
  • F1-Score: 85%

💻 Usage

Prerequisites

  • Hardware: Mac with Apple Silicon (M1, M1 Pro, M1 Max, M2, etc.) for Metal GPU support.
  • Operating System: macOS 12.3 (Monterey) or newer.
  • Python: Version 3.9 or higher.

Installation

  1. Clone the Repository

    git clone https://huggingface.co/shayan5422/eye-eyebrow-movement-recognition
    cd eye-eyebrow-movement-recognition
    
  2. Install Homebrew (if not already installed)

    Homebrew is a package manager for macOS that simplifies the installation of software.

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
    
  3. Install Micromamba

    Micromamba is a lightweight package manager compatible with Conda environments.

    brew install micromamba
    
  4. Create and Activate a Virtual Environment

    We'll use Micromamba to create an isolated environment for our project.

    # Create a new environment named 'eye_movement' with Python 3.9
    micromamba create -n eye_movement python=3.9
    
    # Activate the environment
    micromamba activate eye_movement
    
  5. Install Required Libraries

    We'll install TensorFlow with Metal support (tensorflow-macos and tensorflow-metal) along with other necessary libraries.

    # Install TensorFlow for macOS
    pip install tensorflow-macos
    
    # Install TensorFlow Metal plugin for GPU acceleration
    pip install tensorflow-metal
    
    # Install other dependencies
    pip install opencv-python dlib imutils tqdm scikit-learn matplotlib seaborn h5py
    

    Note: Installing dlib can sometimes be challenging on macOS. If you encounter issues, consider installing it via Conda or refer to dlib's official installation instructions.

  6. Download Dlib's Pre-trained Shape Predictor

    This model is essential for facial landmark detection.

    # Navigate to your project directory
    cd /path/to/your/project/eye-eyebrow-movement-recognition/
    
    # Download the shape predictor
    curl -LO http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
    
    # Decompress the file
    bunzip2 shape_predictor_68_face_landmarks.dat.bz2
    

    Ensure that the shape_predictor_68_face_landmarks.dat file is in the same directory as your scripts.

Loading the Model

import tensorflow as tf

# Load the trained model
model = tf.keras.models.load_model('final_model_sequences.keras')

Making Predictions

import cv2
import numpy as np
import dlib
from imutils import face_utils
from collections import deque
import queue
import threading

# Initialize dlib's face detector and landmark predictor
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')

# Initialize queues for threading
input_queue = queue.Queue()
output_queue = queue.Queue()

# Define sequence length
max_seq_length = 30

def prediction_worker(model, input_q, output_q):
    while True:
        sequence = input_q.get()
        if sequence is None:
            break
        # Preprocess and predict
        # [Add your prediction logic here]
        # Example:
        prediction = model.predict(sequence)
        class_idx = np.argmax(prediction)
        confidence = np.max(prediction)
        output_q.put((class_idx, confidence))

# Start prediction thread
thread = threading.Thread(target=prediction_worker, args=(model, input_queue, output_queue))
thread.start()

# Start video capture
cap = cv2.VideoCapture(0)
frame_buffer = deque(maxlen=max_seq_length)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Preprocess frame
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    rects = detector(gray, 1)
    if len(rects) > 0:
        rect = rects[0]
        shape = predictor(gray, rect)
        shape = face_utils.shape_to_np(shape)
        # Extract ROIs and preprocess
        # [Add your ROI extraction and preprocessing here]
        # Example:
        preprocessed_frame = preprocess_frame(frame, detector, predictor)
        frame_buffer.append(preprocessed_frame)
    else:
        frame_buffer.append(np.zeros((64, 256, 1), dtype='float32'))

    # If buffer is full, send to prediction
    if len(frame_buffer) == max_seq_length:
        sequence = np.array(frame_buffer)
        input_queue.put(np.expand_dims(sequence, axis=0))
        frame_buffer.clear()

    # Check for prediction results
    try:
        while True:
            class_idx, confidence = output_queue.get_nowait()
            movement = index_to_text.get(class_idx, "Unknown")
            text = f"{movement} ({confidence*100:.2f}%)"
            cv2.putText(frame, text, (30, 30), cv2.FONT_HERSHEY_SIMPLEX, 
                        0.8, (0, 255, 0), 2, cv2.LINE_AA)
    except queue.Empty:
        pass

    # Display the frame
    cv2.imshow('Real-time Movement Prediction', frame)

    # Exit on 'q' key
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Cleanup
cap.release()
cv2.destroyAllWindows()
input_queue.put(None)
thread.join()

Note: Replace the placeholder comments with your actual preprocessing and prediction logic as implemented in your scripts.

🔧 Limitations

  • Movement Scope: Currently, the model is limited to recognizing "Yes," "No," and "Normal" movements. Extending to additional movements would require further data collection and training.
  • Environmental Constraints: The model performs best under good lighting conditions and with a clear, frontal view of the face. Variations in lighting, occlusions, or extreme angles may affect accuracy.
  • Single Face Assumption: The system is designed to handle a single face in the frame. Multiple faces may lead to unpredictable behavior.

⚖️ Ethical Considerations

  • Privacy: Ensure that users are aware of and consent to the use of their facial data. Handle all captured data responsibly and in compliance with relevant privacy laws and regulations.
  • Bias: The model's performance may vary across different demographics. It's essential to train the model on a diverse dataset to minimize biases related to age, gender, ethnicity, and other factors.
  • Misuse: Like all facial recognition technologies, there's potential for misuse. Implement safeguards to prevent unauthorized or unethical applications of the model.

📜 License

This project is licensed under the MIT License.

🙏 Acknowledgements


Feel free to reach out or contribute to enhance the capabilities of this model!


Downloads last month
29
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support