Hand-Gesture Recognition with PyTorch: A Comprehensive Tutorial

handgesture

1. Introduction to Hand-Gesture Recognition

Hand gesture recognition is a subset of computer vision that focuses on recognizing meaningful human hand shapes or movements. It is widely used in interactive systems for games, virtual reality, sign language recognition, robotics, and more. With the rise of deep learning, hand-gesture recognition has become more precise and efficient.

2. About PyTorch

PyTorch is an open-source machine learning library for Python, based on Torch. It offers two high-level features: tensor computation (like NumPy) with strong GPU acceleration and deep neural networks built on a tape-based autograd system. PyTorch is favored for its simplicity, ease of use, dynamic computational graph and active community support.

3. Dataset Selection

For this tutorial, we will use the LeapGesture dataset, which contains 20000 frames of hand gestures, captured by Leap Motion device. It consists of 10 types of hand gestures, which makes it a suitable candidate for this task as it provides a diverse set of gestures and a large number of samples.

Let’s start coding. We assume that you have already installed PyTorch. If not, please visit the official website for installation instructions.

Import necessary packages

Python
import os
import numpy as np
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
import torch
import torch.nn as nn
import torch.optim as optim

4. Data Preprocessing

Our data preprocessing will involve loading the data, applying some transformations for data augmentation, and splitting the data into training and testing sets.

Load the data

Python
class LeapGestRecog(Dataset):
    def __init__(self, root, transform=None):
        self.root = root
        self.transform = transform
        self.x = []
        self.y = []

        folders = os.listdir(root)
        for folder in folders:
            for dirpath, dirnames, filenames in os.walk(os.path.join(root, folder)):
                for filename in filenames:
                    self.x.append(os.path.join(dirpath, filename))
                    self.y.append(folder)
        self.len = len(self.x)

    def __len__(self):
        return self.len

    def __getitem__(self, index):
        img = Image.open(self.x[index]).convert('L')
        y = self.y[index]
        if self.transform:
            img = self.transform(img)
        return img, y

root = './data/leapGestRecog/'
transform = transforms.Compose([transforms.Resize((128, 128)), transforms.ToTensor()])
dataset = LeapGestRecog(root, transform=transform)

Splitting data into train and test set

Python
train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(dataset, [train_size, test_size])

5. Building the Model

Let’s build a Convolutional Neural Network (CNN) model for this task. We’ll use three convolutional layers, followed by two fully connected layers.

Python
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=5)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=5)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=5)
        self.fc1 = nn.Linear(128*12*12, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2)
        x = F.relu(self.conv3(x))
        x = x.view(x.size(0), -1)  # Flatten layer
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

model = Net()

6. Training the Model

We will use the Adam optimizer and the CrossEntropy loss function. We’ll train the model for 10 epochs with a batch size of 64.

Python
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=True)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()

def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
        print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
            epoch, batch_idx * len(data), len(train_loader.dataset),
            100. * batch_idx / len(train_loader), loss.item()))

for epoch in range(1, 11):
    train(epoch)

7. Testing the Model

We can evaluate the model’s performance on the unseen test data.

Python
def test():
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += loss_fn(output, target).item()
            pred = output.data.max(1, keepdim=True)[1]
            correct += pred.eq(target.data.view_as(pred)).sum()
    test_loss /= len(test_loader.dataset)
    print('\nTest set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

test()

8. Model Improvement Tips

Here are few tips to improve your model performance:

  • More Data: More data can help the model generalize better.
  • Data Augmentation: Applying different techniques like rotation, flipping can help model to be more robust.
  • More Complex Model: If your model is underfitting, try to increase the complexity of the model.
  • Regularization: If your model is overfitting, try using regularization techniques like dropout or L1, L2 regularization.
  • Hyperparameter Tuning: Tune parameters like learning rate, batch size, number of epochs, etc.

9. Summary

This tutorial has shown you how to develop a hand gesture recognition model using PyTorch. It’s just a start and the accuracy can be improved with more complex architectures, data augmentation techniques and by tuning hyperparameters.

We hope this tutorial helped you understand the process of hand-gesture recognition using PyTorch, and we encourage you to experiment with the code and the model structure to see how the performance can be improved!

Ready for more?

Get our latest tutorials and updates in your inbox.
Facebook
Twitter
LinkedIn

WAIT – Build Data Science Skills

Join FREE CHALLENGE

Are you up for an Object Detection Challenge? 🚀

BONUS: You will get access to an exclusive data science community