Human Activity Recognition
Human Activity Recognition using CNN-LSTM Modelling
1. Introduction
Human Activity Recognition (HAR) is a vital field of study aimed at developing techniques for the identification and categorization of human physical activities through the utilization of machine learning models. These models leverage data collected by IMU devices such as smartwatches and smartphones equipped with sensors, including accelerometers, gyroscopes, and magnetometers [1]. The data generated by these sensors is collected over time, transforming the problem into a time series classification (TSC) [2].
In this project, we present our research on HAR, focusing on the implementation of a hybrid deep learning model combining Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNN). Our model is designed to accurately recognize six distinct human activities: * Downstairs * Jogging * Walking * Sitting * Standing * Upstairs * Walking
1.1 Understanding the time series data
Ismail and his colleagues [3] have laid the groundwork for comprehending time series data. They define a time series as a sequence of real values, where ‘T’ represents the length of this list. Moreover, they introduce the concept of a dataset, denoted as ‘D,’ consisting of pairs in the form of (Xi, yi), where ‘Xi’ symbolizes a time series, and ‘yi’ denotes the corresponding label or category. The crux of the matter is the construction of models that classify this dataset ‘D,’ thereby mapping diverse inputs ‘Xi’ to probabilities associated with different categories ‘yi.’ This serves as a fundamental concept in the realm of machine learning, especially in addressing TSC problems. Since HAR falls into this category, researchers have proposed various techniques, including multi-class Support Vector Machines (SVM) [4], Long Short-Term Memory (LSTM) networks [5], and Convolutional Neural Networks (CNN) [6], among others.
1.2 The CNN-LSTM Approach
In the context of our project, we have employed a novel approach by implementing a fusion of LSTM and CNN models for activity recognition. The synergy between LSTM and CNN layers empowers our model to capture both temporal dependencies and spatial features within the time series data, thereby significantly enhancing the overall accuracy of activity recognition
2. The Code Approach
The team decided to use python to create the CNN-LSTM Model. Python’s vast collection of pre-written codes that provide wide range of functions, classes, and tools for performing specific tasks helped us to build the complex model and perform the required data pre-processing, train, validate and test the model’s performance.
2.1 Importing Libraries
The first step was to import necessary libraries to perform various steps.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from scipy.signal import butter, filtfilt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, f1_score, classification_report
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling1D, LSTM, Dense, Flatten, Dropout
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.regularizers import l2
from sklearn.model_selection import StratifiedKFold
This code imports various Python libraries and modules commonly used in data analysis, signal processing, machine learning, and deep learning tasks. These libraries, including NumPy, Matplotlib, Pandas, Seaborn, SciPy, Scikit-learn, and TensorFlow, provide essential functions and tools for tasks such as data manipulation, visualization, signal filtering, model building, and evaluation. It also imports specific modules and functions for neural network architecture and training, making it suitable for data analysis and machine learning.
2.2 Dataset
We obtained the dataset from the Kaggle challenge here. The data were collected from thirty-six users through their devices like smartphone, smartwatch etc. as they performed daily activities such as walking, jogging, climbing stairs, sitting, and standing. The data provides the user id, labelled activities, timestamps, x,y and z coordinates of the user at the timestamp. [7].
2.3 Data Pre-Processing
In the provided code excerpt, a dataset is being prepared for further analysis, specifically for Human Activity Recognition (HAR). The initial step involves the definition of column names for the DataFrame, prioritizing data organization and clarity. Following this, a CSV file containing sensor data is read into a Pandas DataFrame. To handle missing data effectively, any rows with null entries are removed.Moreover, data refinement procedures are executed, including the removal of semicolon characters from the ‘z-axis’ column and the exclusion of rows with a timestamp value of 0. These measures contribute to data consistency and quality. In pursuit of chronological coherence and proper data arrangement, the DataFrame is sorted based on user identities and timestamps. This code segment highlights the foundational data preprocessing steps essential for working with sensor data in the context of HAR, ensuring data readiness and quality for subsequent analysis or machine learning applications.
columns = ['user', 'activity', 'timestamp', 'x-axis', 'y-axis', 'z-axis']
df = pd.read_csv(r"WISDMData.txt", header=None, names=columns)
df = df.dropna()
print("After removing null values:", df.shape)
df['z-axis'] = df['z-axis'].str.replace(';', '')
df = df[df['timestamp'] != 0]
print(df.shape)
2.4 Label Encoding and Normalization
The label encoding encodes the ‘activity’ column into numerical values and creates a new column named ‘activityEncode’ to store the encoded values. The normalization process eliminates the bias and rescales the distribution so that the mean of observed values is zero and the standard deviation is 1.
if df['activity'].isnull().sum() > 0:
print("Warning: 'activity' column contains null values. Handle them before encoding.")
else:
label_encode = LabelEncoder()
df['activityEncode'] = label_encode.fit_transform(df['activity'].values.ravel())
columns_to_normalize = ['x-axis', 'y-axis', 'z-axis']
scaler = MinMaxScaler()
df[columns_to_normalize] = scaler.fit_transform(df[columns_to_normalize])
print(df)
2.5 Data Splitting
To facilitate the training, validation, and testing of the neural network, it is imperative to partition the data frame into two distinct sets. The first set, constituting 80% of the data, is designated for training and validation purposes, while the remaining 20% is specifically allocated for testing. The code presented in the subsequent snippet accomplishes this division, resulting in two separate data frames: ‘har_df’ and ‘test_df.’ The underlying objective of this data split operation is to shield the test data from influencing the training and validation phases of the model, ensuring their independence and integrity.
har_df, test_df = train_test_split(df, test_size=0.2, random_state=42)
har_df = har_df.sort_values(by = ['user', 'timestamp'], ignore_index=True)
test_df = test_df.sort_values(by = ['user', 'timestamp'], ignore_index=True)
2.6 Data Visualization
In this graph, we visualize the distribution of activities in our Human Activity Recognition (HAR) dataset. Preprocessed data is represented in a color-enhanced bar chart, aiding in a quick assessment of activity counts and class balance. This visualization provides insights critical for informed analysis and model development, leading to improved results.
activity_counts = har_df['activity'].value_counts()
n_colors = len(activity_counts)
colors = plt.cm.viridis(np.linspace(0, 1, n_colors))
plt.figure(figsize=(10, 6))
activity_counts.plot(kind='bar', color=colors)
plt.title('Count vs Activity (Training-validation dataset)')
plt.xlabel('Activity')
plt.ylabel('Count')
plt.tight_layout()
from sklearn.preprocessing import MinMaxScaler
plt.show()
activity_counts = test_df['activity'].value_counts()
n_colors = len(activity_counts)
colors = plt.cm.viridis(np.linspace(0, 1, n_colors))
plt.figure(figsize=(10, 6))
activity_counts.plot(kind='bar', color=colors)
plt.title('Count vs Activity (Test dataset)')
plt.xlabel('Activity')
plt.ylabel('Count')
plt.tight_layout()
from sklearn.preprocessing import MinMaxScaler
plt.show()
Whereas this graph aims to help understand how different users contribute to various activities. The data is grouped by ‘user’ and ‘activity,’ and we count the occurrences of each combination. The resulting visualization provides a comprehensive view of each user’s involvement in different activities. This analysis can offer valuable insights into user-specific behavior patterns and activity preferences, which can be instrumental in tailoring HAR systems or interventions to individual users’ needs and preferences, ultimately enhancing the effectiveness of such applications.
grouped_data = har_df.groupby(['user', 'activity']).size().reset_index(name='count')
palette = sns.color_palette('Set1', n_colors=len(har_df['activity'].unique()))
plt.figure(figsize=(12, 6))
sns.barplot(data=grouped_data, x='user', y='count', hue='activity', palette=palette)
plt.title('Activity Count Contribution by User (Training-Validation Data)')
plt.xlabel('User')
plt.ylabel('Count of Activities')
plt.legend(title='Activity', loc='upper right', bbox_to_anchor=(1.2, 1))
plt.show()
In the following we plot and visualize how different activities behave in the three axis. This is absolutely necessary to understand in order to effectively recognise the different human activities.
for activity in har_df['activity'].unique():
activity_data = har_df[har_df['activity'] == activity][:180] # I have limited to the first 180 data points as it may become complex.
plt.figure(figsize=(15, 5))
plt.title(f'Behavior of X, Y, and Z Axes for {activity}')
plt.plot(activity_data['timestamp'], activity_data['x-axis'], label='X-axis', color='r')
plt.plot(activity_data['timestamp'], activity_data['y-axis'], label='Y-axis', color='g')
plt.plot(activity_data['timestamp'], activity_data['z-axis'], label='Z-axis', color='b')
plt.xlabel('Timestamp')
plt.ylabel('Acceleration')
plt.legend(loc='upper right')
plt.show()
## 3. Windowing To prepare time-series data for machine learning training, we employ a technique called “windowing.” This involves defining a window size, typically 50 records representing 2.5 seconds of data, to create sequences and labels for training. These sequences represent segments of sensor data, while the corresponding labels indicate the specific activity being undertaken. This fundamental process is essential for training time-series-based machine learning models, enabling them to discern patterns and make predictions based on the sequence of sensor readings.
window_size = 50
sequences = []
labels = []
for user_id, user_data in har_df.groupby('user'):
user_sequences = []
user_labels = []
for i in range(0, len(user_data) - window_size + 1):
window = user_data.iloc[i:i + window_size]
sequence = window[['x-axis', 'y-axis', 'z-axis']].values
label = window['activityEncode'].values[-1]
user_sequences.append(sequence)
user_labels.append(label)
sequences.extend(user_sequences)
labels.extend(user_labels)
sequences = np.array(sequences)
labels = np.array(labels)
4. Model
In machine learning model evaluation, the code snippet introduces a vital technique: Stratified K-Fold Cross-Validation. This method ensures robust and equitable model assessment, especially for imbalanced datasets. Here, we initialize Stratified K-Fold Cross-Validation with five splits, dividing the dataset into five distinct test sets, with the remainder used for training. This stratified split maintains consistent class distribution in the target variable across all folds, and the ‘shuffle’ parameter randomizes data to prevent any order-related biases. The code also sets up lists to store evaluation metrics and results.
The code further prepares for constructing confusion matrices and tracking accuracy and loss during training. The core of the code lies in the cross-validation loop, dividing the dataset into training and testing sets for each fold. This process ensures robust model assessment and guards against overfitting. The collected metrics provide valuable insights into the model’s generalization performance and potential overfitting tendencies.
n_splits = 5
skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
f1_scores = []
test_losses = []
test_accuracies = []
train_accuracies = []
n_activities = har_df['activityEncode'].nunique()
total_confusion = np.zeros((n_activities, n_activities))
train_accuracies = []
train_losses = []
val_accuracies = []
val_losses = []
for train_index, test_index in skf.split(sequences, labels):
X_train, X_test = sequences[train_index], sequences[test_index]
y_train, y_test = labels[train_index], labels[test_index]
model = Sequential()
model.add(Conv1D(filters=16, kernel_size=3, activation='relu', input_shape=(window_size, 3)))
model.add(MaxPooling1D(pool_size=2))
model.add(LSTM(32, dropout=0.2, recurrent_dropout=0.2, kernel_regularizer=l2(0.01)))
model.add(Dense(n_activities, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
history = model.fit(X_train, y_train, epochs=100, batch_size=64, validation_data=(X_test, y_test), callbacks=[early_stopping], verbose=2)
In the quest to craft a potent Human Activity Recognition (HAR) model, the following code segment unveils essential steps. Step 2 unveils the model architecture, incorporating Convolutional Neural Networks (Conv1D) and Long Short-Term Memory (LSTM) layers. These layers excel at capturing spatial and temporal patterns in the sensor data. Dropout regularization and L2 regularization techniques are judiciously applied to enhance model robustness. Step 3 embarks on model training, guided by the objective of minimizing sparse categorical cross-entropy loss. The process benefits from early stopping, a technique that curtails training when validation loss plateaus, preserving the model’s optimal state. Finally, in Step 4, the model undergoes evaluation, predicting activity labels for the test dataset, marking a crucial phase in assessing its real-world performance.
train_accuracies.extend(history.history['accuracy'])
val_accuracies.extend(history.history['val_accuracy'])
train_losses.extend(history.history['loss'])
val_losses.extend(history.history['val_loss'])
y_pred = model.predict(X_test)
y_pred_labels = np.argmax(y_pred, axis=1)
5. Results
The results of the CNN-LSTM Model used in the test data indicates a high accuracy in detecting the TRUE activity label. The model returns an accuracy of 0.9048 on the test data. The F1 score for the prediction stands at 0.9014 which is considerably a high score for activity prediction. The confusion matrix clearly indicates the high prediction scores for true vs predicted activity labels.
confusion = confusion_matrix(y_test, y_pred_labels)
total_confusion += confusion
f1 = f1_score(y_test, y_pred_labels, average='weighted')
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
train_loss, train_accuracy = model.evaluate(X_train, y_train, verbose=0)
f1_scores.append(f1)
test_losses.append(test_loss)
test_accuracies.append(test_accuracy)
train_accuracies.append(train_accuracy)
mean_f1 = np.mean(f1_scores)
std_f1 = np.std(f1_scores)
mean_test_loss = np.mean(test_losses)
std_test_loss = np.std(test_losses)
mean_test_accuracy = np.mean(test_accuracies)
std_test_accuracy = np.std(test_accuracies)
mean_train_accuracy = np.mean(train_accuracies)
std_train_accuracy = np.std(train_accuracies)
print(f"Mean F1 Score: {mean_f1:.4f} (±{std_f1:.4f})")
print(f"Mean Training Loss: {mean_test_loss:.4f} (±{std_test_loss:.4f})")
print(f"Mean Validation Accuracy: {mean_test_accuracy:.4f} (±{std_test_accuracy:.4f})")
print(f"Mean Training Accuracy: {mean_train_accuracy:.4f} (±{std_train_accuracy:.4f})")
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(train_accuracies, label='Training Accuracy')
plt.plot(val_accuracies, label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(train_losses, label='Training Loss')
plt.plot(val_losses, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.figure(figsize=(8, 6))
sns.heatmap(total_confusion.astype(int), annot=True, fmt='d', cmap='Blues',
xticklabels=har_df['activity'].unique(),
yticklabels=har_df['activity'].unique())
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()
From the confusion matrix it is evident that the amount of data available affects the accuracy, but it is not the sole factor which influences the accuracy of our model. The activity being performed itself is a significant factor that affects its accuracy. The activities “standing” and “sitting” has the lowest available data, yet the same activities have the higher scores on the confusion matrix. This is because, unlike other activities, sitting and standing do not exhibit a recurrent behavior. Furthermore, due to different orientation of devices with respect to earth [7], it is easier to differentiate between the two activities, as it is evident that “standing” has a constant but higher magnitude on y axis compared to “sitting”. The accuracy also increases due to the relatively small window size, and the caveat being longer training and testing time due to larger number of data points.
Finally, the choice of the number of epochs influences the performance of the neural network. Selecting a large random value in tandem with the early stop method and analyzing the learning curves helped in striking a balance between overfitting and underfitting to achieve optimal accuracy with minimal loss.
window_size = 50
test_sequences = []
test_labels = []
for user_id, user_data in test_df.groupby('user'):
user_test_sequences = []
user_test_labels = []
for i in range(0, len(user_data) - window_size + 1):
window = user_data.iloc[i:i + window_size]
sequence = window[['x-axis', 'y-axis', 'z-axis']].values
label = window['activityEncode'].values[-1]
user_test_sequences.append(sequence)
user_test_labels.append(label)
test_sequences.extend(user_test_sequences)
test_labels.extend(user_test_labels)
test_sequences = np.array(test_sequences)
test_labels = np.array(test_labels)
test_loss, test_accuracy = model.evaluate(test_sequences, test_labels, verbose=0)
y_test_pred = model.predict(test_sequences)
y_test_pred_labels = np.argmax(y_test_pred, axis=1)
f1_test = f1_score(test_labels, y_test_pred_labels, average='weighted')
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")
print(f"Test F1 Score: {f1_test:.4f}")
test_confusion = confusion_matrix(test_labels, y_test_pred_labels)
plt.figure(figsize=(8, 6))
sns.heatmap(test_confusion.astype(int), annot=True, fmt='d', cmap='Blues',
xticklabels=har_df['activity'].unique(),
yticklabels=har_df['activity'].unique())
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix (Test Data)')
plt.show()
Possibilities to improve
The project employs a machine learning framework, specifically utilizing LSTM (Long Short-Term Memory) models, to discern human activities. The obtained results demonstrate a high level of effectiveness in the model’s performance. However, it is important to acknowledge that there exists substantial potential for enhancement, as is customary with machine learning models.
One avenue for improvement entails the development of a more intricate model, involving modifications to the hyperparameters within the CNN-LSTM architecture. This endeavor, though promising, necessitates an increased computational investment for execution. Furthermore, akin to any machine learning model, augmenting the availability of additional data would be advantageous. This additional data would facilitate the model’s refinement through extended training and validation procedures. Consequently, the model is poised to exhibit superior performance when confronted with unseen test data
Future work
As global emphasis on physical fitness intensifies, we perceive this domain as an increasingly pivotal area for scholarly investigation. Consequently, we extend an open invitation to future technology enthusiasts to embark upon exploration within this intriguing realm, with the overarching objective of enhancing the quality of data-driven outcomes.
Our contention is that the model in question possesses the potential to encompass a broader spectrum of recognizable activities. Furthermore, we posit that the pursuit of novel modeling approaches holds promise for further advancement in this field.
Github Repository
You can find the Github Repository here
Bibliography
[1] R. Mutegeki and D. S. Han, “Feature-Representation Transfer Learning for Human Activity Recognition,” The 10th International Conference on ICT Convergence, unpublished.
[2] R. Mutegeki and D. S. Han. “A CNN-LSTM approach to human activity recognition.” In 2020 international conference on artificial intelligence in information and communication (ICAIIC), 362–366. IEEE, 2020.
[3] H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller, “Transfer learning for time series classification,” IEEE Int. Conf. on Big Data, pp 1367–1376, 2018.
[4] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, “A Public Domain Dataset for Human Activity Recognition using Smartphones,” Proc. of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 24–26 April, 2013.
[5] S. O. Eyobu and D. S. Han, “Feature Representation and Data Augmentation for Human Activity Classification Based on Wearable IMU Sensor Data Using a Deep LSTM Neural Network,” Sensors, 2018, 18, 2892.
[6] F. M. Rueda, R. Grzeszick, G. A. Fink, S. Feldhorst and M. Hompel, “Convolutional Neural Networks for Human Activity Recognition Using Body-Worn Sensors,” Informatics, 5(2), 26, May 2018.
[7] Kwapisz, Jennifer R., Gary M. Weiss, and Samuel A. Moore. “Activity recognition using cell phone accelerometers.” ACM SigKDD Explorations Newsletter 12, no. 2 (2011): 74–82.