Gait based Age Estimation using LeNet-50 inspired GaitNet

— Gait is a behavioural biometric that does not require the subject’s collaboration as it can be captured at a distance. The Gait-based age estimation has extensive applications in surveillance, customer age estimation in shopping centres and malls for business intelligence purposes and age-constrained access control to places like liquor shops, etc. In this paper, we propose Gait-Net, a LeNet-50 inspired age classification Convolutional Neural Network (CNN) for Gait-based age estimation. We propose the application of a heat map filter on each Gait Energy Image (GEI), for the enhancement of age differentiating features in the GEI, subsequently followed by the sequential age group and age estimation CNN models. We addressed the inherent class imbalance problem induced by the non-availability of sufficient data for the elderly subjects, by using the image augmentation technique. We evaluated our model on the OU-ISIR Large Population Gait Database and the results confirmed its efficiency.


INTRODUCTION
Gait-based biometric identification has been extensively studied in the recent years for its comparative viability in certain environments over the physiological biometrics like fingerprints, facial recognition, iris recognition, etc [1][2][3][4][5]. A Gait capturing setup can achieve its task even with a non-cooperative distant subject and a low-resolution camera setup. In addition to the individual identification, lately, numerous studies have extensively explored the Gait-based evaluation of attributes like gender, age group, ethnicity, age, etc [6][7][8]. While the age estimation has been the primary focus for relatively more applications in visual surveillance, access control, forensics and criminal investigation.
Earlier studies focussed on establishing a relationship between the Gait attributes like arm swing, leg stride, stride frequency, etc. and age of the subject. Davis [9] established a relationship between age and Gait attributes to differentiate between adult and child subjects. Abreu et al. [10] in part established the possibility of gait-based age estimation using gait analysis. They produced and the used the representations of cyclic movements of limbs called the cyclo-grams as the input to the feature extraction phase. They were successful in creating a model that differentiated between a young and an elderly subject but couldn't differentiate between the two genders. Ince et al. [11] studied the shape of the body as age determining construct by differentiating a child and an adult from their head to body proportion. Callisaya et al. [7] discovered that the gender of an person alters the relationship between the age of the person and their gait as they found a substantial relationship between gender and various Gait attributes.
In conventional image processing or computer visionbased Gait-based age estimation models, a Gait descriptor serves as an input. The common Gait descriptors used in the literature are: Gait Image Contour, Gait Image Silhouette, and Gait Energy Image (GEI) as shown in figure 1. The better feature representation ability and effectiveness of the GEI qualifies it for the most widely used feature descriptor. It is a combined representation of In this paper we put forth Gait-based age estimation using LeNet-50 inspired Convolutional Neural Network. The depth of a deep learning model trained on a GEI based Gait image dataset is restricted by the intrinsic deficiency of features in the GEI, exceeding a certain number of layers in the CNN results in overfitting. To overcome this shortcoming, we propose a heatmap filtering of the GEIs for the feature enhancement purpose as shown in figure 2. The heatmap representation allows us to train a deeper CNN over the data set without overfitting the training set. The proposed LeNet-50 enthused CNN overcomes the problem of getting stuck in the local minima near the global minima as faced by a conventional CNN that is trained with a uniform learning rate. By ensuring an iterative decrease in the learning rate near the global minima facilitates its convergence at the global minima. A common shortcoming possessed by all the preceding studies has been the low prediction accuracy for the elder and child subjects, the bias which is induced by the scarcity of data for those age groups. To overcome this limitation, we used image augmentation technique to reduce the class imbalance problem in the dataset, thus improving age prediction accuracy for those particular age groups. The performance evaluation of our model was conducted on OU-ISIR Large Population Gait Database [12] , which is the largest Gait database with age developed till date, comprising of 63,748 images of subjects aging between 2 and 90 years.
The contributions of this paper are: (1) A LeNet50 inspired sequential algorithm for gender and sequential age prediction and (2) An age estimation model with improved age group prediction accuracy for children and elders.

II. RELATED WORK
The earlier studies in based upon computer vision would mostly require a manual extraction of features from the gait descriptor. Zhang et al. [13] proposed a Hidden Markov Model based age group classification model. In the feature extraction phase, they generated a Frame to Exemplar distance vector which contained the distances from multiple contour points to the centroid of the contour. They achieved an accuracy of 83.33% in bi-class classification of young and old subjects over a self-developed 14 subject dataset. Mansouri Nabila et al. [14] proposed a novel Gait descriptor which captured both Spatiotemporal Transverse and Spatiotemporal Longitudinal projections of the gait descriptor which was a silhouette in this case. They employed the Support Vector Machine (SVM) over the 4000 subject OU-ISIR database reaching up to a precision of about 74%. Xiang Li et al. [15] performed an age-group classification of the subjects. They used a directed acyclic graph (DAG) for the age group representation and an SVM using a Gaussian kernel to do the classification task. They achieved an average age group classification accuracy of 72.23% and an age estimation Mean Absolute Error (MAE) of 6.78 years over the OULP-Age Dataset comprising of Jiwen Lu et al. [16] used a fusion technique for the Gabor feature set like the gait sequence phase and the Gabor magnitude for the purpose of feature enhancement. They used the USF gait database for the model evaluation and achieved a Mean Error Average of 5.42 years. Makihara et al. [17] in one of the first studies to use the Gait Energy Image descriptor used the Gaussian Regression technique to predict the age. For the model training and evaluation, they used a self-created gait database with 1728 subjects of varying ages between 2 and above 90 years. The best MAE they could reach up to was 8.2 years.
M. Hu et al. [18] presented the intensification of mutual information technique using the Gabor filter for feature extraction and Bayes Rule based on Hidden Markov Model (HMM) for the classification. It performed both gender as well as age classification (young and old). The gender classification results were evaluated over the CASIA(B) dataset and IRIP dataset and the age classification on the database used by Zhang et al. [13]. A fresh study by A Sakata et al. [19] put forth a deep learning-based model which employed multiple Convolutional Neural Networks sequentially in the age estimation process. The GEI would firstly go through a Convolutional Neural Network which predicts its gender and sequentially passes through other two CNNs predicting the Age Group and age (achieving an MEA of 5.84 years). T. Islam et al. [23] presented a comprehensive analysis of the related studies in this area which apparently turn out to be not many. They compared various gait-based age estimation techniques based on various evaluation metrics. They found that the Deep Learning based studies had achieved best results.
The efficiency of Deep Learning based techniques motivated us to drive our research in the specific direction.

III. PROPOSED METHOD
We propose separate models for age and gender estimation as depicted in the flowcharts in Figure 3 and 4. In the gender estimation process, a heatmap filter is applied on the GEI, which is subsequently fed into the CNN which predicts the binary gender label. A sequential CNN setup first predicts the age group -Toddler (2-5), Child (6-11), Adolescent (12)(13)(14)(15)(16)(17)(18), Adult (19-60), Old (61-90) and subsequently the age classification CNN predicts the age of the subject.

Model Architecture
The proposed CNN architecture is depicted in figure 5. The input GEI of 128×88 pixels, is converted into an heatmap of 128×88 size. The proposed model includes of two pairs of consecutive convolution and max-pooling layers with a dropout probability of 0.5. The first pair has 81 filters of (5,5) and (3,3) max-pool filters and the second pair comprising of 45 filters of (7,7) and (3,3) max-pool filters. A dropout with probability 0.3 is applied beforehand the flattening operation. We used three fully connected layers with 1024, 256 and 32 nodes with a dropout rate of 0.3. The dense layers are initialized with the he_normal [20] as kernel_initializer and bias initialized to zeros. The relu [21] activation function is used in the three dense layers and sigmoid for the recognition layer. The architecture and hyperparameter selection were done through a rigorous training and testing of multiple architecturehyperparameters combinations and to arrive at the combination with best evaluation metrics results.

Model Training
We propose a two-phase training setup for the CNN. In the initial phase the model is compiled using the Adam optimizer with the default learning rate of 0.01 and categorical_crossentropy as the loss function in the first phase. It is trained for twenty epochs on the training set with a batch size of 32. The model checkpointing on val_accuracy ensures that only the model with best validation accuracy is saved. In the second phase the saved model is further trained in multiple subphases with iterative decreasing learning rate. The subsequent sub phases contain three epochs each and the learning rates iteratively decreasing by a factor of 10 -3 , 0.8, 0.2, 0.08 respectively.

Dataset
The model was evaluated on the OU-ISIR Gait Database, Large Population Dataset with Age (OULP-Age), which is the largest Gait database developed till date. The OU-ISIR biometric database is a repository of various Gait databases like the Large Population Dataset with Age, Population Dataset with Bag, Inertial Sensor Dataset, etc. They were developed by capturing the side view video of Gait sequence of each subject followed by a three step GEI extraction: segmentation, normalization and averaging. The Large Population Dataset with Age is a collection of 63846 GEI images of both male and female with ages between 2 and 90 years.   Table 2 gives the gender wise breakup of the database. The training settest set split is done at 50% (15961 and 15962 subjects respectively). The split remains the same for all the three classifications processes.

Performance Evaluation
Since the gender estimation is a bi class classification problem thus, we require a simple evaluation metric like the accuracy, which is given by where Nc represents the total number of samples that were accurately classified by the model and N gives the total sample size. In age estimation problem (a regression problem), the frequently used performance evaluation metrics are the Mean Absolute Error (MEA) and Standard Deviation (SD) which are given by the following equations: Where N is the total sample size of the test set, tx and px are the actual and predicted values of age for the x th sample.
The following formula is used to compute the standard deviation:   The same CNN architecture when employed for age group estimation yielded the following classification results as shown in Table 7 and 8.

GaitNet with heatmap filtered GEI:
We improved the gender and age group classification accuracy of the conventional single dense layer CNN with our proposed GaitNet. The training set and test set gender classification accuracy reached up to 99.03% and 96.96% respectively. Table 9 and 10 depict the gender classification results of GaitNet with heatmap filtered GEI over training and test set respectively. Table 11 and 12 present the confusion matrix for age group prediction over the training set and test set respectively.

GaitNet with image augmentation:
The inherent class imbalance problem in the OULP dataset induces lower classification accuracy for toddler, child and old age groups. We used image augmentation technique over these two age groups to alleviate the problem. To increase the number of subjects in the scarce age groups we cloned the GEI's in the training set of toddler, child and old age groups. We employed the width_shift and height_shift transformations on the enlarged dataset. The size of the augmented dataset increased up to 76,667 subjects. We can use the adding of lower levels of curated noise for data augmentation such that the new images generated could preserve the discriminating features and at the same time generate new subjects.

Age estimation results using the GaitNet:
The GaitNet employs a sequential process for the age estimation. First the age group of the subject is predicted followed by age estimation using a sequential CNN trained over the predicted age group. Instead of addressing the age estimation problem as a regression problem we considered it as a classification problem. So, the number of nodes in the final layer of GaitNet are equal to the number of distinct age values in the respective age group, e.g., the CNN for age prediction for toddler age group has 4 nodes as it has age values 2-5 years. Table 14 gives the comparative description of our method with the existing methods in the literature using the MEA and SD evaluation metrics. Figure  7 plots the true age values against the predicted values.

V. CONCLUSION AND FUTURE SCOPE
In this paper we proposed GaitNet to improve the Gaitbased age estimation accuracy and it outperformed all other existing Gait-based age estimation models. The heatmap filter assisted in making it possible to train a deeper CNN by increasing the number of distinguishing features in the GEI. The image augmentation technique alleviated the implications of data scarcity of elder and child subjects. Though we addressed most of the shortcomings of the existing models but some still continue to prevail like the sequential nature of the age prediction process connotes that wrong age group prediction for a subject would subsequently degrade its age prediction.
In future the work can be extended by further feature enhancement of the GEI Gait descriptor eventually making employing a deeper CNN a possibility. The development of OULP Gait database by addition of more subjects in the elder and children age group would further enhance the age group prediction accuracy ultimately improving the age estimation accuracy.