feat: talked about resnet
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
Andre Henriques 2024-01-24 19:51:41 +00:00
parent 4c5320b9e8
commit 2223b3e302
2 changed files with 29 additions and 6 deletions

View File

@ -235,3 +235,19 @@ year = 1998
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{going-deeper-with-convolutions,
title={Going Deeper with Convolutions},
author={Christian Szegedy and Wei Liu and Yangqing Jia and Pierre Sermanet and Scott Reed and Dragomir Anguelov and Dumitru Erhan and Vincent Vanhoucke and Andrew Rabinovich},
year={2014},
eprint={1409.4842},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{very-deep-convolution-networks-for-large-scale-image-recognition,
title={Very Deep Convolutional Networks for Large-Scale Image Recognition},
author={Karen Simonyan and Andrew Zisserman},
year={2015},
eprint={1409.1556},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

View File

@ -90,7 +90,7 @@
Amazon provides an image classification service called ``Rekognition'' \cite{amazon-rekognition}. This service provides multiple services from face recognition, celebrity recognition, object recognition and others. One of these services is called custom labels \cite{amazon-rekognition-custom-labels} that provides the most similar service, to the one this project is about. The custom labels service allows the users to provide custom datasets and labels and using AutoML the Rekognition service would generate a model that allows the users to classify images according to the generated model.
The models generated using Amazon's Rekognition do not provide ways to update the number of labels that were created, without generating a new project. This will involve retraining a large part of the model, which would involve large downtime between being able to add new classes. Training models also could take 30 minutes to 24 hours \cite{amazon-rekognition-custom-labels-training}, which could result in up to 24 hours of lag between the need of creating a new label and being able to classify that label. A problem also arises when the uses need to add more than one label at the same time. For example, the user sees the need to create a new label and starts a new model training, but while the model is training a new label is also needed. The user now either stops the training of the new model and retrains a new one or waits until the one currently running stops and trains a new one. If new classification classes are required with frequency, this might not be the best platform to choose.
The models generated using Amazon's Rekognition do not provide ways to update the number of labels that were created, without generating a new project. This will involve retraining a large part of the model, which would involve large downtime between being able to add new classes. Training models also could take 30 minutes to 24 hours \cite{amazon-rekognition-custom-labels-training}, which could result in up to 24 hours of lag between the need of creating a new label and being able to classify that label. A problem also arises when the uses need to add more than one label at the same time. For example, the user sees the need to create a new label and starts a new model training, but while the model is training a new label is also needed. The user now either stops the training of the new model and retrains a new one, or waits until the one currently running stops and trains a new one. If new classification classes are required with frequency, this might not be the best platform to choose.
%https://aws.amazon.com/machine-learning/ml-use-cases/
@ -107,22 +107,24 @@
This section will analyse possible models that would obtain the best results. The models for this project have to be the most efficient as possible while resulting in the best accuracy as possible.
A classical example is the MNIST Dataset \cite{mnist}. Models for the classification of the MNIST dataset can be both simple or extremely complex and achieve different levels of complexity.
For example, in \cite{mist-high-accuracy} an accuracy $99.91\%$, by combining 3 Convolutional Neural Networks (CNNs), with different kernel sizes and by changing hyperparameters, augmenting the data, and in \cite{lecun-98} an accuracy of $95\%$ was achieved using a 2 layer neural network with 300 hidden nodes. Both these models achieve the accuracy that is required for this project but \cite{mist-high-accuracy} is more expensive to run. There when deciding when to choose what models they create, the system should choose to create the model that can achieve the required accuracy while taking the leas amount of effort to train.
For example, in \cite{mist-high-accuracy} an accuracy $99.91\%$, by combining 3 Convolutional Neural Networks (CNNs), with different kernel sizes and by changing hyperparameters, augmenting the data, and in \cite{lecun-98} an accuracy of $95\%$ was achieved using a 2 layer neural network with 300 hidden nodes. Both these models achieve the accuracy that is required for this project, but \cite{mist-high-accuracy} is more expensive to run. When deciding when to choose what models they create, the system should choose to create the model that can achieve the required accuracy while taking the leas amount of effort to train.
% TODO fix the inglish in these sentance
The models for this system to work as indented should be as small as possible while obtaining the required accuracy required to achieve the task of classification of the classes.
As the service might need to handle a large number of requests, it needs to be able to handle as many requests as possible. This would require that the models are easy to run, and smaller models are easier to run, therefore the system requires a balance between size and accuracy.
\subsection{Method of Image Classification Models}
There are all multiple ways of creating of achieving image classification, the requirements of the system are that the system should return the class that an image that belongs to. Which means that we will be using supervised classification methods, as these are the ones that meet the requirements of the system.
There are all multiple ways of achieving image classification, the requirements of the system are that the system should return the class that an image that belongs to. Which means that we will be using supervised classification methods, as these are the ones that meet the requirements of the system.
% TODO find some papers to proff this
The system will use supervised models to classify images, using a combination of different types of models, using neural networks, convolution neural networks, deed neural networks and deep convolution neural networks.
These types were decided as they have had a large success in the past in other image classification chanlleges, for example in the ImageNet chanlleges \cite{imagenet}, which has ranked different models in classifying a 14 million images. The contest has been running since 2010 to 2017.
These types were decided as they have had a large success in the past in other image classification challenges, for example in the ImageNet challenges \cite{imagenet}, which has ranked different models in classifying a 14 million images. The contest has been running since 2010 to 2017.
The models that participated in the contest tended to use more and more Deep convolution neural networks, out of various the models that were generated there are a few landmark models that were able to achieve high accuracies, including AlexNet \cite{krizhevsky2012imagenet}, ResNet-152 \cite{resnet-152}, EfficientNet \cite{efficientnet}.
The models that participated in the contest tended to use more and more Deep convolution neural networks, out of various models that were generated there are a few landmark models that were able to achieve high accuracies, including AlexNet \cite{krizhevsky2012imagenet}, ResNet-152 \cite{resnet-152}, EfficientNet \cite{efficientnet}.
% TODO find vgg to cite
These models can be used in two ways in the system, they can be used to generate the models via transfer learning and by using the model structure as a basis to generate a complete new model.
@ -136,7 +138,12 @@
While using AlexNet would probably yield desired results, it would complicate the other parts of the service. As a platform as a service, the system needs to manage the number of resources available, and requiring to use 2 GPUs to train a model would limit the number of resources available to the system by 2-fold.
% TODO talk more about this
ResNet \cite{resnet} is a deep convolution neural network that participated in the ImageNet ILSVRC-2015 contest, it achieved a top-1 error rate of $21.43\%$ and a top-5 error rate of $5.71\%$.
ResNet \cite{resnet} is a deep convolution neural network that participated in the ImageNet ILSVRC-2015 contest, it achieved a top-1 error rate of $21.43\%$ and a top-5 error rate of $5.71\%$. ResNet was created to solve a problem, the problem of degradation of training accuracy when using deeper models. Close to the release of the ResNet paper, there was evidence that deeper networks result in higher accuracy results \cite{going-deeper-with-convolutions, very-deep-convolution-networks-for-large-scale-image-recognition}. But the increasing the depth of the network resulted in training accuracy degradation.
% This needs some work in terms of gramar
ResNet works by creating shortcuts between sets of layers, the shortcuts allow residual values from previous layers to be used on the upper layers. The hypothesis being that it is easier to optimize the residual mappings than the linear mappings.
The results proved that the using the residual values improved training of the model, as the results of the challenge prove.
It's important to note that using residual networks tends to give better, the deeper the model is. While this could have a negative impact in performance, the number of parameters per layer does not grow that steeply in ResNet when comparing it with other architectures as it uses other optimizations such as $1x1$ kernel sizes, which are more space efficient. Even with these optimizations, it can still achieve incredible results. Which might make it a good contender to be used in the service as one of the predefined models to use to try to create the machine learning models.
% RestNet-152