From 4547229dfc90d906c22fbb048686a619c2a3f48f Mon Sep 17 00:00:00 2001 From: Andre Henriques Date: Thu, 4 Jan 2024 11:05:25 +0000 Subject: [PATCH] fix: some fixes to the report --- main.bib | 8 ++++++++ report/report.tex | 39 ++++++++++++++++++++------------------- 2 files changed, 28 insertions(+), 19 deletions(-) diff --git a/main.bib b/main.bib index 65d071c..82257d5 100644 --- a/main.bib +++ b/main.bib @@ -227,3 +227,11 @@ year = 1998 biburl = {https://dblp.org/rec/journals/corr/abs-1905-11946.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } +@misc{resnet, + title={Deep Residual Learning for Image Recognition}, + author={Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun}, + year={2015}, + eprint={1512.03385}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} diff --git a/report/report.tex b/report/report.tex index 1ab21ea..1d84dd3 100644 --- a/report/report.tex +++ b/report/report.tex @@ -81,29 +81,28 @@ \end{itemize} \section{Literature and Technical Review} - \subsection{Introduction} - This section reviews current existing technologies in the market that do image classification. It also reviews current image classification technologies, and which meats the requirements for the project. This review also analysis methods that are used to distribute the learning between various machines, and how to spread the load so minimum reloading of the models is required when running the model. + This section reviews existing technologies in the market that do image classification. It also reviews current image classification technologies, which meet the requirements for the project. This review also analyses methods that are used to distribute the learning between various physical machines, and how to spread the load so minimum reloading of the models is required when running the model. - \subsection{Current existing classification platforms} + \subsection{Existing classification platforms} There are currently some existing software as a service (SaaS) platforms that do provide similar services to the ones this will project will be providing. %Amazon provides bespoque machine learning services that if were contacted would be able to provide image classification services. Amazon provides general machine learning services \cite{amazon-machine-learning}. - Amazon provides an image classification service called ''Rekognition`` \cite{amazon-rekognition}. This services provides multiple services from face recognition, celebrity recognition, object recognition and others. One of these services is called custom labels \cite{amazon-rekognition-custom-labels} which provides the most similar service, to the one this project is about. The custom labels service allows the users to provide custom datasets and labels and using AutoML the Rekognition service would generate a model that allows the users to classify images according to the generated model. + Amazon provides an image classification service called ``Rekognition'' \cite{amazon-rekognition}. This service provides multiple services from face recognition, celebrity recognition, object recognition and others. One of these services is called custom labels \cite{amazon-rekognition-custom-labels} that provides the most similar service, to the one this project is about. The custom labels service allows the users to provide custom datasets and labels and using AutoML the Rekognition service would generate a model that allows the users to classify images according to the generated model. - The models generated using Amazon's Rekognition don't provide ways to update the number of labels that were created without generating a new project which will involve retraining a large part of the model which would involve large downtime between being able to add new classes. Training models also could take 30 minutes to 24 hours \cite{amazon-rekognition-custom-labels-training} which cloud result in up to 24 hours of lag between the need of creating a new label and being able to classify that label. A problem also arrises when the uses need to add more than one label at the same time, for example, the user sees the need to create a new label and starts a new model training, but while the model is training a new label is also needed the user now either stops the training of the new model and retrains a new one or waits until the one currently running stops and trains a new one. If new classification classes are required with frequency, this might not be the best platform to choose. + The models generated using Amazon's Rekognition do not provide ways to update the number of labels that were created without generating a new project which will involve retraining a large part of the model which would involve large downtime between being able to add new classes. Training models also could take 30 minutes to 24 hours \cite{amazon-rekognition-custom-labels-training}, which could result in up to 24 hours of lag between the need of creating a new label and being able to classify that label. A problem also arrises when the uses need to add more than one label at the same time, for example, the user sees the need to create a new label and starts a new model training, but while the model is training a new label is also needed the user now either stops the training of the new model and retrains a new one or waits until the one currently running stops and trains a new one. If new classification classes are required with frequency, this might not be the best platform to choose. %https://aws.amazon.com/machine-learning/ml-use-cases/ %https://aws.amazon.com/rekognition/image-features/ - Similarly, Google also has ''Cloud Vision API`` \cite{google-vision-api} which provides similar services to Amazon's Rekognition. But Google's Vision API appears to be more targeted at videos than images, as indicated by their price sheet \cite{google-vision-price-sheet}. They have tag and product identifiers, where every image only has one tag or product. The product identifier system seams to work differently than the Amazon's Rekognition and worked based on K neighbouring giving the user similar products on not classification labels \cite{google-vision-product-recognizer-guide}. + Similarly, Google also has ``Cloud Vision API'' \cite{google-vision-api} which provides similar services to Amazon's Rekognition. But Google's Vision API appears to be more targeted at videos than images, as indicated by their price sheet \cite{google-vision-price-sheet}. They have tag and product identifiers, where every image only has one tag or product. The product identifier system seams to work differently than the Amazon's Rekognition and worked based on K neighbouring giving the user similar products on not classification labels \cite{google-vision-product-recognizer-guide}. - This method is more effective at allowing users to add new types of products, but as it does not give defined classes as the output the system does not give the target functionality that this project is hoping to achieve. + This method is more effective at allowing users to add new types of products, but as it does not give defined classes as the output the system does not give the target functionality that this project is aming to achieve. \subsection{Requirements of the Image Classification Models} - The of the main objectives of this project are to be able to create models that can give a class given an image for any dataset. Which means that there will be no ''one solution fits all to the problem``. While the most complex way to solve a problem would most likely result in success, it might not be the most efficient way to achieve the problem this porject is trying to achieve. + The of the main objectives of this project are to be able to create models that can give a class given an image for any dataset. Which means that there will be no ``one solution fits all to the problem''. While the most complex way to solve a problem would most likely result in success, it might not be the most efficient way to achieve the problem this porject is trying to achieve. This section will analyse possible models that would obtain the best results. The models for this project have to be the most efficient as possible while resulting in the best accuracy as possible. @@ -133,26 +132,28 @@ This section will compare the different models that did well in the image net challenge. - AlexNet \cite{krizhevsky2012imagenet} is a deep convolution neural network that participated in the ImageNet LSVRC-2010 contest, it achieved a top-1 error rate of $37.5\%$, and a top-5 error rate of $37.5\%$, and a variant of this model participated in the ImageNet LSVRC-2012 contest and achieved top-5 error rate of $15.3\%$. The architecture of AlexNet consists of 5 convolution layers that are run separately followed by 3 dense layers, some layers are followed by Max pooling. The training the that was done using multiple GPUs, one GPU would run the part of each layer, and some layers are connected between GPUs. The model during training also contained data argumentation techniques such as label preserving data augmentation and dropout. - While using AlexNet would probably yield desired results, it would complicate the other parts of the service. As a platform as a service, the system needs to manage the amount of resources available, and requiring to use 2 GPUs to train a model would limit the amount of resources available to the system by 2-fold. + AlexNet \cite{krizhevsky2012imagenet} is a deep convolution neural network that participated in the ImageNet ILSVRC-2010 contest, it achieved a top-1 error rate of $37.5\%$, and a top-5 error rate of $37.5\%$. A variant of this model participated in the ImageNet LSVRC-2012 contest and achieved top-5 error rate of $15.3\%$. The architecture of AlexNet consists of 5 convolution layers that are run separately followed by 3 dense layers, some layers are followed by Max pooling. The training the that was done using multiple GPUs, one GPU would run the part of each layer, and some layers are connected between GPUs. The model during training also contained data argumentation techniques such as label preserving data augmentation and dropout. + While using AlexNet would probably yield desired results, it would complicate the other parts of the service. As a platform as a service, the system needs to manage the number of resources available, and requiring to use 2 GPUs to train a model would limit the number of resources available to the system by 2-fold. % TODO talk more about this + ResNet \cite{resnet} is a deep convolution neural network that participated in the ImageNet ILSVRC-2015 contest, it achived a top-1 error rate of $21.43\%$ and a top-5 error rate of $5.71\%$. + % RestNet-152 % EddicientNet - \subsection{Efficiency of transfer learning} + % \subsection{Efficiency of transfer learning} - \subsection{Creation Models} - The models that I will be creating will be Convolutional Neural Network(CNN) \cite{lecun1989handwritten,fukushima1980neocognitron}. - The system will be creating two types of models that cannot be expanded and models that can be expanded. For the models that can be expanded, see the section about expandable models. - The models that cannot be expanded will use a simple convolution blocks, with a similar structure as the AlexNet \cite{krizhevsky2012imagenet} ones, as the basis for the model. The size of the model will be controlled by the size of the input image, where bigger images will generate more deep and complex models. - The models will be created using TensorFlow \cite{tensorflow2015-whitepaper} and Keras \cite{chollet2015keras}. These theologies are chosen since they are both robust and used in industry. + % \subsection{Creation Models} + % The models that I will be creating will be Convolutional Neural Network(CNN) \cite{lecun1989handwritten,fukushima1980neocognitron}. + % The system will be creating two types of models that cannot be expanded and models that can be expanded. For the models that can be expanded, see the section about expandable models. + % The models that cannot be expanded will use a simple convolution blocks, with a similar structure as the AlexNet \cite{krizhevsky2012imagenet} ones, as the basis for the model. The size of the model will be controlled by the size of the input image, where bigger images will generate more deep and complex models. + % The models will be created using TensorFlow \cite{tensorflow2015-whitepaper} and Keras \cite{chollet2015keras}. These theologies are chosen since they are both robust and used in industry. - \subsection{Expandable Models} - The current most used approach for expanding a CNN model is to retrain the model. This is done by, recreating an entire new model that does the new task, using the older model as a base for the new model \cite{amazon-rekognition}, or using a pretrained model as a base and training the last few layers. + % \subsection{Expandable Models} + % The current most used approach for expanding a CNN model is to retrain the model. This is done by, recreating an entire new model that does the new task, using the older model as a base for the new model \cite{amazon-rekognition}, or using a pretrained model as a base and training the last few layers. - There are also unsupervised learning methods that do not have a fixed number of classes. While this method would work as an expandable model method, it would not work for the purpose of this project. This project requires that the model has a specific set of labels which does not work with unsupervised learning which has unlabelled data. Some technics that are used for unsupervised learning might be useful in the process of creating expandable models. + % There are also unsupervised learning methods that do not have a fixed number of classes. While this method would work as an expandable model method, it would not work for the purpose of this project. This project requires that the model has a specific set of labels which does not work with unsupervised learning which has unlabelled data. Some technics that are used for unsupervised learning might be useful in the process of creating expandable models. \section{Problem analysis \& design choices}