chore: more work on report

This commit is contained in:
Andre Henriques 2024-02-27 10:05:17 +00:00
parent d65dfcda69
commit 8d6ffce44d
2 changed files with 47 additions and 12 deletions

View File

@ -270,3 +270,12 @@ year = 1998
keywords={Manifolds;Neural networks;Computer architecture;Standards;Computational modeling;Task analysis},
doi={10.1109/CVPR.2018.00474}
}
@article{json-api-usage-stats,
author = {Hnatyuk, Kolya},
title = {{130+ API Statistics: Usage, Growth {\&} Security}},
journal = {MarketSplash},
year = {2023},
month = oct,
publisher = {MarketSplash},
url = {https://marketsplash.com/api-statistics}
}

View File

@ -89,9 +89,9 @@
%Amazon provides bespoque machine learning services that if were contacted would be able to provide image classification services. Amazon provides general machine learning services \cite{amazon-machine-learning}.
Amazon provides an image classification service called ``Rekognition'' \cite{amazon-rekognition}. This service provides multiple services from face recognition, celebrity recognition, object recognition and others. One of these services is called custom labels \cite{amazon-rekognition-custom-labels} that provides the most similar service, to the one this project is about. The custom labels service allows the users to provide custom datasets and labels and using AutoML the Rekognition service would generate a model that allows the users to classify images according to the generated model.
Amazon provides an image classification service called ``Rekognition'' \cite{amazon-rekognition}. This service provides multiple services from face recognition, celebrity recognition, object recognition and others. One of these services is called custom labels \cite{amazon-rekognition-custom-labels} that provide the most similar service, to the one this project is about. The custom labels service allows the users to provide custom datasets and labels and using AutoML the Rekognition service would generate a model that allows the users to classify images according to the generated model.
The models generated using Amazon's Rekognition do not provide ways to update the number of labels that were created, without generating a new project. This will involve retraining a large part of the model, which would involve large downtime between being able to add new classes. Training models also could take 30 minutes to 24 hours \cite{amazon-rekognition-custom-labels-training}, which could result in up to 24 hours of lag between the need of creating a new label and being able to classify that label. A problem also arises when the uses need to add more than one label at the same time. For example, the user sees the need to create a new label and starts a new model training, but while the model is training a new label is also needed. The user now either stops the training of the new model and retrains a new one, or waits until the one currently running stops and trains a new one. If new classification classes are required with frequency, this might not be the best platform to choose.
The models generated using Amazon's Rekognition do not provide ways to update the number of labels that were created, without generating a new project. This will involve retraining a large part of the model, which would involve large downtime between being able to add new classes. Training models also could take 30 minutes to 24 hours, \cite{amazon-rekognition-custom-labels-training}, which could result in up to 24 hours of lag between the need of creating a new label and being able to classify that label. A problem also arises when the uses need to add more than one label at the same time. For example, the user sees the need to create a new label and starts a new model training, but while the model is training a new label is also needed. The user now either stops the training of the new model and retrains a new one, or waits until the one currently running stops and trains a new one. If new classification classes are required with frequency, this might not be the best platform to choose.
%https://aws.amazon.com/machine-learning/ml-use-cases/
@ -103,12 +103,12 @@
\subsection{Requirements of Image Classification Models}
The of the main objectives of this project are to be able to create models that can give a class given an image for any dataset. Which means that there will be no ``one solution fits all to the problem''. While the most complex way to solve a problem would most likely result in success, it might not be the most efficient way to achieve the problem this project is trying to achieve.
The of the main objectives of this project are to be able to create models that can give a class given an image for any dataset. Which means that there will be no ``one solution fits all to the problem''. While the most complex way to solve a problem would most likely result in success, it might not be the most efficient way to achieve the results.
This section will analyse possible models that would obtain the best results. The models for this project have to be the most efficient as possible while resulting in the best accuracy as possible.
A classical example is the MNIST Dataset \cite{mnist}. Models for the classification of the MNIST dataset can be both simple or extremely complex and achieve different levels of complexity.
For example, in \cite{mist-high-accuracy} an accuracy $99.91\%$, by combining 3 Convolutional Neural Networks (CNNs), with different kernel sizes and by changing hyperparameters, augmenting the data, and in \cite{lecun-98} an accuracy of $95\%$ was achieved using a 2 layer neural network with 300 hidden nodes. Both these models achieve the accuracy that is required for this project, but \cite{mist-high-accuracy} is more expensive to run. When deciding when to choose what models they create, the system should choose to create the model that can achieve the required accuracy while taking the leas amount of effort to train.
For example, in \cite{mist-high-accuracy} an accuracy $99.91\%$, by combining 3 Convolutional Neural Networks (CNNs), with different kernel sizes and by changing hyperparameters, augmenting the data, and in \cite{lecun-98} an accuracy of $95\%$ was achieved using a 2 layer neural network with 300 hidden nodes. Both these models achieve the accuracy that is required for this project, but \cite{mist-high-accuracy} are more computational intensive to run. When deciding when to choose what models they create, the system should choose to create the model that can achieve the required accuracy while taking the leas amount of effort to train.
% TODO fix the inglish in these sentance
The models for this system to work as indented should be as small as possible while obtaining the required accuracy required to achieve the task of classification of the classes.
@ -141,7 +141,7 @@
While using AlexNet would probably yield desired results, it would complicate the other parts of the service. As a platform as a service, the system needs to manage the number of resources available, and requiring to use 2 GPUs to train a model would limit the number of resources available to the system by 2-fold.
% TODO talk more about this
ResNet \cite{resnet} is a deep convolution neural network that participated in the ImageNet ILSVRC-2015 contest, it achieved a top-1 error rate of $21.43\%$ and a top-5 error rate of $5.71\%$. ResNet was created to solve a problem, the problem of degradation of training accuracy when using deeper models. Close to the release of the ResNet paper, there was evidence that deeper networks result in higher accuracy results \cite{going-deeper-with-convolutions, very-deep-convolution-networks-for-large-scale-image-recognition}. But the increasing the depth of the network resulted in training accuracy degradation.
ResNet \cite{resnet} is a deep convolution neural network that participated in the ImageNet ILSVRC-2015 contest, it achieved a top-1 error rate of $21.43\%$ and a top-5 error rate of $5.71\%$. ResNet was created to solve a problem, the problem of degradation of training accuracy when using deeper models. Close to the release of the ResNet paper, there was evidence that deeper networks result in higher accuracy results, \cite{going-deeper-with-convolutions, very-deep-convolution-networks-for-large-scale-image-recognition}. but the increasing the depth of the network resulted in training accuracy degradation.
% This needs some work in terms of gramar
ResNet works by creating shortcuts between sets of layers, the shortcuts allow residual values from previous layers to be used on the upper layers. The hypothesis being that it is easier to optimize the residual mappings than the linear mappings.
The results proved that the using the residual values improved training of the model, as the results of the challenge prove.
@ -150,10 +150,10 @@
% MobileNet
% EfficientNet
EfficientNet \cite{efficient-net} is deep convulution neural network that was able to achieve $84.3\%$ top-1 accuracy while ``$8.4$x smaller and $6.1$x faster on inference thatn the best existing ConvNet''. EfficientNets\footnote{the family of models that use the thecniques that described in \cite{efficient-net}} are models that instead of the of just increasing the depth or the width of the model, we increase all the parameters at the same time by a constant value. By not scaling only depth EfficientNets are able to aquire more information about the images specialy the image size is taken into acount.
EfficientNet \cite{efficient-net} is a deep convolution neural network that was able to achieve $84.3\%$ top-1 accuracy while ``$8.4x$ smaller and $6.1x$ faster on inference than the best existing ConvNet''. EfficientNets\footnote{the family of models that use the thecniques that described in \cite{efficient-net}} are models that instead of the of just increasing the depth or the width of the model, we increase all the parameters at the same time by a constant value. By not scaling only depth EfficientNets can acquire more information about the images specially the image size is considered.
To test their results the EfficientNet team created a baseline model which as a building block used the mobile inverted bottleneck MBConv \cite{inverted-bottleneck-mobilenet}. The baseline model was then scaled using the compound method which resulted in better top-1 and top-5 accuracy.
While EfficientNets are smaller than their non-EfficientNet conterparts they are more computational intencive, a ResNet-50 scaled using the EfficientNet compound scaling method is $3\%$ more computational intencive then a ResNet-50 scaled using only depth while only improving the top-1 accuracy by $0.7\%$, and as the model will be trained and run multiple times decreasing the computational cost might be a better overall target for sustainability then being able to offer higher accuracies.
Eventhough scaling using the EfficientNet compound method might not yield the best results using some of the EfficientNets what were obtimized by the team to would be optimal, for example, EfficientNet-B1 is both small and efficient while still obtaining $79.1\%$ top-1 accuracy in ImageNet, and realistically the datasets that this system will process will be smaller and more scope specific then ImageNet.
While EfficientNets are smaller than their non-EfficientNet counterparts they are more computational intensive, a ResNet-50 scaled using the EfficientNet compound scaling method is $3\%$ more computational intensive than a ResNet-50 scaled using only depth while only improving the top-1 accuracy by $0.7\%$, and as the model will be trained and run multiple times decreasing the computational cost might be a better overall target for sustainability then being able to offer higher accuracies.
Even though scaling using the EfficientNet compound method might not yield the best results using some EfficientNets what were optimized by the team to would be optimal, for example, EfficientNet-B1 is both small and efficient while still obtaining $79.1\%$ top-1 accuracy in ImageNet, and realistically the datasets that this system will process will be smaller and more scope specific than ImageNet.
% \subsection{Efficiency of transfer learning}
@ -169,10 +169,38 @@
% There are also unsupervised learning methods that do not have a fixed number of classes. While this method would work as an expandable model method, it would not work for the purpose of this project. This project requires that the model has a specific set of labels which does not work with unsupervised learning which has unlabelled data. Some technics that are used for unsupervised learning might be useful in the process of creating expandable models.
\pagebreak
\section{Problem Analysis}
\section{System Analysis}
\subsection{Introduction}
Understanding the project that is being built is critical in the software deployment process, this section will look into the required parts for the project to work.
As a SaaS project, there are some required parts that the project needs to have:
\begin{itemize}
\item{Web platform}
\item{JSON API}
\item{Server Management}
\item{Dataset Management}
\item{Model Management}
\end{itemize}
\subsection{Web platform}
The web app is where users manage models, and data. The user will access the web app and configure the model, and manage that data set.
\subsection{JSON API}
A big part of a software as a service is the ability to communicate with other services, nowadays, the way that systems communicate with each other is using mostly JSON and Rest APIs \cite{json-api-usage-stats}. Since the system will need to be communicated with other services to work as intended.
\subsection{Server Management}
Since AI training is notoriously expensive the system cannot run on one server alone, as this would put too much strain in that server.
The system needs to be able to distribute the load between the multiple servers.
For that reason, the service needs to both be able to send training and prediction jobs to servers that have the resources to train models or predict classes from images.
The system has to be able to choose the servers to run the models in an optimized way.
For example, when training, send training jobs to the same server to prevent the server from having to reload the data again.
\pagebreak
\section{Design Choices}
\subsection{Structure of the Service}
The system needs to manage:
The system has to manage:
\begin{itemize}
\item{User data}
@ -219,8 +247,6 @@
The TP when training the model decides when the training is finished, this could be when the training time has finished or if the model accuracy is not substantially increasing within the last training rounds.
During the training process the TP needs to also cache the dataset being used, this is because to create one model, the system might try to generate more than on model and match the best of the generated models with
\pagebreak
\section{Design Choices}