Professional Documents
Culture Documents
ABSTRACT
Due to the difficulty in assessing the exact nature of a time series, it is often
forecasting models have been developed in the literature, but they have produced
minimum accuracy in forecasting the bit coin price. The study involves the time series
forecasting of the bit coin prices with improved efficiency using long Short –term memory
techniques (LSTM) and compares its predictability with the traditional method
machine learning models have been shown to perform better on FPGA than on a GPU. The
implemented in a practical or real time setting for predicting into the proposed as opposed to
learning what has already happened. In addition, the ability to predict using streaming data
should improve the model. Sliding window validation is an approach not implemented here but
Comparative analysis based on the experiment with proposed algorithm and benchmark
algorithms.
Challenges
The overall history of Bit coin price has been a strong uptrend, so most algorithms tend to
INTRODUCTION:
To predict the future, has always been a goal or dream of humanity; however humans have
always been terrible at it1. Predicting the price movements for crypto currency or bit coin is a
relatively similar to predicting stocks or the price of USD going up or down. However, unlike a
company with physical buildings and people working in it, bit coin is purely digital and therefore
very difficult to understand when and why the price would change. With no physical entity and
the way bit coin works, most of the effect on the price is based on how the world feels about it.
For this reason, understanding people is one way to help predict bit coins future. A sentiment
analysis on different kind of social media to get a better understanding of how people feel
humans are terrible at predicting, even with information at their hand. For this reason, by using a
machine learning algorithm, the algorithm will try to predict the future price based on the
information given to it. Prediction of mature financial markets such as the stock market has been
prediction problem in a market still in its transient stage. Traditional time series prediction
methods such as Holt-Winters exponential smoothing models rely on linear assumptions and
require data that can be broken down into trend, seasonal and noise to be effective . This type of
methodology is more suitable for a task such as forecasting sales where seasonal effects are
present. Due to the lack of seasonality in the Bitcoin market and its high volatility, these methods
are not very effective for this task. Given the complexity of the task, deep learning makes for an
interesting technological solution based on its performance in similar areas. The recurrent neural
network (RNN) and the long short term memory (LSTM) are favoured over the traditional
AIM &OBJECTIVES:
The aim of this project is to investigate with what accuracy the price of Bitcoin can be predicted
using machine learning and compare parallelisation methods executed on multi-core and GPU
environments.
Comparative analysis based on the experiment with proposed algorithm and benchmark
algorithms.
CHALLENGES
The overall history of Bit coin price has been a strong uptrend, so most algorithms tend to
The goal is to be able to predict if the bit coin price change would increase or decrease and by
how much, however this goal can be accomplished in many ways. How often does it need to be
able to predict? Whether it predicts every minute, 10 minutes or hourly will all decide the time
frame the data set will be collected in. If predictions are 10 minutes, then all social media in a 10
minute timeframe will be used to predict, however if it is 1 hour, then 1 hour worth of data will
be collected to predict etc. In all, how often to predict would be based on the amount of data
gathered in a day, the more data gathered the smaller timeframe, however according to they
concluded that the shorter the frame, the more influence sentiment value had which is something
to take into consideration as well. However news and social network messages takes time to get
around. A longer time frame would help on that, since messages and news has to be written and
when it reaches people, they also have to open and read it.
CHAPTER -2
SYSTEM ANALYSIS
EXISTING SYSTEM
There are many approaches people have taken to try to tackle intra-day and high-speed
trading. Just like in stock, forex, commodity, and options trading, these are coupled with highly-
efficient bots and systems that perform the trades automatically. Humans are a problem in this
situation. Bitcoin presents an interesting parallel to this as it is a time series prediction problem
in a market still in its transient stage. Traditional time series prediction methods such as Holt-
Winters exponential smoothing models rely on linear assumptions and require data that can be
broken down into trend, seasonal and noise to be effective. This type of methodology is more
suitable for a task such as forecasting sales where seasonal effects are present. Due to the lack of
seasonality in the Bitcoin market and its high volatility, these methods are not very effective for
this task.
If you lose the offline wallet, the coins are lost forever
Online coins can be hacked. Once hacked coins are lost for life.
EXISTING ALGORITHM
It focuses only on the closing price of the bit coin to develop the predictive model. It does not
take into consideration the other economic factors such as news about bit coin, government
policies, and market sentiments into account which could be the future scope of the project to
predict the price with much more accuracy. The prediction is limited to the past data. The ability
to predict on streaming data would improve the performance and predictability of the model. The
study involves only the comparison between ARIMA and LSTM. Comparing with more machine
learning models would confirm the result. The model developed using LSTM have more
accuracy than the traditional models which prove deep learning model, in our case LSTM(Long
Short-Term Memory) is evidently effective learner on training data than ARIMA with the LSTM
more capable for recognizing longer-term dependencies. The study is done using the daily price
fluctuations of the bit coin which triggers the study to further investigate in future the
Advantages
number of transactions
closeness centrality
The transactions are decentralized. Unlike the centralized transactions associated with
cash and card transactions, bitcoin transactions are carried out via decentralized private
Proposed Algorithm
BITCOIN CRYPTOCURRENCY
The first decentralized crypto currency was created in 2009 called Bitcoin2. In 2010 price for
one bit coin never reached above one dollar3, but in May 2018 it was the one with the highest
market cap of around 149 billion dollars with a little more than 17 million bit coins in
circulation4. This makes one bit coin worth around 8700 dollars (Market cap / circulation).
Bitcoin is a digital currency, which can be used to buy or trade things electronically, however
what makes it different from normal currency is the decentralization of the system by using
blockchain technology5. By being decentralized, it has no single entity controlling the network
but instead maintained by a group of volunteers running on different computers around the
world. Physical currency such as USD can have unlimited supply of currency, if the government
decides to print more, which can change the value of the currency related to others. However, the
increase of Bitcoin circulation is heavily guarded by an algorithm, which allows only a few
Bitcoin to be created every hour by miners until it reaches a total cap of 21 million bitcoins6.
The decentralization of the system in theory allows for anonymity, since there are no bank or
centralized entity to confirm and validate transactions that are happening. In practice however,
when a transaction is made, each user uses their identifier, known as addresses during the
transaction. These addresses are not associated with a name, but because the transaction must be
transparent because of the way decentralization of the system works, it is possible for everyone
to see it. A disadvantage of the system is that, in case of losing the address, forgetting it or
Web sites – To a degree any websites that has information related to bit coin can be written here,
but most important type of sites would be major media outlets and different government sites of
Internet forums – Bitcointalk.org is created by one of the founder of bit coin, and is one of the
biggest forums on bit coin, however many smaller ones exist as well, whom dedicates
themselves to news and information on bit coin. These sites are mostly guaranteed to have users
who are interested and understand bit coin, which mean most posts will be made by educated
Price collection – To obtain prices on bit coin, sites like Bitcoin exchange exists where it is
possible to buy or sell bit coins. Since every sites have different prices on the bit coin, it is not
possible to guarantee that the biggest site is the one with the cheapest price. According to
article27 the top 3 most recommended is Coin base, for being the biggest, Gemini Exchange for
low fees and Changelly since they have lesser known crypto currencies.
These social platforms and other data sources will be looked upon and analyzed to see which
would fit best to gather data to use for predicting bit coin prices. Although each platform has
their own demographic, many of them are overlapping, this could be a problem if multiple
platforms were used, giving duplicate data rather than more data. Facebook has the potential of
giving the most data, however since the main demographic probably do not know or use Bitcoin,
it might be mostly data which does not influence Bitcoin price. Twitter and Reddit on the other
hand might have less data to obtain, however more specialized knowledge of the users might
lead to some changes in the Bitcoin price. Telegram being the extreme of them all, with the least
SYSTEM STUDY
FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysis the feasibility
study of the proposed system is to be carried out. This is to ensure that the proposed system is
not a burden to the company. For feasibility analysis, some understanding of the major
ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY
ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system will have on
the organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus the developed
system as well within the budget and this was achieved because most of the technologies used
TECHNICAL FEASIBILITY
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the available
technical resources. This will lead to high demands on the available technical resources. This
will lead to high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this system.
SOCIAL FEASIBILITY
The aspect of study is to check the level of acceptance of the system by the user. This includes
the process of training the user to use the system efficiently. The user must not feel threatened by
the system, instead must accept it as a necessity. The level of acceptance by the users solely
depends on the methods that are employed to educate the user about the system and to make him
familiar with it. His level of confidence must be raised so that he is also able to make some
Deep learning, a class of machine learning techniques that are used to extract features
from data, and CNN (Convolutional Neural Network), a type of artificial neural network that has
been extended across space using shared weights, have been found suitable for computer vision
tasks . At the beginning, researchers experimented with small datasets. With the lowered cost of
expensive processing hardware, increasing chip processing capabilities and increasing number of
data existing online, it was possible to implement deep neural networks in larger data sets and in
real-life scenario data sets as well.Deep learning (also known as deep structured learning or
belief networks and recurrent neural networks have been applied to fields including computer
vision, speech recognition, natural language processing, audio recognition, social network
filtering, machine translation, bioinformatics, drug design and board game programs, where they
have produced results comparable to and in some cases superior to human experts.Deep learning
models are vaguely inspired by information processing and communication patterns in biological
nervous systems yet have various differences from the structural and functional properties of
biological brains (especially human brain), which make them incompatible with neuroscience
evidences.
In the past, deep learning was not a growth item due to hardware constraints. However,
nowadays, there is a lot of information that can be learned, also, GPU parallel processing
technology became processing a lot of information in real time, and hardware performance is
better than before so that the deep learning technology can be applied to various fields such as
computer vision field, image recognition. The models of deep running that are mainly studied are
CNN (Convolution Neural Network) and RNN (Recurrent Neural Network) . CNN is an
excellent way to extract high-level abstraction features from images or to process texture
information, and has proven to be excellent in object recognition in 2012. CNN has been applied
to various fields such as video and speech recognition in an effort to express and learn large
amounts of data in the form of meaningful data. RNN is a kind of artificial neural network. When
inputting the data, the value of the hidden layer is stored in the neural network and calculated as
the next input value, which is good for modeling the time series information. Traditional RNNs
use Long Short-Term Memory (LSTM) as an alternative to long-term retention and information
memorization problems. Therefore, LSTM RNN is a deep learning model that solves the
vanishing gradient problem of existing RNN model. A support vector is an optimal hyperplane
that can distinguish two types of data given in a feature space. CRF (Conditional Random Field)
is not referring to a single sequence, but by referring to previous and subsequent states to
determine the current state. HMM (Hidden Markov Model) and MEMM (Maximum Entropy
Markov Model). HMM is modeled under the strong independent assumption of Markov
assumption. This has the advantage that it is easy to model the real problem, but the current
hidden state is affected by the current observation state. The model to solve this is the maximum
Decisions trees use branches to represent observations, and the leaves are the target values.
Decision trees has been used in the past studies in vulnerabilities testing . They are capable of
modeling non-linear data which is the case in our current study. They have also proven to work
wellwith data that has outliers . Experts, however, state that decision trees are prone to
overfitting. While there are techniques to prevent this event from happening, a better tree-based
algorithm can be used as a replacement. One of this is random forest which uses ensemble
technique by generating a series of decision trees and then averaging the predictive result. The
final output is strengthened by combining the generated trees to make better prediction or
classification. Random forest has been proven to be a successful vulnerability predictor model in
Deep Learning is one of the recent innovations in artificial intelligence. It extends the
power of neural models by using more hidden layers. Usually, the typical neural networks utilize
shallow learning algorithms which are less complicated. Deep Learning, however, amplifies the
neural network by optimizing the neural model. It does this by adding multiple layers between
the input and output. The deep feed forward network, one of the Deep Learning techniques, is
the chosen technique for our study. In deep feed forward network, the goal is to approximate a
function and maps it into a group. Neural network information flows from the evaluated function
to the define function and finally to the output. Feed forward network forms the basis of other
Deep Learning networks such as convolutional networks, which are considered as a particular
type of feed forward network. While deep learning is said to be data intensive, it can also be fine-
tuned to work on smaller datasets as we have done in our study. however, acknowledge that we
still have not fully exploited the capabilities of deep learning. Due to the number of hyper
parameters deep learning has plus the presence of other deep learning techniques, the research
One of the most difficult tasks is to understand, during the process of this research, what
information is just noise (makes more harm to the prediction than if it left out) and what
information has a lot of impact on the bit coin price. With the research being focused around
sentiment data, the first part is about psychology of humanity, why sentiment score might be
valuable information rather than noise and where to gather such data from. This information can
be gathered from different social network platforms, but which depends on the people that the
sentiment score is aiming for. Sentiment data can also be gathered from global news, that not
necessary has to affect the whole world, but has a big enough impact for people to notice. Lastly
is the technical data, such as price of the bit coin or how many were sold etc.
In the world of economics, they believe that the psychology affects the investors' decision also
known as Animal Spirit36. This means that negative and positive emotions of the investors can
have influence on the price. The confirms this with happy investors is more likely to take more
risk making the bit coin increase, because of them being more likely to buy it. Related but not
completely the same is the media coverage of the bit coin, the investors can be more comfortable
with their bit coin and that will result in the price increasing, however the opposite is also true
with bad publicity, they might sell their bit coins in fear of it dropping, resulting in it dropping.
Positive and negative media news and personal emotion is related however not the same, as they
are data collected different ways (read more in 3.4). Another way to look at it, is that a person
has their emotions however it can be affected or changed based on good or bad news. However
with the internet today, the news of people's feelings can be spread out extremely quickly
through social networks (read more in 3.4) such as Twitter, Reddit, Facebook etc. This means
that not only does news on a national or global scale have a big effect on people, but friends or
SYSTEM SPECIFICATION
HARDWARE REQUIREMENTS:
Ram : 2 GB.
SOFTWARE REQUIREMENTS:
Tools : Simulink
SOFTWARE REQUIREMENT SPECIFICATION
MATLAB
MATLAB (matrix laboratory) is a multi-paradigm numerical computing environment and
fourth-generation programming language. A proprietary programming language developed by
MathWorks, MATLAB allows matrix manipulations, plotting of functions and data,
implementation of algorithms, creation of user interfaces, and interfacing with programs written
in other languages, including C, C++, C#, Java, Fortran and Python.
Although MATLAB is intended primarily for numerical computing, an optional toolbox
uses the MuPAD symbolic engine, allowing access to symbolic computing abilities. An
additional package, Simulink, adds graphical multi-domain simulation and model-based design
for dynamic and embedded systems.
In 2004, MATLAB had around one million users across industry and academia.
MATLAB users come from various backgrounds of engineering, science, and economics.
HISTORY
Cleve Moler, the chairman of the computer science department at the University of New
Mexico, started developing MATLAB in the late 1970s. He designed it to give his students
access to LINPACK and EISPACK without their having to learn Fortran. It soon spread to other
universities and found a strong audience within the applied mathematics community. Jack little,
an engineer, was exposed to it during a visit Moler made to Stanford University in 1983.
Recognizing its commercial potential, he joined with Moler and Steve Bangert. They rewrote
MATLAB in C and founded MathWorks in 1984 to continue its development. These rewritten
libraries were known as JACKPAC. In 2000, MATLAB was rewritten to use a newer set of
libraries for matrix manipulation, LAPACK.
Anonymous Functions
An anonymous function is like an inline function in traditional programming languages,
defined within a single MATLAB statement. It consists of a single MATLAB expression
and any number of input and output arguments.
You can define an anonymous function right at the MATLAB command line or within a
function or script.
This way you can create simple functions without having to create a file for them.
Private Functions
A private function is a primary function that is visible only to a limited group of other
functions. If you do not want to expose the implementation of a function(s), you can
create them as private functions.
Private functions reside in subfolders with the special name private.
They are visible only to functions in the parent folder.
Global Variables
Global variables can be shared by more than one function. For this, you need to declare
the variable as global in all the functions.
If you want to access that variable from the base workspace, then declare the variable at
the command line.
The global declaration must occur before the variable is actually used in a function. It is a
good practice to use capital letters for the names of global variables to distinguish them
from other variables.
Features of MATLAB
It is a high-level language for numerical computation, visualization and application
development.
It also provides an interactive environment for iterative exploration, design and problem
solving.
It provides vast library of mathematical functions for linear algebra, statistics, Fourier
analysis, filtering, optimization, numerical integration and solving ordinary differential
equations.
It provides built-in graphics for visualizing data and tools for creating custom plots.
MATLAB's programming interface gives development tools for improving code quality
maintainability and maximizing performance.
It provides tools for building applications with custom graphical interfaces.
It provides functions for integrating MATLAB based algorithms with external
applications and languages such as C, Java, .NET and Microsoft Excel.
Chapter 5
System Design
. This chapter describes the overall and the detailed architectural design. It
2. Used Naïve forecasting method for predict next T2 by shifting the T1 as predict value.
4. Transform log data form T1 – T24 (2-hour data collected) to mean, min and max
5. Create sliding window based on T12 threshold value (12 window size was used to convert 5
7. Training and Testing TransData in Random forest to forecast predicted value in Mean, Min
and Max.
Algorithm Implementation
Invest
Trade
}
Encrypt data
Gateway
Switches
Attacks
Users
Identify
elements
Fig.No.4.1.1 System Architecture Diagram
model. UML provides comprehensive notification for the full life cycle of object
oriented development.
ADVANTAGES
To take into account the scaling factors that are inherit to complex and
critical system.
SYSTEM
Use case diagrams overview the usage requirements for system. They are
development you will find that use case provide significantly more values because
they describe “the meant” of the actual requirements. A use case describes a
Registration
View Product
Payment
Coin Element
File decrypt
USER
Fig.No.4.2.2 Sequence diagram for user
structure diagram that describes the structure of a system by showing the system’s
classes their attributes, operations (or methods) and the relationship among objects.
The classes in a class diagram represent both the main elements, interactions in the
activities and actions with support for choice, iteration and concurrency. The
activity diagram can be used to describe the business and operational step by step
USER
SYSTEM IMPLEMENTATION
Modules Description
Dataset
Before we build the model, we need to obtain some data for it. There’s a dataset on UCI
Machine learning repository that details minute by minute Bitcoin prices (plus some other
factors) for the last few years (featured on that other blog post). Over this timescale, noise could
overwhelm the signal, so we’ll opt for daily prices. The issue here is that we may have not
sufficient data (we’ll have hundreds of rows rather than thousands or millions). In deep learning,
no model can overcome a severe lack of data. I also don’t want to rely on static files, as that’ll
complicate the process of updating the model in the future with new data. Instead, we’ll aim to
pull data from websites and APIs.
Rule-based features
Human experts with years of experience created many rules to detect whether a
user is fraud or not. An example of such rules is “bitcoin”, i.e. whether the user has been detected
or complained as coin prediction before. Each rule can be regarded as a binary feature that
indicates the coin price prediction likeliness.
Selective labeling
If the coin price score is above a certain threshold, the case will enter a queue for further
investigation by human experts. Once it is reviewed, the final result will be labeled as Boolean,
i.e. coin or clean. Cases with higher scores have higher priorities in the queue to be reviewed.
The cases whose price score are below the threshold are determined as clean by the system
without any human judgment. Once one case is labeled as fraud by human experts, it is very
likely that the seller is not trustable and may be also selling other coin price; hence all the items
submitted by the same seller are labeled as coin price too. The bitcoin seller along with his/her
cases will be removed from the website immediately once detected
Before you get started, let’s establish a couple of things. First, it’s nearly impossible to
accurately predict the value of a stock (and cryptocurrency) with a simple computer algorithm.
This is because there are so many factors that can affect the price of a stock that we cannot
account for. Think about this, for almost no apparent reason, the price of bitcoin rises and surges.
There is no mathematical variable or equation that we can use to predict these rises and falls.
Yes, there are some really advanced computer models for stocks which takes into account many
long-term factors, but nothing is going to give you 100% accuracy.
Appropriate design of deep learning models in terms of network parameters is imperative to their
success. The three main options available when choosing how to select parameters for deep
learning models are random search, grid search and heuristic search methods such as genetic
algorithms. Manual grid search and Bayesian optimization were utilized in this study. Grid
search, implemented for the Elman RNN, is the process of selecting two hyper paramaters with a
minimum and maximum for each. One then searches that feature space looking for the best
performing parameters. This approach was taken for parameters which were unsuitable for
Bayesian optimisation. This model was built using Visual studio in the Python programming
language. Similar to the RNN, Bayesian optimization was chosen for selecting LTSM parameters
where possible. This is a heuristic search method which works by assuming the function was
sampled from a Gaussian process and maintains a posterior distribution for this function as the
results of different hyper parameter selections are observed.
LSTM
In terms of temporal length, the LSTM is considerably better at learning long term
dependencies. As a result, picking a long window was less detrimental for the LSTM. This
process followed a similar process to the RNN in which autocorrelation lag was used as a
guideline. The LSTM performed poorly on smaller window sizes. Its most effective length found
was 100 days, and two hidden LSTM layers were chosen. For a time series task two layers is
enough to find nonlinear relationships among the data. 20 hidden nodes were also chosen for
both layers as per the RNN model. The Hyperas library2 was used to implement the Bayesian
optimisation of the network parameters. The optimiser searched for the optimal model in terms
of how much dropout per layer and which optimizer to use.
System Structure
Feature Engineering
Feature Evaluation
RNN
LSTM
MODULE DESCRIPTION:
Feature Engineering
Feature engineering is the art of extracting useful patterns from data to make it easier for
machine learning models to per-form its prediction. It can be considered one of the most
important skills to achieve good results for prediction tasks . It investigated the behaviour of
consistent top performers in Kaggle data mining competitions. The findings were that feature
engineering is often the most import-ant part. It is quite a subjective process requiring domain
knowledge to be effective. It is also considered an art. Engineered features should represent what
one is trying to each the network.
Feature Evaluation
Features must be evaluated once selected. The reason for this is dealing with too large of a
feature set will considerably increase training time. In addition, machine learning algorithms can
suffer from decreased accuracy if the number of variables is significantly higher than the optimal
number. Several methods of feature evaluation exist including filter based selection and wrapper
based selection. Filter based selectors filter features based on a particular statistical property of
the feature e.g. correlation. Wrapper based methods perform a heuristic search of solutions to a
classifier.
The Boruta algorithm in R is one such wrapped based methods. This algorithm is a wrapper built
around the random forest classification algorithm. This is an ensemble classification method in
which classification is performed by voting of multiple classifiers. The algorithm works on a
similar principle as the random forest classifier. It adds randomness to the model and collects
results from the ensemble of randomized samples to evaluate attributes. This extra randomness
provides you with a clear view on which attributes are important . All features were deemed
important to the model based on the random forest, with 5 day and 10 days the highest
importance among the tested averages. The de-noised closing price was one of the most
important variables also.
The dimensionality reduction technique of principal component analysis (PCA) was also
explored. The result was four principal groups in which all attributes belonged to. The results of
the PCA was not included in the final model as computation was not an issue and the original
data performed reasonably well.
The recurrent neural network (RNN) was first developed by Elman . The RNN is structured
similarly to the MLP, with the exception that signals can flow both forward and backwards in an
iterative.Appropriate design of deep learning models in terms of network parameters is
imperative to their success. The three main options available when choosing how to select
parameters for deep learning models are random search, grid search and heuristic search methods
such as genetic algorithms. As mentioned in the related work section manual grid search and
Bayesian optimization are utilized in this study. Grid search, implemented for the Elman RNN, is
the process of selecting two hyper parameters with a minimum and maximum for each. One then
searches that feature space looking for the best performing parameters. This approach was taken
for parameters which were unsuitable for Bayesian optimization.
Similar to the RNN, Bayesian optimization was chosen for selecting parameters for this model
where possible. This is a heuristic search method which works by assuming the function was
sampled from a Gaussian process and maintains a posterior distribution for this function as the
results of different hyper parameter selections are observed. One can then optimise the expected
improvement over the best result to pick hyper parameters for the next experiment. The
performance of both the RNN and LSTM network are evaluated on validation data with
significant over fitting measures in place. Dropout is implemented in both layers. In addition, an
early stopper is programmed into the model to prevent over fitting. This stops the model if its
validation loss doesn’t improve for 5 epochs. In terms of temporal length, the LSTM is
considerably better at learning long term dependencies. As a result, picking a long window for
this parameter was less detrimental for the LSTM as the RNN. This process followed a similar
process to the RNN in which autocorrelation lag was used as a guideline. The LSTM performed
poorly on smaller window sizes. Its most effective length found was 100 days.
Long short term memory (LSTM) units address both of these issues. Developed by Hochreiter et
al. they allow the preservation of the weights that are forward and back-propagated through
layers. This is in contrast to the Elman RNN in which the state gets overwritten at each step.
They also allow the network to continue to learn over many time steps by maintaining a more
constant error. This allows the network to learn long term dependencies. A LSTM cell contains
forget and remember gates which allow the cell to decide what information to block or pass
based on its strength and importance. As a result, weak signals can be blocked which prevents
vanishing gradient.
CHAPTER -7
SYSTEM TESTING
TESTING METHODOLOGIES
o Unit Testing.
o Integration Testing.
o User Acceptance Testing.
o Output Testing.
o Validation Testing.
Unit Testing
Unit testing focuses verification effort on the smallest unit of Software design that
is the module. Unit testing exercises specific paths in a module’s control structure to
ensure complete coverage and maximum error detection. This test focuses on each
module individually, ensuring that it functions properly as a unit. Hence, the naming is Unit
Testing.
During this testing, each module is tested individually and the module interfaces
are verified for the consistency with design specification. All important processing path are
tested for the expected results. All error handling paths are also tested.
Integration Testing
Integration testing addresses the issues associated with the dual problems of
verification and program construction. After the software has been integrated a set of high order
tests are conducted. The main objective in this testing process is to take unit tested modules and
builds a program structure that has been dictated by design.
2. Bottom-up Integration
This method begins the construction and testing with the modules at the lowest level in
the program structure. Since the modules are integrated from the bottom up, processing required
for modules subordinate to a given level is always available and the need for stubs is eliminated.
The bottom up integration strategy may be implemented with the following steps:
The low-level modules are combined into clusters into clusters that
perform a specific Software sub-function.
A driver (i.e.) the control program for testing is written to coordinate test case
input and output.
The cluster is tested.
Drivers are removed and clusters are combined moving upward in the
program structure
The bottom up approaches tests each module individually and then each module is module is
integrated with a main module and tested for functionality.
User Acceptance of a system is the key factor for the success of any system. The system
under consideration is tested for user acceptance by constantly keeping in touch with the
prospective system users at the time of developing and making changes wherever required. The
system developed provides a friendly user interface that can easily be understood even by a
person who is new to the system.
After performing the validation testing, the next step is output testing of the proposed
system, since no system could be useful if it does not produce the required output in the specified
format. Asking the users about the format required by them tests the outputs generated or
displayed by the system under consideration. Hence the output format is considered in 2 ways –
one is on screen and another in printed format.
7.1.5 Validation Checking
Text Field:
The text field can contain only the number of characters lesser than or equal to its size.
The text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect entry
always flashes and error message.
Numeric Field:
The numeric field can contain only numbers from 0 to 9. An entry of any character
flashes an error messages. The individual modules are checked for accuracy and what it has to
perform. Each module is subjected to test run along with sample data. The individually tested
modules are integrated into a single system. Testing involves executing the real data
information is used in the program the existence of any program defect is inferred from the
output. The testing should be planned so that all the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and produces
and output revealing the errors in the system.
Live test data are those that are actually extracted from organization files. After a system
is partially constructed, programmers or analysts often ask users to key in a set of data from their
normal activities. Then, the systems person uses this data as a way to partially test the system. In
other instances, programmers or analysts extract a set of live data from the files and have them
entered themselves.
It is difficult to obtain live data in sufficient amounts to conduct extensive testing. And,
although it is realistic data that will show how the system will perform for the typical processing
requirement, assuming that the live data entered are in fact typical, such data generally will not
test all combinations or formats that can enter the system. This bias toward typical values then
does not provide a true systems test and in fact ignores the cases most likely to cause system
failure.
The most effective test programs use artificial test data generated by persons other than
those who wrote the programs. Often, an independent team of testers formulates a testing plan,
using the systems specifications.
The package “Virtual Private Network” has satisfied all the requirements specified as per
software requirement specification and was accepted.
Whenever a new system is developed, user training is required to educate them about the
working of the system so that it can be put to efficient use by those for whom the system has
been primarily designed. For this purpose the normal working of the project was demonstrated to
the prospective users. Its working is easily understandable and since the expected users are
people who have good knowledge of computers, the use of this system is very easy.
7.3 MAINTAINENCE
This covers a wide range of activities including correcting code and design errors. To
reduce the need for maintenance in the long run, we have more accurately defined the user’s
requirements during the process of system development. Depending on the requirements, this
system has been developed to satisfy the needs to the largest possible extent. With development
in technology, it may be possible to add many more features based on the requirements in future.
The coding and designing is simple and easy to understand which will make maintenance easier.
TESTING STRATEGY :
A strategy for system testing integrates system test cases and design techniques into a
well planned series of steps that results in the successful construction of software. The testing
strategy must co-operate test planning, test case design, test execution, and the resultant data
collection and evaluation .A strategy for software testing must accommodate low-level tests
that are necessary to verify that a small source code segment has been correctly implemented
as well as high level tests that validate major system functions against user requirements.
Software testing is a critical element of software quality assurance and represents the
ultimate review of specification design and coding. Testing represents an interesting anomaly for
the software. Thus, a series of testing are performed for the proposed system before the
system is ready for user acceptance testing.
SYSTEM TESTING:
Software once validated must be combined with other system elements (e.g. Hardware,
people, database). System testing verifies that all the elements are proper and that overall system
function performance is
achieved. It also tests to find discrepancies between the system and its original objective, current
specifications and system documentation.
UNIT TESTING:
In unit testing different are modules are tested against the specifications produced during
the design for the modules. Unit testing is essential for verification of the code produced during
the coding phase, and hence the goals to test the internal logic of the modules. Using the
detailed design description as a guide, important Conrail paths are tested to uncover errors
within the boundary of the modules. This testing is carried out during the programming stage
itself. In this type of testing step, each module was found to be working satisfactorily as regards
to the expected output from the module.
In Due Course, latest technology advancements will be taken into consideration. As part of
technical build-up many components of the networking system will be generic in nature so that
future projects can either use or interact with this. The future holds a lot to offer to the
development and refinement of this project.
CHAPTER-8
8. SAMPLE CODE
using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Mail;
using System.Security.Cryptography;
using System.Text;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
String strConnString =
System.Configuration.ConfigurationManager.ConnectionStrings["con"].Connectio
nString;
string pwd;
Label1.Text = Session["ac"].ToString();
rid";
cmd.CommandType = CommandType.Text;
cmd.Connection = con;
try
con.Open();
sda.SelectCommand = cmd;
sda.Fill(dt);
GridView1.DataSource = dt;
GridView1.DataBind();
//GridView2.DataSource = dt;
//GridView2.DataBind();
Response.Write(ex.Message);
finally
con.Close();
sda.Dispose();
con.Dispose();
dt.Dispose();
string id = GridView1.SelectedRow.Cells[1].Text;
con.Open();
//cmd.Parameters.AddWithValue("@appid", Label7.Text);
SqlDataReaderdr = cmd.ExecuteReader();
dr.Read();
if (dr.HasRows)
pwd += temp;
Session["otp"] = pwd;
MailMessage("dotnetjava.projects@gmail.com", mail))
mm.IsBodyHtml = false;
smtp.EnableSsl = true;
NetworkCredentialNetworkCred = new
NetworkCredential("dotnetjava.projects@gmail.com", "ve_dotnetjava2016");
smtp.UseDefaultCredentials = true;
smtp.Credentials = NetworkCred;
smtp.Port = 587;
smtp.Send(mm);
sent.');", true);
con.Close();
e)
GridView1.PageIndex = e.NewPageIndex;
GridView1.DataBind();
}
byte[] { 0x49, 0x76, 0x61, 0x6e, 0x20, 0x4d, 0x65, 0x64, 0x76, 0x65, 0x64, 0x65,
0x76 });
encryptor.Key = pdb.GetBytes(32);
encryptor.IV = pdb.GetBytes(16);
encryptor.CreateDecryptor(), CryptoStreamMode.Write))
cs.Write(cipherBytes, 0, cipherBytes.Length);
cs.Close();
}
cipherText = Encoding.Unicode.GetString(ms.ToArray());
return cipherText;
if (e.Row.RowType == DataControlRowType.DataRow)
e.Row.Cells[5].Text = Decrypt(e.Row.Cells[5].Text);
e.Row.Cells[6].Text = Decrypt(e.Row.Cells[6].Text);
{
String strConnString =
System.Configuration.ConfigurationManager.ConnectionStrings["con"].Connectio
nString;
string pwd;
Label1.Text = Session["ac"].ToString();
string strQuery = "Select * from req Where cno='" + Label1.Text + "' order by
rid";
cmd.CommandType = CommandType.Text;
cmd.Connection = con;
try
con.Open();
sda.SelectCommand = cmd;
sda.Fill(dt);
GridView1.DataSource = dt;
GridView1.DataBind();
//GridView2.DataSource = dt;
//GridView2.DataBind();
Response.Write(ex.Message);
finally
con.Close();
sda.Dispose();
con.Dispose();
dt.Dispose();
{
}
e)
GridView1.PageIndex = e.NewPageIndex;
GridView1.DataBind();
byte[] { 0x49, 0x76, 0x61, 0x6e, 0x20, 0x4d, 0x65, 0x64, 0x76, 0x65, 0x64, 0x65,
0x76 });
encryptor.Key = pdb.GetBytes(32);
encryptor.IV = pdb.GetBytes(16);
using (MemoryStreamms = new MemoryStream())
encryptor.CreateDecryptor(), CryptoStreamMode.Write))
cs.Write(cipherBytes, 0, cipherBytes.Length);
cs.Close();
cipherText = Encoding.Unicode.GetString(ms.ToArray());
return cipherText;
}
CONCLUSION :
Deep learning models such as the RNN and LSTM are evidently effective for Bitcoin
prediction with the LSTM more capable for recognising longer-term dependencies. However, a
high variance task of this nature makes it difficult to transpire this into impressive validation
results. As a result it remains a difficult task. There is a fine line between overfitting a model and
preventing it from learning sufficiently. Dropout is a valuable feature to assist in improving this.
However, despite using Bayesian optimisation to optimize the selection of dropout it still
couldn’t guarantee good validation results. Despite the metrics of sensitivity, specificity and
precision indicating good performance, the actual performance of the ARIMA forecast based on
error was significantly worse than the neural network models. The LSTM outperformed the RNN
marginally, but not significantly. However, the LSTM takes considerably longer to train. The
performance benefits gained from the parallelization of machine learning algorithms on a GPU
are evident with a 70.7% performance improvement for training the LSTM model.
FUTURE WORK:
Looking at the task from purely a classification perspective it may be possible to achieve
better results. One limitation o is that the model has not been implemented in a practical or real
time setting for predicting into the future as opposed to learning what has already happened. In
addition, the ability to predict using streaming data should improve the model. Sliding window
validation is an approach not implemented here but this may be explored as future work.
REFERENCES:
[3] I. Kaastra and M. Boyd, “Designing a neural network for forecasting financial and
economic time series,” Neurocomputing, vol. 10, no. 3, pp. 215–236, 1996.
[4] H. White, “Economic prediction using neural networks: The case of ibm daily stock
returns,” in Neural Networks, 1988., IEEE International Conference on. IEEE, 1988, pp.
451–458.
[5] C. Chatfield and M. Yar, “Holt-winters forecasting: some practical issues,” The
Control, and Computing (Allerton), 2014 52nd Annual Allerton Conference on. IEEE,
[9] G. H. Chen, S. Nikolov, and D. Shah, “A latent source model for nonparametric time
1088–1096.
“Using time-series and sentiment analysis to detect the determinants of bitcoin prices,”
[11] M. Matta, I. Lunesu, and M. Marchesi, “Bitcoin spread prediction using social and
[12] ——, “The predictor impact of web search media on bitcoin trading volumes.”
stock message boards and its implications for stock market efficiency,” in Workshop on
[14] A. Greaves and B. Au, “Using the bitcoin transaction graph to predict the price of
bitcoin,” 2015.
[15] I. Madan, S. Saluja, and A. Zhao, “Automated bitcoin trading via machine learning
algorithms,” 2015.