IAETSD-JARAS-Development of Data Masking Solution For Proprietary Databases

IAETSD JOURNAL FOR ADVANCED RESEARCH IN APPLIED SCIENCES, VOLUME 4, ISSUE 1, JAN-JUNE /2017
ISSN (ONLINE): 2394-8442
DEVELOPMENT OF DATA MASKING SOLUTION FOR

PROPRIETARY DATABASES
BRUNDA T S [1], PROF. MERIN MELEET [2]
Dept. Information Science and Engineering, Rashtreeya Vidyalaya College of Engineering Mysuru Road,
R. V. Vidyanikethan Post, Mysuru Road, Bengaluru, Karnataka- 560059
[2]
Assistant Professor, R. V. Vidyanikethan Post, Bengaluru, Karnataka 560059
[1]
Brunda1310@gmail.com, [2] merinmeleet@rvce.edu.in
ABSTRACT.
In today's data period, the information is a key resource for any association. Each association
has a security arrangement for concealing their information in the database, yet when they outsource
their information to an outsider for investigation reason there is no security measure taken with a
specific end goal to keep it from being abused. Information Security assumes a crucial part in the
business and one approach to accomplish security is to utilize information covering. The Main target of
information covering is to conceal the delicate information from the outside world. To ensure the
unpredictable functionalities and client use situations function admirably when another variant of
database is discharged, it is imperative to have continuous information. It is accessible at client's
generation databases. For security reasons the Engineering groups don't have admittance to the
generation databases. This prompts the designing group to work with the spurious information and
patterns which may not create or test in numerous basic and constant situations.
Keywords: Data Masking, Flat files, Proprietary Database
I. INTRODUCTION
The most serious hazard both organizations and their clients have is that the duplicates of live generation information are as yet being utilized as
a part of most testing situations, which turns into a critical security frail spot. The essential issue close by is the treatment of classified and
delicate information confronting any specialist co-op that may need to utilize administrations from an outsider organization. For the most part,
clients once in a while outsource their information to the outsider where the information is computed or utilized for insights. This information
sharing is turning into a typical practice as the improvement of elite processing, unavoidable registering, and distributed computing is at its
pinnacle.
Numerous applications are profited from this, however because of the security worry about protection; individuals waver to outsource their
information to an outsider, as the last who is not a trusted gathering can undoubtedly take, control or release client's private/classified
information without the information proprietor's authorization. Encryption could have been a superior approach to secure information protection,
however the scrambled information more often than not have less ease of use, which clashes with motivation behind information sharing. In
today's corporate world, there are various associations which absolutely rely on upon the information that they store for R&D. This information
is additionally utilized by them for proficient customization and the nature of their support of the clients [1]. Information accumulation and
capacity of information naturally in the databases has been facilitated up over the web. In any case, then again, it has intensified the moral
worries by being confound to security circumstances to the clients.
Information Masking can be basically put as a procedure of supplanting real delicate information in test or advancement databases with data that
is practical however not genuine. Information covering procedures will cloud particular information inside a database table guaranteeing
information security is kept up [2]. The information covering is connected crosswise over applications and conditions keeping in mind the end
goal to keep up business trustworthiness. The covered information looks fundamentally the same as continuous information and can be utilized
as a part of test and improvement conditions. The productivity of veiling ought to be to a degree that the first information can't be reproduced
from the covered information unless the concealing example is known.
To Cite This Article: BRUNDA T S AND PROF.MERIN MELEET, DEVELOPMENT OF DATA MASKING
SOLUTION FOR PROPRIETARY DATABASES. Advances in Natural and Applied Sciences ;Pages: 214-219
215. BRUNDA T S AND PROF.MERIN MELEET,. DEVELOPMENT OF DATA MASKING SOLUTION FOR PROPRIETARY
DATABASES. Advances in Natural and Applied Sciences; Pages: 214-219
The requirement for information concealing is especially essential when it is for constant utilization. For instance, consider two banks
specifically A and B. Suppose the banks choose to share their client subtle elements among themselves to enhance their nature of administration.
Keeping in mind the end goal to keep up client benefits, the banks can't trade the client points of interest in that capacity. In this way information
veiling is required, so that the first information might be altered without changing its motivation. [3]
II. METHODOLOGY
Figure 1.1 Data masking methodology
The above diagram depicts the 5 phases of data masking:
Analysis
Set up
Mask
Load
Validate
Analysis Phase:
Analysis phase is the first phase of the data masking methodology. During this phase the sensitive data to be masked is analyzed. This phase
involves the identification of sensitive fields and types. Identification of various privacy regulation rules required for the safe data masking is
also carried out in the above phase.
Set up Phase:
Following the analysis phase the set up phase involves applying the in-built masking rules. The different application level constraints required
for the data masking are being identified and defined in the set up phase.
Masking Phase:
During the masking phase the sensitive data that has been analyzed during the analysis phase is given as an input to the data masking
tool/solution. This data is masked using the algorithms used and the masked data is generated.
Load Phase:
The masked data is then loaded into the respective databases in the load phase.
Validate Phase:
This is the final phase of data masking where the masked data that is loaded to the database is validated by executing the product workflows.
III. SYSTEM DESIGN
Figure 4.1 System Architecture

The above figure describes the system architecture of the data masking tool. Coming from the left side, primarily it is the real-time customer data
which follows the ETL (Extract, Transform and Load).
DB: Any proprietary databases where the data is stored.

Flat file: The flat file is the representation of the sensitive data that has been extracted from the proprietary databases.
Data masking solution: This is the main part where the sensitive data is being masked by the masking technique specified by the
customer.
The GUI interface is the front-end interface where the user logins to the system and extracts the sensitive data from the database, selects the
masking technique and selects the particular column on which the masking technique can be applied. The Command-Line interface is the back-
end part where the actual application of the data masking solution takes place. The configuration file is the xml file which consists of the
extracted file and the user inputs from the GUI interface.
IV DATA MASKING TECHNIQUES

There are several data masking techniques available that can be used as per the requirements. This section provides a brief idea of various data
masking techniques. Random Substitution: The value to be masked is replaced or substituted with a random value. Applicable to random
numbers, dates, alphanumeric values which do not have application/DB level constraints. Algorithmic Substitution: Certain fields need certain
algorithms to be followed. For ex. a Credit Card number need to follow the mod-10 algorithm and an SSN Number should only be 9 digits in
the following format AAA-GG-RRRR.
Intelligent Masking: Certain fields are more complicated to mask as custom rules / expressions might be needed to satisfy those requirements.
For ex. A bank account number might have the following rule for a customer account number - BBB-LLLLLL-AAAA. Number and Date
Variance or Blurring: Mostly used for numeric fields for providing variations of the same data. For ex. producing a variation of 80% to 120% of
the current salary values. This technique works well for masking of numeric or date fields. This technique varies the original value within a
specific range. The advantage of this technique is that the look of the data does not change since the modified value has some percentage of the
real value. It prevents bypassing of the records using the number and date fields. An example for this technique is increasing or decreasing the
values by 5% in numeric fields and in the case of date fields by 5 or 10 days. [4]
Shuffling: Data is randomly shuffled within the column. The shuffling method is also open be reversed if the shuffling algorithm can be
deciphered. Shuffling is similar to substitution with the difference being that the substitution takes place between rows of the database. Shuffling
is done till there is no two related data present in the same row. The advantage of shuffling over substitution is that generating of random unique
values is not necessary. [5]
Nulling: This technique just deletes sensitive data and replaces the field with NULL values. This technique is not so useful for the databases in
test and development environments and only can be used for the database without those environments. Selective Masking: Masking a selective
portion of the data. For ex, altering only the domain name of an Email ID. Masking out is replacing certain parts of the data with specific
characters(X or *). Care should be taken in masking out appropriate data by not masking required information. If the required information is
masked then the entire field becomes useless. This technique is generally used in credit card transfers and in internet banking. For example,
credit card number 5289 7895 1236 4598 can be masked as 5289 XXXX XXXX 4598. It would be tedious for applying different patterns for the
same field.
V. RESULTS AND INFERENCE

Step 1- User selects the CSV file to pick the fields one at a time containing sensitive data for masking
Step 2- Selects the Masking Algorithm to apply on the field.
Step 3- Comparing the masked data with the input data.

REFERENCES
[1] Ravishankar et al,A Survey on Recent Trends, Process and Development in Data Masking for Testing, IJCSI Vol. 8, Issue 2, March 2011.
[2] Xiao-Bai Li and LuvaiMotiwalla , Protecting Patient Privacy with Data Masking , WISP, 2009.
[3] G Sarada, N Abitha, G Manikandan, Dr.N.Sairam, A Few New Approaches for Data Masking, International Conference on Circuit,
Power and Computing Technologies [ICCPCT], 2015.
[4] Aleksey Baranchikov, Aleksey Yu. Gromov, Viktor S. Gurov, Natalya N. Grinchenko and Sergey Babaev, The Technique of Dynamic Data
Masking in Information Systems, 5th Mediterranean Conference on Embedded Computing, MEeO 2016.
[5] B. Liver and K. Tice, Privacy Application Infrastructure Confidential Data Masking, IEEE Conference on Commerce and Enterprise
Computing, 2009.

IAETSD-JARAS-Development of Data Masking Solution For Proprietary Databases

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IAETSD-JARAS-Development of Data Masking Solution For Proprietary Databases

Uploaded by

Copyright:

Available Formats

IAETSD JOURNAL FOR ADVANCED RESEARCH IN APPLIED SCIENCES, VOLUME 4, ISSUE 1, JAN-JUNE /2017

ISSN (ONLINE): 2394-8442

DEVELOPMENT OF DATA MASKING SOLUTION FOR

Keywords: Data Masking, Flat files, Proprietary Database

Figure 1.1 Data masking methodology

The above diagram depicts the 5 phases of data masking:

III. SYSTEM DESIGN

Figure 4.1 System Architecture

DB: Any proprietary databases where the data is stored.

IV DATA MASKING TECHNIQUES

V. RESULTS AND INFERENCE

Step 2- Selects the Masking Algorithm to apply on the field.

Step 3- Comparing the masked data with the input data.

You might also like