Data Structure

Data Structure
A data structure is a specialised way for organising and storing data in memory,
so that one can perform operations on it.
For example:
We have data player's name "Dhoni" and age 35. Here "Dhoni" is of String data
type and 35 is of integer data type.
Now we can organise this data as a record like Player record.
We can collect and store player's records in a file or database as a data structure.
For example: "Dhoni" 35, "Rahul" 24, "Rahane" 28.
"Data Structures are structures programmed to store ordered data so that various
operations can be performed on it easily."
Data structure is all about:
 How to represent data element(s).

 What relationship data elements have among themselves.
 How to access data elements i.e., access methods
Types of Data Structure:

Primitive Data Structures : Integer, Float, Boolean, Char etc, all are data
structures.
Abstract Data Structure: Used to store large and connected data.
 Linked List
 Tree
 Graph
 Stack, Queue etc.
Operations on Data Structures: The operations involve in data structure are as
follows.
 Create: Used to allocate/reserve memory for the data element(s).

 Destroy: This operation deallocate/destroy the memory space assigned to
the specified data structure.
 Selection: Accessing a particular data within a data structure.
 Update: For updation (insertion or deletion) of data in the data structure.
 Searching: Used to find out the presence of the specified data item in the
list of data item.
 Sorting: Process of arranging all data items either in ascending or in
descending order.
 Merging: Process of combining data items of two different sorted lists of
data items into a single list.
Stack
A stack is an ordered collection of items into which new items may be inserted
and from which items may be deleted at one end, called the TOP of the stack. It
is a LIFO (Last In First Out) kind of data structure.
Operations on Stack:
 Push: Adds an item onto the stack. PUSH (s, i); Adds the item i to the top
of stack.
 Pop: Removes the most-recently-pushed item from the stack. POP (s);
Removes the top element and returns it as a function value.
 size(): It returns the number of elements in the queue.
 isEmpty(): It returns true if queue is empty.
Implementation of Stack: A stack can be implemented using two ways: Array

and Linked list.
But since array sized is defined at compile time, it can't grow dynamically.
Therefore, an attempt to insert/push an element into stack (which is implemented
through array) can cause a stack overflow situation, if it is already full.
Go, to avoid the above mentioned problem we need to use linked list to
implement a stack, because linked list can grow dynamically and shrink at
runtime.
Applications of Stack: There are many applications of stack some of the
important applications are given below.
 Backtracking. This is a process when you need to access the most recent
data element in a series of elements.
 Depth first Search can be implemented.
 Function Calls: Different ways of organising the data are known as data
structures.
 Simulation of Recursive calls: The compiler uses one such data structure
called stack for implementing normal as well as recursive function calls.
 Parsing: Syntax analysis of compiler uses stack in parsing the program.
 Expression Evaluation: How a stack can be used for checking on syntax
of an expression.
o Infix expression: It is the one, where the binary operator comes
between the operands.
e. g., A + B * C.
o Postfix expression: Here, the binary operator comes after the
operands.
e.g., ABC * +
o Prefix expression: Here, the binary operator proceeds the
operands.
e.g.,+ A * BC
This prefix expression is equivalent to A + (B * C) infix expression. Prefix notation

is also known as Polish notation. Postfix notation is also known as suffix or
Reverse Polish notation.
 Reversing a List: First push all the elements of string in stack and then
pop elements.
 Expression conversion: Infix to Postfix, Infix to Prefix, Postfix to Infix, and
Prefix to Infix
 Implementation of Towers of Hanoi
 Computation of a cycle in the graph
Queue
It is a non-primitive, linear data structure in which elements are added/inserted at
one end (called the REAR) and elements are removed/deleted from the other end
(called the FRONT). A queue is logically a FIFO (First in First Out) type of list.
Operations on Queue:
 Enqueue: Adds an item onto the end of the queue ENQUEUE(Q, i); Adds
the item i onto the end of queue.
 Dequeue: Removes the item from the front of the queue. DEQUEUE (Q);
Removes the first element and returns it as a function value.
Queue Implementation: Queue can be implemented in two ways.
 Static implementation (using arrays)

 Dynamic implementation (using painters)
Circular Queue: In a circular queue, the first element comes just after the last
element or a circular queue is one in which the insertion of a new element is done
at the very first location of the queue, if the last location of queue is full and the
first location is empty.
Note:- A circular queue overcomes the problem of unutilised space in
linear queues implemented as arrays.
We can make following assumptions for circular queue.
 Front will always be pointing to the first element (as in linear queue).
 If Front = Rear, the queue will be empty.
 Each time a new element is inserted into the queue, the Rear is
incremented by 1.
Rear = Rear + 1
 Each time, an element is deleted from the queue, the value of Front is
incremented by one.
Front = Front + 1
Double Ended Queue (DEQUE): It is a list of elements in which insertion and

deletion operations are performed from both the ends. That is why it is called
double-ended queue or DEQUE.
Priority Queues: This type of queue enables us to retrieve data items on the
basis of priority associated with them. Below are the two basic priority queue
choices.
Sorted Array or List: It is very efficient to find and delete the smallest element.
Maintaining sorted ness make the insertion of new elements slow.
Applications of Queue:
 Breadth first Search can be implemented.

 CPU Scheduling
 Handling of interrupts in real-time systems
 Routing Algorithms
 Computation of shortest paths
 Computation a cycle in the graph
Linked Lists
Linked list is a special data structure in which data elements are linked to one
another. Here, each element is called a node which has two parts
 Info part which stores the information.

 Address or pointer part which holds the address of next element of same
type. Linked list is also known as self-referential structure.
Each element (node) of a list is comprising of two items: the data and a
reference to the next node.
 The last node has a reference to NULL.

 The entry point into a linked list is called the head of the list. It should be
noted that head is not a separate
 node, but the reference to the first node.
 If the list is empty then the head is a null reference.
The Syntax of declaring a node which contains two fields in it one is for storing
information and another is for storing address of other node, so that one can
traverse the list.
Advantages of Linked List:
 Linked lists are dynamic data structure as they can grow and shrink during
the execution time.
 Efficient memory utilisation because here memory is not pre-allocated.
 Insertions and deletions can be done very easily at the desired position.
Disadvantages of Linked List:
 More memory is required, if the number of fields are, more.

 Access to an arbitrary data item is time consuming.
Operations on Linked Lists: The following operations involve in linked list are
as given below
 Creation: Used to create a liked list.
 Insertion: Used to insert a new node in linked list at the specified position.
A new node may be inserted
o At the beginning of a linked list
o At the end of a linked list
o At the specified position in a linked list
o In case of empty list, a new node is inserted as a first node.
 Deletion: This operation is basically used to delete as item (a node). A
node may be deleted from the
o Beginning of a linked list.
o End of a linked list.
o Specified position in the list.
 Traversing: It is a process of going through (accessing) all the nodes of a
linked list from one end to the other end.
Types of Linked Lists
 Singly Linked List: In this type of linked list, each node has only one
address field which points to the next node. So, the main disadvantage of
this type of list is that we can’t access the predecessor of node from the
current node.
 Doubly Linked List: Each node of linked list is having two address fields
(or links) which help in accessing both the successor node (next node) and
predecessor node (previous node).
 Circular Linked List: It has address of first node in the link (or address)
field of last node.
 Circular Doubly Linked List: It has both the previous and next pointer in
circular manner.
Tree
Tree is a non-linear and hierarchical Data Structure.
Trees are used to represent data containing a hierarchical relationship between

elements e. g., records, family trees and table contents. A tree is the data
structure that is based on hierarchical tree structure with set of nodes.
 Node: Each data item in a tree.

 Root: First or top data item in hierarchical arrangement.
 Degree of a Node: Number of subtrees of a given node.
o Example: Degree of A = 3, Degree of E = 2
 Degree of a Tree: Maximum degree of a node in a tree.
o Example: Degree of above tree = 3
 Depth or Height: Maximum level number of a node + 1(i.e., level number
of farthest leaf node of a tree + 1).
o Example: Depth of above tree = 3 + 1= 4
 Non-terminal Node: Any node except root node whose degree is not zero.
 Forest: Set of disjoint trees.
 Siblings: D and G are siblings of parent Node B.
 Path: Sequence of consecutive edges from the source node to the
destination node.
 Internal nodes: All nodes those have children nodes are called as internal
nodes.
 Leaf nodes: Those nodes, which have no child, are called leaf nodes.
 The depth of a node is the number of edges from the root to the node.
 The height of a node is the number of edges from the node to the deepest
leaf.
 The height of a tree is the height of the root.
Trees can be used
 for underlying structure in decision-making algorithms

 to represent Heaps (Priority Queues)
 to represent B-Trees (fast access to database)
 for storing hierarchies in organizations
 for file system
Binary Tree:
A binary tree is a tree like structure that is rooted and in which each node has at
most two children and each child of a node is designated as its left or right
child. In this kind of tree, the maximum degree of any node is at most 2.
A binary tree T is defined as a finite set of elements such that
 T is empty (called NULL tree or empty tree).

 T contains a distinguished Node R called the root of T and the remaining
nodes of T form an ordered pair of disjoint binary trees T1 and T2.
Any node N in a binary tree T has either 0, 1 or 2 successors. Level l of a binary

tree T can have at most 2l nodes.
 Number of nodes on each level i of binary tree is at most 2i

 The number n of nodes in a binary tree of height h is atleast n = h + 1 and
atmost n = 2h+1 – 1, where h is the depth of the tree.
 Depth d of a binary tree with n nodes >= floor(lg n)
o d = floor(lg N) ; lower bound, when a tree is a full binary tree
o d = n – 1 ; upper bound, when a tree is a degenerate tree.
Types of Binary Tree:
 Binary search tree

 Threaded Binary Tree
 Balanced Binary Tree
 B+ tree
 Parse tree
 AVL tree
 Spanning Tree
 Digital Binary Tree
Graphs
A graph is a collection of nodes called vertices, and the connections between
them, called edges.
Directed Graph: When the edges in a graph have a direction, the graph is called
a directed graph or digraph and the edges are called directed edges or arcs.
Adjacency: If (u,v) is in the edge set we say u is adjacent to v.
Path: Sequence of edges where every edge is connected by two vertices.
Loop: A path with the same start and end node.
Connected Graph: There exists a path between every pair of nodes, no node is
disconnected.
Acyclic Graph: A graph with no cycles.
Weighted Graphs: A weighted graph is a graph, in which each edge has a
weight.
Weight of a Graph: The sum of the weights of all edges.
Connected Components: In an undirected graph, a connected component is a
subset of vertices that are all reachable from each other. The graph is connected
if it contains exactly one connected component, i.e. every vertex is reachable
from every other. Connected component is a maximal connected subgraph.
Subgraph: subset of vertices and edges forming a graph.
Tree: Connected graph without cycles.
Forest: Collection of trees
In a directed graph, a strongly connected component is a subset of mutually
reachable vertices, i.e. there is a path between every two vertices in the set.
Weakly Connected component: If the connected graph is not strongly
connected then it is weakly connected graph.
Graph Representations: There are many ways of representing a graph:
 Adjacency List
 Adjacency Matrix
 Incidence list
 Incidence matrix
“A man is great by deeds, not by birth.”

Programming in C
 All C programs must have a function in it called main

 Execution starts in function main
 C is case sensitive
 Comments start with /* and end with */. Comments may span over many
lines.
 C is a “free format” language
 The #include <stdio.h> statement instructs the C compiler to insert the
entire contents of file stdio.h in its place and compile the resulting file.
Character Set: The characters that can be used to form words, numbers and
expressions depend upon the computer on which the program runs. The
characters in C are grouped into the following categories: Letters, Digits, Special
characters and White spaces.
C Tokens: The smallest individual units are known as C tokens. C has six
types of tokens. They
are: Keywords, Identifiers, Constants, Operators, String and Special
symbols.
Keywords: All keywords are basically the sequences of characters that have
one or more fixed meanings. All C keywords must be written in lower case letters.
e.g., break, char,int, continue, default, do etc.
Identifiers: A C identifier is a name used to identify a variable, function, or any
other user-defined item. An identifier starts with a letter A to Z, a to z, or an
underscore '_' followed by zero or more letters, underscores, and digits (0 to 9).
Constants: Fixed values that do not change during the execution of a C
program.
Backslash character constants are used in output functions. e.g., '\b' used for
backspace and '\n' used for new line etc.
Operator: It is the symbol that tells the computer to perform certain mathematical
or logical manipulations. e.g., Arithmetic operators (+, -, *, /) etc.
String: A string is nothing but an array of characters (printable ASCII characters).
Delimiters /Separators: These are used to separate constants, variables and
statements e.g., comma, semicolon, apostrophes, double quotes and blank
space etc.
Variable:
 A variable is nothing but a name given to a storage area that our programs
can manipulate.
 Each variable in C has a specific type, which determines the size and
layout of the variable's memory.
 The range of values that can be stored within that memory and the set of
operations that can be applied to the variable.
Data Types
Different Types of Modifier with their Range:

Type Conversions
 Implicit Type Conversion: There are certain cases in which data will get
automatically converted from one type to another:
o When data is being stored in a variable, if the data being stored does
not match the type of the variable.
o The data being stored will be converted to match the type of the
storage variable.
o When an operation is being performed on data of two different
types. The "smaller" data type will be converted to match the "larger"
type.
 The following example converts the value of x to a double
precision value before performing the division. Note that if the
3.0 were changed to a simple 3, then integer division would be
performed, losing any fractional values in the result.
 average = x / 3.0;
o When data is passed to or returned from functions.
 Explicit Type Conversion: Data may also be expressly converted, using
the typecast operator.
o The following example converts the value of x to a double precision
value before performing the division. ( y will then be implicitly
promoted, following the guidelines listed above. )
 average = ( double ) x / y;
 Note that x itself is unaffected by this conversion.
Expression:
 lvalue:
o Expressions that refer to a memory location are called "lvalue"
expressions.
o An lvalue may appear as either the left-hand or right-hand side of an
assignment.
o Variables are lvalues and so they may appear on the left-hand side of
an assignment
 rvalue:
o The term rvalue refers to a data value that is stored at some address
in memory.
o An rvalue is an expression that cannot have a value assigned to it
which means an rvalue may appear on the right-hand side but not on
the left-hand side of an assignment.
o Numeric literals are rvalues and so they may not be assigned and
cannot appear on the left-hand side.
C Flow Control Statements
Control statement is one of the instructions, statements or group of statement in a

programming language which determines the sequence of execution of other
instructions or statements. C provides two styles of flow controls.
1. Branching (deciding what action to take)

2. Looping (deciding how many times to take a certain action)
If Statement: It takes an expression in .parenthesis and a statement or block of

statements. Expressions will be assumed to be true, if evaluated values are non-
zero.
The switch Statement: The switch statement tests the value of a given variable
(or expression) against a list of case values and when a match is found, a block
of statements associated with that case is executed:
The Conditional Operators (?, : )
The ?, : operators are just like an if-else statement except that because it is an
operator we can use it within expressions. ? : are a ternary operators in that it
takes three values. They are the only ternary operator in C language.
flag = (x < 0) ? 0 : 1;
This conditional statement can be evaluated as following with equivalent if else
statement.
if (x < 0) flag = 0;
else flag = 1;
Loop Control Structure
Loops provide a way to repeat commands and control. This involves

repeating some portion of the program either a specified numbers of times until a
particular condition is satisfied.
while Loop:
initialize loop counter;
while (test loop counter using a condition/expression)
{
<Statement1>
<Statement2>
...
< decrement/increment loop counter>
}
for Loop:
for (initialize counter; test counter; increment/decrement counter)
{
<Statement1>
<Statement2>
...
}
do while Loop:
initialize loop counter;
do
{
<Statement1>
<Statement2>
...
}
while (this condition is true);
The break Statement: The break statement is used to jump out of a loop
instantly, without waiting to get back to the conditional test.
The continue Statement: The 'continue' statement is used to take the control to
the beginning of the loop, by passing the statement inside the loop, which have
not yet been executed.
goto Statement: C supports an unconditional control statement, goto, to transfer
the control from one point to another in a C program.
C Variable Types
A variable is just a named area of storage that can hold a single value. There are
two main variable types: Local variable and Global variable.
Local Variable: Scope of a local variable is confined within the block or function,
where it is defined.
Global Variable: Global variable is defined at the top of the program file and it
can be visible and modified by any function that may reference it. Global
variables are initialized automatically by the system when we define them. If
same variable name is being used for global and local variables, then local
variable takes preference in its scope.
Storage Classes in C
A variable name identifies some physical location within the computer, where the
string of bits representing the variable's value, is stored.
There are basically two kinds of locations in a computer, where such a value
maybe kept: Memory and CPU registers.
It is the variable's storage class that determines in which of the above two types
of locations, the value should be stored.
We have four types of storage classes in C: Auto, Register, Static and Extern.
Auto Storage Class: Features of this class are given below.
 Storage Location Memory

 Default Initial Value Garbage value
 Scope Local to the block in which the variable is defined.
 Life Till the control remains within the block in which variable is defined.
Auto is the default storage class for all local variables.

Register Storage Class: Register is used to defined local variables that should
be stored in a register instead of RAM. The Register should only be used for
variables that require quick access such as counters. Features of register storage
class are given below.
 Storage Location CPU register

 Default Initial Value Garbage value
 Scope Local to the block il) which variable is defined.
 Life Till the control remains within the block in which the variable is defined.
Static Storage Class: Static is the default storage class for global variables.
Features of static storage class are given below.
 Storage Location Memory

 Default Initial Value Zero
 Scope Local to the block in which the variable is defined. In case of
global variable, the scope will be through out the program.
 Life Value of variable persists between different function calls.
Extern Storage Class: Extern is used of give a reference of a global variable

that is variable to all the program files. When we use extern, the variable can't be
initialized as all it does, is to point the variable name at a storage location that has
been previously defined.
 Storage Location: Memory

 Default Initial Value: Zero
 Scope: Global
 Life: As long as the program's execution does not come to an end.
Operator Precedence Relations:

Operator precedence relations are given below from highest to lowest order:
Functions
 A function is a self-contained block of statements that perform a coherent

task of some kind.
 Making function is a way of isolating one block of code from other
independent blocks of code.
 After calling function, a function can take a number of arguments or
parameters and a function can be called any number of times.
 A function can call itself such a process is called recursion.
 Functions can be of two types: (i) Library functions , and (ii) User-
defined functions
Call by Value: If we pass values of variables to the function as parameters, such

kind of function calling is known as call by value.
Call by Reference: Variables are stored somewhere in memory. So, instead of
passing the value of a variable, if we pass the location number / address of the
variable to the function, then it would become 'a call by reference’.
Recursive Function:
A function that calls itself directly or indirectly is called a recursive function. The
recursive factorial function uses more memory than its non-recursive counter
part. Recursive function requires stack support to save the recursive
function calls.
Factorial Recursive Function:
GCD Recursive Function:
Fibonacci Sequence Recursive Function:
Power Recursive Function (xy):
Pointers
A pointer is a variable that stores memory address. Like all other variables, it also
has a name, has to be declared and occupies some spaces in memory. It is
called pointer because it points to a particular location.
 ‘&’ =Address of operator

 ‘*’ = Value at address operator or 'indirection' operator
 &i returns the address of the variable i.
 *(&i) return the value stored at a particular address printing the value
of *(&i) is same as printing the value of i.
NULL Pointers
 Uninitilized pointers start out with random unknown values, just like any
other variable type.
 Accidentally using a pointer containing a random address is one of the
most common errors encountered when using pointers, and potentially one
of the hardest to diagnose, since the errors encountered are generally not
repeatable.
Combinations of * and ++
 *p++ accesses the thing pointed to by p and increments p

 (*p)++ accesses the thing pointed to by p and increments the thing pointed
to by p
 *++p increments p first, and then accesses the thing pointed to by p
 ++*p increments the thing pointed to by p first, and then uses it in a larger
expression.
Pointer Operations:
 Assignment: You can assign an address to a pointer. Typically, you do

this by using an array name or by using the address operator (&).
 Value finding (dereferencing): The * operator gives the value stored in
the pointed-to location.
 Taking a pointer address: Like all variables, pointer variables have an
address and a value. The & operator tells you where the pointer itself is
stored.
 Adding an integer to a pointer: You can use the + operator to add an
integer to a pointer or a pointer to an integer. In either case, the integer is
multiplied by the number of bytes in the pointed-to type, and the result is
added to the original address.
 Incrementing a pointer: Incrementing a pointer to an array element makes
it move to the next element of the array.
 Subtracting an integer from a pointer: You can use the - operator to
subtract an integer from a pointer; the pointer has to be the first operand or
a pointer to an integer. The integer is multiplied by the number of bytes in
the pointed-to type, and the result is subtracted from the original address.
 Note that there are two forms of subtraction. You can subtract one pointer
from another to get an integer, and you can subtract an integer from a
pointer and get a pointer.
 Decrementing a pointer: You can also decrement a pointer. In this
example, decrementing ptr2 makes it point to the second array element
instead of the third. Note that you can use both the prefix and postfix forms
of the increment and decrement operators.
 Differencing: You can find the difference between two pointers. Normally,
you do this for two pointers to elements that are in the same array to find
out how far apart the elements are. The result is in the same units as the
type size.
 Comparisons: You can use the relational operators to compare the values
of two pointers, provided the pointers are of the same type.
Array
 It is a collection of similar elements (having same data type.

 Array elements occupy contiguous memory locations.
 Example: a[i]: The name a of the array is a constant expression, whose
value is the address of the 0th location.
 a = a+0 ≡ &a[0]
 a+1 ≡ &a[1]
 a+i ≡ &a[i]
 &(*(a+i)) ≡ &a[i] ≡ a+i
 *(&a[i]) ≡ *(a+i) ≡ a[i]
 Address of an Array Element: a[i] = a + i * sizeof(element)
In C language, one can have arrays of any dimensions. Let us consider a 3 × 3

matrix
3 × 3 matrix for multi-dimensional array: To access the particular element from
the array, we have to use two subscripts; one for row number and other for
column number. The notation is of the form a [i] [j], where i stands for row
subscripts and j stands for column subscripts.
We can also define and initialize the array as follows
Note: Two Dimensional Array b[i][j]
 For Row Major Order: Size of b[i][j] = b + ( Number of rows * i + j

)*sizeof(element)
 For Column Major Order: Size of b[i][j] = b + ( Number of Columns * j + i
)*sizeof(element)
 *(*(b + i) + j) is equivalent to b[i][j]
 *(b + i) + j is equivalent to &b[i][j]
 *(b[i] + j) is equivalent to b[i][j]
 • b[i] + j is equivalent to &b[i][j]
 (*(b+i))[j] is equivalent to b[i][j]
Strings
In C language, strings are stored in an array of character (char) type along with
the null terminating character "\0" at the end.
Example: char name[ ] = { 'G', 'A', 'T','E', 'T', 'O', 'P', '\O'};
 '\0' = Null character whose ASCII value is O.

 ‘0’ = ASCII value is 48.
 In the above declaration '\0' is not necessary. C inserts the null character
automatically.
 %S = It is used in printf( ) as a format specification for printing out a string.

 All the following notations refer to the same element: name [i] , * (name +
i), * (i + name), i [name]
Structures: Structures in C are used to encapsulate or group together different

data into one object. We can define a structure as given below.
struct object
The variables we declare inside the structure are called data member.
Initializing a Structure: Structure members can be initialized when we declare a
variable of our structure.
Union: A union is a collection of heterogeneous elements that is, it is a group of
elements which are having different data types. Each member within a structure
is assigned its own memory location. But the union members all share a common
memory location. Thus, unions are used to save memory.
Like structures, unions are also defined and declared union person Ram; // Union
declaration.
“Vision without action is a daydream. Action without vision is a nightmare.
Computer Security:
Computer security also known as cyber security is the protection of information
systems from theft or damage to the hardware, the software and to the
information on them, as well as from disruption of the services they provide.
Security is based on the following issues:
 Privacy: The ability to keep things private/confidential.

 Trust: we trust data from an individual or a host.
 Authenticity: Are security credentials in order.
 Integrity: Has the system been compromised/altered already.
Why do I need to learn about Computer Security ?

Good Security Standards follow the "90 / 10" Rule:
 10% of security safeguards are technical.

 90% of security safeguards rely on the computer user ("YOU") to adhere to
good computing practices
 We need both parts for effective security.
Example: The lock on the door is the 10%. You remembering to lock the lock,
checking to see if the door is closed, ensuring others do not prop the door open,
keeping control of the keys, etc. is the 90%.
Threats classified into one of the categories below:
 Back doors : A back door in a computer system, a cryptosystem is any

secret method of bypassing normal authentication or security controls.
They may exist for a number of reasons, including by original design or
from poor configuration.
 Denial-of-service attack : It designed to make a machine or network
resource unavailable to its intended users.
 Direct-access attacks : An unauthorized user gaining physical access to a
computer is most likely able to directly download data from it.
 Eavesdropping: It is the act of surreptitiously listening to a private
conversation, typically between hosts on a network.
 Spoofing : Spoofing of user identity describes a situation in which one
person or program successfully masquerades as another by falsifying data.
 Tampering: It describes a malicious modification of products. So-called
"Evil Maid" attacks and security services planting of surveillance capability
into routers.
 Phishing: It is the attempt to acquire sensitive information such as
usernames, passwords and credit card details directly from users.
Good computing practices and tips that apply to most people who use a
computer.
 Use passwords that can't be easily guessed and protect your passwords.
 Minimize storage of sensitive information.
 Beware of scams.
 Protect information when using the Internet and email.
 Make sure your computer is protected with anti-virus and all necessary
security "patches" and updates.
 Secure laptop computers and mobile devices at all times: Lock them up or
carry them with you.
 Shut down, lock, log off, or put your computer and other devices to
sleep before leaving them unattended and make sure they require a
secure password to start up or wake-up.
 Don't install or download unknown or unsolicited programs/apps.
 Secure your area before leaving it unattended.
 Make backup copies of files or data you are not willing to lose.
Computer Viruses:
A virus is a parasitic program that infects another legitimate program, which is
sometimes called the host. To infect the host program, the virus modifies the
host so that it contains a copy of the virus.
 Boot sector viruses: A boot sector virus infects the boot record of a hard
disk. The virus allows the actual boot sector data to be read as through a
normal start-up were occurring.
 Cluster viruses: If any program is run from the infected disk, the program
causes the virus also to run . This technique creates the illusion that the
virus has infected every program on the disk.
 Worms: A worm is a program whose purpose is to duplicate itself.
 Bombs: This type of virus hides on the user’s disk and waits for a specific
event to occur before running.
 Trojan Horses: A Trojan Horses is a malicious program that appears to be
friendly. Because Trojan Horses do not make duplicates of themselves on
the victim’s disk. They are not technically viruses.
 Stealth Viruses: These viruses take up residence in the computer’s
memory, making them hard to detect.
 Micro Viruses: A macro virus is designed to infect a specific type of
document file, such as Microsoft Word or Microsoft Excel files. These types
of documents can include macros, which are small programs that execute
commands.
The following are some well-known viruses.
 CodeRed : It is a worm that infects a computer running Microsoft IIS

server. This virus launched DOS attack on White House’s website. It allows
the hacker to access the infected computer remotely.
 Nimba : It is a worm that spreads itself using different methods. IT
damages computer in different ways. It modified files, alters security
settings and degrades performance.
 SirCam : It is distributed as an email attachment. It may delete files,
degrade performance and send the files to anyone.
 Melisa : It is a virus that is distributed as an email attachment. IT disables
different safeguards in MS Word. It sends itself to 50 people if Microsoft
Outlook is installed..
 Ripper :It corrupts data from the hard disk.
 MDMA :It is transferred from one MS Word file to other if both files are in
memory.
 Concept :It is also transferred as an email attachment. It saves the file in
template directory instead of its original location.
 One_Half :It encrypts hard disk so only the virus may read the data. It
displays One_Half on the screen when the encryption is half completed.
A computer system can be protected from virus by following precautions:
 The latest and updated version of Anti-Virus and firewall should be

installed on the computer.
 The Anti-Virus software must be upgraded regularly.
 USB drives should be scanned for viruses, and should not be used on
infected computers.
 Junk or unknown emails should not be opened and must be deleted
straight away.
 Unauthorized or pirated software should not be installed on the computer.
 An important way of protection against virus is the use of back up of data.
The backup is used if the virus deletes data or modifies it. So back-up your
data on regular basis.
 Freeware and shareware software from the internet normally contain
viruses. It is important to check the software before using them.
 Your best protection is your common sense. Never click on suspicious
links, never download songs, videos or files from suspicious websites.
Never share your personal data with people you don’t know over the
internet.
Computer: A computer is a truly amazing machine that performs a
specified sequence of operations as per the set of instructions (known
as programs) given on a set of data (input) to generate desired information
(output).
A complete computer system consists of four parts:
 Hardware: Hardware represents the physical and tangible components of

the computer.
 Software: Software is a set of electronic instructions consisting of complex
codes (Programs) that make the computer perform tasks.
 User: The computer operators are known as users.
 Data: Consists of raw facts, which the computer stores and reads in the
form of numbers.
The following features characterize this electronic machine:
 Speed
 Accuracy
 Storage and Retrieval
 Repeated Processing Capabilities
 Reliability
 Flexibility
 Low cost
These three steps constitute the data processing cycle.
 Input -Input data is prepared in some convenient form for processing. The
form will depend on the processing machine. For example, when electronic
computers are used, the input data could be recorded on any one of
several types of input medium, such as magnetic disks, tapes and so on.
 Processing - In this step input data is changed to produce data in a more
useful form. For example, paychecks may be calculated from the time
cards, or a summary of sales for the month may be calculated from the
sales orders.
 Output - The result of the proceeding processing step are collected. The
particular form of the output data depends on the use of the data. For
example, output data may be pay-checks for employees.
Fig: The relationship between different hardware components

Language Processors:
 Assembler: This language processor converts the program written in

assembly language into machine language.
 Interpreter: This language processor converts a HLL(High Level
Language) program into machine language by converting and executing it
line by line.
 Compiler:-It also converts the HLL program into machine language but the
conversion manner is different. It converts the entire HLL program in one
go, and reports all the errors of the program along with the line numbers.
Software
Software represents the set of programs that govern the operation of a computer
system and make the hardware run.
This type of software is tailor-made software according to a user’s requirements.
Analog computers
 Analog computers always take input in form of signals.

 The input data is not a number infect a physical quantity like temp.,
pressure, speed, velocity.
 Signals are continuous of (0 to 10 V).
 Accuracy 1% Approximately.
 Example: Speedometer.
Digital Computers
 These computers take the input in the form of digits & alphabets &
converted it into binary format.
 Digital computers are high speed, programmable electronic devices.
 Signals are two level of (0 for low/off , 1 for high/on).
 Accuracy unlimited.
 Examples: Computer used for the purpose of business and education are
also example of digital computers.
Hybrid Computer
 The combination of features of analog and digital computer is called Hybrid

computer.
 The main examples are central national defense and passenger flight radar
system.
 They are also used to control robots.
Super Computer
 The biggest in size.

 Most Expensive
 It can process trillions of instructions in seconds.
 This computer is not used as a PC in a home neither by a student in a
college.
 Used by Govt. for different calculations and heavy jobs.
 Supercomputers are used for the heavy stuff like weather maps,
construction of atom bombs, earthquake prediction etc.
Mainframes
 It can also process millions of instruction per second.

 It can handle processing of many users at a time.
 Less expensive than Supercomputer
 It is commonly used in Hospitals, Air Reservation Companies as it can
retrieve data on huge basis.
 This is normally too expensive and out of reach from a salary-based
person.
 It can cost up to thousands of Dollars.
Mini Computer
 These computers are preferred mostly by small type of business personals,

colleges etc.
 These computers are cheaper than above two.
 Its an intermediary between microcomputer and mainframe.
Micro Computer/ Personal Computer
 It is mostly preferred by Home Users.

 Cost is less compared to above.
 Small in size.
 A microcomputer contains a (a central processing unit on a microchip), in
the form of read-only memory and random access memory, and a housed
in a unit that is usually called a motherboard .
Notebook Computers
 Notebook computers typically weigh less than 6 pounds and are small
enough to fit easily in a briefcase.
 Principal difference between a notebook computer and a personal
computer is the display screen.
 Many notebook display screens are limited to VGA resolution.
Programming Languages
There are two major types of programming languages. These are Low Level
Languages and High Level Languages.
Low Level languages are further divided in to Machine language and Assembly
language.
Low Level Languages: The term low level means closeness to the way in which
the machine has been built. Low level languages are machine oriented and
require extensive knowledge of computer hardware and its configuration.
Machine Language : Machine Language is the only language that is directly
understood by the computer. It does not need any translator program. We also
call it machine code and it is written as strings of 1's (one) and 0’s (zero). When
this sequence of codes is fed to the computer, it recognizes the codes and
converts it in to electrical signals needed to run it.
For example, a program instruction may look like this: 1011000111101
It is not an easy language for you to learn because of its difficult to understand. It
is efficient for the computer but very inefficient for programmers. It is considered
to the first generation language.
Advantage:
 Program of machine language run very fast because no translation

programis required for the CPU.
Disadvantages
 It is very difficult to program in machine language. The programmer has

to know details of hardware to write program.
 The programmer has to remember a lot of codes to write a program which
results in program errors.
 It is difficult to debug the program.
Assembly Language
It is the first step to improve the programming structure. You should know that
computer can handle numbers and letter. Therefore some combination of
letterscan be used to substitute for number of machine codes.
The set of symbols and letters forms the Assembly Language and a translator
program is required to translate the Assembly Language to machine language.
This translator program is called `Assembler'. It is considered to be a second-
generation language.
Advantages:
 The symbolic programming of Assembly Language is easier to understand

and saves a lot of time and effort of the programmer.
 It is easier to correct errors and modify program instructions.
 Assembly Language has the same efficiency of execution as the machine
level language. Because this is one-to-one translator between assembly
language program and its corresponding machine language program.
Disadvantages:
 Assembly language is machine dependent.
 A program written for one computer might not run in other computers with
different hardware configuration.
High Level languages

You know that assembly language and machine level language require deep
knowledge of computer hardware where as in higher language you have to know
only the instructions in English words and logic of the problem irrespective of
the type of computer you are using.
 Higher level languages are simple languages that use English and
mathematical symbols like +, -, %, / for its program construction.
 You should know that any higher level language has to be converted to
machine language for the computer to understand.
 Higher level languages are problem-oriented languages because the
instructions are suitable for solving a particular problem.
For example COBOL (Common Business Oriented Language) is mostly suitable

for business oriented language where there is very little processing and huge
output.
There are mathematical oriented languages like FORTRAN (Formula Translation)
and BASIC (Beginners All-purpose Symbolic Instruction Code) where very large
processing is required.
Thus a problem oriented language designed in such a way that its instruction may
be written more like the language of the problem. For example, businessmen use
business term and scientists use scientific terms in their respective languages.
Advantages of High Level Languages
 Higher level languages have a major advantage over machine and

assembly languages that higher level languages are easy to learn and use.
 It is because that they are similar to the languages used by us in our day to
day life.
A DBMS is the acronym of Data Base Management System is a collection of
interrelated data and a set of programs to access those data. It manages new
large amount of data and supports efficient access to new large amounts of data.
Features of Database:
 Faithfulness: The design and implementation should be faithful to the

requirements.
 Avoid Redundancy: This value is important because redundancy.
 Simplicity: Simplicity requires that the design and implementation avoid
introducing more elements than are absolutely necessary.
 Right kind of element: Attributes are easier to implement but entity sets
are relationships are necessary to ensure that the right kind of element is
introduced.
Types of Database
 Centralized Database: All data is located at a single site.

 Distributed Database: The database is stored on several computer.
The information contained in a database is represented on two levels:
1. Data (which is large and is being frequently modified)

2. Structure of data (which is small and stable in time)
Database Management System (DBMS) provides efficient, reliable, convenient

and safe multi user storage of and access to massive amounts of persistent data.
Key People Involved in a DBMS:
 DBMS Implementer: Person who builds system

 Database Designer: Person responsible for preparing external schemas
for applications, identifying and integrating user needs into a conceptual (or
community or enterprise) schema.
 Database Application Developer: Person responsible for implementing
database application programs that facilitate data access for end users.
 Database Administrator: Person responsible for define the internal
schema, sub-schemas (with database designers) and specifying mappings
between schemas, monitoring database usage and supervising DBMS
functionality (e.g., access control, performance optimisation, backup and
recovery policies, conflict management).
 End Users: Users who query and update the database through fixed
programs (invoked by non-programmer users) e.g., banking.
Levels of Data Abstraction: A 3-tier architecture separates its tiers from each
other based on the complexity of the users and how they use the data present in
the database. It is the most widely used architecture to design a DBMS.
 Physical Level: It is lowest level of abstraction and describes how the data
are actually stored and complex low level data structures in detail.
 Logical Level: It is the next higher level of abstraction and describes what
data are stored and what relationships exist among those data. At the
logical level, each such record is described by a type definition and the
interrelationship of these record types is defined as well. Database
administrators usually work at this level of abstraction.
 View Level: It is the highest level of abstraction and describes only part of
the entire database and hides the details of the logical level.
Relational Algebra:
Relational model is completely based on relational algebra. It consists of a
collection of operators that operate on relations. Its main objective is data
retrieval. It is more operational and very much useful to represent execution
plans, while relational calculus is non-operational and declarative. Here,
declarative means user define queries in terms of what they want, not in terms of
how compute it.
Basic Operation in Relational Algebra
The operations in relational algebra are classified as follows.

Selection (σ): The select operation selects tuples/rows that satisfy a given
predicate or condition. We use (σ) to denote selection. The predicate/condition
appears as a subscript to σ.
Projection (π): It selects only required/specified columns/attributes from a given
relation/table. Projection operator eliminates duplicates (i.e., duplicate rows from
the result relation).
Union (∪): It forms a relation from rows/tuples which are appearing in either or
both of the specified relations. For a union operation R ∪ S to be valid, below two
conditions must be satisfied.
 The relations Rand S must be of the same entity i.e., they must have the
same number of attributes.
 The domains. of the i th attribute of R and i th attribute of S must be the
same, for all i.
Intersection (∩): It forms a relation of rows/ tuples which are present in both the
relations R and S. Ensure that both relations are compatible for union and
intersection operations.
Set Difference (-): It allows us to find tuples that are in one relation but are not in
another. The expression R – S produces a relation containing those tuples in R
but not in S.
Cross Product/Cartesian Product (×): Assume that we have n1 tuples in R and
n2tuples in S. Then, there are n1 * n2 ways of choosing a pair of tuples; one tuple
from each relation. So, there will be (n1 * n2) tuples in result relation P if P = R × S.
Schema:
A schema is also known as database schema. It is a logical design of the
database and a database instance is a snapshot of the data in the database at a
given instant of time. A relational schema consists of a list of attributes and their
corresponding domains.
Types of Schemas: It can be classified into three parts, according to the levels
of abstraction
 Physical/Internal Schema: Describes the database design at the physical

level.
 Logical/Conceptual Schema/Community User View: Describes the
database design at the logical level.
 Sub-schemas /View/External Schema: Describes different views of the
database views may be queried combined in queries with base relations,
used to define other views in general not updated freely.
Data model :
A data model is a plan for building a database. Data models define how data is
connected to each other and how they are processed and stored inside the
system.
Two widely used data models are:
 Object based logical model

 Record based logical model
Entity :
An entity may be an object with a physical existence or it may be an object with a
conceptual existence. Each entity has attributes. A thing (animate or inanimate)
of independent physical or conceptual existence and distinguishable. In the
University database context, an individual student, faculty member, a class room,
a course are entities.
Attributes
Each entity is described by a set of attributes/properties.
Types of Attributes
 Simple Attributes: having atomic or indivisible values. example: Dept – a

string Phone Number - an eight digit number
 Composite Attributes: having several components in the value. example:
Qualification with components (Degree Name, Year, University Name)
 Derived Attributes: Attribute value is dependent on some other attribute.
example: Age depends on Date Of Birth. So age is a derived attribute.
 Single-valued: having only one value rather than a set of values. for
instance, Place Of Birth - single string value.
 Multi-valued: having a set of values rather than a single value. for instance,
Courses Enrolled attribute for student Email Address attribute for student
Previous Degree attribute for student.
 Attributes can be: simple single-valued, simple multi-valued, composite
single-valued or composite multi-valued.
Keys
 A super key of an entity set is a set of one or more attributes whose values
uniquely determine each entity.
 A candidate key of an entity set is a minimal super key
 Customer-id is candidate key of customer
 account-number is candidate key of account
 Although several candidate keys may exist, one of the candidate keys is
selected to be the primary key.
Keys for Relationship Sets

The combination of primary keys of the participating entity sets forms a super key
of a relationship set.
(customer-id, account-number) is the super key of depositor
 NOTE: this means a pair of entity sets can have at most one relationship in
a particular relationship set.
 If we wish to track all access-dates to each account by each customer, we
cannot assume a relationship for each access. We can use a multivalued
attribute though.
 Must consider the mapping cardinality of the relationship set when deciding
the what are the candidate keys.
 Need to consider semantics of relationship set in selecting the primary
key in case of more than one candidate key.
ER Modeling:
Entity-Relationship model (ER model) in software engineering is an abstract way
to describe a database. Describing a database usually starts with a relational
database, which stores data in tables.
Notations/Shapes in ER Modeling
Notations/Shapes in ER Modeling:
The overall logical structure of a database can be expressed graphically by an E-
R diagram. The diagram consists of the following major components.
 Rectangles: represent entity set.

 Ellipses: represent attributes.
 Diamonds: represents relationship sets.
 Lines: links attribute set to entity set and entity set to relationship set.
 Double ellipses: represent multi-valued attributes.
 Dashed ellipses: denote derived attributes.
 Double lines: represent total participation of an entity in a relationship set.
 Double rectangles: represent weak entity sets.
Mapping Cardinalities / Cardinality Ratio / Types of Relationship:

Expresses the number of entities to which another entity can be associated via a
relationship set. For a binary relationship set R between entity sets A and B, the
mapping cardinality must be one of the following:
 One to One: An entity in A is associated with at most one entity in B and

an entity in B is associated with at most one entity in A.
 One to Many: An entity in A is associated with any number (zero or more)
of entities; in B. An entity in B, however, can be associated with at most
one entity in A.
 Many to Many: An entity in A is associated with any number (zero or more)
c entities in B and an entity B is associated with any number (zero or more)
of entities in A.
Specialization: Consider an entity set person with attributes name, street and
city, A person may be further classified-as one of the following: Customer, and
Employee. Each of these person types is described by a set of attributes 1 at
includes all the attributes of entity set person plus possibly additional attributes.
The process of designating subgroupings within an entity set is called
specialization.
The specialization of person allows us to distinguish among persons according to
whether they are employees or customers,
The refinement from an initial entity set into successive levels of entity
subgroupings represents a top-down design process in which distinctions are
made explicitly.
Generalization: Basically generalization is a simple inversion of specialization.
Some common attributes of multiple entity sets are chosen to create higher level
entity set. If the customer entity set and the employee entity set are having
several attributes in common, then this commonality can be expressed by
generalization.
Here, person is the higher level entity set and customer and employee are lower
level entity sets. Higher and lower level entity sets also may be designated by-
the terms super class and subclass, respectively.
Aggregation: Aggregation is used when we have to model a relationship
involving entity set and a relationship set. Aggregation is an abstraction through
which relationships are treated as higher level entities.
Integrity Constraints:
Necessary conditions to be satisfied by the data values in the relational instances
so that the set of data values constitute a meaningful database.
There are four types of Integrity constraints
Domain Constraint: The value of attribute must be within the domain.
Key Constraint: Every relation must have a primary key.
Entity Integrity Constraint: Primary key of a relation should not contain NULL
values.
Referential Integrity Constraint: In relational model, two relations are related to
each other over the basis of attributes. Every value of referencing attributes must
be NULL or be available in the referenced attribute.
There Schema Refinement/Normalization
Decomposition of complex records into simple records. Normalization reduces
redundancy using non-loss decomposition principle.
Decomposition
Splitting a relation R into two or more sub relation R1 and R2. A fully normalized
relation must have a primary key and a set of attributes.
Decomposition should satisfy: (i) Lossless join, and (ii) Dependency
preservence
Lossless Join Decomposition

Join between the sub relations should not create any additional tuples or there
should not be a case such that more number of tuples in R1 than R2
R ⊆ R1 R2 ⇒ (Lossy)
R ≡ R1 R2 ⇒ (Lossless)
Dependency Preservence: Because of decomposition, there must not be loss of
any single dependency.
Functional Dependency (FD): Dependency between the attribute is known as
functional dependency. Let R be the relational schema and X, Y be the non-
empty sets of attributes and t1, t2, ... ,tn are the tuples of relation R. X → Y {values
for X functionally determine values for Y}
Trivial Functional Dependency: If X ⊇ Y, then X → Y will be trivial FD.
Here, X and Y are set of attributes of a relation R.

In trivial FD, there must be a common attribute at both the sides of ‘→’ arrow.
Non-Trivial Functional Dependency: If X ∩ Y = φ (no common attributes) and X
→ Y satisfies FD, then it will be a non-trivial FD.
(no common attribute at either side of ‘→’ arrow)
Case of semi-trivial FD
Sid → Sid Sname (semi-trivial)
Because on decomposition, we will get
Sid → Sid (trivial FD) and
Sid → Sname (non-trivial FD)
Properties of Functional Dependence (FD)
 Reflexivity: If X ⊇ Y, then X → Y (trivial)

 Transitivity: If X → Y and Y → Z, then X → Z
 Augmentation: If X → Y, then XZ → YZ
 Splitting or Decomposition: If X → YZ, then X → Y and X → Z
 Union: If X → Y and X → Z, then X → YZ
Attribute Closure: Suppose R(X, Y, Z) be a relation having set of attributes i.e.,

(X, Y, Z), then (X+) will be an attribute closure which functionally determines other
attributes of the relation (if not all then atleast itself).
Normal Forms/Normalization:
In relational database design, the normalization is the process for organizing data
to minimize redundancy. Normalization usually involves dividing a database into
two or more tables and defining relationship between the tables. The normal
forms define the status of the relation about the individuated attributes. There are
five types of normal forms
First Normal Form (1NF): Relation should not contain any multivalued attributes
or relation should contain atomic attribute. The main disadvantage of 1NF is high
redundancy.
Second Normal Form (2NF): Relation R is in 2NF if and only if R should be in
1NF, and R should not contain any partial dependency.
Partial Dependency: Let R be the relational schema having X, Y, A, which are
non-empty set of attributes, where X = Any candidate key of the relation, Y =
Proper subset of any candidate key, and A = Non-prime attribute (i.e., A doesn't
belong to any candidate key)
In the above example, X → A already exists and if Y → A will exist, then it will
become a partial dependency, if and only if
 Y is a proper subset of candidate key.

 A should be non-prime attribute.
If any of the above two conditions fail, then Y → A will also become fully
functional dependency.
Full Functional Dependency: A functional dependency P → Q is said to be fully
functional dependency, if removal of any attribute S from P means that the
dependency doesn't hold any more.
(Student_Name, College_Name → College_Address)
Suppose, the above functional dependency is a full functional dependency, then
we must ensure that there are no FDs as below.
(Student_Name → College_Address)
or (College_Name → Collage_Address)
Third Normal Form (3NF): Let R be a relational schema, then any non-trivial FD
X → Y over R is in 3NF, if X should be a candidate key or super key or Y should
be a prime attribute.
 Either both of the above conditions should be true or one of them should be
true.
 R should not contain any transitive dependency.
 For a relation schema R to be a 3NF, it is necessary to be in 2NF.
Transitive Dependency: A FD, P → Q in a relation schema R is a transitive if
 There is a set of attributes Z that is not a subset of any key of R.

 Both X → Z and Z → Y hold
 The above relation is in 2NF.

 In relation R1, C is not a candidate key and D is non-prime attribute. Due to
this, R1 fails to satisfy 3NF condition. Transitive dependency is present
here.
AB → C and C → D, then AB → D will be transitive.

Boycee Codd Normal Form (BCNF): Let R be the relation schema and X → Y
be the any non-trivial FD over R is in BCNF if and only if X is the candidate key or
super key.
If R satisfies this dependency, then of course it satisfy 2NF and 3NF.
Summary of 1 NF, 2 NF and 3 NF:

Fourth Normal Form (4NF): 4NF is mainly concerned with multivalued
dependency A relation is in 4NF if and only if for every one of its non-trivial
multivalued dependencies X →→Y, X is a super key (i.e., X is either a candidate
key or a superset).
Fifth Normal Form (5NF): It is also 'known as Project Join Normal From (PJ/NF).
5NF reduces redundancy in relational database recording multivalued facts by
isolating semantically related multiple relationships. A table or relation is said to
be in the 5NF, if and only if every join dependency in it is implied by the candidate
keys.
SQL:
Structured Ouery language (SQL) is a language that provides an interface to
relational database systems. SQL was developed by IBM in the 1970, for use in
system R and is a defector standard, as well as an ISO and ANSI standard.
 To deal with the above database objects, we need a programming

language and that programming languages is known as SOL.
Three subordinate languages of SOL are :

Type of SQL Statement SQL Keyword Function
Used to define change
CREATE and drop the structure of
Data Definition ALTER a table
Language(DDL) DROP
TRUNCATE Used to remove all rows
from a table
SELECT
INSERT INTO Used to enter, modify,
Data manipulation
delete and retrieve data
language(DML) UPDATE from a table
DELETE FROM
GRANT Used to provide control

over the data in a
REVOKE
Data Control Language database
(DCL)
COMMIT
Used to define the end
ROLLBACK of a transaction
Data Definition Language (DDL) :
It includes the commands as
 CREATE To create tables in the database.

 ALTER To modify the existing table structure:
 DROP To drop the table with table structure.
Data Manipulation Language(DML)

It is used to insert, delete, update data and perform queries on these tables.
Some of the DML commands are given below.
 INSERT To insert data into the table.

 SELECT To retrieve data from the table.
 UPDATE To-update existing data in the table.
 DELETE To delete data from the table.
Data Control Language (DCL)
It is used to control user's access to the database objects. Some of the DCL
commands are:
 GRANT Used to grant select/insert/delete access.

 REVOKE Used to revoke the provided access
Transaction Control Language (TCL): It is used to manage changes affecting

the data.
 COMMIT To save the work done, such as inserting or updating or deleting

data to/from the table.
 ROLLBACK To restore database to the original state, since last commit.
 SQL Data Types SQL data types specify the type, size and format of
data/information that can be stored in columns and variables.
Constraint Types with Description
Default Constraint: It is used to insert a default value into a column, if no other

value is specified at the time of insertion.
Syntax
CREATE TABLE Employee
{
Emp_idint NOT NULL,
Last_Name varchar (250),
City varchar (50)OEFAULT *BANGALURU*
}
DDL Commands
1. CREATE TABLE < Tab1e_Name>

{
Co1umn_name 1< data_type >,
Column_name 2 < d'lta_type >
}
2. ALTER TABLE < Table_Name>
ALTER Column < Column_Name> SET NOT NULL
3. RENAME < object_type >object_name > to <new_name>
4. DROP TABLE <Table_Name>
DML Commands
SELECT A1, A2, A3……,An what to return
FROM R1, R2, R3, ….., Rm relations or table
WHERE condition filter condition i.e., on what basis, we want to restrict the
outcome/result.
If we want to write the above SQL script in the form of relational calculus, we use
the following syntax
Comparison operators which we can use in filter condition are (=, >, <, > = , < =,
< >,) ‘< >’ means not equal to.
INSERT Statement: Used to add row (s) to the tables in a database
INSERT INTO Employee (F_Name, L_Name) VALUES ('Atal', 'Bihari')
UPDATE Statement: It is used to modify/update or change existing data in single
row, group of rows or all the rows in a table.
Example:
//Updates some rows in a table.
UPDATE Employee
SET City = ‘LUCKNOW’
WHERE Emp_Id BETWEEN 9 AND 15;
//Update city column for all the rows
UPDATE Employee SET City=’LUCKNOW’;
DELETE Statement: This is used to delete rows from a table,
Example:
//Following query will delete all the rows from Employee table
DELETE Employee
Emp_Id=7;
DELETE Employee
ORDER BY Clause: This clause is used to, sort the result of a query in a specific
order (ascending or descending), by default sorting order is ascending.
SELECT Emp_Id, Emp_Name, City FROM Employee
WHERE City = ‘LUCKNOW’
ORDER BY Emp_Id DESC;
GROUP BY Clause: It is used to divide the result set into groups. Grouping can
be done by a column name or by the results of computed columns when using
numeric data types.
 The HAVING clause can be used to set conditions for the GROUPBY
clause.
 HAVING clause is similar to the WHERE clause, but having puts conditions
on groups.
 WHERE clause places conditions on rows.
 WHERE clause can’t include aggregate: function, while HAVING conditions
can do so.
Example:
SELECT Emp_Id, AVG (Salary)
FROM Employee
GROUP BY Emp_Id
HAVING AVG (Salary) > 25000;
Aggregate Functions
Joins: Joins are needed to retrieve data from two tables' related rows on the
basis of some condition which satisfies both the tables. Mandatory condition to
join is that atleast one set of column (s) should be taking values from same
domain in each table.
Inner Join: Inner join is the most common join operation used in applications and
can be regarded as the default join-type. Inner join creates a new result table by
combining column values of two tables (A and B) based upon the join-predicate.
These may be further divided into three parts.
1. Equi Join (satisfies equality condition)

2. Non-Equi Join (satisfies non-equality condition)
3. Self Join (one or more column assumes the same domain of values).
Outer Join: An outer join does not require each record in the two joined tables to
have a matching record. The joined table retains each record-even if no other
matching record exists.
Considers also the rows from table (s) even if they don't satisfy the joining
condition
(i) Right outer join (ii) Left outer join (iii) Full outer join
Left Outer Join: The result of a left outer join for table A and B always contains
all records of the left table (A), even if the join condition does not find any
matching record in the right table (B).
Result set of T1 and T2

Right Outer Join: A right outer closely resembles a left outer join, except with
the treatment of the tables reversed. Every row from the right table will appear in
the joined table at least once. If no matching with left table exists, NULL will
appear.
Result set of T1 and T2
Full Outer Join: A full outer join combines the effect of applying both left and
right outer joins where records in the FULL OUTER JOIN table do not match, the
result set will have NULL values for every column of the table that lacks a
matching row for those records that do match, as single row will be produced in
the result set.
Result set of T1 and T2 (Using tables of previous example)
Cross Join (Cartesian product): Cross join returns the Cartesian product of rows
form tables in the join. It will produce rows which combine each row from the first
table with each row from the second table.
Select * FROM T1, T2
Number of rows in result set = (Number of rows in table 1 × Number of rows in
table 2)
Result set of T1 and T2 (Using previous tables T1 and T2)
StructureStorage:
The storage structure can be divided into two categories:
Volatile storage: As the name suggests, a volatile storage cannot survive
system crashes. Volatile storage devices are placed very close to the CPU;
normally they are embedded onto the chipset itself. For example, main memory
and cache memory are examples of volatile storage. They are fast but can store
only a small amount of information.
Non-volatile storage: These memories are made to survive system crashes.
They are huge in data storage capacity, but slower in accessibility. Examples
may include hard-disks, magnetic tapes, flash memory, and non-volatile (battery
backed up) RAM.
File Organisation:
The database is stored as a collection of files. Each file is a sequence of records.
A record is a sequence of fields. Data is usually stored in the form of records.
Records usually describe entities and their attributes. e.g., an employee record
represents an employee entity and each field value in the record specifies some
attributes of that employee, such as Name, Birth-date, Salary or Supervisor.
Allocating File Blocks on Disk: There are several standard techniques for
allocating the blocks of a file on disk
 Contiguous Allocation: The file blocks are allocated to consecutive disk

blocks. This makes reading the whole file very fast.
 Linked Allocation: In this, each file contains a pointer to the next file block.
 Indexed Allocation: Where one or more index blocks contain pointers to
the actual file blocks.
Files of Unordered Records (Heap Files): In the simplest type of organization

records are placed in the file in the order in which they are inserted, so new
records are inserted at the end of the file. Such an organisation is called a heap
or pile file.
This organisation is often used with additional access paths, such as the
secondary indexes.
In this type of organisation, inserting a new record is very efficient. Linear search
is used to search a record.
Files of Ordered Records (Sorted Files): We can physically order the records of
a file on disk based on the values of one of their fields called the ordering field.
This leads to an ordered or sequential file. If the ordering field is also a key field
of the file, a field guaranteed to have a unique value in each record, then the field
is called the ordering key for the file. Binary searching is used to search a record.
Indexing Structures for Files: Indexing mechanism are used to optimize certain
accesses to data (records) managed in files. e.g., the author catalog in a library is
a type of index. Search key (definition) attribute or combination of attributes used
to look-up records in a file.
An index file consists of records (called index entries) of the form.
Index files are typically much smaller than the original file because only the
values for search key and pointer are stored. The most prevalent types of indexes
are based on ordered files (single-level indexes) and tree data structures
(multilevel indexes).
Types of Single Level Ordered Indexes: In an ordered index file, index enteries
are stored sorted by the search key value. There are several types of ordered
Indexes
Primary Index: A primary index is an ordered file whose records are of fixed
length with two fields. The first field is of the same data type as the ordering key
field called the primary key of the data file and the second field is a pointer to a
disk block (a block address).
 There is one index entry in the index file for each block in the data file.
 Indexes can also be characterised as dense or sparse.
 Dense index A dense index has an index entry for every search key value
in the data file.
 Sparse index A sparse index (non-dense), on the other hand has index
entries for only some of the search values.
 A primary index is a non-dense (sparse) index, since it includes an entry for
each disk block of the data file rather than for every search value.
Clustering Index: If file records are physically ordered on a non-key field which
does not have a distinct value for each record that field is called the clustering
field. We can create a different type of index, called a clustering index, to speed
up retrieval of records that have the same value for the clustering field.
 A clustering index is also an ordered file with two fields. The first field is of
the same type as the clustering field of the data file.
 The record field in the clustering index is a block pointer.
 A clustering index is another example of a non-dense index.
Secondary Index: A secondary index provides a secondary means of accessing

a file for which some primary access already exists. The secondary index may be
on a field which is a candidate key and has a unique value in every record or a
non-key with duplicate values. The index is an ordered file with two fields. The
first field is of the same data type as some non-ordering field of the data file that
is an indexing field. The second field is either a block pointer or a record pointer.
A secondary index usually needs more storage space and longer search time
than does a primary index.
Multilevel Indexes: The idea behind a multilevel index is to reduce the part of
the index. A multilevel index considers the index file, which will be referred now
as the first (or base) level of a multilevel index. Therefore, we can create a
primary index for the first level; this index to the first level is called the second
level of the multilevel index and so on.
Dynamic Multilevel Indexes Using B-Trees and B+ -Trees: There are two
multilevel indexes
B-Trees
 When data volume is large and does not fit in memory, an extension of the
binary search tree to disk based environment is the B-tree.
 In fact, since the B-tree is always balanced (all leaf nodes appear at the
same level), it is an extension of the balanced binary search tree.
 The problem which the B-tree aims to solve is given a large collection of
objects, each having a key and a value, design a disk based index structure
which efficiently supports query and update.
 A B-tree of order p, when used as an access structure on a key field to
search for records in a data file, can be defined as follows
1. Each internal node in the B-tree is of the form
where, q ≤ p
Each Pi is a tree pointer to another node in the B-tree.
Each is a data pointer to the record whose search key field value is
equal to Kj.
2. Within each node, K1 < K2 < …. < Kq–1
3. Each node has at most p tree pointers.
4. Each node, except the root and leaf nodes, has atleast [(p/2)] tree
pointers.
5. A node within q tree pointers q ≤ p, has q – 1 search key field values
(and hence has q –1 data pointers).
e.g., A B-tree of order p = 3. The values were inserted in the order 8,
5, 1, 7, 3, 12, 9, 6.
B+ Trees
 It is the variation of the B-tree data structure.
 In a B-tree, every value of the search field appears once at some level in
the tree, along with a data pointer. In a B+-tree, data pointers are stored
only at the leaf nodes of the tree. Hence, the structure of the leaf nodes
differs from the structure of internal nodes.
 The pointers in the internal nodes are tree pointers to blocks that are tree
nodes whereas the pointers in leaf nodes are data pointers.
 B+ Tree's Structure: The structure of the B+-tree of order p is as follows
1. Each internal node is of the form < Pl, K1, P2, K2, …. ,Pq–1, Kq–1, Pq>
where, q ≤ P and each Pi is a tree pointer.
2. Within each internal node, K1 < K2 < K3…. < Kq–1.
3. Each internal node has at most p tree pointers and except root, has
atleast [(p/ 2)] tree pointers.
4. The root node has atleast two tree pointers, if it is an internal node.
5. Each leaf node is of the form: where, q ≤ p, each is a data pointer
and Pnext points to the next leaf node of the B+-trees.
ALL THE BEST!!

IP Address
An IP address is a fascinating product of modern computer technology designed
to allow one computer (digital device) to communicate with another via the
Internet.
There are five classes of available IP ranges: Class A, Class B, Class C, Class
D and Class E, while only A, B, and C are commonly used. Each class allows for
a range of valid IP addresses, shown in the following table.
Class Address Range Supports
1.0.0.1 to Supports 16 million hosts on each of 126
Class A
126.255.255.254 networks.
128.1.0.1 to Supports 65,000 hosts on each of 16,000
Class B
191.255.255.254 networks.
192.0.1.1 to Supports 254 hosts on each of 2 million
Class C
223.255.254.254 networks.
224.0.0.0 to
Class D Reserved for multicast groups.
239.255.255.255
240.0.0.0 to Reserved for future use, or Research and
Class E
254.255.255.254 Development Purposes.
Ranges 127.x.x.x are reserved for the loopback or local host, for
example, 127.0.0.1 is the common loopback address.
Range 255.255.255.255 broadcasts to all hosts on the local network.
Automatically assigned addresses:
There are several IP addresses that are automatically assigned when you set up
a home network. These default addresses are what allow your computer and
other network devices to communicate and broadcast information over your
network.
Most commonly assigned default addresses for home networks:
192.168.1.0 0 is the automatically assigned network address.
192.168.1.1
1 is the commonly used address used as the gateway.
192.168.1.2
2 is also a commonly used address used for a gateway.
192.168.1.3 254 Addresses beyond 3 are assigned to computers and

devices on the network.
192.168.1.255255 is automatically assigned on most networks as the
broadcast address.
If you have ever connected to your home network, you should be familiar with
the gateway address or 192.168.1.1, which is the address you use to connect to
your home network router to change its settings.
Getting an IP address
By default the router you use will assign each of your computers their own IP
address, often using NAT to forward the data coming from those computers to
outside networks such as the Internet.
If you need to register an IP address that can be seen on the Internet, you must
register through Inter NIC or use a web host that can assign you addresses.
Anyone who connects to the Internet is assigned an IP address by their Internet
Service Provider (ISP), which has registered a range of IP addresses.
For example, let's assume your ISP is given 100 addresses, 109.145.93.150-
250.In this range, the ISP owns addresses 109.145.93.150 to 109.145.93.250and
can assign any address in that range to its customers.
So, all these addresses belong to your ISP until they are assigned to a customer
computer.
In the case of a dial-up connection, you are given a new IP address each time
you dial into your ISP. With most broadband Internet service providers, you are
always connected to the Internet your address rarely changes. It remains the
same until the service provider requires otherwise.
 IPv4 : 32 bits and IPv6 : 128 bits

 Class A IP address format is:
0NNNNNNN.HHHHHHHH.HHHHHHHH.HHHHHHHH
 Class B IP address format is:
10NNNNNN.NNNNNNNN.HHHHHHHH.HHHHHHHH
 Class C IP address format is:
110NNNNN.NNNNNNNN.NNNNNNNN.HHHHHHHH
 Class D is reserved for Multicasting. In multicasting data is not destined for
a particular host, that is why there is no need to extract host address from
the IP address, and Class D does not have any subnet mask.
 Class E too is not equipped with any subnet mask.
 No of Networks = 2network bits No of Hosts= 2host bits – 2 4.
 Internet Corporation for Assigned Names and Numbers (ICANN) is
responsible for assigning IP addresses.
As to assigning addresses to devices, two general types of addresses can

be used: public and private.
Public addresses
Public addresses are Class A, B, and C addresses that can be used to access
devices in other public networks, such as the Internet.
Private Addresses
Within the range of addresses for Class A, B, and C addresses are some
reserved addresses, commonly called private addresses.
 Anyone can use private addresses; however, this creates a problem if you
want to access the Internet.
 Remember that each device in the network must have a unique IP address.
 If two networks are using the same private addresses, you would run into
reachability issues.
 To access the Internet, your source IP addresses must have a unique
Internet public address. This can be accomplished through address
translation. Here is a list of private addresses.
"All our dreams can come true if we have the courage to pursue them."
Thanks!!
Morning Shift Profession Knowledge Question:
 TCP and UDP functions

 Standby in oracle DB
 Security protocols in network layer
 Spoofing
 In which phase of SDLC prototyping is used
 What does (*) mean in sql
 What is Kernel
 Question related to primary key
 Which file is part of oracle database
 Data Mining Tool
 Security is supported by 802.11b access point
 Data warehouse support (OLAP, OLTP, operational db)
 Class A assignable range
 In database of 50 rows we have to select 20 rows, what operation should
be used
 Disaster recovery plan time and cost
 Question related to oracle 9i shared pool
 Which one is middle ware technology
 What is Key-logger
 Function of physical layer
 Group of server that share task
 Which query structure is true
 Message is convert into non-readable is called – encryption
 Data and Method together inside
 What is prototyping?
 In which stage prototyping will come?
 Keyloggers comes under?
 Class of IP and its range?
 Dataware house supports oltp or olap or both?
 What is cloud computing
 OSI layer in which routing will be done
 What will be done in OSI layer 1
 One question related to which of the following “SELECT” query is correct in
the following queries
 Network of servers to work in case of one server is failed
 Kernel means?
 If two tables are having same “primary key” in 2 tables, what will
happened(scenarios were given as options)
Evening Shift Profession Knowledge Question:
 Full form of CIDR – Classless Inter-Domain Routing

 Which technique is used for estimating the cost of a SDLC model
 2-3 Questions Of Oracle 9i
 The unauthorised way of manipulating data while entering data
 The digital signature provides what for data
 What is main purpose of NOS
 What is primary objective of SDLC model
 Which amongst following is not part of Sql commands – Add and Replace
 Which amongst is basic Life cycle model – Waterfall
 The flowchart of data flowing in system design – Data Flow Diagram
 The conceptual level of database is managed by whom
 Which amongst provides Security in Network layer unlike Application layer
 The environment of Oracle database is called
 What is another name of wireless communication
 One question was on data marts
 2-3 questions were based on WebApp
 What is server farming
 What is full form of SOP in ITIL – (SOP) Standard Operating Procedure and
ITIL (Information Technology Infrastructure Library)
 Mapping of data,atributions and functions is in which level of database?
 Which of the following Is true about three tier database architecture?
 One question was based on green testing
 Which of following is true about Enterprises
 During Point to Point network in Ospf, the next address of node will be
 The system analysts discuss with end users about implementing new
features in a IT software model. What is it called?
 In which OSI layer, the connectors are being used – Physical Layer
 2-3 questions were based on Cloud computing

Data Structure

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Structure

Uploaded by

Copyright:

Available Formats

Data Structure

 How to represent data element(s).

Types of Data Structure:

Abstract Data Structure: Used to store large and connected data.

 Create: Used to allocate/reserve memory for the data element(s).

Implementation of Stack: A stack can be implemented using two ways: Array

This prefix expression is equivalent to A + (B * C) infix expression. Prefix notation

Queue Implementation: Queue can be implemented in two ways.

 Static implementation (using arrays)

Double Ended Queue (DEQUE): It is a list of elements in which insertion and

 Breadth first Search can be implemented.

 Info part which stores the information.

 The last node has a reference to NULL.

Disadvantages of Linked List:

 More memory is required, if the number of fields are, more.

Types of Linked Lists

Tree is a non-linear and hierarchical Data Structure.

Trees are used to represent data containing a hierarchical relationship between

 Node: Each data item in a tree.

Trees can be used

 for underlying structure in decision-making algorithms

A binary tree T is defined as a finite set of elements such that

 T is empty (called NULL tree or empty tree).

Any node N in a binary tree T has either 0, 1 or 2 successors. Level l of a binary

 Number of nodes on each level i of binary tree is at most 2i

Types of Binary Tree:

 Binary search tree

“A man is great by deeds, not by birth.”

 All C programs must have a function in it called main

Different Types of Modifier with their Range:

C Flow Control Statements

Control statement is one of the instructions, statements or group of statement in a

1. Branching (deciding what action to take)

If Statement: It takes an expression in .parenthesis and a statement or block of

Loop Control Structure

Loops provide a way to repeat commands and control. This involves

 Storage Location Memory

Auto is the default storage class for all local variables.

 Storage Location CPU register

 Storage Location Memory

Extern Storage Class: Extern is used of give a reference of a global variable

 Storage Location: Memory

Operator Precedence Relations:

 A function is a self-contained block of statements that perform a coherent

Call by Value: If we pass values of variables to the function as parameters, such

GCD Recursive Function:

Fibonacci Sequence Recursive Function:

Power Recursive Function (xy):

 ‘&’ =Address of operator

 *p++ accesses the thing pointed to by p and increments p

 Assignment: You can assign an address to a pointer. Typically, you do

 It is a collection of similar elements (having same data type.

In C language, one can have arrays of any dimensions. Let us consider a 3 × 3

 For Row Major Order: Size of b[i][j] = b + ( Number of rows * i + j

 '\0' = Null character whose ASCII value is O.

 %S = It is used in printf( ) as a format specification for printing out a string.

Structures: Structures in C are used to encapsulate or group together different

 Privacy: The ability to keep things private/confidential.

Why do I need to learn about Computer Security ?

 10% of security safeguards are technical.

 Back doors : A back door in a computer system, a cryptosystem is any

The following are some well-known viruses.

 CodeRed : It is a worm that infects a computer running Microsoft IIS

A computer system can be protected from virus by following precautions: