Professional Documents
Culture Documents
A data structure is a specialised way for organising and storing data in memory,
so that one can perform operations on it.
For example:
We have data player's name "Dhoni" and age 35. Here "Dhoni" is of String data
type and 35 is of integer data type.
Now we can organise this data as a record like Player record.
We can collect and store player's records in a file or database as a data structure.
For example: "Dhoni" 35, "Rahul" 24, "Rahane" 28.
"Data Structures are structures programmed to store ordered data so that various
operations can be performed on it easily."
Data structure is all about:
Linked List
Tree
Graph
Stack, Queue etc.
Operations on Data Structures: The operations involve in data structure are as
follows.
Stack
A stack is an ordered collection of items into which new items may be inserted
and from which items may be deleted at one end, called the TOP of the stack. It
is a LIFO (Last In First Out) kind of data structure.
Operations on Stack:
Push: Adds an item onto the stack. PUSH (s, i); Adds the item i to the top
of stack.
Pop: Removes the most-recently-pushed item from the stack. POP (s);
Removes the top element and returns it as a function value.
size(): It returns the number of elements in the queue.
isEmpty(): It returns true if queue is empty.
Backtracking. This is a process when you need to access the most recent
data element in a series of elements.
Depth first Search can be implemented.
Function Calls: Different ways of organising the data are known as data
structures.
Simulation of Recursive calls: The compiler uses one such data structure
called stack for implementing normal as well as recursive function calls.
Parsing: Syntax analysis of compiler uses stack in parsing the program.
Expression Evaluation: How a stack can be used for checking on syntax
of an expression.
o Infix expression: It is the one, where the binary operator comes
between the operands.
e. g., A + B * C.
o Postfix expression: Here, the binary operator comes after the
operands.
e.g., ABC * +
o Prefix expression: Here, the binary operator proceeds the
operands.
e.g.,+ A * BC
Reversing a List: First push all the elements of string in stack and then
pop elements.
Expression conversion: Infix to Postfix, Infix to Prefix, Postfix to Infix, and
Prefix to Infix
Implementation of Towers of Hanoi
Computation of a cycle in the graph
Queue
It is a non-primitive, linear data structure in which elements are added/inserted at
one end (called the REAR) and elements are removed/deleted from the other end
(called the FRONT). A queue is logically a FIFO (First in First Out) type of list.
Operations on Queue:
Enqueue: Adds an item onto the end of the queue ENQUEUE(Q, i); Adds
the item i onto the end of queue.
Dequeue: Removes the item from the front of the queue. DEQUEUE (Q);
Removes the first element and returns it as a function value.
Circular Queue: In a circular queue, the first element comes just after the last
element or a circular queue is one in which the insertion of a new element is done
at the very first location of the queue, if the last location of queue is full and the
first location is empty.
Note:- A circular queue overcomes the problem of unutilised space in
linear queues implemented as arrays.
We can make following assumptions for circular queue.
Front will always be pointing to the first element (as in linear queue).
If Front = Rear, the queue will be empty.
Each time a new element is inserted into the queue, the Rear is
incremented by 1.
Rear = Rear + 1
Each time, an element is deleted from the queue, the value of Front is
incremented by one.
Front = Front + 1
Each element (node) of a list is comprising of two items: the data and a
reference to the next node.
The Syntax of declaring a node which contains two fields in it one is for storing
information and another is for storing address of other node, so that one can
traverse the list.
Advantages of Linked List:
Linked lists are dynamic data structure as they can grow and shrink during
the execution time.
Efficient memory utilisation because here memory is not pre-allocated.
Insertions and deletions can be done very easily at the desired position.
Operations on Linked Lists: The following operations involve in linked list are
as given below
Creation: Used to create a liked list.
Insertion: Used to insert a new node in linked list at the specified position.
A new node may be inserted
o At the beginning of a linked list
o At the end of a linked list
o At the specified position in a linked list
o In case of empty list, a new node is inserted as a first node.
Deletion: This operation is basically used to delete as item (a node). A
node may be deleted from the
o Beginning of a linked list.
o End of a linked list.
o Specified position in the list.
Traversing: It is a process of going through (accessing) all the nodes of a
linked list from one end to the other end.
Singly Linked List: In this type of linked list, each node has only one
address field which points to the next node. So, the main disadvantage of
this type of list is that we can’t access the predecessor of node from the
current node.
Doubly Linked List: Each node of linked list is having two address fields
(or links) which help in accessing both the successor node (next node) and
predecessor node (previous node).
Circular Linked List: It has address of first node in the link (or address)
field of last node.
Circular Doubly Linked List: It has both the previous and next pointer in
circular manner.
Tree
Binary Tree:
A binary tree is a tree like structure that is rooted and in which each node has at
most two children and each child of a node is designated as its left or right
child. In this kind of tree, the maximum degree of any node is at most 2.
Graphs
A graph is a collection of nodes called vertices, and the connections between
them, called edges.
Directed Graph: When the edges in a graph have a direction, the graph is called
a directed graph or digraph and the edges are called directed edges or arcs.
Adjacency: If (u,v) is in the edge set we say u is adjacent to v.
Path: Sequence of edges where every edge is connected by two vertices.
Loop: A path with the same start and end node.
Connected Graph: There exists a path between every pair of nodes, no node is
disconnected.
Acyclic Graph: A graph with no cycles.
Weighted Graphs: A weighted graph is a graph, in which each edge has a
weight.
Weight of a Graph: The sum of the weights of all edges.
Connected Components: In an undirected graph, a connected component is a
subset of vertices that are all reachable from each other. The graph is connected
if it contains exactly one connected component, i.e. every vertex is reachable
from every other. Connected component is a maximal connected subgraph.
Subgraph: subset of vertices and edges forming a graph.
Tree: Connected graph without cycles.
Forest: Collection of trees
In a directed graph, a strongly connected component is a subset of mutually
reachable vertices, i.e. there is a path between every two vertices in the set.
Weakly Connected component: If the connected graph is not strongly
connected then it is weakly connected graph.
Graph Representations: There are many ways of representing a graph:
Adjacency List
Adjacency Matrix
Incidence list
Incidence matrix
Character Set: The characters that can be used to form words, numbers and
expressions depend upon the computer on which the program runs. The
characters in C are grouped into the following categories: Letters, Digits, Special
characters and White spaces.
C Tokens: The smallest individual units are known as C tokens. C has six
types of tokens. They
are: Keywords, Identifiers, Constants, Operators, String and Special
symbols.
Keywords: All keywords are basically the sequences of characters that have
one or more fixed meanings. All C keywords must be written in lower case letters.
e.g., break, char,int, continue, default, do etc.
Identifiers: A C identifier is a name used to identify a variable, function, or any
other user-defined item. An identifier starts with a letter A to Z, a to z, or an
underscore '_' followed by zero or more letters, underscores, and digits (0 to 9).
Constants: Fixed values that do not change during the execution of a C
program.
Backslash character constants are used in output functions. e.g., '\b' used for
backspace and '\n' used for new line etc.
Operator: It is the symbol that tells the computer to perform certain mathematical
or logical manipulations. e.g., Arithmetic operators (+, -, *, /) etc.
String: A string is nothing but an array of characters (printable ASCII characters).
Delimiters /Separators: These are used to separate constants, variables and
statements e.g., comma, semicolon, apostrophes, double quotes and blank
space etc.
Variable:
A variable is nothing but a name given to a storage area that our programs
can manipulate.
Each variable in C has a specific type, which determines the size and
layout of the variable's memory.
The range of values that can be stored within that memory and the set of
operations that can be applied to the variable.
Data Types
Implicit Type Conversion: There are certain cases in which data will get
automatically converted from one type to another:
o When data is being stored in a variable, if the data being stored does
not match the type of the variable.
o The data being stored will be converted to match the type of the
storage variable.
o When an operation is being performed on data of two different
types. The "smaller" data type will be converted to match the "larger"
type.
The following example converts the value of x to a double
precision value before performing the division. Note that if the
3.0 were changed to a simple 3, then integer division would be
performed, losing any fractional values in the result.
average = x / 3.0;
o When data is passed to or returned from functions.
Explicit Type Conversion: Data may also be expressly converted, using
the typecast operator.
o The following example converts the value of x to a double precision
value before performing the division. ( y will then be implicitly
promoted, following the guidelines listed above. )
average = ( double ) x / y;
Note that x itself is unaffected by this conversion.
Expression:
lvalue:
o Expressions that refer to a memory location are called "lvalue"
expressions.
o An lvalue may appear as either the left-hand or right-hand side of an
assignment.
o Variables are lvalues and so they may appear on the left-hand side of
an assignment
rvalue:
o The term rvalue refers to a data value that is stored at some address
in memory.
o An rvalue is an expression that cannot have a value assigned to it
which means an rvalue may appear on the right-hand side but not on
the left-hand side of an assignment.
o Numeric literals are rvalues and so they may not be assigned and
cannot appear on the left-hand side.
The switch Statement: The switch statement tests the value of a given variable
(or expression) against a list of case values and when a match is found, a block
of statements associated with that case is executed:
The Conditional Operators (?, : )
The ?, : operators are just like an if-else statement except that because it is an
operator we can use it within expressions. ? : are a ternary operators in that it
takes three values. They are the only ternary operator in C language.
flag = (x < 0) ? 0 : 1;
This conditional statement can be evaluated as following with equivalent if else
statement.
if (x < 0) flag = 0;
else flag = 1;
C Variable Types
A variable is just a named area of storage that can hold a single value. There are
two main variable types: Local variable and Global variable.
Local Variable: Scope of a local variable is confined within the block or function,
where it is defined.
Global Variable: Global variable is defined at the top of the program file and it
can be visible and modified by any function that may reference it. Global
variables are initialized automatically by the system when we define them. If
same variable name is being used for global and local variables, then local
variable takes preference in its scope.
Storage Classes in C
A variable name identifies some physical location within the computer, where the
string of bits representing the variable's value, is stored.
There are basically two kinds of locations in a computer, where such a value
maybe kept: Memory and CPU registers.
It is the variable's storage class that determines in which of the above two types
of locations, the value should be stored.
We have four types of storage classes in C: Auto, Register, Static and Extern.
Auto Storage Class: Features of this class are given below.
Static Storage Class: Static is the default storage class for global variables.
Features of static storage class are given below.
Pointers
A pointer is a variable that stores memory address. Like all other variables, it also
has a name, has to be declared and occupies some spaces in memory. It is
called pointer because it points to a particular location.
NULL Pointers
Uninitilized pointers start out with random unknown values, just like any
other variable type.
Accidentally using a pointer containing a random address is one of the
most common errors encountered when using pointers, and potentially one
of the hardest to diagnose, since the errors encountered are generally not
repeatable.
Combinations of * and ++
Pointer Operations:
Array
a = a+0 ≡ &a[0]
a+1 ≡ &a[1]
a+i ≡ &a[i]
&(*(a+i)) ≡ &a[i] ≡ a+i
*(&a[i]) ≡ *(a+i) ≡ a[i]
Address of an Array Element: a[i] = a + i * sizeof(element)
Strings
In C language, strings are stored in an array of character (char) type along with
the null terminating character "\0" at the end.
Example: char name[ ] = { 'G', 'A', 'T','E', 'T', 'O', 'P', '\O'};
Example: The lock on the door is the 10%. You remembering to lock the lock,
checking to see if the door is closed, ensuring others do not prop the door open,
keeping control of the keys, etc. is the 90%.
Threats classified into one of the categories below:
Good computing practices and tips that apply to most people who use a
computer.
Use passwords that can't be easily guessed and protect your passwords.
Minimize storage of sensitive information.
Beware of scams.
Protect information when using the Internet and email.
Make sure your computer is protected with anti-virus and all necessary
security "patches" and updates.
Secure laptop computers and mobile devices at all times: Lock them up or
carry them with you.
Shut down, lock, log off, or put your computer and other devices to
sleep before leaving them unattended and make sure they require a
secure password to start up or wake-up.
Don't install or download unknown or unsolicited programs/apps.
Secure your area before leaving it unattended.
Make backup copies of files or data you are not willing to lose.
Computer Viruses:
A virus is a parasitic program that infects another legitimate program, which is
sometimes called the host. To infect the host program, the virus modifies the
host so that it contains a copy of the virus.
Boot sector viruses: A boot sector virus infects the boot record of a hard
disk. The virus allows the actual boot sector data to be read as through a
normal start-up were occurring.
Cluster viruses: If any program is run from the infected disk, the program
causes the virus also to run . This technique creates the illusion that the
virus has infected every program on the disk.
Worms: A worm is a program whose purpose is to duplicate itself.
Bombs: This type of virus hides on the user’s disk and waits for a specific
event to occur before running.
Trojan Horses: A Trojan Horses is a malicious program that appears to be
friendly. Because Trojan Horses do not make duplicates of themselves on
the victim’s disk. They are not technically viruses.
Stealth Viruses: These viruses take up residence in the computer’s
memory, making them hard to detect.
Micro Viruses: A macro virus is designed to infect a specific type of
document file, such as Microsoft Word or Microsoft Excel files. These types
of documents can include macros, which are small programs that execute
commands.
Speed
Accuracy
Storage and Retrieval
Repeated Processing Capabilities
Reliability
Flexibility
Low cost
Input -Input data is prepared in some convenient form for processing. The
form will depend on the processing machine. For example, when electronic
computers are used, the input data could be recorded on any one of
several types of input medium, such as magnetic disks, tapes and so on.
Processing - In this step input data is changed to produce data in a more
useful form. For example, paychecks may be calculated from the time
cards, or a summary of sales for the month may be calculated from the
sales orders.
Output - The result of the proceeding processing step are collected. The
particular form of the output data depends on the use of the data. For
example, output data may be pay-checks for employees.
Software
Software represents the set of programs that govern the operation of a computer
system and make the hardware run.
This type of software is tailor-made software according to a user’s requirements.
Analog computers
Digital Computers
These computers take the input in the form of digits & alphabets &
converted it into binary format.
Digital computers are high speed, programmable electronic devices.
Signals are two level of (0 for low/off , 1 for high/on).
Accuracy unlimited.
Examples: Computer used for the purpose of business and education are
also example of digital computers.
Hybrid Computer
Super Computer
Mainframes
Mini Computer
Notebook Computers
Notebook computers typically weigh less than 6 pounds and are small
enough to fit easily in a briefcase.
Principal difference between a notebook computer and a personal
computer is the display screen.
Many notebook display screens are limited to VGA resolution.
Programming Languages
There are two major types of programming languages. These are Low Level
Languages and High Level Languages.
Low Level languages are further divided in to Machine language and Assembly
language.
Low Level Languages: The term low level means closeness to the way in which
the machine has been built. Low level languages are machine oriented and
require extensive knowledge of computer hardware and its configuration.
Machine Language : Machine Language is the only language that is directly
understood by the computer. It does not need any translator program. We also
call it machine code and it is written as strings of 1's (one) and 0’s (zero). When
this sequence of codes is fed to the computer, it recognizes the codes and
converts it in to electrical signals needed to run it.
For example, a program instruction may look like this: 1011000111101
It is not an easy language for you to learn because of its difficult to understand. It
is efficient for the computer but very inefficient for programmers. It is considered
to the first generation language.
Advantage:
Disadvantages
Assembly Language
It is the first step to improve the programming structure. You should know that
computer can handle numbers and letter. Therefore some combination of
letterscan be used to substitute for number of machine codes.
The set of symbols and letters forms the Assembly Language and a translator
program is required to translate the Assembly Language to machine language.
This translator program is called `Assembler'. It is considered to be a second-
generation language.
Advantages:
Disadvantages:
Assembly language is machine dependent.
A program written for one computer might not run in other computers with
different hardware configuration.
Higher level languages are simple languages that use English and
mathematical symbols like +, -, %, / for its program construction.
You should know that any higher level language has to be converted to
machine language for the computer to understand.
Higher level languages are problem-oriented languages because the
instructions are suitable for solving a particular problem.
Types of Database
Levels of Data Abstraction: A 3-tier architecture separates its tiers from each
other based on the complexity of the users and how they use the data present in
the database. It is the most widely used architecture to design a DBMS.
Physical Level: It is lowest level of abstraction and describes how the data
are actually stored and complex low level data structures in detail.
Logical Level: It is the next higher level of abstraction and describes what
data are stored and what relationships exist among those data. At the
logical level, each such record is described by a type definition and the
interrelationship of these record types is defined as well. Database
administrators usually work at this level of abstraction.
View Level: It is the highest level of abstraction and describes only part of
the entire database and hides the details of the logical level.
Relational Algebra:
Relational model is completely based on relational algebra. It consists of a
collection of operators that operate on relations. Its main objective is data
retrieval. It is more operational and very much useful to represent execution
plans, while relational calculus is non-operational and declarative. Here,
declarative means user define queries in terms of what they want, not in terms of
how compute it.
Intersection (∩): It forms a relation of rows/ tuples which are present in both the
relations R and S. Ensure that both relations are compatible for union and
intersection operations.
Set Difference (-): It allows us to find tuples that are in one relation but are not in
another. The expression R – S produces a relation containing those tuples in R
but not in S.
Cross Product/Cartesian Product (×): Assume that we have n1 tuples in R and
n2tuples in S. Then, there are n1 * n2 ways of choosing a pair of tuples; one tuple
from each relation. So, there will be (n1 * n2) tuples in result relation P if P = R × S.
Schema:
A schema is also known as database schema. It is a logical design of the
database and a database instance is a snapshot of the data in the database at a
given instant of time. A relational schema consists of a list of attributes and their
corresponding domains.
Types of Schemas: It can be classified into three parts, according to the levels
of abstraction
Data model :
A data model is a plan for building a database. Data models define how data is
connected to each other and how they are processed and stored inside the
system.
Two widely used data models are:
Entity :
An entity may be an object with a physical existence or it may be an object with a
conceptual existence. Each entity has attributes. A thing (animate or inanimate)
of independent physical or conceptual existence and distinguishable. In the
University database context, an individual student, faculty member, a class room,
a course are entities.
Attributes
Each entity is described by a set of attributes/properties.
Types of Attributes
Keys
A super key of an entity set is a set of one or more attributes whose values
uniquely determine each entity.
A candidate key of an entity set is a minimal super key
Customer-id is candidate key of customer
account-number is candidate key of account
Although several candidate keys may exist, one of the candidate keys is
selected to be the primary key.
NOTE: this means a pair of entity sets can have at most one relationship in
a particular relationship set.
If we wish to track all access-dates to each account by each customer, we
cannot assume a relationship for each access. We can use a multivalued
attribute though.
Must consider the mapping cardinality of the relationship set when deciding
the what are the candidate keys.
Need to consider semantics of relationship set in selecting the primary
key in case of more than one candidate key.
ER Modeling:
Entity-Relationship model (ER model) in software engineering is an abstract way
to describe a database. Describing a database usually starts with a relational
database, which stores data in tables.
Notations/Shapes in ER Modeling
Notations/Shapes in ER Modeling:
The overall logical structure of a database can be expressed graphically by an E-
R diagram. The diagram consists of the following major components.
Decomposition
Splitting a relation R into two or more sub relation R1 and R2. A fully normalized
relation must have a primary key and a set of attributes.
Decomposition should satisfy: (i) Lossless join, and (ii) Dependency
preservence
Case of semi-trivial FD
Sid → Sid Sname (semi-trivial)
Because on decomposition, we will get
Sid → Sid (trivial FD) and
Sid → Sname (non-trivial FD)
Properties of Functional Dependence (FD)
Normal Forms/Normalization:
In relational database design, the normalization is the process for organizing data
to minimize redundancy. Normalization usually involves dividing a database into
two or more tables and defining relationship between the tables. The normal
forms define the status of the relation about the individuated attributes. There are
five types of normal forms
First Normal Form (1NF): Relation should not contain any multivalued attributes
or relation should contain atomic attribute. The main disadvantage of 1NF is high
redundancy.
Second Normal Form (2NF): Relation R is in 2NF if and only if R should be in
1NF, and R should not contain any partial dependency.
Partial Dependency: Let R be the relational schema having X, Y, A, which are
non-empty set of attributes, where X = Any candidate key of the relation, Y =
Proper subset of any candidate key, and A = Non-prime attribute (i.e., A doesn't
belong to any candidate key)
In the above example, X → A already exists and if Y → A will exist, then it will
become a partial dependency, if and only if
If any of the above two conditions fail, then Y → A will also become fully
functional dependency.
Full Functional Dependency: A functional dependency P → Q is said to be fully
functional dependency, if removal of any attribute S from P means that the
dependency doesn't hold any more.
(Student_Name, College_Name → College_Address)
Suppose, the above functional dependency is a full functional dependency, then
we must ensure that there are no FDs as below.
(Student_Name → College_Address)
or (College_Name → Collage_Address)
Third Normal Form (3NF): Let R be a relational schema, then any non-trivial FD
X → Y over R is in 3NF, if X should be a candidate key or super key or Y should
be a prime attribute.
Either both of the above conditions should be true or one of them should be
true.
R should not contain any transitive dependency.
For a relation schema R to be a 3NF, it is necessary to be in 2NF.
It is used to control user's access to the database objects. Some of the DCL
commands are:
DML Commands
SELECT A1, A2, A3……,An what to return
FROM R1, R2, R3, ….., Rm relations or table
WHERE condition filter condition i.e., on what basis, we want to restrict the
outcome/result.
If we want to write the above SQL script in the form of relational calculus, we use
the following syntax
Comparison operators which we can use in filter condition are (=, >, <, > = , < =,
< >,) ‘< >’ means not equal to.
INSERT Statement: Used to add row (s) to the tables in a database
INSERT INTO Employee (F_Name, L_Name) VALUES ('Atal', 'Bihari')
UPDATE Statement: It is used to modify/update or change existing data in single
row, group of rows or all the rows in a table.
Example:
//Updates some rows in a table.
UPDATE Employee
SET City = ‘LUCKNOW’
WHERE Emp_Id BETWEEN 9 AND 15;
//Update city column for all the rows
UPDATE Employee SET City=’LUCKNOW’;
DELETE Statement: This is used to delete rows from a table,
Example:
//Following query will delete all the rows from Employee table
DELETE Employee
Emp_Id=7;
DELETE Employee
ORDER BY Clause: This clause is used to, sort the result of a query in a specific
order (ascending or descending), by default sorting order is ascending.
SELECT Emp_Id, Emp_Name, City FROM Employee
WHERE City = ‘LUCKNOW’
ORDER BY Emp_Id DESC;
GROUP BY Clause: It is used to divide the result set into groups. Grouping can
be done by a column name or by the results of computed columns when using
numeric data types.
The HAVING clause can be used to set conditions for the GROUPBY
clause.
HAVING clause is similar to the WHERE clause, but having puts conditions
on groups.
WHERE clause places conditions on rows.
WHERE clause can’t include aggregate: function, while HAVING conditions
can do so.
Example:
SELECT Emp_Id, AVG (Salary)
FROM Employee
GROUP BY Emp_Id
HAVING AVG (Salary) > 25000;
Aggregate Functions
Joins: Joins are needed to retrieve data from two tables' related rows on the
basis of some condition which satisfies both the tables. Mandatory condition to
join is that atleast one set of column (s) should be taking values from same
domain in each table.
Inner Join: Inner join is the most common join operation used in applications and
can be regarded as the default join-type. Inner join creates a new result table by
combining column values of two tables (A and B) based upon the join-predicate.
These may be further divided into three parts.
Outer Join: An outer join does not require each record in the two joined tables to
have a matching record. The joined table retains each record-even if no other
matching record exists.
Considers also the rows from table (s) even if they don't satisfy the joining
condition
(i) Right outer join (ii) Left outer join (iii) Full outer join
Left Outer Join: The result of a left outer join for table A and B always contains
all records of the left table (A), even if the join condition does not find any
matching record in the right table (B).
Full Outer Join: A full outer join combines the effect of applying both left and
right outer joins where records in the FULL OUTER JOIN table do not match, the
result set will have NULL values for every column of the table that lacks a
matching row for those records that do match, as single row will be produced in
the result set.
Cross Join (Cartesian product): Cross join returns the Cartesian product of rows
form tables in the join. It will produce rows which combine each row from the first
table with each row from the second table.
Select * FROM T1, T2
Number of rows in result set = (Number of rows in table 1 × Number of rows in
table 2)
Result set of T1 and T2 (Using previous tables T1 and T2)
StructureStorage:
The storage structure can be divided into two categories:
Volatile storage: As the name suggests, a volatile storage cannot survive
system crashes. Volatile storage devices are placed very close to the CPU;
normally they are embedded onto the chipset itself. For example, main memory
and cache memory are examples of volatile storage. They are fast but can store
only a small amount of information.
Non-volatile storage: These memories are made to survive system crashes.
They are huge in data storage capacity, but slower in accessibility. Examples
may include hard-disks, magnetic tapes, flash memory, and non-volatile (battery
backed up) RAM.
File Organisation:
The database is stored as a collection of files. Each file is a sequence of records.
A record is a sequence of fields. Data is usually stored in the form of records.
Records usually describe entities and their attributes. e.g., an employee record
represents an employee entity and each field value in the record specifies some
attributes of that employee, such as Name, Birth-date, Salary or Supervisor.
Allocating File Blocks on Disk: There are several standard techniques for
allocating the blocks of a file on disk
Index files are typically much smaller than the original file because only the
values for search key and pointer are stored. The most prevalent types of indexes
are based on ordered files (single-level indexes) and tree data structures
(multilevel indexes).
Types of Single Level Ordered Indexes: In an ordered index file, index enteries
are stored sorted by the search key value. There are several types of ordered
Indexes
Primary Index: A primary index is an ordered file whose records are of fixed
length with two fields. The first field is of the same data type as the ordering key
field called the primary key of the data file and the second field is a pointer to a
disk block (a block address).
There is one index entry in the index file for each block in the data file.
Indexes can also be characterised as dense or sparse.
Dense index A dense index has an index entry for every search key value
in the data file.
Sparse index A sparse index (non-dense), on the other hand has index
entries for only some of the search values.
A primary index is a non-dense (sparse) index, since it includes an entry for
each disk block of the data file rather than for every search value.
Clustering Index: If file records are physically ordered on a non-key field which
does not have a distinct value for each record that field is called the clustering
field. We can create a different type of index, called a clustering index, to speed
up retrieval of records that have the same value for the clustering field.
A clustering index is also an ordered file with two fields. The first field is of
the same type as the clustering field of the data file.
The record field in the clustering index is a block pointer.
A clustering index is another example of a non-dense index.
When data volume is large and does not fit in memory, an extension of the
binary search tree to disk based environment is the B-tree.
In fact, since the B-tree is always balanced (all leaf nodes appear at the
same level), it is an extension of the balanced binary search tree.
The problem which the B-tree aims to solve is given a large collection of
objects, each having a key and a value, design a disk based index structure
which efficiently supports query and update.
A B-tree of order p, when used as an access structure on a key field to
search for records in a data file, can be defined as follows
1. Each internal node in the B-tree is of the form
where, q ≤ p
Each Pi is a tree pointer to another node in the B-tree.
Each is a data pointer to the record whose search key field value is
equal to Kj.
2. Within each node, K1 < K2 < …. < Kq–1
3. Each node has at most p tree pointers.
4. Each node, except the root and leaf nodes, has atleast [(p/2)] tree
pointers.
5. A node within q tree pointers q ≤ p, has q – 1 search key field values
(and hence has q –1 data pointers).
e.g., A B-tree of order p = 3. The values were inserted in the order 8,
5, 1, 7, 3, 12, 9, 6.
B+ Trees
In a B-tree, every value of the search field appears once at some level in
the tree, along with a data pointer. In a B+-tree, data pointers are stored
only at the leaf nodes of the tree. Hence, the structure of the leaf nodes
differs from the structure of internal nodes.
The pointers in the internal nodes are tree pointers to blocks that are tree
nodes whereas the pointers in leaf nodes are data pointers.
B+ Tree's Structure: The structure of the B+-tree of order p is as follows
1. Each internal node is of the form < Pl, K1, P2, K2, …. ,Pq–1, Kq–1, Pq>
where, q ≤ P and each Pi is a tree pointer.
2. Within each internal node, K1 < K2 < K3…. < Kq–1.
3. Each internal node has at most p tree pointers and except root, has
atleast [(p/ 2)] tree pointers.
4. The root node has atleast two tree pointers, if it is an internal node.
5. Each leaf node is of the form: where, q ≤ p, each is a data pointer
and Pnext points to the next leaf node of the B+-trees.
192.168.1.1
1 is the commonly used address used as the gateway.
192.168.1.2
2 is also a commonly used address used for a gateway.
Public addresses are Class A, B, and C addresses that can be used to access
devices in other public networks, such as the Internet.
Private Addresses
Within the range of addresses for Class A, B, and C addresses are some
reserved addresses, commonly called private addresses.
Anyone can use private addresses; however, this creates a problem if you
want to access the Internet.
Remember that each device in the network must have a unique IP address.
If two networks are using the same private addresses, you would run into
reachability issues.
To access the Internet, your source IP addresses must have a unique
Internet public address. This can be accomplished through address
translation. Here is a list of private addresses.
"All our dreams can come true if we have the courage to pursue them."
Thanks!!
Morning Shift Profession Knowledge Question: