Professional Documents
Culture Documents
in association with
and
PERAGO Information Systems PLC
Addis Ababa, Ethiopia
Document Distribution
Date of Distribution: April 30, 2015
Version Recipients
Hard Copy Softcopy
E-Government Directorate,
1.0 Ministry of Communication and YES YES
IT.
2
National Data Set - Design Document
List of Abbreviations
BP Business Process
BS Business Service
CSA Central Statistical Organization Ethiopia
CSC Citizen Services Centre
e-Government Electronic Government
e-GIF Electronic Government Interoperability Framework
e-Services Electronic Services
EthERNET Ethiopian Educational and Research Network
ENDS Ethiopia National Data Set
FMIS Financial Management Information System
HRMS Human Resources Management System
G2B Government to Business
G2C Government to Citizen
G2G Government to Government
GTP Growth and Transformation Plan
IT Information Technology
ICT Information and Communication Technology
ICTAD Information and Communication Technology Assisted Development
ID Identification
ISO International Standards Organization
TOGAF The Open Group Architecture Framework
MCIT Ministry of Communication and Information Technology
MDG Millennium Development Goals
M-Services Public Services delivered on mobile devices
PKI Public Key Infrastructure
QMS Quality Management System
VOIP Voice over Internet Protocol
VSAT Very Small Aperture Terminal
3
National Data Set - Design Document
LIST OF CONTENTS
No Description Page No
Definitions 7
1.0 Introduction 11
1.1 Applicability & Use 11
2.0 Business Information Requirements 12
2.1 Business Principles and Policies 13
3.0 Information Systems Architecture 20
3.1 Application & Data Principles & Policies 20
3.2 Application Services & Data Matrix 25
3.2.1 Index of Data Commonality 25
4.0 Identifying Common Data Elements 69
5.0 Ethiopian National Dataset Design & Technology Specification 80
5.1 Ethiopian National Dataset Design Consideration 80
5.2 Ethiopian National Dataset Design Options 81
5.3 ENDS Data Hub High Level Architecture 83
5.3.1 Summary of High Level Architecture 84
5.4 ETL (Extract – Transform – Load) 90
5.4.1 Extraction 91
5.4.2 Extraction Mode (Offline / Online) 92
5.5 Decoupled Source File Integration 94
5.6 FTP Mapping 94
5.7 Recommended Extraction Methodology for ENDS 95
5.7.1 Flat File Consistency Check 95
5.7.2 Extract to Staging Server for Data Cleansing 97
5.8 Data Transformation & Loading to Data Mart 98
5.8.1 Error Logging & handling Mechanisms 99
5.8.2 Business rule Violations 99
5.8.3 Data rule Violations (Data error) 100
5.8.4 Key Lookup Scenario 100
5.8.5 Data Partitioning 100
6.0 Metadata 102
6.1 Dataset Metadata 102
6.2 Data Element Metadata Definition 104
6.3 Data source Metadata Definition 105
6.4 Organization Reference Metadata Table 105
6.5 Lookup / Reference Tables 106
7.0 Central Data Mart Architecture 109
7.1 Star Data Warehouse Schema 109
7.1.1 Key Advantages of the Star Schema for ENDS 111
7.1.2 Fact Tables in Star Schema Datamarts 113
7.1.3 Dimension Tables in Star Schema Datamarts 114
7.2 Star Schema Key Structure for ENDS Datamart 115
7.3 Types of Facts in Data Warehouse 115
7.3.1 Additive 116
7.3.2 Semi-Additive 116
4
National Data Set - Design Document
5
National Data Set - Design Document
6
National Data Set - Design Document
Definitions
The definitions of the various technical terms used in document are given here under
and expected to be consistently used in the planning, development, deployment and
management of the Data Architecture and the ENDS of the Government of Ethiopia.
Actor: A person, organizational entity or a system within the Enterprise that has a
role to play in the development or management of the Enterprise Architecture.
Architecture Building Block (ABB): Any component part of the EA system with
defined structure and functionality which together with other ABB becomes a basis of
a solution.
Solution Building Block (SBB): A component part of a solution, that together with
other SBB can provide a business solution for the Enterprise.
7
National Data Set - Design Document
Repository: A system that manages all of the data of an enterprise, including data
and process models and other enterprise information. Hence, the data in a repository
is much more extensive than that in a data dictionary, which generally defines only
the data making up a database.
Data Mart: A Data mart is the access layer of the data warehouse environment that
is used to get data out to the users. The Data mart is a subset of the data warehouse
that is usually oriented to a specific business line or team. Data marts are small
slices of the data warehouse. Whereas data warehouses have an enterprise-wide
depth, the information in Data marts pertains to a single department or domain.
ETL: The process of extracting data from source systems and bringing it into a
central aggregated data warehouse to be used for downstream reporting and
analytical purposes is commonly called ETL, which stands for Extraction-
Transformation-Loading.
8
National Data Set - Design Document
Metadata: Metadata is the control descriptors for data and processes underlying any
Data Warehouse System including ancillary components to the system like ETL,
Reporting, Data marts, Datasets etc. This includes Reports, Cubes, Tables
(Records, Segments, Entities, etc.), Columns (Fields, Attributes, Data Elements,
etc.), Keys and Indices.
FTP: The File Transfer Protocol (FTP) is a standard network protocol used to
transfer computer files from one host to another host over a TCP-based network,
such as the Internet. FTP is built on a client-server architecture and uses separate
control and data connections between the client and the server. FTP users may
authenticate themselves using a clear-text sign-in protocol, normally in the form of a
username and password, but can connect anonymously if the server is configured to
allow it. For secure transmission that protects the username and password, and
encrypts the content.
Business Rule: A Business rule is a rule that defines or constrains some aspect of
business and always resolves to either true or false. Business rules are intended to
assert business structure or to control or influence the behaviour of the business.
Business rules describe the operations, definitions and constraints that apply to an
organization. Business rules can apply to people, processes, corporate behaviour
and computing systems in an organization, and are put in place to help the
organization achieve its goals.
9
National Data Set - Design Document
Data Cleaning: Data Cleansing refers to a process by which data is treated for
aberrations and inconsistencies. e.g. Presence of numbers in only text fields,
absence of data in data mandatory fields and presence of unrelated data in any field.
Master Data: is a single source of basic business data used across multiple
systems, applications, and/or processes.
Reference data: is data that defines the set of permissible values to be used by
other data fields. Reference data gains in value when it is widely re-used and widely
referenced. Typically, it does not change overly much in terms of definition (apart
from occasional revisions). Reference data often is defined by standards
organizations (such as country codes as defined in ISO 3166-1).
10
National Data Set - Design Document
1.0 Introduction
The primary objective of the Ethiopian National Datasets Master Plan project
is to prepare a comprehensive master plan for development of the National Common
Data set for all present and potential Ministries and agencies of the Government in
order to assist in better accessibility, openness and integration of e-services and
applications across Ministries, Departments and Agencies and reduce the
dependency of applications and channels. In addition the master plan is expected to
support all business and database security requirements of the Ministries and inter-
ministerial applications and enhance interoperability within the Government of
Ethiopia enterprise.
As a part of the project: Ethiopian National Datasets Master Plan project, this
document contains the design parameters, technical specifications, management
and Governance systems for the proposed Ethiopian National Datasets ( ENDS).
Conceptually, the ENDS inter alia consists of a national Data Hub, system for data
inflow and outflow, databases, data marts and data warehouse as well as technology
for data management, transformation and presentation to meet the present and
potential business needs of the Government. The ENDS Design presented here is
comprehensive to cover all its component parts. Detailed analysis of the information
collected and presented in the Consultant’s previous report - Situational Analysis
Report (Baseline study and survey of the existing situation in the Government of
Ethiopia) was undertaken to analyse the various options and to optimize the overall
design of the ENDS and its associated systems.
The document is expected to be used as a blue print for the development and
deployment of the ENDS and as such has to be treated as a reference Technical
Document both during the planning and implementation of the ENDS. Indeed, the
document can also be a useful technical reference material during ongoing
management and operations of the ENDS.
11
National Data Set - Design Document
Point of Optimality
Management Performance
Information Availability
Fig 1: Management Performance Vs Data Availability
12
National Data Set - Design Document
For evaluating the information and data needs to support the organizational
business from strategic to operational, the present and potential business
imperatives for each component part of the Enterprise need to be considered. This
logical hierarchical approach as laid down in the TOGAF framework and followed
herein, demands that the Target Business Needs for information and data be
identified so also the gaps that currently exist between what is available now and
what the target architecture would demand.
The business principles and policies are the guiding rules and concepts that
apply to the business architecture domain and mainly relate to development and
deployment of business strategy, business processes and organization which in turn
influence the application and data architecture. Currently, the various ministries and
agencies of the Government of Ethiopia (GOE) have each enunciated certain
business principles and policies that are guiding their management operation.
However, enterprise wide business principles and policies are evident. In order to
develop uniformity and consistency across the Enterprise the following target
business principles / policies will apply. These principles and policies will directly
impact the proposed ENDS design, implementation, management and Governance.
13
National Data Set - Design Document
14
National Data Set - Design Document
15
National Data Set - Design Document
16
National Data Set - Design Document
17
National Data Set - Design Document
Business Principle 9
Principle or Attributes Attribute Description
Policy
Statement The Intellectual Property (IP) of the GOE must be
Protection of protected. This protection must be reflected in the IT
Intellectual architecture, implementation, and governance
Property processes.
18
National Data Set - Design Document
19
National Data Set - Design Document
Application and Data principles / policies are the rules and concepts for
managing the information resources of the Government of Ethiopia. These principles
would provide guidelines for development of the data domain in the Enterprise
Architecture and ENDS Governance guidelines.
20
National Data Set - Design Document
21
National Data Set - Design Document
22
National Data Set - Design Document
23
National Data Set - Design Document
24
National Data Set - Design Document
25
National Data Set - Design Document
Enterprise are assigned the lowest index of Zero. Conversing the data elements that
are generated, used and shared extensively within the enterprise have been
assigned the highest Index of commonality of 5 ( Five). In addition each Index of
Commonality is multiplied by a weightage number from 1 to 2 depending on the
business importance of the dataset. For instance a data element which is widely
shared and used across the Enterprise and is strategically important from business
point of view will have a weighted Index of Commonality of 10 ( 5x2).
26
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
1 AGH Agency for Government AGH01 Government Housing Dataset 3 1 3
Housing
2 ATA Agriculture Transformation ATA1 Agriculture supply and distribution 3 2 6
Agency dataset
3 ATA Agriculture Transformation ATA2 Agriculture Marketing Dataset 3 2 6
Agency
4 CSA Central Statistical Agency CSA1 Population Census Dataset 4 2 8
7 CSA Central Statistical Agency CSA4 Consumer and Producer Price Index 4 2 8
9 CSA Central Statistical Agency CSA6 Housing Census Data 3 1.5 4.5
27
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
12 DARO Document Authentication and DARO2 Document Repository - Company 3 2 6
Registration office Registration, Articles of Association.
28
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Exchange Authority Practioners.
21 MOEF Ministry of Environment and MOEF1 Data on land areas under Forest 3 2 6
Forests Cover, GIS data on watershed areas.
28 ERCA Ethiopian Revenue & Customs ERCA1 Registered Tax Payers and Payments 4 2 8
Authority covers individual and business
29
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Dataset
30 ERCA Ethiopian Revenue & Customs ERCA3 Bonded Store Permits, Tax 3 1 3
Authority Exemption Register and Custom
clearance certificates.
32 FEACC Federal Ethics & Anti FEACC2 Document Repository: Ethics policies 2 2 4
Corruption Commission and rules
(FEACC)
30
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Affairs office guidelines, rules and procedures.
40 MOA Ministry of Agriculture MOA2 Baselines and targets data for Key 3 2 6
Performance Indicators Natural
Resources, Livestock, Agricultural
Production, budgets, Disaster
Management
31
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
42 MOA Ministry of Agriculture MOA3 Agriculture products production, 4 2 8
market, prices, trade and
consumption data
44 MOA Ministry of Agriculture MOA5 Document Repository: Guidelines for 3 1.5 4.5
new farmer trainees, Atlas data,
Agri-land use manuals, GIS maps
45 MOCS Ministry of Civil Service MOCS1 Data on the performance of 3 1.5 4.5
government staff
46 MOCS Ministry of Civil Service MOCS 2 Data on the cases against staff and 3 2 6
appeals by civil service staff with
documentary evidence.
32
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
including appointment letters and all
recruitment related documents.
52 MOCT Ministry of Culture and MOCT2 Dataset: Local cultural habits and 3 1.5 4.5
Tourism values
54 MOCT Ministry of Culture and MOCT3 Data on cultural events, trade fairs 2 1 2
Tourism exhibitions and related programmes
33
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
55 MOCT Ministry of Culture and MOCT4 Data and Documents : Competence 2 1 2
Tourism License For Multi Regional Cultural
Institutions
57 MOE Ministry of Education MOE2 National Teacher Dataset including 3 1.5 4.5
data on qualifications and
competency certificates issued
34
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
62 MOFA Ministry of Federal Affairs MOFA1 Listing and registration and profile of 3 1.5 4.5
faith and religions institutions
63 MOFED Ministry of Finance and MOFED1 National Budget, cash flow, 4.5 2 9
Economic Development disbursement and related Data
68 MOFED Ministry of Finance and MOFED7 National Accounts, Treasury, Debt 3 1.5 4.5
35
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Economic Development and related data.
69 MOFED Ministry of Finance and MOFED8 Evaluation and Approval Of 3 1.5 4.5
Economic Development Assistance of Borrowing Projects,
Bids Documents of Consultants and
approval of procurement.
70 MOFED Ministry of Finance and MOFED9 Data and documents related to 3 1.5 4.5
Economic Development cooperation agreements
71 MOFED Ministry of Finance and MOFED9 Data on accounts of ongoing and 4.5 2 9
Economic Development completed projects
36
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
76 MOFRA Ministry of Foreign Affairs MOFRA2 Data and Documents related to 2 1 2
Passports including applications,
status and related information.
Documents and data on the 2 1 2
77 MOFRA Ministry of Foreign Affairs MOFRA3
Ethiopia’s relations with UN,
bilateral agreements and projects
and programmes with EU, USA, Asia,
Africa and other international
entities.
Data regarding developments in 2 1 2
77 MOFRA Ministry of Foreign Affairs MOFRA4
Ethiopia to be disseminated to
Ethiopians, the Diaspora and friends
of Ethiopia residing abroad
including database of Ethiopians
living abroad, persons of Ethiopian
origin and friends of Ethiopia,
participation of the Ethiopian
Diaspora in investment, tourism and
trade, as well as technology transfer
and plans and performance reports
on annual Diaspora Day celebrations
Repository of Authenticated 2 1 2
78 MOFRA Ministry of Foreign Affairs MOFRA4
Documents
37
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
National Health Dataset, including 4 2 8
79 MOH Ministry of Health MOH1
data on health indicators among
various population segments, life
expectancy, child mortality etc.
Public health preparedness
indicators.
Data on communicable and non 4 2 8
80 MOH Ministry of Health MOH2
communicable diseases including
their prevalence including data on
projects and programmes on
prevention and cure.
Data on medical services providers, 3 2 6
81 MOH Ministry of Health MOH3
coverage, projects and programmes
and medical service availability and
quality indicators.
Medical equipment and Pharma 2 2 4
82 MOH Ministry of Health MOH4
dataset including availability, prices,
quality control and related data.
38
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Industry Dataset: Data on Industrial
84 MOI Ministry of Industry MOI 1 4 2 8
Investment, value, sector and
geographical distribution. List of
investors, historical growth, installed
capacity and growth. Key
performance indicators.
Document Repository: Industry 4 2 8
85 MOI Ministry of Industry MOI2
development policy, plans and
programmes, incentives, survey
reports, industry sector profiles.
Data and List and profiles of legal
86 MOJ Ministry of Justice MOJ1
advocates: Name, Address Level of 2 2 4
Education, and work experiences
License number, date of issuance,
etc. and repository of licenses
issued.
Dataset: Cases investigated and 3 2 6
87 MOJ Ministry of Justice MOJ2
prosecuted.
Document Repository: Civil and 3 2 6
88 MOJ Ministry of Justice MOJ3
Criminal Laws, court decisions, legal
drafts.
39
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Data related to the cases between 2 1 2
89 MOJ Ministry of Justice MOJ4
Federal Government offices and
public enterprises.
Data on law suits instituted for and 3 1.5 4.5
90 MOJ Ministry of Justice MOJ5
on behalf of the Government.
Data on legal training provided and 2 1.5 3
91 MOJ Ministry of Justice MOJ6
related document repository
Data on legal advice provided to 3 1.5 4.5
92 MOJ Ministry of Justice MOJ7
other agencies and related doc.
repository
40
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Foreign employees in Ethiopia. 2 1 2
96 MOLSA Ministry of Labor & Social MOLSA4
Sector wide employment data and
Affairs Work permits issued, issue of
clearance of work permit
Data on Ethiopians employed 2 1 2
97 MOLSA Ministry of Labor & Social Affairs MOLSA5 overseas data on approval of
overseas contracts
41
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Of Area, Type Of Terrain, Mineral
Data and Mineral type and
estimated Amount.
Data on existing mining operations,
102 MOM Ministry of Mines MOM1
operator data including Name Of 3 2 6
Company, License Number and date
and duration of operations.
Data and document on licensing and
103 MOM Ministry of Mines MOM2 issue of competence certificates
including Gold Melting Competence 2 1 2
Certificate Gold Whole Selling
Competence Certificate , Petroleum
Operation License
Precious Minerals Exporting
Competence Certificate , Precious
Minerals Trading Competence
Certificate Of Precious Minerals,
Lapidary and Smithery
Support Letter For Buying Of Gold
Transfer Of Mineral Operation
License
Document repository New Methods 2 1 2
104 MOM Ministry of Mines MOM3
and Mechanisms for Mining
Operation; Project Profiles Of Mining
42
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Sector Development, policies and
plans. Performance Reports.
Data on National Math, Science 2 1 2
105 MOST Ministry of Science MOST1 Research And Innovation award
&Technology candidates awards
Data on National Science Academies, 2 1 2
106 MOST Ministry of Science & MOST2 Technology Institutions &
Technology Professionals & Science Clubs
Association,
Document Repository on S&T
107 MOST Ministry of Science & MOST3 policies, plans and programmes, 3 2 6
Technology technology transfers, technology
profiles and performance indicators
National Business Register all 4 2 8
108 MOT Ministry of Trade MOT1
business and ownership data, trade
names
Register of Trade Associations, 3 1.5 4.5
109 MOT Ministry of Trade MOT2
Traders, trader license details and
renewal
National data on commodity ( other
110 MOT Ministry of Trade MOT3 4 2 8
than coffee) production,
consumption prices and local
43
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
market
National data on coffee production,
111 MOT Ministry of Trade MOT4 4 2 8
consumption, prices, exports.
Producers, international market
operators etc.
Data on international trade. Imports 4 2 8
112 MOT Ministry of Trade MOT5
and exports. Importers and
Exporters; international trade
associations.
Document Repository: Trade
113 MOT Ministry of Trade MOT6 3 2 6
Policies, Guidance to traders and
exporters, International Trade
Agreements, Market information
and profiles.
Import and Export Cargo data, 4 1.5 6
114 MOTN Ministry of Transport MOTN1
vehicle details, Schedules.
Data on Road Safety Private Projects 2 2 4
115 MOTN Ministry of Transport MOTN2
their evaluation and support.
Data on Development Projects in 3 2 6
116 MOTN Ministry of Transport MOTN3
Transport Sector. Performance
indicators. Transport Sector
Statistics
44
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Data on the Technical And Financial 2 2 4
117 MOTN Ministry of Transport MOTN4
Support to Road Safety Fund Users
Document Repository: Policies, Plans 2 2 4
118 MOTN Ministry of Transport MOTN5
and project profiles in the
development of Transport sector
Geospatial Dataset on Urban Land 4 2 8
119 MOUD Ministry of Urban MOUD1 Size, Type, GIS Data, Current
Development, Housing and Ownership Status, Land Bank data,
Construction Land standards and modern land
data attributes
Data on urban Infrastructure supply 4 2 8
120 MOUD Ministry of Urban MOUD1 status. List And Profiles Of Urban
Development, Housing and Areas Of The Country. Standards Of
Construction Municipal Services Provision, and
Level Of Urban areas In terms of
Municipal Services.
Data on Strategic and Operational
121 MOUD Ministry of Urban MOUD1 Project Plans for Development Of 2 2 4
Development, Housing and Residential Houses including
Construction List And Profiles of Enterprises
Working In The Development Of
Residential Houses ,Project Profiles
Of Development Of Residential
45
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Houses.
46
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
and performance indicators and
related reports.
47
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
MOWIE3 Registry of water users, water
131 MOWIE Ministry of Water and Energy 4.5 2 9
professionals, consultants and
contractors
132 MOWIE Ministry of Water and Energy MOWIE4 Irrigation Projects Data includes
List, Location, Size, Capacity, 4 2 8
Potential, Beneficiaries, Profiles Of
Irrigation And Drainage
Developments, List And Profiles Of
Companies Working In The
Irrigation and drainage
development.
132 MOWIE Ministry of Water and Energy MOWIE4 National water related licensing data
including issue of water Consultancy 3 2 6
Competence Licenses, Water
Contractor Competence Licenses,
Water Professional Competence
Licenses.
48
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Net Domestic Credit, Export And
Import, Consumer Price Index
49
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
License To Banks, Insurances And
Micro Finance Institutions and New
License To Insurance Assistance.
50
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Vehicle Assemblers , Competence
License Certificate To Vehicle
Maintenance And Body Change
Organizations, Issuance Of Daily
Plate Number,
Issuance Of Equivalent Driving
License
Issuance Of Freight Transport
Associations Competence Licensing
Permit for Djibouti Round Trip,
Notifying Criteria For New Vehicle
Import Licensing, Public Transport
Operators Competence Licensing
Public Transport Associations
Competence Licensing
143 TA Transport Authority TA6 Register of Road Accidents 2 2 4
51
National Data Set - Design Document
Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
HPR2 Data on Evaluation of performance
146 HPR/HOF House of People’s
and Budget and Planning process 4.5 2 9
Representative and House of
including MDGs, National sector
Federation
goals and targets. Ministrial
Achievements and performance
reports and national budget and
financial data.
148 EFC Ethiopian Federal Courts EFC Data related to cases registered and
proceeding in Federal Courts. Court 3 2 6
Decisions
52
National Data Set - Design Document
53
National Data Set - Design Document
54
National Data Set - Design Document
55
National Data Set - Design Document
56
National Data Set - Design Document
57
National Data Set - Design Document
58
National Data Set - Design Document
59
National Data Set - Design Document
60
National Data Set - Design Document
167 Supporting letter for film Ministry of Foreign National Data. ID issued 2 1 2
makers, Tourists to go Tourism to Foreign Nationals in Ethiopia,
abroad. Application for Driving Permits to Foreigners,
Tourist VISA on Arrival. Diplomat and service visa related
Driving license equivalence data; tourist visa data.
for foreign citizens
ID card issuance for foreign
nationals
Issuing diplomats and
service visas
Issuing Birth, Death and
Marriage Certificate to
foreign nationals.
61
National Data Set - Design Document
62
National Data Set - Design Document
171 Court Cases Management Federal Courts, Court Cases Dataset includes
System Ministry of Justice, Register of cases filed in various 4 2 8
Application for a Pleading , Public Prosecutor courts, details of litigants,
Document Transfer, Case offices pleaders and lawyers, status of
Filing, cases, hearing dates etc.
Adjournment of Cases
172 Information Provision about City Addis Ababa City Data including
A.A. City Administration, Administration GIS data on city, locational data 4 2 8
Information on creating on places of interest,
awareness for protecting the entertainment, cultural and
city from different pollution heritage sites, City Administration
and natural resource information
degradation. Distributing Radio Programme data
Organizational Structure for TV Programmes Data
sub cities & center Offices City Roads data
.Distribution of City Laws. City News Data
City Entertainment, Radio City Tours Data
Programmes, TV programme, City Maps
Events and Programmes. City Pollution Data
Directives on City Road Maps, City Parks and entertainment
Application for blocking centre data
roads. City Employee Data
Information on damaged City Bye Laws
63
National Data Set - Design Document
64
National Data Set - Design Document
65
National Data Set - Design Document
66
National Data Set - Design Document
Passport and Visa related Ministry of Foreign Passport and Visa Data
179 2 2 4
services Affairs Applicant Data. Passport and
Application for PP Visas issued and rejected. Foreign
Renewal National in Ethiopia related data
Replacement
Visa Application,
Processing & Issue
ID Card issue to Foreign
Nationals
180 E-Procurement MCIT/ All agencies Dataset of Government Purchasing and Stores
Data on Government purchasing, Purchase Indents, 4.5 2 9
stores, materials. Approved vendors, Rate Contracts,
black listed vendors. Tenders and bids. RFPs and RFI
related data and documents.
67
National Data Set - Design Document
68
National Data Set - Design Document
The exhaustive study of the data / information flow within the Government of
Ethiopia enterprise organizations has revealed that there are data elements that are
agency specific and remain within the agency concerned. On the other had there are
data elements and information that must flow between agencies in parallel to the
work flow of the Enterprise or for undertaking multi-agency tasks. As described in the
previous section, it is imperative to distinguish between agency specific data
elements and common data elements. A methodology that has been described in the
previous section to assign a Data Commonality Index to various data elements of the
Enterprise indicating the extent to which the data elements have the potential to be
exchanged and shared between agencies. Additionally, the calculated Data
Commonality Index for each data element has been multiplied by a number from 1 to
2 depending on how important is the concerned data element / information from
public service or Government business perspective. The weighted average Data
Commonality Index can vary from 0 to 10. The data elements that have 0 value are
data elements which are agency specific and therefore are not by definition common
data elements. Only Data elements with Index value from 1 to 10 are common data
elements that are shared and exchanged between agencies.
69
National Data Set - Design Document
70
National Data Set - Design Document
71
National Data Set - Design Document
72
National Data Set - Design Document
73
National Data Set - Design Document
74
National Data Set - Design Document
45 National Energy Data sets Ministry of Water and Energy and Power
includes Energy
Energy resources, location (GIS),
energy demand and supply,
capacity, power station and dams
etc.
46 National Water Data includes Ministry of Water and Land and Natural
water sources, Water Point Data, Energy Resources
location, yield, captured water
data, Ground Water Points Data
consumption and projections.
47 Registry of water users, water Ministry of Water and Land and Natural
professionals, consultants and Energy Resources
contractors
48 Irrigation Projects Data includes Ministry of Water and Agriculture and
List, Location, Size, Capacity, Energy Infrastructure
Potential, Beneficiaries, Profiles
Of Irrigation And Drainage
Developments, List And Profiles
Of Companies Working In The
Irrigation and drainage
development.
49 National Monetary and Financial National Bank of Ethiopia Economic/ Financial
Data includes Inflation, and Social
Investment, Lending Rates,
75
National Data Set - Design Document
76
National Data Set - Design Document
77
National Data Set - Design Document
After having removed some duplications we are left with 64 sets of Data which
could be considered as the priority candidates for the proposed ENDS. As can be
seen in the table given above these sets have been categorized into various data
categories, each corresponding to each data domain. For instance data category of
Lands and Natural Resources contains within it all the sets of data elements that
relate to Land, Minerals, Forests, Water and Natural Resources. Similarly the data
category of Citizen and Population contains within it all the sets of data elements
related to citizens and residents, demography, population distribution etc. All in all
16 data categories have been identified. The 64 sets of data elements mentioned in
Table 6 are distributed into these 16 data categories as follows:
78
National Data Set - Design Document
79
National Data Set - Design Document
The first question that would naturally come to mind is why should Ethiopia
have a National Dataset? To some extent this question is answered in the Ethiopian
E-Government Strategy document which lays down the development of the National
Data set as a means for effectively and efficiently exchanging data between the
various ICT systems existing in the Government of Ethiopia and thereby making
interoperability possible between the individual systems existing at the agency level.
And in effect virtually integrating the individual agency and departmental level
technology systems into a single Enterprise System.
80
National Data Set - Design Document
hub that is optimally meets the needs of the Government of Ethiopia Enterprise.
Some of the important consideration to be kept in view are as follows:
The Data Hub design has to be maintainable and data transfer from
ministerial systems to the central data hub / warehouse should be easy.
Various architectural and design options are available to design and develop
the ENDS from the simplest where the data is published by the agency based
81
National Data Set - Design Document
systems into the ENDS Data Hub and is extracted by the applications at the user
end to the very complex and layered architecture where in the data is prepared and
loaded into a staging area transformed, shared, and analysed. At the same time the
data is stored in a data warehouse for aggregation, analysis and reporting. Some of
these options are briefly described below to determine the option that best meets the
situation existing in Ethiopia.
82
National Data Set - Design Document
83
National Data Set - Design Document
fragmented nature of the ICT systems with relatively low level of organizational and
technology maturity, and skill and Governance deficit. A high level architectural
design for the ENDS Data Hub is proposed that is close to the design of the
Integration Hub (Figure 5). Indeed, the proposed architecture design goes beyond
the basic model to include components and functionalities closely aligned with the
needs of the Government of Ethiopia. At the agency level Publishing end, the ETL
data extraction and loading technology will be used to move the data to a staging
Database for further transformation to Data Marts, where from it may be pulled out
to user systems for variegated uses for instance for delivery of public services ( e-
services), support to business processes of the Government for operational and
management purposes, to provide information and reports as per the needs of the
user agencies or departments of the Government. The various components parts
and operations of the proposed ENDS design is given below:
84
National Data Set - Design Document
85
National Data Set - Design Document
the next step is to draw information or intelligence from this Data Mart to cater to
various end purposes. The penultimate step in the process, this is carried out by
the Business Intelligence and Analysis layer which processes the Data Marts into
cleanly packaged information sets for final consumption.
86
National Data Set - Design Document
For Example:
Cities in Ethiopia
Towns in Ethiopia
List of PIN Codes in Ethiopia
List of Telecom Operators in Ethiopia
List of Hospitals in Ethiopia
List of Regional Transport Offices in Ethiopia
List of Medical Specializations currently held by Ethiopian doctors.
The final component of the ENDS technical architecture is providing access to this
aggregated, validated, packaged and analyzed data to consumers. Consumers for
ENDS data are Ministries, Agencies, e-Services and Citizens.
The Access layer can be broadly broken down into the following two channels:
a. Web Service / APIs
A Key Data Access channel to the Data Sets and Data Marts is provided
by a comprehensive Web Service Layer. Web Services are a software
87
National Data Set - Design Document
APIs not only allow for data access by e-Service applications but also by
mobility & web applications developed for information dissipation to the
public. For example this can power online mobile apps which can help a
citizen with:
Keying in a Vehicle Registration number and find out the owner
details instantly with all known traffic offences.
Keying in a Business Name and pulling out the entire Registration
details for the business including the Business Owner, Paid-Up
Capital, Date of Registration, Partners etc.
Instantly accessing land registration details for a place by simply
passing the geo co-ordinates (via GPS enabled smartphones) to
the central Data Mart via the provided APIs.
b. Portals
Data Access portals will be provided to act as a window to all Data Views
where Consumers can search and download various data cuts for specific
period in open data formats like Excel, CSV etc.
Portals provide the following information catalogued by Subject Area,
Source Ministry, Target Application and Users:
Data Catalogues for View and Download
Data Visualizations (Graphs/Trends etc)
88
National Data Set - Design Document
89
National Data Set - Design Document
The process of extracting data from source systems and bringing it into the data
warehouse is commonly called ETL, which stands for extraction, transformation, and
loading. It should be noted that ETL refers to a broad process, and not merely to its
three well-defined steps. The methodology and tasks of ETL have been well known for
many years, and are not necessarily unique to data warehouse environments: a wide
variety of proprietary applications and database systems as the IT backbone of any
enterprise may use ETL technologies. Data has to be shared between applications or
systems, trying to integrate them, giving at least two applications a unified picture to the
outside world. This data sharing was mostly addressed by mechanisms similar to what
we now call ETL.
During extraction, the desired data is identified and extracted from many different
sources, including database systems and applications. Very often, it is not possible to
identify the specific subset of interest, therefore more data than necessary has to be
extracted, so the identification of the relevant data will be done at a later point in time.
Depending on the source system's capabilities (for example, operating system
resources), some transformations may take place during this extraction process. The
size of the extracted data varies from hundreds of kilobytes up to gigabytes, depending
on the source system and the business situation. The same is true for the time delta
between two (logically) identical extractions: the time span may vary between
90
National Data Set - Design Document
days/hours and minutes to near real-time. Web server log files, for example, can easily
grow to hundreds of megabytes in a very short period of time.
5.4.1 Extraction
There are various types of logical data extraction methods which may be
employed in extraction of the Ministry Systems data as a part of the operations of the
ENDS:
A. Full Extraction
Full extraction is used when the data needs to be extracted and loaded for the first time.
In full extraction, the data from the source is extracted completely. This extraction
reflects the current data available in the source system.
B. Incremental Extraction
In incremental extraction, the changes in source data need to be tracked since the last
successful extraction. Only these changes in data will be extracted and then loaded.
These changes can be detected from the source data which have the last changed
timestamp. Also a change table can be created in the source system, which keeps track
of the changes in the source data. One more method to get the incremental changes is
to extract the complete source data and then do a difference (minus operation) between
the current extraction and last extraction. This approach causes a performance issue.
C. Online Extraction
In online extraction the data is extracted directly from the source system. The extraction
process connects to the source system and extracts the source data. This extraction
mechanism requires significant system integration and may necessitate the
development of custom EDI’s to access the various systems in the eco-system. From
an ENDS perspective, most constituent systems are disconnected and have been
91
National Data Set - Design Document
configured for use in independent applications. Online connectivity for these systems for
EDI may be relatively expensive for development and maintenance.
D. Offline Extraction
The data from the source system is dumped outside of the source system into a flat file.
This flat file is used to extract the data. The flat file can be created by a routine process
daily. The advantage of this method is that this is source system agnostic. Each source
data system needs to simply export data in its native export formats which is then used
for offline extraction and loading into the Staging Database and downstream Datamart
systems.
The EDI logic and cost is therefore loaded at the Server side and not at the Client
Application side. This is ideal for ENDS considering the current system eco-system at
various ministries and agencies. The most common method for transporting data is by
the transfer of flat files, using mechanisms such as FTP or other remote file system
access protocols. Data is unloaded or exported from the source system into flat files
and is then transported to the target platform using FTP or similar mechanisms.
Because source systems and data warehouses often use different operating systems
and database systems, using flat files is often the simplest way to exchange data
between heterogeneous systems with minimal transformations. However, even when
transporting data between homogeneous systems, flat files are often the most efficient
and most easy-to-manage mechanism for data transfer. Therefore, offline extraction of
Data may be a more suitable option for ENDS.
92
National Data Set - Design Document
Extraction mechanism requires significant system integration and may necessitate the
development of custom EDI’s to access the various systems in the eco-system.
The most common method for transporting data is by the transfer of flat files,
using mechanisms such as FTP or other remote file system access protocols. Data is
unloaded or exported from the source system into flat files and is then transported to the
target platform using FTP or similar mechanisms. On account of the fact that the
source systems and data warehouses often use different operating systems and
database systems, using flat files is often the simplest way to exchange data between
heterogeneous systems with minimal transformations. However, even when
transporting data between homogeneous systems, flat files are often the most efficient
and most easy-to-manage mechanism for data transfer.
93
National Data Set - Design Document
The Architecture for ENDS will be based on a Decoupled Source File Ingestion
methodology. This means that all input Source Files for relevant Datasets will be
extracted locally in each ministry and then transferred into a common FTP destination
for further downstream processing.
94
National Data Set - Design Document
Figure 7: Proposed Offline Decoupled FTP File Based Extraction Schematic for ENDS
95
National Data Set - Design Document
96
National Data Set - Design Document
97
National Data Set - Design Document
98
National Data Set - Design Document
Having data that is not clean is very common when loading and transforming
data, especially when dealing with data coming from a variety of sources, including
external ones. If this dirty data causes you to abort a long-running load or
transformation operation, a lot of time and resources will be wasted.
Data that is logically not clean violates business rules that are known prior to any
data consumption. Most of the time, handling these kind of errors will be incorporated
into the loading or transformation process. However, in situations where the error
identification for all records would become too expensive and the business rule can be
enforced as a data rule violation, for example, testing hundreds of columns to see if they
are NOTNULL, programmers often choose to handle even known possible logical error
cases more generically. Incorporating logical rules can be as easy as applying filter
conditions on the data input stream or as complex as feeding the dirty data into a
different transformation workflow. Some examples are as follows:
Filtering of logical data errors using SQL. Data that does not adhere to certain
conditions will be filtered out prior to being processed.
Identifying and separating logical data errors. In simple cases, this can be
accomplished using SQL.
99
National Data Set - Design Document
Unlike logical errors, data rule violations are not usually anticipated by the load or
transformation process. Such unexpected data rule violations (also known as data
errors) that are not handled from an operation cause the operation to fail. Data rule
violations are error conditions that happen inside the database and cause a statement
to fail. Examples of this are data type conversion errors or constraint violations.
A typical transformation is the key lookup. For example, suppose that sales
transaction data has been loaded into a retail data warehouse. Although the data
warehouse's sales table contains a product_id column, the sales transaction data
extracted from the source system contains Uniform Price Codes (UPC) instead of
product IDs. Therefore, it is necessary to transform the UPC codes into product IDs
before the new sales transaction data can be inserted into the sales table.In order to
execute this transformation, a lookup table must relate the product_id values to the UPC
codes. This table might be the product dimension table, or perhaps another table in the
data warehouse that has been created specifically to support this transformation. For
this example, we assume that there is a table named product, which has a product_id
and an upc_code column.
100
National Data Set - Design Document
101
National Data Set - Design Document
6.0 Metadata
Metadata has been identified as a key success factor in data warehouse
projects. It captures all kinds of information necessary to extract, transform and load
data from source systems into the data warehouse, and afterwards to use and interpret
the data warehouse contents. Metadata can be broadly categorized into three
categories:
102
National Data Set - Design Document
Dataset Availability
Date from which data for this Dataset is available.
StartDate
103
National Data Set - Design Document
The Data Element Metadata describes the nomenclature, data type and various
descriptors for constituent Data Elements within a Dataset. These rules are used for
Machine Reading and serving of Datasets.
Data Identifier Code A code that uniquely tracks this Data Element.
Size of Field
Field Size
E.g. 2/4/6/10/30
XXXXXX
AA/AAA/AAAA
AAANNNNA
Layout
Any
NNN.NNN
CCYYMMDD – Date in ISO 8601 format to day level
Unit of Measure NA
Mandatory Yes/No
104
National Data Set - Design Document
Organization Unit
Code of the Primary Owner Organization of Dataset
Code *Optional
Organization Unit
Name of the Primary Owner Organization of Dataset
Name*Optional
105
National Data Set - Design Document
Administrative Status
A code that uniquely identifies the data element. If the data
Table Reference ID element is used in more than one collection, it should retain its
Reference ID wherever it appears
A version number for each data element. A new version number is
allocated to a data element/concept when changes have been
made to one or more of the following attributes of the definition:
Version Name / Definition / Data domain, eg, adding a new value to the
field.
Elements with frequently updated code tables, such as the Facility
code table, will not be assigned a new version for changes to data
domain.
Version Date
106
National Data Set - Design Document
Element Type DERIVED DATA ELEMENT - A data element whose values are
derived by calculation from the values of other data elements.
COMPOSITE DATA ELEMENT- A data element whose values
represent a grouping of the values of other data elements in a
specified order.
Definition A statement that expresses the essential nature of a data element
and its differentiation from all other data elements.
A designation or description of the application environment or
Context (Optional) discipline in which a name is applied or from which it originates.
This attribute may also include the justification for collecting the
items and uses of the information.
Data Type The type of field in which a data element is held. For example,
character, integer, or numeric.
The maximum number of storage units (of the corresponding data
Field Size type) to represent the data element value. Field size does not
generally include characters used to mark logical separations of
values, E.g. commas, hyphens or slashes.
The representational layout of characters in data element values
expressed by a character string representation. For example:
Layout - ‘CCYYMMDD’ for calendar date - ‘N’ for a one-digit numeric field
- ‘A’ for a one-character field
- ‘X’ for a field that can hold either a character or a digit, and - ‘$$$,
$$$, $$$’ for data elements about expenditure.
The permissible values for the data element. The set of values can
Data Domain
be listed or specified by referring to a code table or code tables.
107
National Data Set - Design Document
108
National Data Set - Design Document
After detailed evaluation of the Ethiopian Ministerial IT & Data maturity, the best
architectural fit to cater to these requirements has been identified to be the Star
Schema.
109
National Data Set - Design Document
number of much smaller dimension tables (OR lookup tables), each of which contains
information about the entries for a particular attribute in the fact table.
A star query is a join between a fact table and a number of dimension tables.
Each dimension table is joined to the fact table using a primary key to foreign key join,
but the dimension tables are not joined to each other. The cost-based optimizer
recognizes star queries and generates efficient execution plans for them. A typical fact
table contains keys and measures. For example, in the sample schema, the fact table,
sales, contain the measures quantity_sold, amount, and average, and the keys
time_key, item-key, branch_key, and location_key. The dimension tables are time,
branch, item and location. A star join is a primary key to foreign key join of the
dimension tables to a fact table.
Fig 10: An Example of typical Star Schema DataMart for Sales Data
110
National Data Set - Design Document
111
National Data Set - Design Document
You can browse a single dimension table in order to select attribute values to
construct an efficient query.
The Star model on the other hand has lesser joins between dimension tables
and the facts table. In this model if you need information on the advertiser you
will just have to join Advertiser dimension table with fact table.
In a star schema database design, the dimensions are linked only through the
central fact table. When two dimension tables are used in a query, only one join
path, intersecting the fact table, exists between those two tables. This design
feature enforces accurate and consistent query results.
Built-in referential integrity - A star schema has referential integrity built in when
data is loaded. Referential integrity is enforced because each record in a
dimension table has a unique primary key, and all keys in the fact tables are
legitimate foreign keys drawn from the dimension tables. A record in the fact
112
National Data Set - Design Document
table that is not related correctly to a dimension cannot be given the correct key
value to be retrieved.
In the star schema diagram example shown earlier in this chapter, the
measurements in the fact table are daily totals of sales in dollars, sales in units, and
cost in dollars of each product sold. The level of detail of a single record in a fact table is
called the granularity of the fact table. In this diagram, the granularity is daily item totals.
Each record in the fact table represents the total sales of a specific product in a retail
store on one day. Each new combination of product, store, or day generates a different
record in the fact table.
The most useful facts are numeric, continuously valued, and additive. A
continuously valued fact is a numeric measurement that varies every time it is
measured. A fact is additive if it makes sense to add the measurement across the
dimensions. Most queries against a fact table access thousands or hundreds of
thousands of records to construct a result set of relatively few rows. It is helpful if these
records are compressed into the result set by adding them or performing other
113
National Data Set - Design Document
mathematical operations. Fact tables in a Data-Mart are populated with data extracted
from an OLTP system or a data warehouse. A snapshot of the source data is regularly
extracted and moved to the Data-Mart, usually at the same time every hour, every day,
every week, or every month.
The shape of dimension tables is typically wide and short because they contain
few records and many columns. The columns of a dimension table are also called
attributes of the dimension table. Each dimension table in a star schema database has
a single-part primary key joined to the fact table.
Most star schemas include a time dimension. A time dimension table makes it
possible to analyze historic data without using complex SQL calculations. For example,
data can be analysed by workdays as opposed to holidays, by weekdays as opposed to
weekends, by fiscal periods, or by special events. If the granularity of the fact table is
daily sales, each record in the time dimension table represents a day.
114
National Data Set - Design Document
Each row in a fact table must contain a primary key value from each dimension
table. This rule is called referential integrity and is an important requirement in decision-
support databases. The reference from the foreign key to the primary key is the
mechanism for verifying key values between the two tables. Referential integrity must
be maintained to ensure valid query results. The primary key of a fact table is a
combination of its foreign keys. This is called a concatenated key. The join cardinality of
dimension tables to fact tables is one-to-many, because each record in a dimension
table can describe many records in the fact table.
A star schema database uses very few joins, and each join expresses the
relationship between the elements of the underlying business. For example, in the star
schema diagram at the beginning of this chapter, the join between the product
dimension table and fact table represents the relationship between the company's
products and its sales.
115
National Data Set - Design Document
7.3.1 Additive
Additive facts are facts that can be summed up through all of the dimensions in
the fact table. A sales fact is a good example for additive fact.
7.3.2 Semi-Additive
Semi-additive facts are facts that can be summed up for some of the dimensions
in the fact table, but not the others. For instance daily balances fact can be summed up
through the customers dimension but not through the time dimension.
7.3.3 Non-Additive
Non-additive facts are facts that cannot be summed up for any of the dimensions
present in the fact table. For example: Facts which have percentages, ratios calculated.
116
National Data Set - Design Document
Transactional and Summary Dataset views. Transactional Data Views are meant to be
consumed by e-Service systems and other Inter-Ministerial Data sharing application.
These views are granular to a transaction. The same Data-Mart can be viewed as
aggregation Summary data which is relevant from an (Open) Dataset perspective.
117
National Data Set - Design Document
118
National Data Set - Design Document
Transactional Data derived and aggregated from various Land Use Mapping
Type
systems
119
National Data Set - Design Document
Type Transactional
5.3.2 Total Current (Monthly) Connect Base by Technology, Channel & Region
120
National Data Set - Design Document
Type Transactional
121
National Data Set - Design Document
Type Transactional
122
National Data Set - Design Document
Type Transactional
123
National Data Set - Design Document
Type Transactional
124
National Data Set - Design Document
125
National Data Set - Design Document
The core Data Access Layer for ENDS for data sharing between ministries and
for supporting e-services should be defined around a service-oriented architecture
(SOA).SOA is an architectural design pattern in which application components provide
services to other components via a communications protocol, typically over a network.
The principles of service-orientation are independent of any vendor, product or
technology.
1. The metadata should be provided in a form that software systems can use to configure
dynamically by discovery and incorporation of defined services, and also to maintain
coherence and integrity. For example, metadata could be used by other applications,
126
National Data Set - Design Document
like a catalogue, to perform auto discovery of services without modifying the functional
contract of a service.
2. The metadata should be provided in a form that system designers can understand and
manage with a reasonable expenditure of cost and effort.
The purpose of SOA is to allow users to combine together fairly large chunks of
functionality to form ad hoc applications built almost entirely from existing software
services. The larger the chunks, the fewer the interfaces required to implement any
given set of functionality; however, very large chunks of functionality may not prove
sufficiently granular for easy reuse. Each interface brings with it some amount of
processing overhead, so there is a performance consideration in choosing the
granularity of services.SOA as an architecture relies on service-orientation as its
fundamental design principle. If a service presents a simple interface that abstracts
away its underlying complexity, then users can access independent services without
knowledge of the service's platform implementation.
1. Consumer Interface Layer – These are GUI for end users or apps accessing apps/service
interfaces.
2. Business Process Layer – These are choreographed services representing business use-
cases in terms of applications.
3. Services – Services are consolidated together for whole-enterprise in-service inventory.
4. Service Components – The components used to build the services, such as functional and
technical libraries, technological interfaces etc.
5. Operational Systems – This layer contains the data models, enterprise data repository,
technological platforms etc.
127
National Data Set - Design Document
There are four cross-cutting vertical layers, each of which are applied to and
supported by each of the following horizontal layers:
1. Integration Layer – starts with platform integration (protocols support), data integration,
service integration, application integration, leading to enterprise application integration
supporting B2B and B2C.
2. Quality of Service – Security, availability, performance etc. constitute the quality of service
parameters which are configured based on required SLAs, OLAs.
3. Informational – provide business information.
4. Governance – IT strategy is governed to each horizontal layer to achieve required
operating and capability model.
128
National Data Set - Design Document
129
National Data Set - Design Document
Among a number of choices available for implementing Web Service protocols, the
two main candidates are SOAP and REST.
130
National Data Set - Design Document
when using a public Web service that’s freely available to everyone, there is not
much need for WS-Security.
The XML used to make requests and receive responses in SOAP can
become extremely complex. In some programming languages, requests need to be
built manually, which becomes problematic because SOAP is intolerant of errors.
However, other languages can use shortcuts that SOAP provides; that can help
reduce the effort required to create the request and to parse the response.
One of the most important SOAP features is built-in error handling. If there’s a
problem with a request, the response contains error information that can be used to
fix the problem. The error reporting even provides standardized codes so that it’s
possible to automate some error handling tasks in your code.
131
National Data Set - Design Document
Unlike SOAP, REST doesn’t have to use XML to provide the response.
REST-based Web services can output the data in Command Separated Value
(CSV), JavaScript Object Notation (JSON) and Really Simple Syndication (RSS). So
it is straightforward to obtain the output in a form that’s easy to parse within the
language that is needed for an application.
While SOAP is a heavyweight choice for Web service access and it does
provide enterprise grade features like good support for distributed enterprise
environments, Built-in error handling, REST is the contemporary standard for Web
Service development for the web for the following reasons:
1. REST is easier to use for the most part and is more flexible
2. No expensive tools are required to interact with the Web service.
3. REST boats of a smaller learning curve – which means easier
maintainability and support.
132
National Data Set - Design Document
4. Efficient (SOAP uses XML for all messages, REST can use smaller
message formats)
5. Fast (no extensive processing required) - component interactions can be the
dominant factor in user-perceived performance and network efficiency.
6. Scalability – REST can support a large number of components and
interactions among components
The ENDS project will be fairly standardized, not prone to ad-hoc use, with
clear use cases and a point-to-point access use case (Consumer to Data Mart).For
these reasons, it is recommended that REST be used as the protocol of choice for
the ENDS implementation.
133
National Data Set - Design Document
134
National Data Set - Design Document
The Ethiopian National Dataset project will cover three distinct types of Datasets:
1. Primary Data (For example. Population Census, Education Census,
Economic Survey, etc.)
2. Processed/Value Added Data e.g. Budget, Planning, etc.
3. Data Generated through delivery of Government Services e.g. Income
Tax Collection
The data which will contribute to the Ethiopian National Dataset Platform have
to be in the specified open data format only. The data will have to be internally
processed to ensure that the quality standard is met i.e. accuracy, free from any sort
of legal issues, privacy of an individual is maintained and does not compromise with
the National security. While prioritizing the release of Datasets, one should try to
publish as many high value Datasets. Grouping of Related Resources
(Datasets/Apps) should be planned and are to be organized under Catalogs. Though
each department shall have its own criterion of high value and low value Datasets,
generally High value data is governed by following Principles.
1. Completeness
2. Primary
3. Timeliness
4. Ease of Physical and Electronic Access
5. Machine readability
6. Non-discrimination
7. Use of Commonly Owned Standards
8. Licensing
9. Permanence
10. Usage Costs
135
National Data Set - Design Document
1. Head the ENDP Cell, which helps in compilation, collation, conversion and
publishing catalogs/resource on the platform. The size of the cell varies from
department to department and depends on the quantum of resources to be
published.
2. Lead the open data initiative of department.
3. Nominate Data Contributors.
4. Take initiative to release as many Datasets as possible on proactive basis.
5. Identify the High Value Datasets and schedule their release on ENDP.
6. Prepare the Negative List for the Department as per the directions in ENDP.
7. Ensure that the Datasets being published are in compliance with ENDP
through a predefined workflow process.
8. Periodically monitor the release of Datasets as per predefined schedule
9. Take relevant action on the feedback/suggestion received from the citizen
for the Datasets belonging to the Ministry/Department/Organization.
10. Take action on suggestions on new Datasets made by public on OGD
Platform.
136
National Data Set - Design Document
4. Monitor and manage the Open data initiative in their respective Ministry/
Department/State and ensure quality and correctness of the data.
5. Work out an open data strategy to promote proactive dissemination of
Datasets.
6. Institutionalize the creation of Datasets as part of routine functioning.
ENDS Cell shall be headed by Data Controller, who could be assisted by number
of Data Contributors. ENDP Cell shall have professionals from data analyst,
visualization and programming domain. The policy mentions that budgetary
provisions and appropriate support for data management for each
department/organization would be necessary.
137
National Data Set - Design Document
138
National Data Set - Design Document
The above diagram (Fig.21) depicts the Server Infrastructure required to host
the ENDS system end-to-end. The minimum configuration of each of these
servers/machines is specified below:
139
National Data Set - Design Document
RAM 16 GB
RAM 16 GB
RAM 8 GB
140
National Data Set - Design Document
RAM 8 GB
RAM 8 GB
In addition to the Server infrastructure with the requisite Operating system, the
Server infrastructure will also have to host special ETL-Data Warehouse-BI software
for carrying out the various functions of the ENDS implementation as described in
this document.
There are multiple platform choices for these components as shown in Table 30
below:
141
National Data Set - Design Document
Vendors
Microsoft BI Oracle Pentaho
Component
ETL SQL Server Integration Oracle data Pentaho Kettle
Services integrator
Database SQL Server 2012 Oracle Database Any of SQL Server,
11gR2 Oracle or MySQL
Data Mart / SQL Server Analytical Oracle Data Mart
Warehouse Services Suite
Reporting SQL Server Reporting Oracle Reports Pentaho Dashboard
Services
Web Service Platform Windows Native Oracle XML Pentaho Web
Communication DB web service Services
Foundation (WCF)
Verdict Our Choice Open Source
While Pentaho is the most popular open Source alternative, the Microsoft BI
platform is the most accessible suite for Data Management with widely available
knowledge base, support and extensibility.
142
National Data Set - Design Document
For Example:
Cities in Ethiopia
Towns in Ethiopia
List of PIN Codes in Ethiopia
List of Telecom Operators in Ethiopia
List of Hospitals in Ethiopia
List of Regional Transport Offices in Ethiopia
List of Medical Specializations currently held by Ethiopian doctors.
These look up data values or controlled vocabulary are invaluable in their own
right as they provide a single point of access of key business information for third
party validation purposes. Controlled Vocabulary (CV) provide a common metadata
for reference by various Datasets and Meta Datasets to ensure Data Consistency
across systems. They also help with preparation for CMS or knowledge
management projects, since many of these require this sort of structure for easy data
aggregation. Controlled Vocabulary also ensures that key Data Elements that are
used within governmental systems is also consumed by the public thus ensuring that
data within and outside of the government is consistent to a very high degree.
143
National Data Set - Design Document
144
National Data Set - Design Document
145
National Data Set - Design Document
The Dublin Core standard includes two levels: Simple and Qualified. Simple
Dublin Core comprises fifteen elements; Qualified Dublin Core includes three
additional elements (Audience, Provenance and RightsHolder), as well as a group of
element refinements (also called qualifiers) that refine the semantics of the elements
in ways that may be useful in resource discovery. The semantics of Dublin Core
have been established by an international, cross-disciplinary group of professionals
146
National Data Set - Design Document
from librarianship, computer science, text encoding, the museum community, and
other related fields of scholarship and practice. Another way to look at Dublin Core is
as a "small language for making a particular class of statements about resources". In
this language, there are two classes of terms -- elements (nouns) and qualifiers
(adjectives) -- which can be arranged into a simple pattern of statements. The
resources themselves are the implied subjects in this language. In the diverse world
of the Internet, Dublin Core can be seen as a "metadata pidgin for digital tourists":
easily grasped, but not necessarily up to the task of expressing complex
relationships or concepts.
The Dublin Core basic element set comprises of elements that are optional
and non-repeatable. Most elements also have a limited set of qualifiers or
refinements, attributes that may be used to further refine (not extend) the meaning of
the element. The Dublin Core Metadata Initiative (DCMI) has established standard
ways to refine elements and encourage the use of encoding and vocabulary
schemes. The full set of elements and element refinements conforming to DCMI
"best practice" is available, with a formal registry available as well.
The Dublin Core element set has been kept as small and simple as possible
to allow a non-specialist to create simple descriptive records for information
resources easily and inexpensively, while providing for effective retrieval of those
resources in the networked environment.
147
National Data Set - Design Document
3. International scope
The Dublin Core Element Set was originally developed in English, but versions
are being created in many other languages, including Finnish, Norwegian, Thai,
Japanese, French, Portuguese, German, Greek, Indonesian, and Spanish.
4. Extensibility
While balancing the needs for simplicity in describing digital resources with the
need for precise retrieval, Dublin Core developers have recognized the importance
of providing a mechanism for extending the DC element set for additional resource
discovery needs. It is expected that other communities of metadata experts will
create and administer additional metadata sets, specialized to the needs of their
communities. Metadata elements from these sets could be used in conjunction with
Dublin Core metadata to meet the need for interoperability. The DCMI Usage Board
is presently working on a model for accomplishing this in the context of "application
profiles."
148
National Data Set - Design Document
The ENDS System is ultimately based central data repository which depends
on source data from a number of Ministerial Data Systems. There is a key concern
with this model. What happens if the ministerial data system is a legacy system
which has one or more of the following constraints:
The system is not actively supported by any IT vendor because of which any
specialized data interfacing tool cannot be easily built.
The system is not online and therefore cannot be part of an online data
synchronization initiative with the ENDS central Data Warehouse
In addition the Data Interface method is one of simple data file dump/export
from the source system in an as-is basis – with all data validation and management
undertaken by the (more powerful and flexible) Server-side ETLs and Data
Warehouse. The premise for this design is that any data management system is
expected to have, at the very least, a simple core data extract/export feature. Now
the native format of any such extract may widely differ from system to system.
Changing the native data extract format for these systems would need a fair amount
of local system customization which need not be available for legacy systems.
149
National Data Set - Design Document
- Data Exports to the ENDS system can be scheduled via periodic automatic
routines which could be daily, weekly or monthly depending on the type of data,
frequency of change and the nature of data consumption in ENDS.
These routines can be built into the system (most systems will support this function)
and the data file upload into the ENDS FTP can also be automated.
150
National Data Set - Design Document
151
National Data Set - Design Document
elements which are required for the specific target DataMart / DataSets in the ENDS
system. For example, for ENDS to be able to host and maintain a Datamart based
on Passport data, it is a pre-requisite to have Client systems which contain key data
on Passport Applications including new Passport Applications and outcomes, current
active Passport holder Details, changes in Passport particulars and Passport
renewals.
152