You are on page 1of 152

National Data Set - Design Document

National Data Set-Design Document


(Incorporating the ENDS Design, Technical Specifications of the
Data Hub-databases and data warehouse, operations, governance
and management)
Development of National Data Set
Master Plan
Submitted to

Ministry of Communication and Information


Technology
ALSAM Building, 4th Floor, P.O.Box 1028,
Addis Ababa, Ethiopia
By
Grail Consulting Services,
New Delhi, India

in association with

Bytech India Pvt, Ltd.


New Delhi, India

and
PERAGO Information Systems PLC
Addis Ababa, Ethiopia

March- April 2015


1
National Data Set - Design Document

Document Version Control


Version Authors / Contributors Date Parts Revised Description
1.0 Initial draft
R. Raina – Author
A. Srinivasan -Author
Sumit Panda - Author
Prashant Bahera - Contributor
Sneha Shivanna - Contributor
Michael Joseph - Contributor
Ewnetu Abera - Contributor
Beruk Berhane - Contributor

Document Distribution
Date of Distribution: April 30, 2015
Version Recipients
Hard Copy Softcopy
E-Government Directorate,
1.0 Ministry of Communication and YES YES
IT.

2
National Data Set - Design Document

List of Abbreviations
BP Business Process
BS Business Service
CSA Central Statistical Organization Ethiopia
CSC Citizen Services Centre
e-Government Electronic Government
e-GIF Electronic Government Interoperability Framework
e-Services Electronic Services
EthERNET Ethiopian Educational and Research Network
ENDS Ethiopia National Data Set
FMIS Financial Management Information System
HRMS Human Resources Management System
G2B Government to Business
G2C Government to Citizen
G2G Government to Government
GTP Growth and Transformation Plan
IT Information Technology
ICT Information and Communication Technology
ICTAD Information and Communication Technology Assisted Development
ID Identification
ISO International Standards Organization
TOGAF The Open Group Architecture Framework
MCIT Ministry of Communication and Information Technology
MDG Millennium Development Goals
M-Services Public Services delivered on mobile devices
PKI Public Key Infrastructure
QMS Quality Management System
VOIP Voice over Internet Protocol
VSAT Very Small Aperture Terminal

3
National Data Set - Design Document

LIST OF CONTENTS
No Description Page No
Definitions 7
1.0 Introduction 11
1.1 Applicability & Use 11
2.0 Business Information Requirements 12
2.1 Business Principles and Policies 13
3.0 Information Systems Architecture 20
3.1 Application & Data Principles & Policies 20
3.2 Application Services & Data Matrix 25
3.2.1 Index of Data Commonality 25
4.0 Identifying Common Data Elements 69
5.0 Ethiopian National Dataset Design & Technology Specification 80
5.1 Ethiopian National Dataset Design Consideration 80
5.2 Ethiopian National Dataset Design Options 81
5.3 ENDS Data Hub High Level Architecture 83
5.3.1 Summary of High Level Architecture 84
5.4 ETL (Extract – Transform – Load) 90
5.4.1 Extraction 91
5.4.2 Extraction Mode (Offline / Online) 92
5.5 Decoupled Source File Integration 94
5.6 FTP Mapping 94
5.7 Recommended Extraction Methodology for ENDS 95
5.7.1 Flat File Consistency Check 95
5.7.2 Extract to Staging Server for Data Cleansing 97
5.8 Data Transformation & Loading to Data Mart 98
5.8.1 Error Logging & handling Mechanisms 99
5.8.2 Business rule Violations 99
5.8.3 Data rule Violations (Data error) 100
5.8.4 Key Lookup Scenario 100
5.8.5 Data Partitioning 100
6.0 Metadata 102
6.1 Dataset Metadata 102
6.2 Data Element Metadata Definition 104
6.3 Data source Metadata Definition 105
6.4 Organization Reference Metadata Table 105
6.5 Lookup / Reference Tables 106
7.0 Central Data Mart Architecture 109
7.1 Star Data Warehouse Schema 109
7.1.1 Key Advantages of the Star Schema for ENDS 111
7.1.2 Fact Tables in Star Schema Datamarts 113
7.1.3 Dimension Tables in Star Schema Datamarts 114
7.2 Star Schema Key Structure for ENDS Datamart 115
7.3 Types of Facts in Data Warehouse 115
7.3.1 Additive 116
7.3.2 Semi-Additive 116

4
National Data Set - Design Document

7.3.3 Non-Additive 116


7.3.4 Factless Fact Table 116
7.4 Data Mart Architecture / Schema 116
7.4.1 Agricultural Export Schema 118
7.4.2 Land use Geo Mapping Datamart 119
7.4.3 ICT Connectivity Data Mart - Schema 120
7.4.4 Road Transport Vehicle Registration Datamart 121
7.4.5 National Hospital Out Patient Datamart 122
7.4.6 National Census – Citizen Datamart 123
7.4.7 National Student Assessment Datamart 124
7.5 Master Data Management 125
8.0 ENDS – Data Access with Service Oriented Architecture (SOA) 126
8.1 Web Services 130
8.1.1 SOAP – Simple Object Access Protocol 130
8.1.2 REST – Representational State Transfer 131
8.1.3 The Choice of ENDS 132
9.0 Organization and Management 134
9.1 Identification of Resources (Data Sets / Apps) and their Organization 135
9.2 Data Collector 136
9.3 ENDS Cell 136
9.4 Data Contributors 137
10.0 Servers and Software Infrastructure 139
10.1 Enterprise Technology Platform for ENDS 141
11.0 Controlled Vocabulary 143
12.0 Security & Information Technology Standards 145
12.1 Information / Data Security Standards 145
12.2 Meta Data Standards 146
13.0 Management of Legacy Systems within ENDS 149
14.0 Client Side Hardware and Software Platform 151
14.1 Client System Data Dependency 151

S.No List of Figures Page No


Fig 1 Management Performance Vs Data Availability 12
Fig 2 Simple Publish Extract Data Hub 82
Fig 3 Operational Data Store for Reporting 82
Fig 4 Master Data Management (MDM) Hub 83
Fig 5 Integration Data Hub 83
Fig 6 Master High Level Data Mart / Dataset Architecture 89
Fig 7 Proposed offline Decoupled FTP file based Extraction Schematic for 95
ENDS
Fig 8 Proposed file Consistency check Schematic for ENDS 97
Fig 9 Multistage Data Transformation 98
Fig 10 An Example of Typical Star Schema Data Mart for Sales Data 110
Fig 11 Individual Data-Mart Schematic for ENDS 117
Fig 12 Agriculture Export Schema 118
Fig 13 Land use Geo Mapping Data Mart 119

5
National Data Set - Design Document

Fig 14 ICT Connectivity Data Mart Schema 120


Fig 15 Road Transport Vehicle Registration Data Mart 121
Fig 16 National Hospital Out Patient Data Mart 122
Fig 17 National Census – Citizen Data Mart 123
Fig 18 National Student Assessment Data Mart 124
Fig 19 Service Oriented Architecture – Web Service Model 129
Fig 20 Proposed Organizational Structure of ENDS 134
Fig 21 ENDS – Server & Software Infrastructure 139
Fig 22 ENDS Controlled Vocabulary Services 144
Fig 23 ENDS- Integration with Legacy Systems 150

S.No List of Tables Page No


Table 1 Information / Data needs at various levels of Management 12
Table 2 Data to support the Operational & Business Service needs of the 27
Government of Ethiopia
Table 3 Data to support Government E-Services / M-services 53
Table 4 Data to Support common Applications for the Government of Ethiopia 67
Table 5 Distribution of Data Commonality Index 70
Table 6 High Priority Sets of Data Elements 70
Table 7 Data Categories 79
Table 8 Example of MTP Mapping 94
Table 9 Flat File Consistency Check 96
Table 10 Data Cleansing Steps 97
Table 11 Data Set Meta Data 102
Table 12 Data Element Meta Data Definition 104
Table 13 Data Source Metadata Definition 105
Table 14 Organization Reference Meta Data 105
Table 15 Look up / Reference Tables 106
Table 16 Agriculture Export Schema 118
Table 17 Geo Mapping Data Mart Land use 119
Table 18 ICT Connectivity Data Mart Schema 120
Table 19 Road Transfer Vehicle Registration Data Mart 121
Table 20 National Hospital Outpatient Data Mart 122
Table 21 National Census – Citizen Data Mart 123
Table 22 National Student Assessment Data Mart 124
Table 23 FTP Server 139
Table 24 FTP Backup Server 139
Table 25 Server Data Warehouse Server 140
Table 26 Staging Server 140
Table 27 Data Warehouse Back up / Fail over Server 140
Table 28 Web Server 141
Table 29 Fail over Web Server 141
Table 30 Enterprise Technology Platform for ENDs 142

6
National Data Set - Design Document

Definitions

The definitions of the various technical terms used in document are given here under
and expected to be consistently used in the planning, development, deployment and
management of the Data Architecture and the ENDS of the Government of Ethiopia.

Enterprise Architecture (EA): Though there is no single universal definition for


Enterprise Architecture, most researchers and practitioners would define EA as the
analysis and design of an Enterprise from its current to its future state from an
integrated strategy, business and technology perspective.

E-Government Interoperability Framework (e-GIF): The e-GIF gives a set of


policies, principles and standards that Government of Ethiopia and its institutions,
organizations and departments need to follow and comply with in order to achieve
interoperability and integration of the technical systems in the Government.

Enterprise: The Government of the Republic of Ethiopia

Organization: An organized entity within the Government of Ethiopia such as a


department, ministry or agency of the Government.

Actor: A person, organizational entity or a system within the Enterprise that has a
role to play in the development or management of the Enterprise Architecture.

Application: An IT system that supports a business function within the EA

Architecture Building Block (ABB): Any component part of the EA system with
defined structure and functionality which together with other ABB becomes a basis of
a solution.

Solution Building Block (SBB): A component part of a solution, that together with
other SBB can provide a business solution for the Enterprise.

Artifact: An architectural work product that describes an aspect of the architecture.

7
National Data Set - Design Document

Stakeholder: An individual, team, or organization (or classes thereof) with interests


in, or concerns relative to, the outcome of the different stakeholders with different
roles will have different concerns.

Repository: A system that manages all of the data of an enterprise, including data
and process models and other enterprise information. Hence, the data in a repository
is much more extensive than that in a data dictionary, which generally defines only
the data making up a database.

Data Warehouse: A data warehouse (DW or DWH), also known as an enterprise


data warehouse (EDW), is a system used for reporting and data analysis. DWs are
central repositories of integrated data from one or more disparate sources. They
store current and historical data and are used for creating trending reports for senior
management reporting such as annual and quarterly comparisons

Data Mart: A Data mart is the access layer of the data warehouse environment that
is used to get data out to the users. The Data mart is a subset of the data warehouse
that is usually oriented to a specific business line or team. Data marts are small
slices of the data warehouse. Whereas data warehouses have an enterprise-wide
depth, the information in Data marts pertains to a single department or domain.

Datasets: National Datasets define a standard set of information that is generated


from governmental data records, from any organization or system that captures the
base data. They are structured lists of individual data items, each with a clear label,
definition and set of permissible values, codes and classifications. From this,
secondary uses information is derived or compiled, which can then be used to
monitor and improve services.

ETL: The process of extracting data from source systems and bringing it into a
central aggregated data warehouse to be used for downstream reporting and
analytical purposes is commonly called ETL, which stands for Extraction-
Transformation-Loading.

8
National Data Set - Design Document

Metadata: Metadata is the control descriptors for data and processes underlying any
Data Warehouse System including ancillary components to the system like ETL,
Reporting, Data marts, Datasets etc. This includes Reports, Cubes, Tables
(Records, Segments, Entities, etc.), Columns (Fields, Attributes, Data Elements,
etc.), Keys and Indices.

Staging Database: A Staging database is an intermediate storage area used for


data processing during the extract, transform and load (ETL) process. The data
staging area sits between the data source(s) and the data target(s), which are often
data warehouses, Data marts or other data repositories.

Controlled Vocabulary Services: In the context of ENDS, controlled vocabulary is


a method to ensure that everyone is using the same word to mean the same thing.
This consistency of terms is one of the most important concepts in knowledge
management, where effort is expended to use the same word throughout a
document or organization instead of slightly different ones to refer to the same thing.
E.g. Cities in Ethiopia, Ethnicities in Ethiopia etc.

FTP: The File Transfer Protocol (FTP) is a standard network protocol used to
transfer computer files from one host to another host over a TCP-based network,
such as the Internet. FTP is built on a client-server architecture and uses separate
control and data connections between the client and the server. FTP users may
authenticate themselves using a clear-text sign-in protocol, normally in the form of a
username and password, but can connect anonymously if the server is configured to
allow it. For secure transmission that protects the username and password, and
encrypts the content.

Business Rule: A Business rule is a rule that defines or constrains some aspect of
business and always resolves to either true or false. Business rules are intended to
assert business structure or to control or influence the behaviour of the business.
Business rules describe the operations, definitions and constraints that apply to an
organization. Business rules can apply to people, processes, corporate behaviour
and computing systems in an organization, and are put in place to help the
organization achieve its goals.

9
National Data Set - Design Document

Data Cleaning: Data Cleansing refers to a process by which data is treated for
aberrations and inconsistencies. e.g. Presence of numbers in only text fields,
absence of data in data mandatory fields and presence of unrelated data in any field.

Master Data: is a single source of basic business data used across multiple
systems, applications, and/or processes.

Transaction data: are data describing an event (the change as a result of a


transaction) and is usually described with verbs. Transaction data always has a time
dimension, a numerical value and refers to one or more objects (i.e. the reference
data).

Reference data: is data that defines the set of permissible values to be used by
other data fields. Reference data gains in value when it is widely re-used and widely
referenced. Typically, it does not change overly much in terms of definition (apart
from occasional revisions). Reference data often is defined by standards
organizations (such as country codes as defined in ISO 3166-1).

10
National Data Set - Design Document

1.0 Introduction

The primary objective of the Ethiopian National Datasets Master Plan project
is to prepare a comprehensive master plan for development of the National Common
Data set for all present and potential Ministries and agencies of the Government in
order to assist in better accessibility, openness and integration of e-services and
applications across Ministries, Departments and Agencies and reduce the
dependency of applications and channels. In addition the master plan is expected to
support all business and database security requirements of the Ministries and inter-
ministerial applications and enhance interoperability within the Government of
Ethiopia enterprise.

As a part of the project: Ethiopian National Datasets Master Plan project, this
document contains the design parameters, technical specifications, management
and Governance systems for the proposed Ethiopian National Datasets ( ENDS).
Conceptually, the ENDS inter alia consists of a national Data Hub, system for data
inflow and outflow, databases, data marts and data warehouse as well as technology
for data management, transformation and presentation to meet the present and
potential business needs of the Government. The ENDS Design presented here is
comprehensive to cover all its component parts. Detailed analysis of the information
collected and presented in the Consultant’s previous report - Situational Analysis
Report (Baseline study and survey of the existing situation in the Government of
Ethiopia) was undertaken to analyse the various options and to optimize the overall
design of the ENDS and its associated systems.

1.1 Applicability and Use

The document is expected to be used as a blue print for the development and
deployment of the ENDS and as such has to be treated as a reference Technical
Document both during the planning and implementation of the ENDS. Indeed, the
document can also be a useful technical reference material during ongoing
management and operations of the ENDS.

11
National Data Set - Design Document

2.0 Business Information Requirements

The business of any Government, as that of any ongoing organizational entity,


is of two main varieties, strategic and long term decision making involving goal
setting, long term planning, resource allocation and monitoring and evaluation; and
operational management and public service delivery. These business processes of
the Government and indeed all its decision making processes are based on the
availability of right information at the right time and at the right place. International
empirical research in decision sciences and information management indicates that
the quality of management and decision making increases directly in proportion to
the availability of required information. Evidently, with better information and data
availability the uncertainty associated with decision making decreases. The empirical
research referred to above also indicates that the parameters of the information
needs vary across levels in an organization and with the nature of the management
tasks (refer Figure 1 and Table 1 below).

Point of Optimality
Management Performance

Information Availability
Fig 1: Management Performance Vs Data Availability

Table 1 Information / Data needs at various levels of Management


S. No Management Level Volume of Aggregation Frequency Currency
Information Required
Required
1 Top Management Low High Low Low
2 Middle Management Medium Medium Medium Medium
3 Operational Level High Low High High

12
National Data Set - Design Document

For evaluating the information and data needs to support the organizational
business from strategic to operational, the present and potential business
imperatives for each component part of the Enterprise need to be considered. This
logical hierarchical approach as laid down in the TOGAF framework and followed
herein, demands that the Target Business Needs for information and data be
identified so also the gaps that currently exist between what is available now and
what the target architecture would demand.

2.1 Business Principles and Policies

The business principles and policies are the guiding rules and concepts that
apply to the business architecture domain and mainly relate to development and
deployment of business strategy, business processes and organization which in turn
influence the application and data architecture. Currently, the various ministries and
agencies of the Government of Ethiopia (GOE) have each enunciated certain
business principles and policies that are guiding their management operation.
However, enterprise wide business principles and policies are evident. In order to
develop uniformity and consistency across the Enterprise the following target
business principles / policies will apply. These principles and policies will directly
impact the proposed ENDS design, implementation, management and Governance.

Business Principle / Policy 1


Principle or
Attributes Attribute Description
Policy
Statement The principles and policies of information management
Primacy of apply to all organizations within the Government of
Optimal Ethiopia Enterprise.
Information Rationale The only way to provide a consistent and measurable level
Management of quality information /data to decision-makers is if all
organizations of the Government of Ethiopia abide by this
principle.

Implications Without these principle and policies, exclusions,

13
National Data Set - Design Document

favouritism, and inconsistency would rapidly undermine


optimal management of information in the Government of
Ethiopia

Business Principle / Policy 2

Principle or Attributes Attribute Description


Policy
Statement Information management decisions are made to provide
maximum benefit to the Government of Ethiopia as a
Maximize
whole
Benefit to the
Government Rationale This principle embodies "service above self". Decisions

of Ethiopia made from GOE-wide perspective have greater long-term


value than decisions made from any particular
organizational perspective. Maximum return on investment
requires information management decisions to adhere to
GOE-wide drivers and priorities.

Implications Achieving maximum enterprise-wide benefit will require


changes in the way we plan and manage information.
Some organizations within the GOE may have to concede
their own preferences for the greater benefit of the entire
GOE. Application development priorities must be
established by the GOE for the entire enterprise.
Applications components should be shared across
organizational boundaries. Information management
initiatives should be conducted in accordance with the
GOE plan. Individual organizations should pursue
information management initiatives that conform to the
blueprints and priorities established by the GOE.

14
National Data Set - Design Document

Business Principle / Policy 3


Principle or
Attributes Attribute Description
Policy
Statement All organizations in the GOE participate in information
Information management decisions needed to accomplish short-term
Management is and long-term objectives of the GOE
Everybody's
Rationale In order to ensure information management is aligned
Business
with the business of GOE, all organizations in the GOE
must be involved in all aspects of the information
environment. The business experts from across the GOE
and the technical staff responsible for developing and
sustaining the information environment need to come
together as a team to jointly define the goals and
objectives of IT.

Implications To operate as a team, every stakeholder will need to


accept responsibility for developing the information
environment. Commitment of resources will be required
by each stakeholder to implement this principle.

Business Principle / Policy 4


Principle or
Attributes Attribute Description
Policy
Statement Development of applications used across the GOE is
Common Use preferred over the development of similar or duplicative
Applications applications which are only provided to a particular
organization within GOE

Rationale Duplicative capability is expensive and proliferates


conflicting data.

Implications Organizations within GOE will not be allowed to develop


capabilities for their own use which are similar /
duplicative of GOE-wide capabilities. In this way,
expenditures of scarce resources to develop essentially
the same capability in marginally different ways will be

15
National Data Set - Design Document

reduced. Organizations within GOE which depend on a


capability which does not serve the entire GOE must
change over to the replacement enterprise-wide
capability.

Business Principle / Policy 5


Principle or
Attributes Attribute Description
Policy
Statement GOE operations are maintained in spite of system
Business interruptions.
Continuity Rationale As E-Government system operations become more
pervasive in Ethiopia, we become more dependent on
them; therefore, we must consider the reliability of such
systems throughout their design and use. Business
premises throughout the GOE must be provided with the
capability to continue their business functions regardless
of external events. Hardware failure, natural disasters,
and data corruption should not be allowed to disrupt or
stop GOE activities. The GOE business functions must
be capable of operating on alternative information
delivery mechanisms

Implications Risks of business interruption must be established in


advance and managed. Management includes but is not
limited to periodic reviews, testing for vulnerability and
exposure, or designing mission-critical services to ensure
business function continuity through redundant or
alternative capabilities. Recoverability, redundancy, and
maintainability should be addressed at the time of
design. Applications must be assessed for criticality and
impact on the GOE mission, in order to determine what
level of continuity is required and what corresponding
recovery plan is necessary

16
National Data Set - Design Document

Business Principle / Policy 6


Principle or Attributes Attribute Description
Policy
Service Statement The architecture is based on a design of services
Orientation aligned to GOE activities comprising the
organization or inter-organization business
processes
Rationale Service orientation delivers enterprise agility,
reduction of costs and customer satisfaction.
Implications Service representation utilizes business descriptions
to provide context (i.e., business process, goal, rule,
policy, service interface, and service component)
and implements services using service
orchestration. Service orientation places unique
requirements on the infrastructure, and
implementations should use open standards to
realize interoperability and location transparency.

Business Principle / Policies 7:


Principle or Attributes Attribute Description
Policy
Compliance Statement GOE information management processes comply
with Law with all relevant laws, policies, and regulations.
Rationale GOE policy is to abide by laws, policies, and
regulations. This will not preclude business process
improvements that lead to changes in policies and
regulations.
Implications All organizations of GOE must be mindful to comply
with laws, regulations, and external policies
regarding the collection, retention, and management
of data. Changes in the law and changes in

17
National Data Set - Design Document

regulations may drive changes in our processes or


applications.

Business Principle / Policies 8

Principle or Attributes Attribute Description


Policy
Statement The IT organization in each ministry and agency of
Information the GOE is responsible for owning and
Technology implementing IT processes and infrastructure that
Responsibility enable solutions to meet user-defined requirements
for functionality, service levels, cost, and delivery
timing
Rationale Effectively align expectations with capabilities and
costs so that all projects are cost-effective. Efficient
and effective solutions have reasonable costs and
clear benefits.
Implications In each organization of GOE a process must be
created to prioritize projects. The IT function must
define processes to manage business unit
expectations. Data, application, and technology
models must be created to enable integrated quality
solutions and to maximize results

Business Principle 9
Principle or Attributes Attribute Description
Policy
Statement The Intellectual Property (IP) of the GOE must be
Protection of protected. This protection must be reflected in the IT
Intellectual architecture, implementation, and governance
Property processes.

18
National Data Set - Design Document

Rationale The major part of a GOE’s IP will be hosted in the IT


domain
Implications While protection of IP assets is everybody's
business, much of the actual protection is
implemented in the IT domain. A security policy,
governing human and IT actors, will be required that
can substantially improve protection of IP. This must
be capable of both avoiding compromises and
reducing liabilities.

19
National Data Set - Design Document

3.0 Information Systems Architecture

The Government of Ethiopia Enterprise consisting of the various Ministries,


Agencies and Authorities, Federal Courts, Regional Departments and Organizations
as well as city administrations each have over the years developed and deployed
information systems of varying degree of complexity and maturity to support their
business processes and provide public services. The existing state of the information
systems of the Government, already documented in details in the Situational
Analysis Report of the Consultant highlights the low level of maturity in the
enterprise, low level of automation, and the fragmented nature of the systems. There
is an appreciable degree of data sharing and document exchange between the
agencies and organizations of the Government. However, most of it at the present
time is undertaken in an offline environment, in batch mode. The level of automation
at the present time being about 20 percent indicates that nearly 80 percent of the
some 1100 business processes of the enterprise are currently undertaken in the
traditional manner.

Assuming a target level of 90 percent automation in the enterprise in the next


10 years, the Application and Solution Gap between what applications are currently
available within the existing Information Systems and what should exist at the
targeted level of automation is of large dimensions. For the purpose of designing and
developing the ENDS, therefore, it is logical to consider the targeted Information
Systems Architecture and its associated data needs.

3.1 Application and Data Principles and Policies

Application and Data principles / policies are the rules and concepts for
managing the information resources of the Government of Ethiopia. These principles
would provide guidelines for development of the data domain in the Enterprise
Architecture and ENDS Governance guidelines.

20
National Data Set - Design Document

Data Principle / Policy 1:


Principle or Attributes Attribute Description
Policy
Statement Data is an asset that has value to the GOE and
Data is an should be accessible and should be shared with all
Asset, it must organizations in the GOE
be shared and Rationale Data is a valuable resource for GOE it has real,
is accessible measurable value and is the basis for decision
making so it must be carefully managed to ensure
that one knows where it is, can rely upon its
accuracy, and can obtain it when and where
needed.
Implications The implication is that there is an education task to
ensure that all organizations within the GOE
understand the relationship between value of data,
sharing of data, and accessibility to data

Data Principle / Policy 2


Principle or Attributes Attribute Description
Policy
Statement Each data element within the GOE has a trustee
Data Trustee accountable for data quality.
Rationale One of the benefits of an architected environment
is the ability to share data (e.g., text, video, sound,
etc.) across the enterprise. As the degree of data
sharing grows and business units rely upon
common information, it becomes essential that
only the data trustee makes decisions about the
content of data. Since data can lose its integrity
when it is entered multiple times, the data trustee
will have sole responsibility for data entry which

21
National Data Set - Design Document

eliminates redundant human effort and data


storage resources.
Implications As a result of sharing data across the GOE, the
trustee is accountable and responsible for the
accuracy and currency of their designated data
element(s) and, subsequently, must then
recognize the importance of this trusteeship
responsibility

Data Principle / Policy 3

Principle or Attributes Attribute Description


Policy
Common Statement Data is defined consistently throughout the GOE,
Vocabulary and and the definitions are understandable and
Data Definitions available to all users.
Rationale The data that will be used in the development of
applications must have a common definition
throughout the GOE (in all its organizations) to
enable sharing of data. A common vocabulary will
facilitate communications and enable dialog to be
effective. In addition, it is required to interface
systems and exchange data.
Implications The GOE must establish the initial common
vocabulary for the business in all its organizations.
The definitions will be used uniformly throughout
the GOE. Whenever a new data definition is
required, the definition effort will be co-ordinated
and reconciled with the corporate "glossary" of
data descriptions. The GOE data administrator will
provide this co-ordination.

22
National Data Set - Design Document

Data Principle / Policy 4


Principle or Attributes Attribute Description
Policy
Data Security Statement Data is protected from unauthorized use and
disclosure at all time.
Rationale Open sharing of information and the release of
information via relevant legislation must be
balanced against the need to restrict the
availability of classified, proprietary, and sensitive
information. Existing laws and regulations require
the safeguarding of national security and the
privacy of data, while permitting free and open
access.
Implications Data security safeguards will need to be put in
place throughout the GOE to restrict access to
"view only", or "never see". Sensitivity labelling for
access to pre-decisional, decisional, classified,
sensitive, or proprietary information must be
determined. Security must be designed into data
elements from the beginning; it cannot be added
later. Systems, data, and technologies must be
protected from unauthorized access and
manipulation. Data/information must be
safeguarded against inadvertent or unauthorized
alteration, sabotage, disaster, or disclosure.

Application Principle / Policy 1

Principle or Attributes Attribute Description


Policy
Technology Statement Applications to be developed and used throughout
Independence GOE are independent of specific technology

23
National Data Set - Design Document

choices and therefore can operate on a variety of


technology platforms.
Rationale Independence of applications from the underlying
technology allows applications to be developed,
upgraded, and operated in the most cost-effective
and timely way. Otherwise technology, which is
subject to continual obsolescence and vendor
dependence, becomes the driver rather than the
user requirements themselves. The intent of this
principle is to ensure that Application Software is
not dependent on specific hardware and operating
systems software.
Implications This principle will require standards which support
portability. Subsystem interfaces will need to be
developed to enable legacy applications to
interoperate with applications and operating
environments developed under the GOE
architecture. Middleware should be used to
decouple applications from specific software
solutions.

Application Principle / Policy 2

Principle or Attributes Attribute Description


Policy
Ease-of-Use Statement Applications developed and deployed in GOE are
user friendly and easy to use. The underlying
technology is transparent to users, so they can
concentrate on tasks at hand.
Rationale Ease-of-use is a positive incentive for use of
applications. It encourages users to work within
the integrated information environment instead of

24
National Data Set - Design Document

developing isolated systems to accomplish the


task outside of the enterprise's integrated
information environment.
Implications Applications developed and deployed in the GOE
will be required to have a common "look-and-feel".
Hence, the common look-and-feel standard must
be designed and usability test criteria must be
developed. Guidelines for user interfaces should
not be constrained by narrow assumptions about
user location, language, systems training, or
physical capability. Factors such as linguistics,
customer physical infirmities (visual acuity, ability
to use keyboard/mouse), and proficiency in the
use of technology have broad ramifications in
determining the ease-of-use of an application.

3.2 Applications, Services and Data Matrix

As a part of the project, the software applications existing in the Enterprise


and the associated flow of data within each of them has been studied and mapped.
A summary of the data elements associated with the existing applications of the
Enterprise as well as existing and planned e-services is given in the Table below.
The Table also captures the data elements that would be needed to support the new
applications and services that would need to be developed and deployed to meet the
needs of the target Architecture.

3.2.1 Index of Data Commonality


To identify such data elements which are shared across the enterprise, an
index of commonality to each data element is assigned. The data elements which
are generated and used within one single agency and are neither shared with any
other organization within the enterprise nor are collected and used elsewhere in the

25
National Data Set - Design Document

Enterprise are assigned the lowest index of Zero. Conversing the data elements that
are generated, used and shared extensively within the enterprise have been
assigned the highest Index of commonality of 5 ( Five). In addition each Index of
Commonality is multiplied by a weightage number from 1 to 2 depending on the
business importance of the dataset. For instance a data element which is widely
shared and used across the Enterprise and is strategically important from business
point of view will have a weighted Index of Commonality of 10 ( 5x2).

26
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
1 AGH Agency for Government AGH01 Government Housing Dataset 3 1 3
Housing
2 ATA Agriculture Transformation ATA1 Agriculture supply and distribution 3 2 6
Agency dataset
3 ATA Agriculture Transformation ATA2 Agriculture Marketing Dataset 3 2 6
Agency
4 CSA Central Statistical Agency CSA1 Population Census Dataset 4 2 8

5 CSA Central Statistical Agency CSA2 Economy Activities Dataset 4 2 8

6 CSA Central Statistical Agency CSA3 Agriculture Census Data 3 2 6

7 CSA Central Statistical Agency CSA4 Consumer and Producer Price Index 4 2 8

8 CSA Central Statistical Agency CSA5 Household Income, Consumption 4 2 8


and Expenditure

9 CSA Central Statistical Agency CSA6 Housing Census Data 3 1.5 4.5

10 CSA Central Statistical Agency CSA7 Government Fixed Assets Register 3 1 4

11 DARO Document Authentication and DARO1 Document Repository- Agreements 3 1 3


Registration office on immovable Property and lease
rights

27
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
12 DARO Document Authentication and DARO2 Document Repository - Company 3 2 6
Registration office Registration, Articles of Association.

13 DARO Document Authentication and DARO3 Document Repository - Agreements 3 1 3


Registration office on withholding and sale of vehicles

14 DARO Document Authentication and DARO$ Document Repository - Power of 3 1 3


Registration office Attorney

15 EIAR Ethiopian Agriculture Research EIAR1 Document Repository - Agriculture 2 1 2


Institute. Research Documents.

16 ECXA Ethiopian Commodity ECXA1 Documents: Recognition Grants Of 2 1 2


Exchange Authority Clearing Institutions

17 ECXA Ethiopian Commodity ECXA2 Register and licensing of 2 1 2


Exchange Authority Independent Auditors

18 ECXA Ethiopian Commodity ECXA3 Register and licensing of Investment 2 1 2


Exchange Authority Advisors

19 ECXA Ethiopian Commodity ECXA4 Register and Recognition Grant 2 1 2


Exchange Authority Documents of Exchange Actors

20 ECXA Ethiopian Commodity ECXA5 Register and licensing of Legal 2 1 2

28
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Exchange Authority Practioners.

21 MOEF Ministry of Environment and MOEF1 Data on land areas under Forest 3 2 6
Forests Cover, GIS data on watershed areas.

22 MOEF Ministry of Environment and MOEF2 Documents: Import Permits for 2 1 2


Forests industrial Chemicals

23 MOEF Ministry of Environment and MOEF3 Documents: Certificates-Modified 2 1 2


Forests Organism Free Export of Plants

24 MOEF Ministry of Environment and MOEF3 Permits for destruction of hazardous 2 1 2


Forests waste

25 EIA Ethiopian Investment Agency EIA1 Investment Dataset- investment 3 2 6


related-sectors,, licenses issued
names, date issued, investment
value etc.

26 EIA Ethiopian Investment Agency EIA2 Document Repository: Investment 3 2 6


Licenses.

27 ERA Ethiopian Roads Authority ERA1 Ethiopian Roads Dataset 3 2 6

28 ERCA Ethiopian Revenue & Customs ERCA1 Registered Tax Payers and Payments 4 2 8
Authority covers individual and business

29
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Dataset

29 ERCA Ethiopian Revenue & Customs ERCA2 Document Repository: Taxation 4 2 8


Authority Policies, Rules, Procedures and
Templates

30 ERCA Ethiopian Revenue & Customs ERCA3 Bonded Store Permits, Tax 3 1 3
Authority Exemption Register and Custom
clearance certificates.

31 FEACC Federal Ethics & Anti FEACC1 Corruption allegations, registered 2 2 4


Corruption Commission cases and prosecution data.
(FEACC)

32 FEACC Federal Ethics & Anti FEACC2 Document Repository: Ethics policies 2 2 4
Corruption Commission and rules
(FEACC)

33 GCAO Government Communication GCAO1 Communication Event Organizer 2 1 2


Affairs office profiles, Events and Events
Management Data

34 GCAO Government Communication GCAO2 Local and Foreign Media Register 2 1 2


Affairs office and Profile Data

35 GCAO Government Communication GCAO3 Communication policies and 2 2 4

30
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Affairs office guidelines, rules and procedures.

36 GCAO Government Communication GCAO4 Intra-government communication 3 2 6


Affairs office data including news, development
results and achievements.

37 MCIT Ministry of Communication MCIT1 ICT Awareness and Promotion 3 2 6


and Information Technology related data.

38 MCIT Ministry of Communication MCIT2 ICT, E-Governance related policies, 3 2 6


and Information Technology standards and guidelines
Agriculture Land Data includes
39 MOA Ministry of Agriculture MOA1
Geographical Position Of Agricultural 4 2 8
Lands, Topography Landscape Soil type
and productivity.

40 MOA Ministry of Agriculture MOA2 Baselines and targets data for Key 3 2 6
Performance Indicators Natural
Resources, Livestock, Agricultural
Production, budgets, Disaster
Management

41 MOA Ministry of Agriculture MOA3 Agriculture Sector Investment and 4 2 8


Investor Data

31
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
42 MOA Ministry of Agriculture MOA3 Agriculture products production, 4 2 8
market, prices, trade and
consumption data

43 MOA Ministry of Agriculture MOA4 Agriculture inputs production, 4 2 8


prices, market, trade and
consumption data

44 MOA Ministry of Agriculture MOA5 Document Repository: Guidelines for 3 1.5 4.5
new farmer trainees, Atlas data,
Agri-land use manuals, GIS maps

45 MOCS Ministry of Civil Service MOCS1 Data on the performance of 3 1.5 4.5
government staff

46 MOCS Ministry of Civil Service MOCS 2 Data on the cases against staff and 3 2 6
appeals by civil service staff with
documentary evidence.

47 MOCS Ministry of Civil Service MOCS3 Document Repository of civil service 3 2 6


code, grades, salary scales, job
description, rules, regulations and
guidelines.

48 MOCS Ministry of Civil Service MOCS 4 Civil service recruitment data 4 2 8

32
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
including appointment letters and all
recruitment related documents.

49 MOCS Ministry of Civil Service MOCS5 Repository of all documents given to 3 2 6


civil service staff covering the full
HRM life cycle. ( includes
recruitment, promotion,
performance, termination,
retirement etc.)

MOCS6 Documents Of Civil Service Reform 2 1 2


50 MOCS Ministry of Civil Service
Documents Of Tools Of Institutional
Transformation
International Tourist visitors dataset 3 2 6
51 MOCT Ministry of Culture and MOCT1
Tourism

52 MOCT Ministry of Culture and MOCT2 Dataset: Local cultural habits and 3 1.5 4.5
Tourism values

53 MOCT Ministry of Culture and MOCT3 Tourist Locations Dataset 2 2 4


Tourism

54 MOCT Ministry of Culture and MOCT3 Data on cultural events, trade fairs 2 1 2
Tourism exhibitions and related programmes

33
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
55 MOCT Ministry of Culture and MOCT4 Data and Documents : Competence 2 1 2
Tourism License For Multi Regional Cultural
Institutions

56 MOE Ministry of Education MOE1 National Student Dataset including 4 2 8


enrolment and attendance and
scholarship data

57 MOE Ministry of Education MOE2 National Teacher Dataset including 3 1.5 4.5
data on qualifications and
competency certificates issued

58 MOE Ministry of Education MOE3 Data on Educational Institutions 3 2 6


data on subject curriculum, courses,
capacity, location etc.

59 MOE Ministry of Education MOE4 Data on National Examination 3 2 6


Results.

60 MOE Ministry of Education MOE5 Documents: Certificates of 3 1.5 4.5


Competence, Examination Diplomas

61 MOFA Ministry of Federal Affairs MOFA1 Regional Social and Economic 4 2 8


Development Dataset

34
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
62 MOFA Ministry of Federal Affairs MOFA1 Listing and registration and profile of 3 1.5 4.5
faith and religions institutions

63 MOFED Ministry of Finance and MOFED1 National Budget, cash flow, 4.5 2 9
Economic Development disbursement and related Data

63 MOFED Ministry of Finance and MOFED2 Civil Service Payroll Data 4 2 8


Economic Development

64 MOFED Ministry of Finance and MOFED3 National Economic and Social 4 2 8


Economic Development Development data including plans,
projects, programmes, indicators
and achievements.

65 MOFED Ministry of Finance and MOFED4 National Population Dataset 4.5 2 9


Economic Development including forecasts

MOFED5 List and Profiles of Pubic Budgetary 4.5 2 9


66 MOFED Ministry of Finance and
Organizations, Budget Requests,
Economic Development
Budget Ceilings, Budget Utilization
Performance
MOFED6 List And Profiles Of Internal And 4 2 8
67 MOFED Ministry of Finance and
External Auditors and Audit Reports
Economic Development

68 MOFED Ministry of Finance and MOFED7 National Accounts, Treasury, Debt 3 1.5 4.5

35
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Economic Development and related data.

69 MOFED Ministry of Finance and MOFED8 Evaluation and Approval Of 3 1.5 4.5
Economic Development Assistance of Borrowing Projects,
Bids Documents of Consultants and
approval of procurement.

70 MOFED Ministry of Finance and MOFED9 Data and documents related to 3 1.5 4.5
Economic Development cooperation agreements

71 MOFED Ministry of Finance and MOFED9 Data on accounts of ongoing and 4.5 2 9
Economic Development completed projects

72 MOFED1 Data on Federal Budget Assistance


MOFED Ministry of Finance and
0 to Regions and Region to Woreda 4.5 2 9
Economic Development
(cascaded Budget)

MOFED1 Registration and Analysis of Bank


73 MOFED Ministry of Finance and
1 Statements and Budget Ceiling For 4.5 2 9
Economic Development
Federal Payments
74 MOFED Ministry of Finance and MOFED1 Transferring Sample Signature Of 3 2 6
Economic Development 2 Government Officials To
International Financial Institutions

75 MOFRA Ministry of Foreign Affairs MOFRA1 Ethiopian Diaspora Dataset 3 2 6

36
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
76 MOFRA Ministry of Foreign Affairs MOFRA2 Data and Documents related to 2 1 2
Passports including applications,
status and related information.
Documents and data on the 2 1 2
77 MOFRA Ministry of Foreign Affairs MOFRA3
Ethiopia’s relations with UN,
bilateral agreements and projects
and programmes with EU, USA, Asia,
Africa and other international
entities.
Data regarding developments in 2 1 2
77 MOFRA Ministry of Foreign Affairs MOFRA4
Ethiopia to be disseminated to
Ethiopians, the Diaspora and friends
of Ethiopia residing abroad
including database of Ethiopians
living abroad, persons of Ethiopian
origin and friends of Ethiopia,
participation of the Ethiopian
Diaspora in investment, tourism and
trade, as well as technology transfer
and plans and performance reports
on annual Diaspora Day celebrations
Repository of Authenticated 2 1 2
78 MOFRA Ministry of Foreign Affairs MOFRA4
Documents

37
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
National Health Dataset, including 4 2 8
79 MOH Ministry of Health MOH1
data on health indicators among
various population segments, life
expectancy, child mortality etc.
Public health preparedness
indicators.
Data on communicable and non 4 2 8
80 MOH Ministry of Health MOH2
communicable diseases including
their prevalence including data on
projects and programmes on
prevention and cure.
Data on medical services providers, 3 2 6
81 MOH Ministry of Health MOH3
coverage, projects and programmes
and medical service availability and
quality indicators.
Medical equipment and Pharma 2 2 4
82 MOH Ministry of Health MOH4
dataset including availability, prices,
quality control and related data.

Health Insurance and Public Health 3 2 6


83 MOH Ministry of Health MOH5
funding data, including community
health insurance availability.

38
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Industry Dataset: Data on Industrial
84 MOI Ministry of Industry MOI 1 4 2 8
Investment, value, sector and
geographical distribution. List of
investors, historical growth, installed
capacity and growth. Key
performance indicators.
Document Repository: Industry 4 2 8
85 MOI Ministry of Industry MOI2
development policy, plans and
programmes, incentives, survey
reports, industry sector profiles.
Data and List and profiles of legal
86 MOJ Ministry of Justice MOJ1
advocates: Name, Address Level of 2 2 4
Education, and work experiences
License number, date of issuance,
etc. and repository of licenses
issued.
Dataset: Cases investigated and 3 2 6
87 MOJ Ministry of Justice MOJ2
prosecuted.
Document Repository: Civil and 3 2 6
88 MOJ Ministry of Justice MOJ3
Criminal Laws, court decisions, legal
drafts.

39
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Data related to the cases between 2 1 2
89 MOJ Ministry of Justice MOJ4
Federal Government offices and
public enterprises.
Data on law suits instituted for and 3 1.5 4.5
90 MOJ Ministry of Justice MOJ5
on behalf of the Government.
Data on legal training provided and 2 1.5 3
91 MOJ Ministry of Justice MOJ6
related document repository
Data on legal advice provided to 3 1.5 4.5
92 MOJ Ministry of Justice MOJ7
other agencies and related doc.
repository

Ministry of Labor & Social MOLSA1 National Labour Employment data


93 MOLSA 4 2 8
Affairs
Work Place Incidents Data From 2 2 4
94 MOLSA Ministry of Labor & Social MOLSA2
Various Public Organizations
Affairs including Type Of Incident, Place Of
Incident, Cause Of Incident, Damage
Type And Amount and measures
taken
Data on registered job seekers and 3 2 6
95 MOLSA Ministry of Labor & Social MOLSA3
vacancies, recruitment, terms and
Affairs conditions of employment.

40
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Foreign employees in Ethiopia. 2 1 2
96 MOLSA Ministry of Labor & Social MOLSA4
Sector wide employment data and
Affairs Work permits issued, issue of
clearance of work permit
Data on Ethiopians employed 2 1 2
97 MOLSA Ministry of Labor & Social Affairs MOLSA5 overseas data on approval of
overseas contracts

Data on registered collective labour 2 1 2


98 MOLSA Ministry of Labor & Social MOLSA6
Affairs agreements and related documents.

Data and documents on registration


99 MOLSA Ministry of Labor & Social MOLSA6 and licensing of private employment 2 1 2
Affairs agencies.

99 MOLSA Ministry of Labor & Social Data on registration and 2 1 2


MOLSA7
Affairs accreditation of associations

Document repository, labour laws


100 MOLSA Ministry of Labor & Social MOLSA8 and best practices guidelines, 2 2 4
Affairs worker safety etc.
Geospatial dataset on existing mines
101 MOM Ministry of Mines MOM1 and mineral resources. Which 3 2 6
among others includes Spatial
Identification Of Mining Area, Size

41
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Of Area, Type Of Terrain, Mineral
Data and Mineral type and
estimated Amount.
Data on existing mining operations,
102 MOM Ministry of Mines MOM1
operator data including Name Of 3 2 6
Company, License Number and date
and duration of operations.
Data and document on licensing and
103 MOM Ministry of Mines MOM2 issue of competence certificates
including Gold Melting Competence 2 1 2
Certificate Gold Whole Selling
Competence Certificate , Petroleum
Operation License
Precious Minerals Exporting
Competence Certificate , Precious
Minerals Trading Competence
Certificate Of Precious Minerals,
Lapidary and Smithery
Support Letter For Buying Of Gold
Transfer Of Mineral Operation
License
Document repository New Methods 2 1 2
104 MOM Ministry of Mines MOM3
and Mechanisms for Mining
Operation; Project Profiles Of Mining

42
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Sector Development, policies and
plans. Performance Reports.
Data on National Math, Science 2 1 2
105 MOST Ministry of Science MOST1 Research And Innovation award
&Technology candidates awards
Data on National Science Academies, 2 1 2
106 MOST Ministry of Science & MOST2 Technology Institutions &
Technology Professionals & Science Clubs
Association,
Document Repository on S&T
107 MOST Ministry of Science & MOST3 policies, plans and programmes, 3 2 6
Technology technology transfers, technology
profiles and performance indicators
National Business Register all 4 2 8
108 MOT Ministry of Trade MOT1
business and ownership data, trade
names
Register of Trade Associations, 3 1.5 4.5
109 MOT Ministry of Trade MOT2
Traders, trader license details and
renewal
National data on commodity ( other
110 MOT Ministry of Trade MOT3 4 2 8
than coffee) production,
consumption prices and local

43
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
market
National data on coffee production,
111 MOT Ministry of Trade MOT4 4 2 8
consumption, prices, exports.
Producers, international market
operators etc.
Data on international trade. Imports 4 2 8
112 MOT Ministry of Trade MOT5
and exports. Importers and
Exporters; international trade
associations.
Document Repository: Trade
113 MOT Ministry of Trade MOT6 3 2 6
Policies, Guidance to traders and
exporters, International Trade
Agreements, Market information
and profiles.
Import and Export Cargo data, 4 1.5 6
114 MOTN Ministry of Transport MOTN1
vehicle details, Schedules.
Data on Road Safety Private Projects 2 2 4
115 MOTN Ministry of Transport MOTN2
their evaluation and support.
Data on Development Projects in 3 2 6
116 MOTN Ministry of Transport MOTN3
Transport Sector. Performance
indicators. Transport Sector
Statistics

44
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Data on the Technical And Financial 2 2 4
117 MOTN Ministry of Transport MOTN4
Support to Road Safety Fund Users
Document Repository: Policies, Plans 2 2 4
118 MOTN Ministry of Transport MOTN5
and project profiles in the
development of Transport sector
Geospatial Dataset on Urban Land 4 2 8
119 MOUD Ministry of Urban MOUD1 Size, Type, GIS Data, Current
Development, Housing and Ownership Status, Land Bank data,
Construction Land standards and modern land
data attributes
Data on urban Infrastructure supply 4 2 8
120 MOUD Ministry of Urban MOUD1 status. List And Profiles Of Urban
Development, Housing and Areas Of The Country. Standards Of
Construction Municipal Services Provision, and
Level Of Urban areas In terms of
Municipal Services.
Data on Strategic and Operational
121 MOUD Ministry of Urban MOUD1 Project Plans for Development Of 2 2 4
Development, Housing and Residential Houses including
Construction List And Profiles of Enterprises
Working In The Development Of
Residential Houses ,Project Profiles
Of Development Of Residential

45
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Houses.

MOUD1 Data on Construction Machinery


122 MOUD Ministry of Urban 2 1 2
including Issuance Of Import Permits
Development, Housing and
for Construction Machineries
Construction
Registration Of Construction
Machineries, Issuance Of
Construction Machineries Debt
Seizure
Transfer Of Ownership Of
Construction Machineries.
Register of Construction
123 MOUD Ministry of Urban MOUD1 Professionals 2 1 2
Development, Housing and Contractors and Practicing
Construction Professionals
124 MOUD Ministry of Urban MOUD1 Document Repository : Sector plan,
Development, Housing and project profiles, survey reports, 3 2 6
Construction performance indicators.
MOWYCA Dataset on women related social
125 Ministry of women, Children MOWYCA
and economic issues. Education, 4 2 8
and Youth Affairs 1
health etc.
126 MOWYCA Ministry of women, Children Data on strategic women focused
MOWYCA
and Youth Affairs 2 development projects and 4 2 8
programmes. Targets, achievements

46
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
and performance indicators and
related reports.

MOWYCA Legal frameworks on women and


127 Ministry of women, Children MOWYCA
youth development and gender 3 2 6
and Youth Affairs 3
equality and Children rights and
safety protection

MOWYCA Training materials on women and


128 Ministry of women, Children MOWYCA
youth affairs, gender mainstreaming 2 1.5 3
and Youth Affairs 4
, Children rights and safety
protection and women, youth and
children related legal frameworks
and development packages

MOWIE1 National Energy Data sets includes


129 MOWIE Ministry of Water and Energy
Energy resources, location (GIS), 4.5 2 9
energy demand and supply,
capacity, power station and dams
etc.

MOWIE2 National Water Data includes water


130 MOWIE Ministry of Water and Energy
sources, Water Point Data, location, 4.5 2 9
yield, captured water data, Ground
Water Points Data consumption and
projections.

47
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
MOWIE3 Registry of water users, water
131 MOWIE Ministry of Water and Energy 4.5 2 9
professionals, consultants and
contractors

132 MOWIE Ministry of Water and Energy MOWIE4 Irrigation Projects Data includes
List, Location, Size, Capacity, 4 2 8
Potential, Beneficiaries, Profiles Of
Irrigation And Drainage
Developments, List And Profiles Of
Companies Working In The
Irrigation and drainage
development.

132 MOWIE Ministry of Water and Energy MOWIE4 National water related licensing data
including issue of water Consultancy 3 2 6
Competence Licenses, Water
Contractor Competence Licenses,
Water Professional Competence
Licenses.

NBE1 National Monetary and Financial


133 NBE National Bank of Ethiopia
Data includes Inflation, Investment, 4 2 8
Lending Rates, Deposit Rates,
Treasury Bill, Budget Deficit,
Monetary Aggregates,

48
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Net Domestic Credit, Export And
Import, Consumer Price Index

NBE2 National Banking and insurance and


134 NBE National Bank of Ethiopia
micro finance institution data 3 2 6
including lists, guidelines and
reports.

NBE3 Data related to Evaluation and


135 NBE National Bank of Ethiopia 2 1 2
Approval Of Memorandum and
Articles Of Associations, General
Assembly and Required Board
Meeting Minutes Of Bank, Insurance
And Microfinance Institutions;

NBE4 Data and documents related to


136 NBE National Bank of Ethiopia
Approval of Appointments of 3 2 6
External Auditors of Bank, Insurance
and Microfinance Institutions.
Approval of Bank, Insurance And
Micro Finance Institutions’ CEOs and
Board Members
137 NBE National Bank of Ethiopia NBE5 Data related to Coffee Contract
Registration and Export Licensing, 4 2 8
Issuance Of Foreign Borrowing
Permit, Gold Export Permit

49
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
License To Banks, Insurances And
Micro Finance Institutions and New
License To Insurance Assistance.

TA1 Vehicles Import Data including


138 TA Transport Authority 2 1 2
chassis number, motor number etc.
TA2 Vehicle drivers data including type
139 TA Transport Authority
Of Driving, Level Of Driving, 4 2 8
Driver’s Name, Address, Sex, Age,
Driving License Number, Etc.
139 TA Transport Authority TA3 Fuel Price Data including type Of 4 2 8
Fuel, Date Of Fuel Price Change,
price change

140 TA Transport Authority TA4 Freight Transport Providers Data 2 1 2

TA4 Transport schedule date and routes


141 TA Transport Authority 2 1 2
data
TA5 Data related to issue of permits and
142 TA Transport Authority
licenses including Annual Vehicle 2 1 2
Inspection Service Agent License ,
Competence License Certificate To
Vehicle And Spare Part Importers,
Competence License Certificate To

50
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
Vehicle Assemblers , Competence
License Certificate To Vehicle
Maintenance And Body Change
Organizations, Issuance Of Daily
Plate Number,
Issuance Of Equivalent Driving
License
Issuance Of Freight Transport
Associations Competence Licensing
Permit for Djibouti Round Trip,
Notifying Criteria For New Vehicle
Import Licensing, Public Transport
Operators Competence Licensing
Public Transport Associations
Competence Licensing
143 TA Transport Authority TA6 Register of Road Accidents 2 2 4

TA7 Document Repository: Policies,


144 TA Transport Authority
Programmes, Reports and surveys, 2 2 4
Road Safety Guidelines etc.

HPR1 Data related to Presentation and


145 HPR/HOF House of People’s
enactment of the parliamentary bills 4 2 8
Representative and House of
including draft bills, revisions and
Federation
versions

51
National Data Set - Design Document

Table 2: Data to Support the Operational & Business service needs of the Government of Ethiopia
S.No Agency Source: Ministry Agency Weightage
Code Data Description Index of Weighted
Code Name Commonality Index
HPR2 Data on Evaluation of performance
146 HPR/HOF House of People’s
and Budget and Planning process 4.5 2 9
Representative and House of
including MDGs, National sector
Federation
goals and targets. Ministrial
Achievements and performance
reports and national budget and
financial data.

147 HPR/HOF House of People’s HPR2 Document Repository, laws and


regulations and parliamentary 4 2 8
Representative and House of
Federation proceedings including documents,
audio and video content.

148 EFC Ethiopian Federal Courts EFC Data related to cases registered and
proceeding in Federal Courts. Court 3 2 6
Decisions

52
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible
149 Agriculture Management Ministry of Data on agriculture sector
Information Agriculture including agro industry, agro 4 2 8
production and consumption,
prices, exports and imports,
agriculture land use, crop
productivity. Irrigation, crop
disease etc.
150 Agricultural Product Inspection
and Certification Services Data on agro imports and exports, 4 2 8
•Custom clearance certificate Ministry of
importer and exporter details, data
for agricultural imports. Agriculture
related to agriculture competences.
•Certificate of competence on Details of the agro products
agriculture exported and imported
•Certificate of Import and
Export of Animal and Crop
products
151 Pension Management System Local Government National Pensioners Data 4 2 8
(PSM) -e-Municipality including register of beneficiary
eligible for pension, pension
details etc.
152 Social Benefits Related MOLSA/ Local Data on the benefits eligibility, 3 2 6
Govt. level and scale of social benefits
153 Student Scholarships MOE / Relevant Data on students applications, 4 2 8

53
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible
Institutions qualified and selected, record of
scholarships granted and availed
of
154 e-education: National Exam Ministry of
Result, Student Placement Education National Student and Teacher 4.5 2 9
Results, Teacher Placement Dataset: which includes student
Results Publish Education data and teachers data, Register
Statistics Report Student of students applied and appeared
performance results in examinations, comprehensive
• Student Registration examination data. Examination
•Teaching Permit – for Results.
Employment Teachers qualification s and
•Certificate for completion of permits; examination certificate
high School (Original & data etc.
replacement)
•Certificate for completion of
Degree
•Teaching Permit – for
Employment
•Education Certificate – to be
sent abroad

54
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible
155 E-health informational
Services Data on Hospitals and clinics,
Ministry of Health 4.5 2 9
•Hospital Information Doctors and health professionals,
•Doctors Profile Information drug availability and pricing data.
•Booking of Appointments Patient profiles and results
•Notification of Test result
•Drug pricing Information
•Drug Availability
Information.
156 E-health Transactional Ministry of Health Data related to drug marketing,
services register of drug marketing 3 2 6
 Issuance of Drug Marketing entities, pharmacies, companies,
License. distributors. Register of health
 Issuance and renewal of professionals. Data related to
Certificates of competence licensing to hospitals. Conditions
to drug establishments and qualification for license
 Issuance, renewal and eligibility.
cancellation of drug
registration certificate
 Issuance of Certificate of
competence to health
professionals
 Issuance of Licenses to

55
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible
Hospitals
 Issuance, Renewal and
cancellation of license to
conduct clinical trials.
157 E-Trade- Informational
Services: Ministry of Trade / Data on investment and 4 2 8
 Provision of Investor Ministry of investors, import and export data
Information Industry and market research information,
 Policies on Business Loans information on existing and
 Provision of potential industries. Policies and
Export/Import Trade programmes for business loans
Information to the and incentives.
Business community
 Dissemination of market
research to Exporters for
investing in existing as
well as new markets
 Provision of Information
on SMEs and Large
Industries.
158  Business Licensing and Ministry of Trade Data on Registered Businesses
Registration Services and companies, names, 4 2 8
(Issuance, renewal and addresses, locations,
cancellation) shareholding, licence data

56
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible
 Licensing of Professional, including its validity. Licence
Contractors related data of professional
 Trade name and Trade contractors. Business Names
mark registration and database.
renewal Trade Name and Trade Mark
• Collection of fines if register and validity. Data related
business operate without to licenses of construction
License or outdated machines
license Documents and procedures for
 Licensing of construction business registration and
machines. trademarks, licensing etc.
159  Request for Rental Agency for Dataset of Government Houses
Housing services include Government including the Register of 4 2 8
Residential Housing Houses Government Houses including
request by customers. house data, locational data,
 Request for renewal of specification, rental data, lease
housing lease contract by and lease data,
tenants. Rental Contracts related data and
Contract Repository.
 Request for maintenance
Request for Maintenance of
 Request for transfer of
housing and maintenance data
right Policies and Procedures and
 Various Notifications. Notifications related to Govt.
Houses.

57
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible
160 Information on job markets, Ministry of Labour Data on job market employment 4 2 8
vacancies and job trends, job vacancies, Candidate
descriptions and candidate Profiles, Register of
availability unemployment certificates issues
• Unemployment Registration
/ Unemployment Certificate.
161 •Overseas employment Ministry of Labour Data on the Ethiopian Nationals 3 1.5 4.5
registration for Ethiopian employed abroad. Register of
Nationals Ethiopians employed abroad,
•Employment Agency License Register of Foreign Nationals
Renewal working in Ethiopia.
•Employment Agency Employment agencies’ Register.
Licensing (New) Data on Employment agencies’
•Foreign Nationals Work Licenses issues etc. Document
Permit Repository: Foreign Worker
Permits, Employment Agencies
Licenses
162 Registration of:- Labour Ministry of Labour Labour Unions Data, Employment 3 2 6
Unions Unions Data, Work Place Data
Employment Unions , Work Labour Safety Data, Worker
Places Health Data, Worker Accident
Collective Agents, Safety & Data
Health Committees Work Registers of Labour Unions, Work

58
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible
place accidents Institution & Place Collective Agents, Work
Partner Organisation Place accidents Institutions,
Safety and Health Committees.

163 Informational Services to the Ministry of Dataset on vehicles, including


public and businesses on Transport / vehicle insurance, public 4 2 8
expiry / renewal of insurance, Transport transport timings, ticket
licenses; on public transport Authority availability, GIS data on public
timings, tickets availability, transport routes and stops,
routes, stops etc.; the traffic Special events, regulations, duty
movement, special events, and taxes on vehicles. Traffic
road works, parking areas, movement.
detour information; Traffic violation s and penalties
regulation, policies and
import duty for vehicles and
Notification of Traffic
violations and penalties.
Online booking of tickets for
interstate government
transport.

164 Issuance / Renewal of drivers Ministry of Drivers Database, Drivers data, 4 2 8


license (National / Transport / license related data. License
International) Transport renewal data. Accident and traffic

59
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible
Authority violations etc.
Online notification and
collection of fines for traffic
violation

165 Application for change of Ministry of Private vehicle ownership data. 4 2 8


ownership of vehicles Transport / Ownership details, insurance-
Application and renewal of Transport insurer, validity etc.
vehicle Insurance. Application Authority
for request of special vehicle
numbers
166 Dissemination of Information Ministry of National Tourist and Tourism 3 1 3
on Tour Operators, Travel Tourism Dataset, including tourist
Packages, Itinerary, Fee, destinations GIS database; tour
Information about operators, tour packages and
Accommodations, fees, hotels, accommodations,
Information regarding various cultural and heritage centres.
resources, Cultural and
heritage centres.
Registration of private tour
operators: Registration for a
tour
Registration of Hotels,

60
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible
Resorts, Guest Houses and
Local Accommodations.
Booking a Hotel, resort etc.
Permission to visit Heritage
sites

167 Supporting letter for film Ministry of Foreign National Data. ID issued 2 1 2
makers, Tourists to go Tourism to Foreign Nationals in Ethiopia,
abroad. Application for Driving Permits to Foreigners,
Tourist VISA on Arrival. Diplomat and service visa related
 Driving license equivalence data; tourist visa data.
for foreign citizens
 ID card issuance for foreign
nationals
 Issuing diplomats and
service visas
 Issuing Birth, Death and
Marriage Certificate to
foreign nationals.

61
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible

168  Online application of Revenue Authority National Tax Payers Dataset. 4 2 8


Registration as a Tax Payer Includes register of tax payers,
 Online filling of Tax Return details of tax payers of various
{Land Tax, Rental Housing types of taxes. Register of TIN
tax (paid by owner), numbers. Tax filing declarations
Turnover tax, TV tax, VAT Document Repository:
and Excise Tax} Registration Policy and
 File a declaration Procedures, Tax Declarations.
 Tax clearance Certificate
 TIN Number Registration
170  Buying and Selling Property Federal Courts, National Property Database 4 2 8
Agreement, Land Property includes, land and property
Ownership Registration office, details, Land and property
 Land Transfer and Property City ownership- name, address, ID,
Registration Administration GIS Land and property data.
Document Repository: Policies
and Procedures for property
registration, GIS Land Maps,
Buying and selling agreement
copies.

62
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible

171  Court Cases Management Federal Courts, Court Cases Dataset includes
System Ministry of Justice, Register of cases filed in various 4 2 8
 Application for a Pleading , Public Prosecutor courts, details of litigants,
Document Transfer, Case offices pleaders and lawyers, status of
Filing, cases, hearing dates etc.
 Adjournment of Cases
172 Information Provision about City Addis Ababa City Data including
A.A. City Administration, Administration GIS data on city, locational data 4 2 8
Information on creating on places of interest,
awareness for protecting the entertainment, cultural and
city from different pollution heritage sites, City Administration
and natural resource information
degradation. Distributing  Radio Programme data
Organizational Structure for  TV Programmes Data
sub cities & center Offices  City Roads data
.Distribution of City Laws.  City News Data
City Entertainment, Radio  City Tours Data
Programmes, TV programme,  City Maps
Events and Programmes.  City Pollution Data
Directives on City Road Maps,  City Parks and entertainment
Application for blocking centre data
roads.  City Employee Data
Information on damaged  City Bye Laws

63
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible
roads and request for repairs

City Administration Citizen


173 City Data on City radio and TV 2 1 2
services Administration programmes, News, City
Request for Archives of News,
Advertising spots.
Television Programs, Radio
Programs.
 Application for Sponsorship
of news, Communications
 Request to post News
 Request for Advertising
bids and names etc.

City Administration Citizen


174 AA. City Addis Ababa Resident Data Set
Services 4 2 8
Administration
 Application for change of This will include the resident
name details: names and addresses,
 Application for declaration family members, marital status.
of absence City Property ownership,
 Application for succession Registers of marriages,
Certificate guardianship of minors, Birth
 Application for marriage Certificates, Non Married
certificate Certificate, Death Certificate.
Residence ID data

64
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible
 Application for Certificate of
Guardianship
 Issuance of Resident ID
 Issuance of Marriage
Certificate
 Issuance of Divorce
Certificate
 Issuance of Birth Certificate
 Issuance of Non-Married
Cert.
 Issuance of Death
Certificate

A.A City Addis Ababa Land and Natural


176  Issuance of License for Administration Resources and Environment Data 4 2 8
Mining Base.
 Environmental Impact Including GIS data on urban
Assessment Permit residential land, parks and urban
 Request for environmental forests, designated mining land,
Lab test mineral resources,
 Collecting land and land environmentally protected areas,
related data water resources, areas under
 Request for land and land gardens and farming.
related data

65
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible
 Request for park and City Environment Data on air and
recreation centre water quality, air pollution levels.
Industrial discharges, smoke and
other particulate.
Water and Sewerage related City AA Water and Sewerage Dataset
177
citizens services Administration 4 2 8
GIS data on water and sewerage
 Request for new water lines. Water Supply data, capacity
line data, Water and Sewerage
 Request for maintenance services billing data. Water
of water / sewerage lines consumer data.
 Request for sewerage
service
 Bill Collection
 Request for Disposal of
solid waste
City Based Educational Citizen City AA City Educational Dataset
178 2 1 2
Services Administration Educational Courses offered,
 Registration of Private Educational Educational Institutions in the
ECLSE examinees Bureau city. Capacity, admission
 Request for summer procedure,
courses registration Course Examination results data.
 Request for grade report Student Data
card when lost

66
National Data Set - Design Document

Table 3: Data to Support Government E-Services / M -Services

S. No E-Service and M-Service Main Govt. Dataset Description


Index of Weightage Weighted
Description Agency
Commonality Index
Responsible

Passport and Visa related Ministry of Foreign Passport and Visa Data
179 2 2 4
services Affairs Applicant Data. Passport and
 Application for PP Visas issued and rejected. Foreign
 Renewal National in Ethiopia related data
 Replacement
 Visa Application,
Processing & Issue
 ID Card issue to Foreign
Nationals

Table 4 : Data to support Common Applications for the Government of Ethiopia


S. No Description of Primary Agency/ Dataset Description Index of Weightage Weighted
Application Agencies Commonality Index

180 E-Procurement MCIT/ All agencies Dataset of Government Purchasing and Stores
Data on Government purchasing, Purchase Indents, 4.5 2 9
stores, materials. Approved vendors, Rate Contracts,
black listed vendors. Tenders and bids. RFPs and RFI
related data and documents.

67
National Data Set - Design Document

181 HRMS Ministry of Civil National Civil Services Dataset


Services /MCIT This includes data of all civil servants, recruitment, 4 2 8
application data, job applicant data, training of civil
servants, promotion, civil service job vacancies,
interview data, selection data. Payroll data,
retirement benefits data etc.

182 Financial Ministry of National Government Treasury, budget and Financial 4 2 8


Management Finance/ MCIT Dataset Data related to budget, finance and
System accounting and treasury operations of the
Government. Agency disbursement and financial
delivery data.

183 E-Office MCIT/ All Repository of Documents 4.5 2 9


ministries &
agencies

184 Email of the MCIT/ All Repository of Government Emails 4.5 2 9


Government ministries &
agencies

68
National Data Set - Design Document

4.0 Identifying Common Data Elements

As mentioned before, to support the e-Government programme of the


Government including its e-services and automated strategic and operational
processes the availability of the right information and data at the right time is
essential. The current fragmented nature of its ICT systems leading to wasted
resources and suboptimal operations has to give place to an Enterprise wise
systems establishment and integration of the agency level systems at the level of
application and databases is an essential requirement. Development and
deployment of ENDS is an initiative in this direction. Some of the business
processes are confined to single ministries, agencies and departments, while there
are many other business processes and operations and e-services which are by their
nature multi-agency. In the latter cases data exchange between Enterprise sub-
systems is indispensable to achieve optimality.

The exhaustive study of the data / information flow within the Government of
Ethiopia enterprise organizations has revealed that there are data elements that are
agency specific and remain within the agency concerned. On the other had there are
data elements and information that must flow between agencies in parallel to the
work flow of the Enterprise or for undertaking multi-agency tasks. As described in the
previous section, it is imperative to distinguish between agency specific data
elements and common data elements. A methodology that has been described in the
previous section to assign a Data Commonality Index to various data elements of the
Enterprise indicating the extent to which the data elements have the potential to be
exchanged and shared between agencies. Additionally, the calculated Data
Commonality Index for each data element has been multiplied by a number from 1 to
2 depending on how important is the concerned data element / information from
public service or Government business perspective. The weighted average Data
Commonality Index can vary from 0 to 10. The data elements that have 0 value are
data elements which are agency specific and therefore are not by definition common
data elements. Only Data elements with Index value from 1 to 10 are common data
elements that are shared and exchanged between agencies.

69
National Data Set - Design Document

Based on the target Enterprise Architecture requirements with 90 percent


automation of the business processes, 184 sets of Data Elements have been
identified as described in Tables 2, 3 and 4. Ideally one would expect that all these
184 sets of Data Elements should form the National Dataset for Ethiopia. However,
the elements that have higher Weighted Data Commonality Index would have
preference over the elements with lower index. The Index distribution for the 184
sets of Data Elements is as follows:

Table 5: Distribution of Data Commonality Index

S. No Index Interval Number of Sets of Percentage of the


Data Elements Total
1 1 to 3 45 24
2 >3 to 5 31 17
3 >5 to <8 39 22
4 >8 to 10 69 37
Total 184 100

In the first instance it would be logical that the Government concentrate on


such sets of Data Elements that have maximum impact in terms of coverage / usage
within the enterprise and importance from business point of view. As can be seen in
the Table 5 Sixty nine (69) out of 184 or nearly 37 percent have the highest index
and should be the fit candidates for inclusion in the ENDS in the first instance. These
priority sets of data elements are summarised in the Table 6 below

Table 6: High Priority Sets of Data Elements


S.N0 Agency / Agencies
Summary Description Data Category
Responsible
1 Dataset of Government Houses Agency for Government Infrastructure
including the Register of Govt. Houses
Houses including house data,
locational data, specification,
rental data, lease and lease data,

70
National Data Set - Design Document

Table 6: High Priority Sets of Data Elements


S.N0 Agency / Agencies
Summary Description Data Category
Responsible
Rental Contracts data. Request
for Maintenance of housing and
maintenance data.
Policies and Procedures and
2 Agency for Government Document
Notifications related to Govt. Houses Repository
Houses
3 Population Census Dataset Central Statistical Agency Population /citizen

4 Economy Activities Dataset Central Statistical Agency Economic / Financial


5 Consumer and Producer Prices Central Statistical Agency Economic / Financial
6 Household Income, Consumption Central Statistical Agency Economic / Financial
and Expenditure
Registered Tax Payers and Ethiopian Revenue & Economic / Financial
7 Payments individual & businesses Customs Authority / social
Document Repository: Taxation Ethiopian Revenue & Document
8 Policies, Rules, Procedures and Customs Authority Repository
Templates
Agriculture Land Data includes Ministry of Agriculture Land & Natural
9 Geographical Position Resources
Agricultural Lands, Topography
Landscape Soil type and
productivity
Agriculture Sector Investment
10 and Investor Data Ministry of Agriculture Agriculture
11 Agriculture products production, Ministry of Agriculture Agriculture
market, prices, trade and
consumption data
12 Agriculture inputs production, Ministry of Agriculture Agriculture
prices, market, trade and
consumption data

13 Data on agro imports and Ministry of Agriculture Business /Trade


exports, importer and exporter Ministry of Trade
details, data related to
agriculture competences.
Details of the agro products
exported and imported

71
National Data Set - Design Document

Table 6: High Priority Sets of Data Elements


S.N0 Agency / Agencies
Summary Description Data Category
Responsible
14 National Civil Services Data. Ministry of Civil Service Civil Services
Civil service recruitment data
including appointments etc.
This includes data of all civil
servants, recruitment,
application data, job applicant
data, training of civil servants,
promotion, civil service job
vacancies, interview data,
selection data. Payroll data,
retirement benefits data etc
15 National Student Dataset Ministry of Education Education and
including enrolment and Human Resources
attendance and scholarship data
16 National Teacher Dataset Ministry of Education Education and
including data on qualifications Human Resources
and competency certificates
issued
17 Data on National Examination Ministry of Education Education and
dates, venues and results Human Resources
18 Regional Social and Economic Ministry of Federal Affairs Economic / Financial
Development Data /Social
19 National Budget, cash flow, Ministry of Finance and Economic / Financial
disbursement and related Data Economic Development / Social
20 Civil Service Payroll Data Ministry of Finance &ED Civil Services
21 National Economic and Social Ministry of Finance and Economic / Financial
Dev. including plans, projects, Economic Development / Social
programmes, indicators and
achievements.
22 National Population Data Ministry of Finance and Population / Citizen
including forecasts Economic Development
23 List and Profiles of Pubic Ministry of Finance and Economic / Financial
Budgetary Organizations, Budget Economic Development / Social
Requests, Budget Ceilings,
Budget Utilization Performance
24 List And Profiles Of Internal And Ministry of Finance and Economic / Financial
External Auditors and Audit Economic Development / Social
Reports

72
National Data Set - Design Document

Table 6: High Priority Sets of Data Elements


S.N0 Agency / Agencies
Summary Description Data Category
Responsible
25 Data on accounts of ongoing and Ministry of Finance and Economic / Financial
completed projects Economic Development / Social
26 Data on Federal Budget Ministry of Finance and Economic / Financial
Assistance to Regions and Region Economic Development / Social
to Woreda (cascaded Budget)
27 Registration and Analysis of Bank Ministry of Finance and Economic / Financial
Statements and Budget Ceiling Economic Development / Social
For Federal Payments
28 National Health Dataset, Ministry of Health Public Health
including data on health
indicators among various
population segments, life
expectancy, child mortality etc.
Public health preparedness
indicators.
Data on communicable and non
29 Ministry of Health Public Health
communicable diseases including
their prevalence including data
on projects and programmes on
prevention and cure.
30 Data on Hospitals and clinics,
Doctors and health professionals, Ministry of Health Public Health
drug availability and pricing data.
Patient profiles and results
Data on Industrial Investment, Industry /
31 Ministry of Industry
value, sector and geographical manufacturing
distribution. List of investors,
historical growth, installed
capacity and growth. Key
performance indicators.

32 Document Repository: Industry Ministry of Industry Document


development policy, plans and Repository
programmes, incentives, survey
reports, industry sector profiles.
market research information,
information on existing and
potential industries. Policies and
programmes for business loans
and incentives

73
National Data Set - Design Document

Table 6: High Priority Sets of Data Elements


S.N0 Agency / Agencies
Summary Description Data Category
Responsible
33 National Labour Employment Ministry of Labour and Human Resources
data Social Affairs
34 National Pensioners Data Ministry of Labour and Economic/ Financial
including register of beneficiary Social Affairs and Local and Social
eligible for pension, pension Government
details etc
35 Data on job market employment Ministry of Labour and Economic/ Financial
trends, job vacancies, Candidate Social Affairs and Local and Social
Profiles, Register of Government
unemployment certificates issues
National Business Register all
36 Ministry of Trade Business / Trade
business and ownership data,
companies, names, addresses,
locations, shareholding, licence
data including its validity. Licence
related data of professional
contractors. Business Names
database. Trade Name and Trade
Mark register and validity. Data
related to licenses of
construction machines.
Documents and procedures for
37 Ministry of Trade Document
business registration and Repository
trademarks, licensing etc.
National data on commodity
38 Ministry of Trade Business / Trade
(other than coffee) production,
consumption prices and local
market
National data on coffee
39 Ministry of Trade Business / Trade
production, consumption, prices,
exports. Producers, international
market operators etc.
Data on international trade.
40 Ministry of Trade Business / Trade
Imports and exports. Importers
and exporters; international
trade associations
41 Geospatial Dataset on Urban Ministry of Urban Land and natural
Land ; Size, Type, GIS Data, Development, Housing resources
Current Ownership Status, Land and Construction
Bank data, Land standards and
modern land data attributes

74
National Data Set - Design Document

Table 6: High Priority Sets of Data Elements


S.N0 Agency / Agencies
Summary Description Data Category
Responsible
42 Data on urban Infrastructure Ministry of Urban Infrastructure
Supply status. List And Profiles Development, Housing
Of Urban Areas Of The Country. and Construction
Standards Of Municipal Services
Provision, and Level Of Urban
areas In terms of Municipal
Services.

43 Dataset on women related social Ministry of women, Economic/ Financial


and economic issues. education, Children and Youth Affairs and Social
health etc.
Data on strategic women focused Ministry of women, Economic/ Financial
44 Children and Youth Affairs and Social
development projects and
programmes. Targets,
achievements and performance
indicators and related reports.

45 National Energy Data sets Ministry of Water and Energy and Power
includes Energy
Energy resources, location (GIS),
energy demand and supply,
capacity, power station and dams
etc.
46 National Water Data includes Ministry of Water and Land and Natural
water sources, Water Point Data, Energy Resources
location, yield, captured water
data, Ground Water Points Data
consumption and projections.
47 Registry of water users, water Ministry of Water and Land and Natural
professionals, consultants and Energy Resources
contractors
48 Irrigation Projects Data includes Ministry of Water and Agriculture and
List, Location, Size, Capacity, Energy Infrastructure
Potential, Beneficiaries, Profiles
Of Irrigation And Drainage
Developments, List And Profiles
Of Companies Working In The
Irrigation and drainage
development.
49 National Monetary and Financial National Bank of Ethiopia Economic/ Financial
Data includes Inflation, and Social
Investment, Lending Rates,

75
National Data Set - Design Document

Table 6: High Priority Sets of Data Elements


S.N0 Agency / Agencies
Summary Description Data Category
Responsible
Deposit Rates, Treasury Bill,
Budget Deficit, Monetary
Aggregates, Net Domestic Credit,
Export And Import, Consumer
Price Index
50 Data related to Coffee Contract National Bank of Ethiopia Economic/ Financial
Registration and Export and Social
Licensing, Issuance Of Foreign
Borrowing Permit, Gold Export
Permit. License To Banks,
Insurances And Micro Finance
Institutions and New License To
Insurance Assistance
51 Vehicle drivers data including Transport Authority Vehicle& Driver Data
type Of Driving, Level Of Driving,
Driver’s Name, Address, Sex, Age,
Driving License Number
52 Dataset on vehicles, including Transport Authority Vehicle & Driver
vehicle insurance, public Data
transport timings, ticket
availability, GIS data on public
transport routes and stops,
Special events, regulations, duty
and taxes on vehicles. Traffic
movement. Traffic violation s and
penalties
Fuel Price Data including type Of
53 Transport Authority Economic / Financial
Fuel Date Of Fuel Price Change

54 Data related to Presentation and House of People’s Legislative and Legal


enactment of the parliamentary Representative and House
bills draft bills, revisions and of Federation
versions

55 Data on Evaluation of House of People’s Legislative and Legal


performance and Budget and Representative and House
Planning process including of Federation
MDGs, National sector goals &
targets. Achievements and
performance reports and
national budget and financial
data.

76
National Data Set - Design Document

Table 6: High Priority Sets of Data Elements


S.N0 Agency / Agencies
Summary Description Data Category
Responsible
Document Repository, laws and House of People’s Document
56
regulations and parliamentary Representative and House Repository
proceedings including of Federation
documents, audio and video
content

57 National Property Database Federal Courts, Property Economic/ Financial


includes, land and property Registration office, City and Social
details, Land and property Administration
ownership- name, address, ID,
GIS Land and property data.
Document Repository: Policies
and Procedures for property
registration, GIS Land Maps,
Buying and selling agreement
copies.
58 Addis Ababa City Data including Addis Ababa City Infrastructure
GIS data on city, locational data Administration
on places of interest,
entertainment, cultural and
heritage sites, City
Administration information
59 Addis Ababa Resident Data This Addis Ababa City Citizen / Population.
will include the resident details: Administration.
names and addresses, family
members, marital status. City
Property ownership, Registers of
marriages, guardianship of
minors, Birth Certificates,
Married Certificate, Death
Certificate. Residence ID data.
60 Addis Ababa Land and Natural Addis Ababa City Land and Natural
Resources and Environment Data Administration. Resources
Base. Including GIS data on urban
residential land, parks and urban
forests, designated mining land,
mineral resources,
environmentally protected areas,
water resources, areas under
gardens and farming.
City Environment Data on air and

77
National Data Set - Design Document

Table 6: High Priority Sets of Data Elements


S.N0 Agency / Agencies
Summary Description Data Category
Responsible
water quality, air pollution levels.
Industrial discharges, smoke and
other particulate.
AA Water and Sewerage Dataset
61 Infrastructure
GIS data on water and sewerage Addis Ababa City
lines. Water Supply data, Administration.
capacity data, Water and
Sewerage services billing data.
Water consumer data.
62 Data of Government Purchasing Administration
Various Agencies
and Stores
Purchase Indents, stores,
materials. Approved vendors,
Rate Contracts, black listed
vendors. Tenders and bids. RFPs
and RFI related data and
documents.
63 Repository Emails MCIT Administration

64 Repository of Documents MCIT Administration

After having removed some duplications we are left with 64 sets of Data which
could be considered as the priority candidates for the proposed ENDS. As can be
seen in the table given above these sets have been categorized into various data
categories, each corresponding to each data domain. For instance data category of
Lands and Natural Resources contains within it all the sets of data elements that
relate to Land, Minerals, Forests, Water and Natural Resources. Similarly the data
category of Citizen and Population contains within it all the sets of data elements
related to citizens and residents, demography, population distribution etc. All in all
16 data categories have been identified. The 64 sets of data elements mentioned in
Table 6 are distributed into these 16 data categories as follows:

78
National Data Set - Design Document

Table 7: Data Categories

S. No Data Category No of sets of data elements


1 Administration 2
2 Agriculture 4
3 Business / Trade 5
3 Citizen / Population. 3
4 Civil Services 3
5 Education and HR Skills 3
6 Economic/ Financial and Social 19
7 Energy and Power 1
8 Human Resources 1
9 Industry / Manufacturing 1
10 Infrastructure 4

11 Land and Natural Resources 5


12 Legislative and Legal 2
13 Public Health 3
14 Vehicle & Driver Data 2
15 Document Repository 5
16 Email Repository 1
Total 64

79
National Data Set - Design Document

5.0 Ethiopian National Dataset Design and Technical Specification

The first question that would naturally come to mind is why should Ethiopia
have a National Dataset? To some extent this question is answered in the Ethiopian
E-Government Strategy document which lays down the development of the National
Data set as a means for effectively and efficiently exchanging data between the
various ICT systems existing in the Government of Ethiopia and thereby making
interoperability possible between the individual systems existing at the agency level.
And in effect virtually integrating the individual agency and departmental level
technology systems into a single Enterprise System.

One of the important Enterprise Architectural policy, mentioned before, is that


data is to be considered as an enterprise resource that needs to be shared and
exchanged. Data can and indeed is many times shared and exchanged via point-to-
point interfaces between pairs of applications, and this is a valid architectural
alternative to other more complex architectural alternatives such as data hubs or
data marts. A point-to-point interface that moves data from one application to
another is much simpler to implement than any Common Data sets, data hubs or
data marts, and it seems that all enterprises now have gigantic spider webs of such
interfaces that have grown up over the years. However, Common Enterprise Data
sets (data hubs) provide an attractive alternative to point-to-point interfaces. The
point to point data exchange in the context of a large enterprise like that of the
Government of Ethiopia would inevitably lead to unnecessary movement of the same
data; poor control and governance, does not permit reuse of the data optimally and
makes governance and management difficult.

5.1 Ethiopian National Dataset Design Consideration

Even though the Government of Ethiopia has made considerable progress in


terms of adoption of ICT systems for its day to day operational management, most
Government agencies still are at a low level of maturity in matters of systematic
technology application, management and Governance. There are therefore several
constraints and opportunities that must be kept in view in designing the ENDS data

80
National Data Set - Design Document

hub that is optimally meets the needs of the Government of Ethiopia Enterprise.
Some of the important consideration to be kept in view are as follows:

 The Data Hub design has to be maintainable and data transfer from
ministerial systems to the central data hub / warehouse should be easy.

 The design has to be Inexpensive and robust. Electronic Data Interfacing


which can fit right into the current technology setup in various ministries in
Ethiopia.

 Design should be adoption of a simple, maintainable and extensible


Datamart/Warehouse model that can be created, configured and made live
with relatively less effort from government stakeholders.

 Architecture to have minimum dependencies with third party systems.


Typical governmental projects of this scale hit severe bottlenecks in
execution and roll-out because of insurmountable issues in third part system
interfacing and dependencies. The proposed Architecture is decoupled by
design with second & third party systems to ensure that rollout can happen
on an as-is basis for most ministries, with the only exception being the
inherent quality of data, which can be locally resolved in the source systems.

 Minimum Effort – Maximum Impact approach to ensure that the ENDS


project can be launched with minimum effort but can result in maximum
Data Discovery and Use for governance, service delivery and information
sharing purposes.

 Technology Neutral – The architecture to be technology neutral to ensure


that the government has the flexibility to choose between most mainstream
ETL, Data Warehousing and Web Access technologies including Open
Source options without being tied into any specific provider.

5.2 Ethiopian National Dataset Design Options

Various architectural and design options are available to design and develop
the ENDS from the simplest where the data is published by the agency based

81
National Data Set - Design Document

systems into the ENDS Data Hub and is extracted by the applications at the user
end to the very complex and layered architecture where in the data is prepared and
loaded into a staging area transformed, shared, and analysed. At the same time the
data is stored in a data warehouse for aggregation, analysis and reporting. Some of
these options are briefly described below to determine the option that best meets the
situation existing in Ethiopia.

The simplest option is for applications to


publish data into the Data Hub and other
user applications can pull the data or the
publishing applications can push the data
to the user applications. There is no
integration in a publish-and-subscribe
data hub. Each publisher’s data set is just
staged as is, and taken by a subscriber.
What the hub can do is coordinate the
pushing and pulling of data by
Figure 2: Simple Publish Extract Data Hub recognizing when a publisher is ready to
publish, and informing a subscriber when
data is available. This can be tied into
enforced service level agreements
(SLAs).

The second design option is that of a data


store for reporting. The primary objective of
a reporting data store is to shift reporting
out of transactional applications because
reporting typically degraded their
performance. Initially, the databases of
transaction applications were simply
replicated and the reports run off the
replicas. Then, it was realized that data from
Figure 3: Operational Data Store for Reporting several applications could be integrated into
a hub and integrated reporting run from the
hub.

82
National Data Set - Design Document

The Master Data Management Hub is


another alternative architectural design
option for situations were in manual
intervention for data management is
required as some of data providers are
external entities outside the enterprise. To
some extent subscription management may
need to be undertaken in the hub. Perhaps
the biggest difference with the MDM hub is
that data content management is necessary.
This means there has to be functionality to
permit human operators to analyse and
update the data. Most master data domains
are simply too complex for automated
management, and human intervention is
Figure 4: Master Data Management (MDM) Hub required.

The Integration Data Hub design illustrated in


Figure 5 is an extensive integrational model.
This kind of hub serves to integrate data
flowing via batch movement and/or
messaging. It also supplies the warehouse
layer, and the warehouse layer does not
repeat the integration that has already been
performed. The principle that underlies this
hub is that integration will only be done
once. Quite often, transaction applications
actually do integration, and often further
integration is redundant. Each application
does the integration in its own way and
subtle differences mean that the data is
inconsistent. Further downstream integration
(e.g., in a warehouse) brings out the
Figure 5: Integration Data Hub integration inconsistencies if any.

5.3 ENDS Data Hub High Level Architecture

Considering the situation existing in Government of Ethiopia as an enterprise,


which has been surveyed and studied in details in the Situational Analysis Report
and briefly mentioned in the foregoing sections of this report underscoring the

83
National Data Set - Design Document

fragmented nature of the ICT systems with relatively low level of organizational and
technology maturity, and skill and Governance deficit. A high level architectural
design for the ENDS Data Hub is proposed that is close to the design of the
Integration Hub (Figure 5). Indeed, the proposed architecture design goes beyond
the basic model to include components and functionalities closely aligned with the
needs of the Government of Ethiopia. At the agency level Publishing end, the ETL
data extraction and loading technology will be used to move the data to a staging
Database for further transformation to Data Marts, where from it may be pulled out
to user systems for variegated uses for instance for delivery of public services ( e-
services), support to business processes of the Government for operational and
management purposes, to provide information and reports as per the needs of the
user agencies or departments of the Government. The various components parts
and operations of the proposed ENDS design is given below:

5.3.1 Summary of High Level Architecture


The Master Architecture includes the following broad steps:

1. Offline Data Extraction from Ministerial Applications/Systems to Staging


Database.
This is the key process where key (difficult to access/inaccessible) data residing
in various ministerial systems is extracted and brought into one common data
store. This represents the single most important aspect of the ENDS initiative and
includes the following steps:

a. Extracting Key Data from various Ministerial Systems as Flat File


downloads by Ministerial stakeholders.
b. Consolidation of all Data files into a central FTP.
c. Extracting data from all the files and placing them in a Staging Database
where each data file is mapped to a single data entity.

84
National Data Set - Design Document

2. Staging Database – Data Cleansing & Validation


Now the data aggregated in a central storage area has to be thoroughly cleansed
and validated so that it is at a quality level to be taken for further downstream
processing.

This consists of the following steps:


a. Running Data Cleansing Routines in the Staging Server Database to
ensure data is clean of any aberrations like special characters, null
values, date format, currency format etc.

b. Running Data Validation Routines in the Staging Server Database to


ensure data present is validated with the expected Business Rules like
expected date range, expected number range for relevant fields etc.

3. Data Transformation to Datamart


Once the aggregated data in the Staging Server has been validated and
cleansed, the next steps is to divide/partition this data into clearly defined Data
Marts. Each Data Mart comprises of subset data pertaining to a specific business
end and is targeted at fulfilling one or more specific e-Services and Data Sharing
requirements.

This process of Data Transformation comprises of the following steps:

a. Aggregating various related data entities in the Staging Server into


specifically defined Datamarts which will then form the basis for Reporting
and Analysis – using the Metadata rules in the Datamart table as
reference.
b. Updating each Datamart’s Fact (Transactional) and Dimension (Lookup)
tables on every periodic/monthly data upload.

4. Business Intelligence & Analysis Layer


Once the Data has been brought out of the ministerial systems, aggregated in
the Staging Server, cleansed and validated, transformed into specific Data Marts,

85
National Data Set - Design Document

the next step is to draw information or intelligence from this Data Mart to cater to
various end purposes. The penultimate step in the process, this is carried out by
the Business Intelligence and Analysis layer which processes the Data Marts into
cleanly packaged information sets for final consumption.

The three categories of information that is the outcome of this process is as


below:
a. Transactional Datamart Views
Transactional Datamart Views provides transactional data views for
access by e-governance services and inter/intra ministerial uses, for
example Census database with details on each citizen, passport details
for each citizen and all issuance, renewal and cancellation details. This
transaction data is the most granular data which can be used for core e-
Services for the purposes of validation, and data lookup for example e-
Service requiring a proof of identity can simply send the Passport Number
(as stated by the citizen) to this Datamart to extract all other identification
information stored for that particular Passport number like Address, City,
exact First Name/Last Name etc.

b. Summary Dataset Views


Summary Dataset Views are simply business reports derived from the
core transactions data which can be used towards purposes like Open
Data, Planning, National State/Progress visibility etc.
As an example, Transactional Data for Vehicle Registrations in Ethiopia
could include core details of all vehicles registered in Ethiopia with
date/time, vehicle, owner, location details.

Summary Data in this case could include, amongst others:

 Statistics on the number of registrations carried out in Ethiopia on a


monthly basis for specific vehicle categories.

86
National Data Set - Design Document

 Statistics on the number of vehicles older than a particular age of a


particular category which should be on the streets of Ethiopia now,
for purposes of pollution control and road safety.

c. Controlled Vocabulary Services

Controlled Vocabulary Services is an important by-product of the ENDS


initiative which draws on Dimensions / Look up values that are built up in the
Datamart which are important for common National Controlled Vocabulary
services.

For Example:
 Cities in Ethiopia
 Towns in Ethiopia
 List of PIN Codes in Ethiopia
 List of Telecom Operators in Ethiopia
 List of Hospitals in Ethiopia
 List of Regional Transport Offices in Ethiopia
 List of Medical Specializations currently held by Ethiopian doctors.

These look up data values or controlled vocabulary is invaluable in their own


right as they provide a single point of access of key business information for
third party validation purposes.

5. Data Access Layer

The final component of the ENDS technical architecture is providing access to this
aggregated, validated, packaged and analyzed data to consumers. Consumers for
ENDS data are Ministries, Agencies, e-Services and Citizens.

The Access layer can be broadly broken down into the following two channels:
a. Web Service / APIs

A Key Data Access channel to the Data Sets and Data Marts is provided
by a comprehensive Web Service Layer. Web Services are a software

87
National Data Set - Design Document

system designed to support interoperable machine-to-machine interaction


where the consumer is decoupled from the Supplier of Information. This
means that e-Services and other systems which depend in the central
ENDS Datamart and Datasets get access to this information via a simple
Request/Response (Question/Answer) pattern without any online
integration.

APIs not only allow for data access by e-Service applications but also by
mobility & web applications developed for information dissipation to the
public. For example this can power online mobile apps which can help a
citizen with:
 Keying in a Vehicle Registration number and find out the owner
details instantly with all known traffic offences.
 Keying in a Business Name and pulling out the entire Registration
details for the business including the Business Owner, Paid-Up
Capital, Date of Registration, Partners etc.
 Instantly accessing land registration details for a place by simply
passing the geo co-ordinates (via GPS enabled smartphones) to
the central Data Mart via the provided APIs.

b. Portals

Data Access portals will be provided to act as a window to all Data Views
where Consumers can search and download various data cuts for specific
period in open data formats like Excel, CSV etc.
Portals provide the following information catalogued by Subject Area,
Source Ministry, Target Application and Users:
 Data Catalogues for View and Download
 Data Visualizations (Graphs/Trends etc)

Examples for this includes Open Data Portals for Singapore


(http://www.data.gov.sg), India, US amongst other.

88
National Data Set - Design Document

Figure 6: Master High Level Datamart/Dataset Architecture

89
National Data Set - Design Document

5.4 ETL (Extract – Transform – Load)

Facilitating ENDS Dataset Infrastructure depends on loading the ENDS Core


data warehouse regularly so that it can serve its purpose business analysis and data
integration and sharing. To do this, data from one or more operational systems needs to
be extracted and copied into the data warehouse. The challenge in data warehouse
environments is to integrate, rearrange and consolidate large volumes of data over
many systems, thereby providing a new unified information base for business
intelligence and e-services.

The process of extracting data from source systems and bringing it into the data
warehouse is commonly called ETL, which stands for extraction, transformation, and
loading. It should be noted that ETL refers to a broad process, and not merely to its
three well-defined steps. The methodology and tasks of ETL have been well known for
many years, and are not necessarily unique to data warehouse environments: a wide
variety of proprietary applications and database systems as the IT backbone of any
enterprise may use ETL technologies. Data has to be shared between applications or
systems, trying to integrate them, giving at least two applications a unified picture to the
outside world. This data sharing was mostly addressed by mechanisms similar to what
we now call ETL.

During extraction, the desired data is identified and extracted from many different
sources, including database systems and applications. Very often, it is not possible to
identify the specific subset of interest, therefore more data than necessary has to be
extracted, so the identification of the relevant data will be done at a later point in time.
Depending on the source system's capabilities (for example, operating system
resources), some transformations may take place during this extraction process. The
size of the extracted data varies from hundreds of kilobytes up to gigabytes, depending
on the source system and the business situation. The same is true for the time delta
between two (logically) identical extractions: the time span may vary between

90
National Data Set - Design Document

days/hours and minutes to near real-time. Web server log files, for example, can easily
grow to hundreds of megabytes in a very short period of time.

5.4.1 Extraction

There are various types of logical data extraction methods which may be
employed in extraction of the Ministry Systems data as a part of the operations of the
ENDS:

A. Full Extraction
Full extraction is used when the data needs to be extracted and loaded for the first time.
In full extraction, the data from the source is extracted completely. This extraction
reflects the current data available in the source system.

B. Incremental Extraction
In incremental extraction, the changes in source data need to be tracked since the last
successful extraction. Only these changes in data will be extracted and then loaded.
These changes can be detected from the source data which have the last changed
timestamp. Also a change table can be created in the source system, which keeps track
of the changes in the source data. One more method to get the incremental changes is
to extract the complete source data and then do a difference (minus operation) between
the current extraction and last extraction. This approach causes a performance issue.

C. Online Extraction
In online extraction the data is extracted directly from the source system. The extraction
process connects to the source system and extracts the source data. This extraction
mechanism requires significant system integration and may necessitate the
development of custom EDI’s to access the various systems in the eco-system. From
an ENDS perspective, most constituent systems are disconnected and have been

91
National Data Set - Design Document

configured for use in independent applications. Online connectivity for these systems for
EDI may be relatively expensive for development and maintenance.

D. Offline Extraction
The data from the source system is dumped outside of the source system into a flat file.
This flat file is used to extract the data. The flat file can be created by a routine process
daily. The advantage of this method is that this is source system agnostic. Each source
data system needs to simply export data in its native export formats which is then used
for offline extraction and loading into the Staging Database and downstream Datamart
systems.

The EDI logic and cost is therefore loaded at the Server side and not at the Client
Application side. This is ideal for ENDS considering the current system eco-system at
various ministries and agencies. The most common method for transporting data is by
the transfer of flat files, using mechanisms such as FTP or other remote file system
access protocols. Data is unloaded or exported from the source system into flat files
and is then transported to the target platform using FTP or similar mechanisms.
Because source systems and data warehouses often use different operating systems
and database systems, using flat files is often the simplest way to exchange data
between heterogeneous systems with minimal transformations. However, even when
transporting data between homogeneous systems, flat files are often the most efficient
and most easy-to-manage mechanism for data transfer. Therefore, offline extraction of
Data may be a more suitable option for ENDS.

5.4.2 Extraction Mode (Offline/Online)


Online Extraction – Not Recommended for ENDS
In online extraction the data is extracted directly from the source system. The
extraction process connects to the source system and extracts the source data. This

92
National Data Set - Design Document

Extraction mechanism requires significant system integration and may necessitate the
development of custom EDI’s to access the various systems in the eco-system.

From an ENDS perspective, most constituent data systems currently in use in


Ministries are highly localized with a low to moderate level of inter-connectivity and have
been configured for use in independent applications. Online connectivity for these
systems for EDI will be significantly expensive for development and maintenance and is
hence not recommended for use in the ENDS architecture.

Offline Extraction – The Recommended Choice for ENDS


The data from the source system is dumped outside of the source system into a
flat file. This flat file is used to extract the data. The flat file can be created by a routine
process daily. The advantage of this method is that this is source system agnostic. Each
source data system needs to simply export data in its native export formats which is
then used for offline extraction and loading into the Staging Database and downstream
Datamart systems. The EDI logic and cost is therefore loaded at the Server side and not
at the Client Application side. This is ideal for ENDS considering the current system
eco-system at various ministries and agencies.

The most common method for transporting data is by the transfer of flat files,
using mechanisms such as FTP or other remote file system access protocols. Data is
unloaded or exported from the source system into flat files and is then transported to the
target platform using FTP or similar mechanisms. On account of the fact that the
source systems and data warehouses often use different operating systems and
database systems, using flat files is often the simplest way to exchange data between
heterogeneous systems with minimal transformations. However, even when
transporting data between homogeneous systems, flat files are often the most efficient
and most easy-to-manage mechanism for data transfer.

93
National Data Set - Design Document

The Architecture for ENDS will be based on a Decoupled Source File Ingestion
methodology. This means that all input Source Files for relevant Datasets will be
extracted locally in each ministry and then transferred into a common FTP destination
for further downstream processing.

5.5 Decoupled Source File Ingestion


The Architecture for ENDS will be based on a Decoupled Source File Ingestion
methodology. This means that all input Source Files for relevant Datasets will be
extracted locally in each ministry and then transferred into a common FTP destination
for further downstream processing.

5.6 FTP Mapping


The FTP Mapping Schema specifies which Data File from which Ministerial
System is to be transferred upon extraction to which FTP folder and under which name
to be picked up by the File Extract Operation for downstream processing. A typical FTP
Mapping may look like the below:

Table 8: Example of FTP Mapping


Ministry Ministry of Education
Target Datamart National Schools Database
Application Schools Management System , MoE
File NSD_Weekly_Incremental_20150401
Destination FTP Location //ENDS/MoE/NSD/

94
National Data Set - Design Document

5.7 Recommended Extraction Methodology for ENDS

Having evaluated the benefits of the various available methodologies as outlined


above, the recommended Data Extraction methodology for ENDS is an offline Data
Extraction with Decoupled Source File Ingestion over FTP.

Figure 7: Proposed Offline Decoupled FTP File Based Extraction Schematic for ENDS

5.7.1 Flat File Consistency Check


On receipt of any Data File from any Ministry, an automated File Consistency
Check is performed to ensure that the File is correct and relevant for further processing
and loading into the Staging Server for the purposes of Transformation as is shown in
the Table 9 below

95
National Data Set - Design Document

Table 9: Flat File Consistency Check

File Name Nomenclature to Check is the File Name is consistent with


FTP Folder the FTP Location found in.

The Columns contained in the File are


matched to that expected in this particular
ETL mapping. If any column does not match,
Expected Columns
an expected column is missing, then the file
is not considered for downstream
processing.
The File Size is checked for the expected
size for the relevant ETL. Typically Input
Data Files should be mapped to expected
Expected File Size
file sizes. If this does not match, then this
file is not considered for downstream
processing.

Check Data type for each of the fields


contained in the Input File to check if it is of
Data Type Mismatches
an expected data type (Text, Number,
Alphanumeric String, Date, Date-Time etc)

Many data files are expected to have either


a specific number of rows or a row count
Row Count Mismatches between an expected range. If this does not
match, this input Data File will not be
accepted for downstream processing.
The Data contained in the File should be
Date Relevance relevant to the Data Time window
expected.

96
National Data Set - Design Document

Fig 8: Proposed File Consistency Check Schematic for ENDS

5.7.2 Extract to Staging Server for Data Cleansing


The files which have passed validation for consistency are then loaded to a
Staging Server to carrying out Data Cleansing & Validation. Each file is typically loaded
into a separate table where local cleansing and validations is carried out. The main
steps to be followed in the process of Data Cleansing are given in the table below:

Table 10: Data Cleansing Steps


Check for Unique Data Columns and ensure that there
Unique Column Check
is no data column duplication.
The Data loaded is to be checked for the presence of
Duplicate Rows
duplicate rows.
Check for Data Type mismatches with Column Data
Data Type Mismatches
definition reference.
Check for Row Count mismatches. i.e. Validate
Row Count Mismatches
number of Rows in Input file with that expected.
Ensure that the date values for input data falls within
Date Relevance
expected range

97
National Data Set - Design Document

Validate Look Up values in the Input Data with Master


Look Up Value Validation Look Up tables in the Datamart. E.g. City Name,
Institution Names etc.

5.8 Data Transformation & Loading to Datamart


The data transformation logic for most data warehouses consists of multiple
steps. For example, in transforming new records to be inserted into a sales table, there
may be separate logical transformation steps to validate each dimension key. A
common strategy is to implement each transformation as a separate SQL operation and
to create a separate, temporary staging table (such as the tables new_sales_step1 and
new_sales_step2 to store the incremental results for each step. This load-then-
transform strategy also provides a natural check pointing scheme to the entire
transformation process, which enables to the process to be more easily monitored and
restarted. However, a disadvantage to multi staging is that the space and time
requirements increase. This is shown in the Figure 9 given below:

Figure 9:Multistage Data Transformation

98
National Data Set - Design Document

It may also be possible to combine many simple logical transformations into a


single SQL statement or single stored procedure. Doing so may provide better
performance than performing each step independently, but it may also introduce
difficulties in modifying, adding, or dropping individual transformations, as well as
recovering from failed transformations.

5.8.1 Error Logging and Handling Mechanisms

Having data that is not clean is very common when loading and transforming
data, especially when dealing with data coming from a variety of sources, including
external ones. If this dirty data causes you to abort a long-running load or
transformation operation, a lot of time and resources will be wasted.

5.8.2 Business Rule Violations

Data that is logically not clean violates business rules that are known prior to any
data consumption. Most of the time, handling these kind of errors will be incorporated
into the loading or transformation process. However, in situations where the error
identification for all records would become too expensive and the business rule can be
enforced as a data rule violation, for example, testing hundreds of columns to see if they
are NOTNULL, programmers often choose to handle even known possible logical error
cases more generically. Incorporating logical rules can be as easy as applying filter
conditions on the data input stream or as complex as feeding the dirty data into a
different transformation workflow. Some examples are as follows:

 Filtering of logical data errors using SQL. Data that does not adhere to certain
conditions will be filtered out prior to being processed.
 Identifying and separating logical data errors. In simple cases, this can be
accomplished using SQL.

99
National Data Set - Design Document

5.8.3 Data Rule Violations (Data Errors)

Unlike logical errors, data rule violations are not usually anticipated by the load or
transformation process. Such unexpected data rule violations (also known as data
errors) that are not handled from an operation cause the operation to fail. Data rule
violations are error conditions that happen inside the database and cause a statement
to fail. Examples of this are data type conversion errors or constraint violations.

5.8.4 Key Lookup Scenario

A typical transformation is the key lookup. For example, suppose that sales
transaction data has been loaded into a retail data warehouse. Although the data
warehouse's sales table contains a product_id column, the sales transaction data
extracted from the source system contains Uniform Price Codes (UPC) instead of
product IDs. Therefore, it is necessary to transform the UPC codes into product IDs
before the new sales transaction data can be inserted into the sales table.In order to
execute this transformation, a lookup table must relate the product_id values to the UPC
codes. This table might be the product dimension table, or perhaps another table in the
data warehouse that has been created specifically to support this transformation. For
this example, we assume that there is a table named product, which has a product_id
and an upc_code column.

5.8.5 Data Partitioning


Data warehouses often contain large tables and require techniques both for
managing these large tables and for providing good query performance across these
large tables. Partitioning allows splitting this data basis key parameters which helps with
better manageability of data. Also, Partitioned tables and indexes facilitate
administrative operations by enabling these operations to work on subsets of data.

100
National Data Set - Design Document

Finally, partitioning data greatly improves manageability of very large databases


and dramatically reduces the time required for administrative tasks such as backup and
restore. Different types of Partitioning include Range Partitioning, Hash Partitioning, List
Partitioning & Composite Partitioning.

For the purposes of ENDS, the recommended Partitioning Option is Horizontal


Range Partitioning based on Date Range. Candidates for these are large data tables,
tables that can be expected to grow very large in the near future, and tables that can be
intuitively period partitioned based on their business value (e.g., by fiscal year/month
etc.)

101
National Data Set - Design Document

6.0 Metadata
Metadata has been identified as a key success factor in data warehouse
projects. It captures all kinds of information necessary to extract, transform and load
data from source systems into the data warehouse, and afterwards to use and interpret
the data warehouse contents. Metadata can be broadly categorized into three
categories:

1. Business Metadata - It has the data ownership information, business


definition, and changing policies.
2. Technical Metadata - It includes database system names, table and column
names and sizes, data types and allowed values. Technical metadata also
includes structural information such as primary and foreign key attributes and
indices.
3. Operational Metadata - It includes currency of data and data lineage.
Currency of data means whether the data is active, archived, or purged.
Lineage of data means the history of data migrated and transformation applied
on it.

6.1 Dataset Metadata


The Dataset Metadata describes the structure of a Dataset, what it is meant to
describe, its uses, ownership information, classification etc amongst other information
which makes the Dataset identifiable, readable and understandable both by Humans
and Machines.

Table 11: Dataset Metadata

Dataset ID Unique ID for Dataset

A unique name of the resource viz. Consumer Price Index for


Dataset Title <Month/Year>, Variety-wise Daily Market Prices Data, State-
wise Construction of Deep Tube wells over the years, etc.

102
National Data Set - Design Document

Table 11: Dataset Metadata

Dataset Description Description of Dataset

Dataset Sector Classification of Dataset

Dataset Sub-Sector Sub Classification of Dataset

Number of Data Element


Number
Fields

Dataset Availability
Date from which data for this Dataset is available.
StartDate

Update Frequency Frequency of Update of Dataset (Monthly/Weekly/Quarterly)

Dataset Owner/Source Agency/Ministry

Dataset Version To be specified

Dataset Version Release


To be specified
Date

Dataset Last Update Source of Data – FK to Data Source Table

It is a list of terms, separated by commas, describing and


Dataset Keyword indicating at the content of the catalog. Example: rainfall,
weather, monthly statistics

This may include description to the study design,


instrumentation, implementation, limitations, and appropriate
Reference URL
use of the Dataset or tool. In the case of multiple documents or
URLs, please delimit with commas or enter in separate lines.

Access Type Open, Priced, Registered Access or Restricted Access (G2G)

103
National Data Set - Design Document

6.2 Data Element Metadata Definition

The Data Element Metadata describes the nomenclature, data type and various
descriptors for constituent Data Elements within a Dataset. These rules are used for
Machine Reading and serving of Datasets.

Table 12: Data Element Metadata Definition


Data Element
Description of Data Element
Description

Data Identifier Code A code that uniquely tracks this Data Element.

Link/Foreign Key to Dataset ID to which this Data Element


DatasetID
belongs to.

Data Type Char/Int/Varchar

Size of Field
Field Size
E.g. 2/4/6/10/30
XXXXXX
AA/AAA/AAAA
AAANNNNA
Layout
Any
NNN.NNN
CCYYMMDD – Date in ISO 8601 format to day level

Unit of Measure NA

Primary Key in Dataset No/Yes/Composite

Guide for Use Guidelines for use of this Data Element

Mandatory Yes/No

Custom Table for Look-Up values for this data element.


Code LookUp Table*
Can include static values or code values

Verification Rules* To be specified

Validation Rules* To be specified

Data Element Source ID* Source of Data – FK to Data Source Table

104
National Data Set - Design Document

6.3 Data Source Metadata Definition


The Data Source Metadata describes the source for each Dataset in terms of the
Ministerial Systems from which the core data for that particular Dataset is derived.

Table 13: Data Source Metadata Definition


DataSourceID Unique Identifier of Data Source
Data Source Type System/Application/Others [List]
Data Source ID Identifier for Data Source
Source Organization ID ID of Organization which owns this Source.
Data Source Details Details on Data Source
Data Source Version Version of Data Source

6.4 Organization Reference Metadata Table


The Organization Reference Metadata describes the Organizational
Stakeholders for each Dataset. Organizations in this context refers to Ethiopian
Ministries and Agencies who are the source and custodians of such Datasets.

Table 14: Organization Reference Metadata

Unique ID for Organization + Organization Unit who own or are


Org Ref ID
responsible for a Dataset

Organization Code Code of the Primary Owner Organization of Dataset

Organization Name Name of the Primary Owner Organization of Dataset

Organization Unit
Code of the Primary Owner Organization of Dataset
Code *Optional

Organization Unit
Name of the Primary Owner Organization of Dataset
Name*Optional

105
National Data Set - Design Document

6.5 Look Up / Reference Tables


Lookup and Reference Tables contain common look-up values that will constitute
all the Datasets in the ENDS system. These Lookup tables have a high cross-referential
value as they contain a cleansed list of possible values in any Dataset. These Look Up
tables also form the basis for Common Vocabulary Services.

Table 15: Look up / Reference Tables

Administrative Status
A code that uniquely identifies the data element. If the data
Table Reference ID element is used in more than one collection, it should retain its
Reference ID wherever it appears
A version number for each data element. A new version number is
allocated to a data element/concept when changes have been
made to one or more of the following attributes of the definition:
Version Name / Definition / Data domain, eg, adding a new value to the
field.
Elements with frequently updated code tables, such as the Facility
code table, will not be assigned a new version for changes to data
domain.
Version Date

Identifying & Defining Attributes

A single or multi-word designation assigned to a data element. This


Name appears in the heading for each unique data definition in the
Dictionaries. Previous names for the data element are included in
the Guide for Use section.
Table Name in
Database

106
National Data Set - Design Document

Table 15: Look up / Reference Tables


DATA ELEMENT – A unit of data for which the definition,
identification, representation and permissible values are specified
by means of a set of attributes.

Element Type DERIVED DATA ELEMENT - A data element whose values are
derived by calculation from the values of other data elements.
COMPOSITE DATA ELEMENT- A data element whose values
represent a grouping of the values of other data elements in a
specified order.
Definition A statement that expresses the essential nature of a data element
and its differentiation from all other data elements.
A designation or description of the application environment or
Context (Optional) discipline in which a name is applied or from which it originates.
This attribute may also include the justification for collecting the
items and uses of the information.

Relational & Representational Attributes

Data Type The type of field in which a data element is held. For example,
character, integer, or numeric.
The maximum number of storage units (of the corresponding data
Field Size type) to represent the data element value. Field size does not
generally include characters used to mark logical separations of
values, E.g. commas, hyphens or slashes.
The representational layout of characters in data element values
expressed by a character string representation. For example:

Layout - ‘CCYYMMDD’ for calendar date - ‘N’ for a one-digit numeric field
- ‘A’ for a one-character field
- ‘X’ for a field that can hold either a character or a digit, and - ‘$$$,
$$$, $$$’ for data elements about expenditure.

The permissible values for the data element. The set of values can
Data Domain
be listed or specified by referring to a code table or code tables.

107
National Data Set - Design Document

Table 15: Look up / Reference Tables


Additional comments or advice on the interpretation or application
of the data element (this attribute has no direct counterpart in the
Guide For Use ISO/IEC Standard 11179 but has been included to assist in
(Optional) clarification of issues relating to the classification of data
elements). Includes historical information, advice regarding
data quality, and alternative names for this data element.
Verification Rules
A reference between the data element and any related data
element in the Dictionary, including the type of this relationship.
Related Data Examples include: ‘has been superseded by the data element…’, ‘is
calculated using the data element…’, and ‘supplements the data
element…’
Administrative Attributes
Source Document The document from which definitional or representational
(Optional) attributes originate.
The Organization responsible for the source document and/or the
development of the data definition (this attribute is not specified in
the ISO/IEC Standard 11179 but has been added for completeness).
Source Organization
The source organization is not necessarily the organization
responsible for the ongoing development/maintenance of the data
element definition.

108
National Data Set - Design Document

7.0 Central Data Mart Architecture


The Central Datamart of the ENDS would be a Data Warehouse that contains
cleaned, consolidated and aggregated data from various Ministries/Agencies
segregated into individual Datamarts each of which contains seed data for a specific
business use. In the earlier sections of this document 15 sets of data element have
been identified as the priority data elements corresponding to the business needs of the
Government of Ethiopia. These Datamarts will in turn cater to the Transactional and
Summary Dataset views. Choices for the Central Datamart Warehouse include the Star
Schema, Snowflake Schema and the Fact Constellation Schema, which are the three
most widely used Architecture patterns for similar end uses. Having evaluated ENDS
special needs, key considerations behind the Architecture of the Central Datamart have
been identified to be:
 Simplicity of Structure
 Maintainability for a large cross-section of Input Data Sources
 Minimum Normalization to ensure flexibility from a Data Warehousing
perspective.
 Easy Readability

After detailed evaluation of the Ethiopian Ministerial IT & Data maturity, the best
architectural fit to cater to these requirements has been identified to be the Star
Schema.

7.1 Star Data Warehouse Schema


The star schema is perhaps the simplest and most maintainable and easy to
consume data warehouse schema. It is called a star schema because the entity-
relationship diagram of this schema resembles a star, with points radiating from a
central table. The center of the star consists of a large fact table and the points of the
star are the dimension tables. A star schema is characterized by one OR more very
large fact tables that contain the primary information in the data warehouse, and a

109
National Data Set - Design Document

number of much smaller dimension tables (OR lookup tables), each of which contains
information about the entries for a particular attribute in the fact table.

A star query is a join between a fact table and a number of dimension tables.
Each dimension table is joined to the fact table using a primary key to foreign key join,
but the dimension tables are not joined to each other. The cost-based optimizer
recognizes star queries and generates efficient execution plans for them. A typical fact
table contains keys and measures. For example, in the sample schema, the fact table,
sales, contain the measures quantity_sold, amount, and average, and the keys
time_key, item-key, branch_key, and location_key. The dimension tables are time,
branch, item and location. A star join is a primary key to foreign key join of the
dimension tables to a fact table.

Fig 10: An Example of typical Star Schema DataMart for Sales Data

110
National Data Set - Design Document

7.1.1 Key Advantages of the Star Schema for ENDS


The main advantages of recommended Star Schema over other alternatives are
as follows:
 Provide a direct and intuitive mapping between the business entities being
analyzed by end users and the schema design.
 Provide highly optimized performance for typical star queries.
 Are widely supported by a large number of business intelligence tools, which
may anticipate OR even require that the data-warehouse schema contains
dimension tables.
 Query Performance- Queries run faster against a star schema database than
an OLTP system because the star schema has fewer tables and clear join
paths. In a star schema design, dimensions are linked through the central fact
table. Dimensions are linked with each other through one join path intersecting
the fact table. This design feature enforces accurate and consistent query
results.
 Load Performance and Administration- The star schema structure reduces
the time required to load large batches of data into a database. By defining
facts and dimensions and separating them into different tables, the impact of a
load operation is reduced. Dimension tables can be populated once and
occasionally refreshed. New facts can be added regularly and selectively by
appending records to a fact table.
 Built-in Referential Integrity- A star schema is designed to enforce referential
integrity of loaded data. Referential integrity is enforced by the use of primary
and foreign keys. Primary keys in dimension tables become foreign keys in fact
tables to link each record across dimension and fact tables.
 Efficient Navigation Through Data- Navigating through data is efficient
because dimensions are joined through fact tables. These joins are significant
because they represent fundamental relationships of real business processes.

111
National Data Set - Design Document

You can browse a single dimension table in order to select attribute values to
construct an efficient query.

 The Star model loads dimension table without dependency between


dimensions and hence the ETL job is simpler and can achieve higher
parallelism.

 The Star model on the other hand has lesser joins between dimension tables
and the facts table. In this model if you need information on the advertiser you
will just have to join Advertiser dimension table with fact table.

 Query performance - Because a star schema database has a small number of


tables and clear join paths, queries run faster than they do against an OLTP
system. Small single-table queries, usually of dimension tables, are almost
instantaneous. Large join queries that involve multiple tables take only seconds
or minutes to run.

 In a star schema database design, the dimensions are linked only through the
central fact table. When two dimension tables are used in a query, only one join
path, intersecting the fact table, exists between those two tables. This design
feature enforces accurate and consistent query results.

 Load performance and administration - Structural simplicity also reduces the


time required to load large batches of data into a star schema database. By
defining facts and dimensions and separating them into different tables, the
impact of a load operation is reduced. Dimension tables can be populated once
and occasionally refreshed. You can add new facts regularly and selectively by
appending records to a fact table.

 Built-in referential integrity - A star schema has referential integrity built in when
data is loaded. Referential integrity is enforced because each record in a
dimension table has a unique primary key, and all keys in the fact tables are
legitimate foreign keys drawn from the dimension tables. A record in the fact

112
National Data Set - Design Document

table that is not related correctly to a dimension cannot be given the correct key
value to be retrieved.

 Easily Understood - A star schema is easy to understand and navigate, with


dimensions joined only through the fact table. These joins are more significant
to the end user, because they represent the fundamental relationship between
parts of the underlying business. Users can also browse dimension table
attributes before constructing a query.

7.1.2 Fact Tables in Star Schema Data-Marts


A fact table contains data columns for the numeric measurements of a business.
It also includes a set of columns that form a concatenated or composite key. Each
column of the concatenated key is a foreign key drawn from a dimensional table primary
key. Fact tables usually have few columns and many rows, which result in relatively
long and narrowly shaped tables.

In the star schema diagram example shown earlier in this chapter, the
measurements in the fact table are daily totals of sales in dollars, sales in units, and
cost in dollars of each product sold. The level of detail of a single record in a fact table is
called the granularity of the fact table. In this diagram, the granularity is daily item totals.
Each record in the fact table represents the total sales of a specific product in a retail
store on one day. Each new combination of product, store, or day generates a different
record in the fact table.

The most useful facts are numeric, continuously valued, and additive. A
continuously valued fact is a numeric measurement that varies every time it is
measured. A fact is additive if it makes sense to add the measurement across the
dimensions. Most queries against a fact table access thousands or hundreds of
thousands of records to construct a result set of relatively few rows. It is helpful if these
records are compressed into the result set by adding them or performing other

113
National Data Set - Design Document

mathematical operations. Fact tables in a Data-Mart are populated with data extracted
from an OLTP system or a data warehouse. A snapshot of the source data is regularly
extracted and moved to the Data-Mart, usually at the same time every hour, every day,
every week, or every month.

7.1.3 Dimension Tables in Star Schema Data-Marts

Dimension tables store descriptions of the characteristics of a business. A


dimension is usually descriptive information that qualifies a fact. For example, each
record in a product dimension represents a specific product. In the star schema shown
at the beginning of this chapter, the product, customer, promotion, and time dimensions
describe the measurements in the fact table. Dimensions do not change, or change
slowly over time.

The shape of dimension tables is typically wide and short because they contain
few records and many columns. The columns of a dimension table are also called
attributes of the dimension table. Each dimension table in a star schema database has
a single-part primary key joined to the fact table.

An important design characteristic of a star schema database is that one can


quickly browse a single dimension table. This is possible because dimension tables are
flat and de-normalized. It is also possible to browse a single dimension table to
determine the constraints and row headers to use when the fact table is queried.

Most star schemas include a time dimension. A time dimension table makes it
possible to analyze historic data without using complex SQL calculations. For example,
data can be analysed by workdays as opposed to holidays, by weekdays as opposed to
weekends, by fiscal periods, or by special events. If the granularity of the fact table is
daily sales, each record in the time dimension table represents a day.

114
National Data Set - Design Document

7.2 Star Schema Key Structure for ENDS Data-Mart


The join constraints in a star schema define the relationships between a fact table
and its dimension tables. In the star schema diagram at the beginning of the chapter,
the product key is the primary key in the product dimension table. This means that each
row in the product dimension table has a unique product key. The product key in the
fact table is a foreign key drawn from the product dimension table.

Each row in a fact table must contain a primary key value from each dimension
table. This rule is called referential integrity and is an important requirement in decision-
support databases. The reference from the foreign key to the primary key is the
mechanism for verifying key values between the two tables. Referential integrity must
be maintained to ensure valid query results. The primary key of a fact table is a
combination of its foreign keys. This is called a concatenated key. The join cardinality of
dimension tables to fact tables is one-to-many, because each record in a dimension
table can describe many records in the fact table.

A star schema database uses very few joins, and each join expresses the
relationship between the elements of the underlying business. For example, in the star
schema diagram at the beginning of this chapter, the join between the product
dimension table and fact table represents the relationship between the company's
products and its sales.

7.3 Types of Facts in Data Warehouse


A fact table is the one which consists of the measurements, metrics or facts of
business process. These measurable facts are used to know the business value and to
forecast the future business. The different types of facts are explained in detail below.

115
National Data Set - Design Document

7.3.1 Additive
Additive facts are facts that can be summed up through all of the dimensions in
the fact table. A sales fact is a good example for additive fact.

7.3.2 Semi-Additive
Semi-additive facts are facts that can be summed up for some of the dimensions
in the fact table, but not the others. For instance daily balances fact can be summed up
through the customers dimension but not through the time dimension.

7.3.3 Non-Additive
Non-additive facts are facts that cannot be summed up for any of the dimensions
present in the fact table. For example: Facts which have percentages, ratios calculated.

7.3.4 Factless Fact Table


In the real world, it is possible to have a fact table that contains no measures or
facts. These tables are called "Factless Fact tables". For example a Fact table which
has only product key and date key is a factless fact. There are no measures in this
table. But still you can get the number products sold over a period of time. A fact tables
that contain aggregated facts are often called summary tables.

7.4 Data-Mart Architecture / Schema

As discussed earlier, the Central Data-Mart is a Data Warehouse that contains


cleaned, consolidated and aggregated data from various Ministries/Agencies
segregated into individual Data-Marts each of which contains specific seed data for a
specific business uses.

An individual Data-Mart comprises of transactional core data for a particular


business functional which at the most granular level should be able to cater for both

116
National Data Set - Design Document

Transactional and Summary Dataset views. Transactional Data Views are meant to be
consumed by e-Service systems and other Inter-Ministerial Data sharing application.
These views are granular to a transaction. The same Data-Mart can be viewed as
aggregation Summary data which is relevant from an (Open) Dataset perspective.

Fig 11: Individual Data-Mart Schematic for ENDS

Each of the candidate Schemas defined in this section constitutes a single


individual Datamart. Specifications include the scope, transactions covered as well as
the possible (multiple) Summary Data Views that can be produced from each of these
Datamarts.The Candidate Datamart Schemas defined in this sections are those
identified to have significant value from an inter-ministerial data sharing, open data as
well as e-Services fulfilment standpoint. This is not an exhaustive list, but is indicative of
the nature and type of Datamarts identified for the purposes of ENDS. Additional
Candidate Datamarts will be identified and added.

117
National Data Set - Design Document

7.4.1 Agricultural Export Schema


Table 16: Agriculture Export Schema
The Agricultural Export Datamart is a consolidated Quantity & Value data
Scope store of all Agricultural Exports in Ethiopia classified by commodity,
Exporting organization, Source Region of Export Commodity.
Each Export Transaction catalogued by Exporter Details, Agricultural Export
Transactions
Commodity, Source Region of Export Commodity, Export License Applicable,
Covered
Date of Export, Export Destination, Quantity & Monetary Value
Type Transactional
Derivable Candidate Summary Datasets
Region Wise Export Commodities in Ethiopia by Value and Volume by
5.1.1
Week/Month/Year
Agricultural Product Export Destination – Where are Ethiopian Exports
5.1.2
going?
Commodity Wise Export Summary from Ethiopia by Value and Volume by
5.1.3
Week/Month/Year

Fig 12: Agriculture Export Schema

118
National Data Set - Design Document

7.4.2 Land Use Geo Mapping Datamart

Table 17: Land Use Geo Mapping Datamart


The Land Use Geo Mapping Datamart provides data on geo-mapped land
Scope co-ordinates across Ethiopia catalogued by Region and the Land User
Purpose.

Land Use Scenarios include amongst others, Area Under Non-Agriculture


Use, Barren and Unculturable Land, Culturable Waste Land, Current
Transactions
Fallows, Fallow Lands Other than Current Fallows, Forest Area, Land Under
Covered
Misc Tree Crops and Groves not included in NAS, Net Area Sown,
Permanent Pastures and other grazing lands & Agricultural Land

Transactional Data derived and aggregated from various Land Use Mapping
Type
systems

Derivable Candidate Summary Datasets

National Land Utilization Dataset catalogued by detailed Land Use and


5.2.1
constituent size in Hectares

Region-Wise Land Utilization Dataset catalogued by detailed Land Use and


5.2.2
constituent size in Hectares
Land Use Conversion Statistics over specified Period. E.g. Conversion of
5.3.3
Agricultural Land to Non-Agricultural Use etc.

Fig 13: Land Use Geo Mapping Datamart

119
National Data Set - Design Document

7.4.3 ICT Connectivity Data-Mart Schema

Table 18: ICT Connectivity Data-Mart Schema

The ICT Connectivity Datamart is a consolidated data store of all ICT


Connections subscribed for and cancelled across Ethiopia for all Service
Providers. This provides a detailed insight into Subscriber KYC, Penetration
Scope
of various ICT channels in the country across regions and technologies.
This forms the basis of any decision point on policy rollouts based on ICT
maturity.

- New ICT Subscriptions (Voice, Data, Fax)


Transactions - Change of Existing ICT Subscriptions
Covered - Cancellation of ICT Subscriptions
- With details on Service Provider, Service Packages and Subscriber Data

Type Transactional

Derivable Candidate Summary Datasets

5.3.1 Regional Connectivity Index for Ethiopia

5.3.2 Total Current (Monthly) Connect Base by Technology, Channel & Region

Fig 14: ICT Connectivity Datamart Schema

120
National Data Set - Design Document

7.4.4 Road Transport Vehicle Registration Datamart

Table 19: Road Transport Vehicle Registration Data-Mart Schema

The Road Transport Vehicle Registration Datamart is a consolidated data


Scope
store of all Vehicle Registrations in Ethiopia for all Vehicle categories.

Transactions New Vehicle Registrations


Covered Vehicle Ownership Transfers

Type Transactional

Derivable Candidate Summary Datasets

5.4.1 Monthly Vehicle Sales in Ethiopia by Regionby Categoryby BrandBy Model

5.4.2 Top n Selling Vehicle Categories sorted by Brand & Model

5.4.3 Top n Re-selling Vehicle Categories sorted by Brand & Model

Fig 15: Road Transport Vehicle Registration Datamart

121
National Data Set - Design Document

7.4.5 National Hospital Outpatient Datamart

Table 20: National Hospital Outpatient Datamart

The National Hospital Outpatient Datamart is meant to be a central


repository of all Outpatient Visits, Consultations and Registrations across
hospitals in Ethiopia where this data is available from Hospital Systems.
Scope This shows the prevalence of key medical conditions for which first level
outpatient support is being sought nationwide. This also provides insight
into Hospital Outpatient Loads and the number of Doctors available per
1000 Patients which greatly affects quality healthcare.
New Outpatient Registration
Transactions Outpatient Doctor Consultations
Covered Patient Engagement (Medical Purpose)

Type Transactional

Derivable Candidate Summary Datasets

5.5.1 Active Outpatient Consultation Doctors by Specialization & Hospitals


5.5.2 Hospital Loads by Number of Patient Visits per Week/Month
Key Medical specializations in Demand across Hospitals across Region
5.5.3
around the year

Fig 16: National Hospital Outpatient Datamart

122
National Data Set - Design Document

7.4.6 National Census - Citizen Datamart

Table 21: National Census - Citizen Datamart

The National Census Citizen Datamart is meant to be a central repository of


all Census Data collected nationally.
Scope
This provides a detailed demography data for national planning purposes
across development programmes.

Citizen Information (Gender/Age/Name)


Residential Information
Transactions Ethnicity/Tribe Information
Covered Profession / Working Sector
Literacy Index
Average Annual Income

Type Transactional

Derivable Candidate Summary Datasets

5.6.1 National Literacy Index by Region/Demography

5.6.2 National Population Spread and Development Index across Regions

5.6.3 National Employment Index across Regions and Demography.

5.6.4 National Gender Ratio and Female Literacy Index

Fig 17: National Census - Citizen Datamart

123
National Data Set - Design Document

7.4.7 National Student Assessment Datamart

Table 22: National Student Assessment Datamart

The National Student Assessment Results Dataset is meant to be a central


repository of all Primary and Secondary School and Tertiary Institute
student assessment outcomes. This Datamart is critical to evaluate Student
Scope
performance levels across regions, institutions as well as general enrolment
details for various courses and subjects which is important for National
Skills Development initiatives.

- Assessment attempts & outcomes for Specific Courses and Subjects


Transactions - Assessment attempts & outcomes for various types of Institutes
Covered (Schools/Collages/Training Institutes)
- Student Identification Information

Type Transactional

Derivable Candidate Summary Datasets

5.7.1 National Educational Enrolment Dataset

5.7.2 Student Enrolment by Region, Institute and Specialization

Regional and Institutional Student Performance Metrices benchmarked


5.7.3
against a National Index

Fig 18: National Student Assessment Datamart

124
National Data Set - Design Document

7.5 Master Data Management


Master data is business critical data which is used widely across the enterprise
and is often the basis for business service delivery or business process management.
Master Data is not transactional data but can be the basis for transactions and is often
data that is constant and does not change over a short period. In the context of the
ENDS project it could be the citizen ID, locational data, addresses, land data and maps
etc. Following the design of the ENDS proposed here the master data would be
naturally distributed across the various datamarts that would form a part of the ENDS
system. However, a Master datamart could be created additionally to become a
reference and centralised source of Master Data.

125
National Data Set - Design Document

8.0 ENDS – Data Access with SOA

The core Data Access Layer for ENDS for data sharing between ministries and
for supporting e-services should be defined around a service-oriented architecture
(SOA).SOA is an architectural design pattern in which application components provide
services to other components via a communications protocol, typically over a network.
The principles of service-orientation are independent of any vendor, product or
technology.

Service Oriented Architecture revolves primarily around a Service Requester and


a Service Provider. In the case of ENDS, the Service Provider is the Data Mart which
provides the requisite data for a specific request that comes through from the Service
Requester, which could be another ministerial system, a mobile data app, an e-Service
portal amongst other. A service is a self-contained unit of functionality, such as
retrieving an online bank statement, retrieving the identity information for a person etc,
which makes a service a discretely invocable operation. Also, in the Web Services
Description Language (WSDL), a "service" is an interface definition that may list several
discrete services or operations. The term is also used for a component that is
encapsulated behind an interface.

SOA makes it easier for software components on computers connected over a


network to cooperate. Every computer can run any number of services, and each
service is built in a way that ensures that the service can exchange information with any
other service in the network without human interaction and without the need to make
changes to the underlying program itself.SOA depends on data and services that are
described by metadata that should meet the following two criteria:

1. The metadata should be provided in a form that software systems can use to configure
dynamically by discovery and incorporation of defined services, and also to maintain
coherence and integrity. For example, metadata could be used by other applications,

126
National Data Set - Design Document

like a catalogue, to perform auto discovery of services without modifying the functional
contract of a service.
2. The metadata should be provided in a form that system designers can understand and
manage with a reasonable expenditure of cost and effort.

The purpose of SOA is to allow users to combine together fairly large chunks of
functionality to form ad hoc applications built almost entirely from existing software
services. The larger the chunks, the fewer the interfaces required to implement any
given set of functionality; however, very large chunks of functionality may not prove
sufficiently granular for easy reuse. Each interface brings with it some amount of
processing overhead, so there is a performance consideration in choosing the
granularity of services.SOA as an architecture relies on service-orientation as its
fundamental design principle. If a service presents a simple interface that abstracts
away its underlying complexity, then users can access independent services without
knowledge of the service's platform implementation.

SOA-based solutions endeavour to enable business objectives while building an


enterprise-quality system. SOA architecture is viewed as the following layers relevant to
the ENDS implementation:

1. Consumer Interface Layer – These are GUI for end users or apps accessing apps/service
interfaces.
2. Business Process Layer – These are choreographed services representing business use-
cases in terms of applications.
3. Services – Services are consolidated together for whole-enterprise in-service inventory.
4. Service Components – The components used to build the services, such as functional and
technical libraries, technological interfaces etc.
5. Operational Systems – This layer contains the data models, enterprise data repository,
technological platforms etc.

127
National Data Set - Design Document

There are four cross-cutting vertical layers, each of which are applied to and
supported by each of the following horizontal layers:

1. Integration Layer – starts with platform integration (protocols support), data integration,
service integration, application integration, leading to enterprise application integration
supporting B2B and B2C.
2. Quality of Service – Security, availability, performance etc. constitute the quality of service
parameters which are configured based on required SLAs, OLAs.
3. Informational – provide business information.
4. Governance – IT strategy is governed to each horizontal layer to achieve required
operating and capability model.

128
National Data Set - Design Document

Fig 19: Service Oriented Architecture- Web Services Model

129
National Data Set - Design Document

8.1 Web Services

SOA (Service Oriented Architecture) implementation for ENDS will be


manifested by using Web Services. Each service implements at least one action,
such as retrieving an online application status for a governmental process, retrieving
an online bank statement or modifying an online booking or retrieving details about
any governmental data entity. Within an SOA, services use defined protocols that
describe how services pass and parse messages using description metadata, which
in sufficient details describes not only the characteristics of these services, but also
the data that drives them.

Among a number of choices available for implementing Web Service protocols, the
two main candidates are SOAP and REST.

8.1.1 SOAP – Simple Object Access Protocol

SOAP is a standards-based Web services access protocol that has been


around for a while and enjoys all of the benefits of long-term use. SOAP relies
exclusively on XML to provide messaging services. Microsoft originally developed
SOAP to take the place of older technologies that don’t work well on the Internet
such as the Distributed Component Object Model (DCOM) and Common Object
Request Broker Architecture (CORBA). These technologies fail because they rely on
binary messaging; the XML messaging that SOAP employs works better over the
Internet.

SOAP is designed to support expansion, so it has all sorts of other acronyms


and abbreviations associated with it, such as WS-Addressing, WS-Policy, WS-
Security, WS-Federation, WS-ReliableMessaging, WS-Coordination, WS-
AtomicTransaction, and WS-RemotePortlets.SOAP is considered to be highly
extensible, but only when using pieces required for particular task. For example,

130
National Data Set - Design Document

when using a public Web service that’s freely available to everyone, there is not
much need for WS-Security.

The XML used to make requests and receive responses in SOAP can
become extremely complex. In some programming languages, requests need to be
built manually, which becomes problematic because SOAP is intolerant of errors.
However, other languages can use shortcuts that SOAP provides; that can help
reduce the effort required to create the request and to parse the response.

The Web Services Description Language (WSDL) is another file that’s


associated with SOAP. It provides a definition of how the Web service works, so that
when a reference is created to it, the IDE can completely automate the process. So,
the difficulty of using SOAP depends to a large degree on the language that is used.

One of the most important SOAP features is built-in error handling. If there’s a
problem with a request, the response contains error information that can be used to
fix the problem. The error reporting even provides standardized codes so that it’s
possible to automate some error handling tasks in your code.

8.1.2 REST - Representational State Transfer

General development feedback on SOAP is that it is cumbersome and hard to


use. For example, working with SOAP in JavaScript means writing large amounts of
code to perform extremely simple tasks because required XML structure must be
created absolutely every time. REST provides a lighter weight alternative to SOAP
and is now proving to be the de facto standard for web services on the World Wide
Web. Instead of using XML to make a request, REST relies on a simple URL in
many cases. Most Web services using REST rely exclusively on obtaining the
needed information using the URL approach. REST can use four different HTTP 1.1
verbs (GET, POST, PUT, and DELETE) to perform tasks.

131
National Data Set - Design Document

Unlike SOAP, REST doesn’t have to use XML to provide the response.
REST-based Web services can output the data in Command Separated Value
(CSV), JavaScript Object Notation (JSON) and Really Simple Syndication (RSS). So
it is straightforward to obtain the output in a form that’s easy to parse within the
language that is needed for an application.

Technical Description of REST’s core architectural advantages

REST's client–server separation of concerns simplifies component


implementation, reduces the complexity of connector semantics, improves the
effectiveness of performance tuning, and increases the scalability of pure server
components. Layered system constraints allow intermediaries—proxies, gateways,
and firewalls—to be introduced at various points in the communication without
changing the interfaces between components, thus allowing them to assist in
communication translation or improve performance via large-scale, shared caching.

REST enables intermediate processing by constraining messages to be self-


descriptive: interaction is stateless between requests, standard methods and media
types are used to indicate semantics and exchange information, and responses
explicitly indicate cacheability.

8.1.3 The Choice for ENDS

While SOAP is a heavyweight choice for Web service access and it does
provide enterprise grade features like good support for distributed enterprise
environments, Built-in error handling, REST is the contemporary standard for Web
Service development for the web for the following reasons:

1. REST is easier to use for the most part and is more flexible
2. No expensive tools are required to interact with the Web service.
3. REST boats of a smaller learning curve – which means easier
maintainability and support.

132
National Data Set - Design Document

4. Efficient (SOAP uses XML for all messages, REST can use smaller
message formats)
5. Fast (no extensive processing required) - component interactions can be the
dominant factor in user-perceived performance and network efficiency.
6. Scalability – REST can support a large number of components and
interactions among components

The ENDS project will be fairly standardized, not prone to ad-hoc use, with
clear use cases and a point-to-point access use case (Consumer to Data Mart).For
these reasons, it is recommended that REST be used as the protocol of choice for
the ENDS implementation.

133
National Data Set - Design Document

9.0 Organization and Management

The ENDS organization and management is be seen as an ongoing dynamic


activity as a part of the broader e-Government programme of the Government rather
than as a one off project. MICT as the designated lead agency for implementation of
e-Governance infrastructure and systems in the Government of Ethiopia would be
naturally expected to lead the ENDS development and implementation and its
ongoing management. MCIT would appoint a ENDS Central Cell to manage its
deployment and management. The following organizational structure is proposed for
the implementation and management of ENDS

Ministerial Cell Ministerial Cell Ministerial Cell


Data Controllers Data Controllers Data Controllers
Data Contributors Data Contributors Data Contributors

ENDS Central Cell


Central Data Controllers

Figure 20 : Proposed Organizational Structure of ENDS

In order to implement the National Dataset Project, the


Ministries/Departments of Government of Ethiopia have to undertake the following
activities:

1. Nominate Data Controller


2. Data Controllers in turn Nominate Data Contributors
3. Setup National Dataset Cell
4. Identify Datasets
5. Publish Catalogs and Resources (Datasets/Apps) on ENDP
6. Prepare Negative List
7. Create Action Plan for regular release of Datasets on the ENDP
8. Monitor and Manage the ENDP program of the Department

134
National Data Set - Design Document

9.1 Identification of Resources (Datasets/Apps) and their organization

As a Security Guideline, each Department participating in the National


Dataset project has to prepare it‘s Negative List. The Datasets which are confidential
in nature and are in the interest of the country‘s security in not opening to the public
would fall into the negative list. This list would need to be compiled and sent to the
National Dataset Authority within a defined time period. All other Datasets which do
not fall under this negative list would be in the Open List. These Datasets would
need to be prioritized into high value Datasets and non-high value Datasets.

The Ethiopian National Dataset project will cover three distinct types of Datasets:
1. Primary Data (For example. Population Census, Education Census,
Economic Survey, etc.)
2. Processed/Value Added Data e.g. Budget, Planning, etc.
3. Data Generated through delivery of Government Services e.g. Income
Tax Collection

The data which will contribute to the Ethiopian National Dataset Platform have
to be in the specified open data format only. The data will have to be internally
processed to ensure that the quality standard is met i.e. accuracy, free from any sort
of legal issues, privacy of an individual is maintained and does not compromise with
the National security. While prioritizing the release of Datasets, one should try to
publish as many high value Datasets. Grouping of Related Resources
(Datasets/Apps) should be planned and are to be organized under Catalogs. Though
each department shall have its own criterion of high value and low value Datasets,
generally High value data is governed by following Principles.
1. Completeness
2. Primary
3. Timeliness
4. Ease of Physical and Electronic Access
5. Machine readability
6. Non-discrimination
7. Use of Commonly Owned Standards
8. Licensing
9. Permanence
10. Usage Costs

135
National Data Set - Design Document

9.2 Data Controller

A senior officer is to be nominated as the Data Controller or Nodal Officer for


the Department/Organization/State. The responsibility of Data Controller is as
follows:

1. Head the ENDP Cell, which helps in compilation, collation, conversion and
publishing catalogs/resource on the platform. The size of the cell varies from
department to department and depends on the quantum of resources to be
published.
2. Lead the open data initiative of department.
3. Nominate Data Contributors.
4. Take initiative to release as many Datasets as possible on proactive basis.
5. Identify the High Value Datasets and schedule their release on ENDP.
6. Prepare the Negative List for the Department as per the directions in ENDP.
7. Ensure that the Datasets being published are in compliance with ENDP
through a predefined workflow process.
8. Periodically monitor the release of Datasets as per predefined schedule
9. Take relevant action on the feedback/suggestion received from the citizen
for the Datasets belonging to the Ministry/Department/Organization.
10. Take action on suggestions on new Datasets made by public on OGD
Platform.

9.3 ENDS Cell

In order to implement ENDS each Department would establish an ENDS Cell.


The size of the cell would vary from Department to Department and would depend
on the quantum of Datasets to be published. The ENDP Cell would be responsible
for:
1. Prepare Negative List of Datasets and communicate to DST within Six
Months.
2. Prepare a schedule of Datasets to be released in next one year.

136
National Data Set - Design Document

3. Extend Technical Support for Preparation of Datasets, conversion of formats


etc.

4. Monitor and manage the Open data initiative in their respective Ministry/
Department/State and ensure quality and correctness of the data.
5. Work out an open data strategy to promote proactive dissemination of
Datasets.
6. Institutionalize the creation of Datasets as part of routine functioning.

ENDS Cell shall be headed by Data Controller, who could be assisted by number
of Data Contributors. ENDP Cell shall have professionals from data analyst,
visualization and programming domain. The policy mentions that budgetary
provisions and appropriate support for data management for each
department/organization would be necessary.

9.4 Data Contributors


In order to cater to the contribution of the Datasets from offices/organization under
the Ministries/Departments, the Data Controller can nominate a number of Data
Contributors who would be responsible in contributing the Datasets along with their
metadata. Using the web based DMS, each data contributor would be able to
contribute the data as per the given metadata format which is based on the Dublin
Core Standards. The contributed Datasets would be approved by the Data Controller
as the case may be. The Data Contributor could be an officer of the
Ministry/Department/State who would be responsible for his/her unit/division.

The responsibilities of the Data Contributor are as follows:

1. Responsible for ensuring quality and correctness of Datasets of his/her


unit/division.
2. Preparing and contributing input data for ingestion into the Data Marts from
nominated Ministerial Systems.

137
National Data Set - Design Document

3. Liaison with Systems to ensure availability of Data Extracts from nominated


Ministerial Systems.
4. Liaison with User Groups to map responsibilities for Data Acquisition,
Compilation and Collation

138
National Data Set - Design Document

10.0 Server & Software Infrastructure

The ENDS is expected to be hosted in the National Data Centre as a common


hardware and software infrastructure for use across the Enterprise. The minimum
requirement would include the FTP Server, Staging Server, Data Warehouse Server
and Web Server and the corresponding backup Servers

Figure 21: ENDS: Server and Software Infrastructure

The above diagram (Fig.21) depicts the Server Infrastructure required to host
the ENDS system end-to-end. The minimum configuration of each of these
servers/machines is specified below:

Table 23: FTP Server

Machine Type Server Computer


Processor Intel Xeon Dual Core
RAM 4 GB
HDD 2 x 1 TB with Raid 1
Operating System Windows Server 2008 x64 R2 Standard Edition
Any other software Microsoft Office 2013 Standard

Table 24: FTP Back Up Server

Machine Type Server Computer


Processor Intel Xeon Dual Core
RAM 4 GB
HDD 2 x 1 TB with Raid 1
Operating System Windows Server 2008 x64 R2 Standard Edition
Any other software Microsoft Office 2013 Standard

139
National Data Set - Design Document

Table 25 Server Data Warehouse Server

Processor Intel Xeon Dual Quad Core

RAM 16 GB

HDD 2 x 1 TB with Raid 1

Operating System Windows Server 2008 x64 R2 Enterprise edition

Database Microsoft SQL 2008 R2 Standard Edition

Any other Software Microsoft Office 2013 Standard

Table 26 Staging Server

Processor Intel Xeon Dual Quad Core

RAM 16 GB

HDD 2 x 1 TB with Raid 1

Operating System Windows Server 2008 x64 R2 Enterprise edition

Database Microsoft SQL 2008 R2 Standard Edition

Any other Software Microsoft Office 2013 Standard

Table 27 Data Warehouse Backup / Failover Server

Processor Intel Xeon Quad Core

RAM 8 GB

HDD 2 x 1 TB with Raid 1


Windows Server 2008 x64 R2 Enterprise
Operating System
Edition
Database Microsoft SQL 2008 R2 Standard Edition

Any other Software Microsoft Office 2013 Standard

140
National Data Set - Design Document

Table 28 :Web Server

Processor Intel Xeon Quad Core

RAM 8 GB

HDD 2 x 500 GB with Raid 1

Windows Server 2008 x64 R2 Standard


Operating System
Edition

Microsoft Office 2013 Standard


Any other Software
IIS (Internet Information Services) 7

Table 29 Failover Web Server

Processor Intel Xeon Quad Core

RAM 8 GB

HDD 2 x 500 GB with Raid 1

Windows Server 2008 x64 R2 Standard


Operating System
Edition

Microsoft Office 2013 Standard


Any other Software
IIS (Internet Information Services) 7

10.1 Enterprise Technology Platform for ENDS

In addition to the Server infrastructure with the requisite Operating system, the
Server infrastructure will also have to host special ETL-Data Warehouse-BI software
for carrying out the various functions of the ENDS implementation as described in
this document.

There are multiple platform choices for these components as shown in Table 30
below:

141
National Data Set - Design Document

Table 30 Enterprise Technology Platform for ENDS

Vendors
Microsoft BI Oracle Pentaho
Component
ETL SQL Server Integration Oracle data Pentaho Kettle
Services integrator
Database SQL Server 2012 Oracle Database Any of SQL Server,
11gR2 Oracle or MySQL
Data Mart / SQL Server Analytical Oracle Data Mart
Warehouse Services Suite
Reporting SQL Server Reporting Oracle Reports Pentaho Dashboard
Services
Web Service Platform Windows Native Oracle XML Pentaho Web
Communication DB web service Services
Foundation (WCF)
Verdict Our Choice Open Source

While Pentaho is the most popular open Source alternative, the Microsoft BI
platform is the most accessible suite for Data Management with widely available
knowledge base, support and extensibility.

142
National Data Set - Design Document

11.0 Controlled Vocabulary


Controlled Vocabulary Services is an important by-product of the ENDS
initiative which draws on Dimensions / Look up values that are built up in the
Datamart which are important for common National Controlled Vocabulary services.

For Example:
 Cities in Ethiopia
 Towns in Ethiopia
 List of PIN Codes in Ethiopia
 List of Telecom Operators in Ethiopia
 List of Hospitals in Ethiopia
 List of Regional Transport Offices in Ethiopia
 List of Medical Specializations currently held by Ethiopian doctors.

These look up data values or controlled vocabulary are invaluable in their own
right as they provide a single point of access of key business information for third
party validation purposes. Controlled Vocabulary (CV) provide a common metadata
for reference by various Datasets and Meta Datasets to ensure Data Consistency
across systems. They also help with preparation for CMS or knowledge
management projects, since many of these require this sort of structure for easy data
aggregation. Controlled Vocabulary also ensures that key Data Elements that are
used within governmental systems is also consumed by the public thus ensuring that
data within and outside of the government is consistent to a very high degree.

143
National Data Set - Design Document

Figure 22: ENDS: Controlled Vocabulary Services

144
National Data Set - Design Document

12.0 Security & Information Technology Standards


The ENDS project deals with high value governmental data and needs to be
protected on multiple levels from unauthorized access or information misuse. Some
of the security standards which are recommended to be followed during the
implementation, establishment and running of the ENDS project are given below. In
addition to information security standards the ENDS has to also follow Technical
Standards which have been already defined in the E-Government Interoperability
Framework of the Government and other Technical Policy Documents of the
Government. This will amongst others include the Data Exchange Standards, Data
Protection Standards and Policies and Meta Data Standards. Important one that
have relevance to the design and management of the ENDS are mentioned in the
following sections:

12.1 Information / Data Security Standards

ISO/IEC 27001:2013 specifies the requirements for establishing,


implementing, maintaining and continually improving an information security
management system within the context of the organization. It also includes
requirements for the assessment and treatment of information security risks tailored
to the needs of the organization. The requirements set out in ISO/IEC 27001:2013
are generic and are intended to be applicable to all organizations, regardless of type,
size or nature

This standard covers the following controls of Information Security Management:


1. Information security policies (2 controls)
2. Organization of information security (7 controls)
3. Human resource security - 6 controls that are applied before, during, or
after employment
4. Asset management (10 controls)
5. Access control (14 controls)
6. Cryptography (2 controls)
7. Physical and environmental security (15 controls)

145
National Data Set - Design Document

8. Operations security (14 controls)


9. Communications security (7 controls)
10. System acquisition, development and maintenance (13 controls)
11. Supplier relationships (5 controls)
12. Information security incident management (7 controls)
13. Information security aspects of business continuity management (4
controls)
14. Compliance; with internal requirements, such as policies, and with
external requirements, such as laws (8 controls)

This standard provides best practice recommendations on information security


management for use by those responsible for initiating, implementing or maintaining
information security management systems (ISMS). Information security is defined
within the standard in the context of the C-I-A triad: the preservation of confidentiality
(ensuring that information is accessible only to those authorized to have access),
integrity (safeguarding the accuracy and completeness of information and
processing methods) and availability (ensuring that authorized users have access to
information and associated assets when required).

12.2 Meta Data Standards


The Technical Standards listed in e-GIF for adoption across the Government
of Ethiopia, in all its agencies, include the Dublin Core as the metadata standard.
The Dublin Core metadata standard is a simple yet effective element set for
describing a wide range of networked resources.

The Dublin Core standard includes two levels: Simple and Qualified. Simple
Dublin Core comprises fifteen elements; Qualified Dublin Core includes three
additional elements (Audience, Provenance and RightsHolder), as well as a group of
element refinements (also called qualifiers) that refine the semantics of the elements
in ways that may be useful in resource discovery. The semantics of Dublin Core
have been established by an international, cross-disciplinary group of professionals

146
National Data Set - Design Document

from librarianship, computer science, text encoding, the museum community, and
other related fields of scholarship and practice. Another way to look at Dublin Core is
as a "small language for making a particular class of statements about resources". In
this language, there are two classes of terms -- elements (nouns) and qualifiers
(adjectives) -- which can be arranged into a simple pattern of statements. The
resources themselves are the implied subjects in this language. In the diverse world
of the Internet, Dublin Core can be seen as a "metadata pidgin for digital tourists":
easily grasped, but not necessarily up to the task of expressing complex
relationships or concepts.

The Dublin Core basic element set comprises of elements that are optional
and non-repeatable. Most elements also have a limited set of qualifiers or
refinements, attributes that may be used to further refine (not extend) the meaning of
the element. The Dublin Core Metadata Initiative (DCMI) has established standard
ways to refine elements and encourage the use of encoding and vocabulary
schemes. The full set of elements and element refinements conforming to DCMI
"best practice" is available, with a formal registry available as well.

Dublin Core has as its goals:

1. Simplicity of creation and maintenance

The Dublin Core element set has been kept as small and simple as possible
to allow a non-specialist to create simple descriptive records for information
resources easily and inexpensively, while providing for effective retrieval of those
resources in the networked environment.

2. Commonly understood semantics


Discovery of information across the vast commons of the Internet is hindered
by differences in terminology and descriptive practices from one field of knowledge
to the next. The Dublin Core can help the "digital tourist" -- a non-specialist searcher
-- find his or her way by supporting a common set of elements, the semantics of
which are universally understood and supported. For example, scientists concerned
with locating articles by a particular author, and art scholars interested in works by a
particular artist, can agree on the importance of a "creator" element. Such

147
National Data Set - Design Document

convergence on a common, if slightly more generic, element set increases the


visibility and accessibility of all resources, both within a given discipline and beyond.

3. International scope
The Dublin Core Element Set was originally developed in English, but versions
are being created in many other languages, including Finnish, Norwegian, Thai,
Japanese, French, Portuguese, German, Greek, Indonesian, and Spanish.

The DCMI Localization and Internationalization Special Interest Group is


coordinating efforts to link these versions in a distributed registry. Although the
technical challenges of internationalization on the World Wide Web have not been
directly addressed by the Dublin Core development community, the involvement of
representatives from virtually every continent has ensured that the development of
the standard considers the multilingual and multicultural nature of the electronic
information universe.

4. Extensibility
While balancing the needs for simplicity in describing digital resources with the
need for precise retrieval, Dublin Core developers have recognized the importance
of providing a mechanism for extending the DC element set for additional resource
discovery needs. It is expected that other communities of metadata experts will
create and administer additional metadata sets, specialized to the needs of their
communities. Metadata elements from these sets could be used in conjunction with
Dublin Core metadata to meet the need for interoperability. The DCMI Usage Board
is presently working on a model for accomplishing this in the context of "application
profiles."

148
National Data Set - Design Document

13. Management of Legacy Systems within ENDS

The ENDS System is ultimately based central data repository which depends
on source data from a number of Ministerial Data Systems. There is a key concern
with this model. What happens if the ministerial data system is a legacy system
which has one or more of the following constraints:

 The system is not actively supported by any IT vendor because of which any
specialized data interfacing tool cannot be easily built.
 The system is not online and therefore cannot be part of an online data
synchronization initiative with the ENDS central Data Warehouse

The ENDS Architecture has been specifically designed to cater to such


constraints. Data extraction from Ministerial Systems has been deliberately designed
to be offline – with no online dependency on systems.

In addition the Data Interface method is one of simple data file dump/export
from the source system in an as-is basis – with all data validation and management
undertaken by the (more powerful and flexible) Server-side ETLs and Data
Warehouse. The premise for this design is that any data management system is
expected to have, at the very least, a simple core data extract/export feature. Now
the native format of any such extract may widely differ from system to system.
Changing the native data extract format for these systems would need a fair amount
of local system customization which need not be available for legacy systems.

Therefore, an inversion of control is prescribed in this proposed architecture.


Instead of the source data system being made responsible for the data format and
validation, all such active interventions are moved to the central server side. The
only dependency on the source system is that it should be able to export a basic
data file which at the very least contains all key fields required for any business
reporting process.

149
National Data Set - Design Document

Figure 23: ENDS Integration with Legacy Systems

If a Client Side system is Online:

- Data Exports to the ENDS system can be scheduled via periodic automatic
routines which could be daily, weekly or monthly depending on the type of data,
frequency of change and the nature of data consumption in ENDS.

These routines can be built into the system (most systems will support this function)
and the data file upload into the ENDS FTP can also be automated.

If a Client Side system is Offline:


- Data Exports from such offline systems can be manually aggregated into a
common temporary internal data storage location (which is online) which in turn can
then be moved into the ENDS FTP Server via a periodic batch upload routine.

150
National Data Set - Design Document

14. Client Side Hardware and Software Platform

As described in the previous section, the ENDS system architecture is


designed to be decoupled from the Client System dependencies to a large extent.
This means that the ENDS system is NOT DEPENDANT on the following attributes
of a Client system:

1.Client System Platform


2.Client System Operating System
3.Client System Application Platform (Microsoft/Java etc)
4.Client System Application Programming Language
5.Client System Hardware Architecture

What the Client System NEEDS TO HAVE though are:


1. Data Export Function for exporting its core data into a standard flat file format
(CSV/Excel/Other Delimited etc.)
2. Ability to schedule a particular Data Export to an internal network location (if
offline) or an FTP location (if online) at a given frequency or periodicity.
3. Online accessibility for most source Client system or the option to have a
central (online) local storage location which will aggregate multiple data files from
various Client systems.

These must-have requirements are platform or system agnostic and are


generally expected to be part of any data system design and implementation. Even if
some Client systems do not have these features, building in this functionality into the
systems is relatively straightforward and should take on an average 15 to 30 Man-
Days per System-Data Dump to develop and implement.

14.1 Client System Data Dependency

Notwithstanding the dependencies described so far, a key obvious


requirement for Client Systems to have is to capture, store (& validate) the key data

151
National Data Set - Design Document

elements which are required for the specific target DataMart / DataSets in the ENDS
system. For example, for ENDS to be able to host and maintain a Datamart based
on Passport data, it is a pre-requisite to have Client systems which contain key data
on Passport Applications including new Passport Applications and outcomes, current
active Passport holder Details, changes in Passport particulars and Passport
renewals.

152

You might also like