You are on page 1of 183

5th International Conference on

Information Technologies and


Information Society
ITIS 2013
Proceedings
Edited by Zoran Levnajic
Faculty of Information Studies in Novo mesto

Dolenjske toplice, Slovenia, 7-9 November 2013


http://itis2013.fis.unm.si/

CONFERENCE ORGANIZATION
ORGANIZING COMMITTEE
Zoran Levnajic (chair), Janez Povh, Matej Mertik
Barbara Pavlakovic, Marjeta Grahek (administrative support)
Maja Zorcic (financial matters)

PROGRAM COMMITTEE (Paper referees)


Natasa Przulj, Imperial College London, UK
Santo Fortunato, Aalto University, Finland
Matjaz Perc, University of Maribor, Slovenia
Tijana Milenkovic, University of Notre Dame, USA
Miroslav Baca, Faculty of Organization and Informatics, Croatia
Markus Schatten, Faculty of Organization and Informatics, Croatia
Antonina Dattolo, University of Udine, Italy
Marko Bohanec, Jozef Stefan Institute, Slovenia
Mario Spremic, University of Zagreb, Croatia
Matjaz Juric, University of Ljubljana, Slovenia
Marjan Hericko, University of Maribor, Slovenia
Sanda Martincic-Ipsic, University of Rijeka, Croatia
Ana Mestrovic, University of Rijeka, Croatia
Janez Povh, Faculty of Information Studies, Novo mesto, Slovenia
Nadja Damij, Faculty of Information Studies, Novo mesto, Slovenia
Matej Mertik, Faculty of Information Studies, Novo mesto, Slovenia
Blaz Rodic, Faculty of Information Studies, Novo mesto, Slovenia
Zoran Levnajic, Faculty of information studies, Novo mesto, Slovenia
Borut Luzar, Faculty of information studies, Novo mesto, Slovenia
Bostjan Delak, ITAD, Technological Park Ljubljana, Slovenia
Igor Bernik, University of Maribor, Slovenia
Davide Rossi, University of Bologna, Italy
Marina Ivasic-Kos, University of Rijeka, Croatia
Simon Kozina, Jozef Stefan Institute, Ljubljana, Slovenia
Mitja Krajnc, University of Maribor, Slovenia
Grzegorz Majewski, Faculty of information studies, Novo mesto, Slovenia
Andrej Dobrovoljc, Faculty of information studies, Novo mesto, Slovenia
Jelena Govorcin, Faculty of information studies, Novo mesto, Slovenia

CONFERENCE PROGRAM
Thursday, 7 November
08:30 09:00 Registration
09:00 09:20 Opening
Simulating Social Networks
09:20 10:00 Santo Fortunato: Community Detection in Networks
10:00 10:20 Mario Karlovcec: Web application for generating sub networks of
Slovenian research collaboration
10:20 10:40 Borut Luzar: Interdisciplinarity of Slovenian research
10:40 11:00 Sweet coffee break
Modeling and Simulating Social Processes
11:00 11:40 Matteo Marsili: On sampling and modeling complex systems
11:40 12:00 Marija Mitrovic: Universality in voting behavior
12:00 12:20 Zoran Levnajic: Looking for stable pluralism
12:20 12:40 Alenka Pandiloska Jurak: Network analysis of the competence
centres in Slovenia
12:40 13:00 free
13:00 14:00 Light lunch
Data Technologies and Simulations
14:00 14:40 Peter Richt
arik: Big Data Convex Optimization: Why Parallelizing Like Crazy and Being Lazy Can be Good
14:40 15:00 Joze Bucar: Case Study Web Clipping The Preliminary Work
15:00 15:20 Darko Zelenika: Automatic invoice capture in small and mediumsized Slovenian enterprises project overview
15:20 15:40 Tomislav Fotak: Handwritten Signature Authentication Using
Statistical Measures of Basic On-line Signature Characteristics
15:40 16:00 Salty coffee break
Information Society and Simulations
16:00 16:20 Andrej Dobrovoljc: An approach to identify organizational security vulnerabilities
16:20 16:40 Igor Jugo: A proposal for a web based educational data mining
and visualization system
16:40 17:00 Gregor Polancic: Extending BPMN 2.0 Conversation diagrams
for modeling complex communication
17:00 17:20 Martin Ravnikar: The Way to Efficient Management of Complex
Engineering Design
17:20 17:40 Tatjana Welzer: Cultural Components in Information Society
17:40 18:00 free
18:00 Wine tasting and various

Friday, 8 November
Simulating Cultural Processes
09:00 09:40 Matjaz Perc: Culturomics of physics: Which words and phrases
defined the biggest breakthroughs of the 20th century?
09:40 10:00 Domagoj Margan: Preliminary Report on the Structure of Croatian Linguistic Co-occurrence Networks
10:00 10:20 Kristina Ban: Initial Comparison of Linguistic Networks Measures
for Parallel Texts
10:20 10:40 Lucia Nacinovic Prskalo: An Overview of Prosodic Modelling for
Croatian Speech Synthesis
10:40 11:00 Sweet coffee break
Increasing Well-being through IT and Simulations
11:00 11:40 Tijana Milenkovic: What can complex networks tell us about human aging?
11:40 12:00 Matjaz Tome: IT solutions to assist diabetic patients and medical
staff
12:00 12:20 David Fabjan: Long-term-care and intelligent IT in a changing
demographic landscape
12:20 12:40 Blaz Rodic: Perception of privacy in social networks among youth
12:40 13:00 Andrej Kovacic: New Ways to Manage Communication with Customers on the Internet
13:00 14:00 Light lunch
Simulating Business Processes
14:00 14:20 Jernej Agrez: TAD methodology for process pattern assessment
in weakly defined organizational formations
14:20 14:40 Grzegorz Majewski: Inclusion of tacit knowledge in the simulation
of business processes

14:40 14:00 Zeljko


Dobrovic: Connection between Process Model and Data
Model: Metamodelling Approach
15:00 15:20 Renato Barisic: Use of printed textbooks and digital content in
secondary school education
15:20 15:40 Bostjan Delak: How to identify knowledge and evaluate knowledge
management in organization
15:40 16:00 free
zemberk castle and conference dinner in tra16:00 Excursion to historic Zu
ditional Zupancic restaurant

CONFERENCE SPONSORS

Operacijo delno financira Evropska unija in sicer iz Evropskega sklada za regionalni razvoj. Operacija se izvaja v okviru Operativnega programa krepitve
regionalnih razvojnih potencialov za obdobje 2007-2013, 1. razvojne prioritete:
Konkurencnost podjetij in raziskovalna odlicnost, prednostne usmeritve 1.1:
Izboljsanje konkurencnih sposobnosti podjetij in raziskovalna odlicnost.

Contents
1 Simulating Social Networks

2 Modeling and Simulating Social Processes

21

3 Data Technologies and Simulations

35

4 Information Society and Simulations

53

5 Simulating Cultural Processes

88

6 Increasing Well-being through IT and Sim-

ulations

113

7 Simulating Business Processes

146

Simulating Social Networks

Invited lecture:
Santo Fortunato
Dept. of Biomedical Engineering and Computational Science,
Aalto University, Finland.
Community Detection in Networks
Finding communities in networks is crucial to understand their structure and function, as well as to identify the role of the nodes and
uncover hidden relationships between nodes. In this talk I will highlight the open problems that still keep the scientific community from
having a shared set of reliable tools for the clustering analysis of real
networks. In particular I will address the complexity of real community detection, due to the presence of overlapping communities and
hierarchical structure, I will discuss the limits of null models at the
basis of existing techniques and assess the delicate issue of testing
the performance of methods.

Web application for generating subnetworks of Slovenian


research collaboration
Mario Karlovec, Dunja Mladeni
{mario.karlovcec, dunja.mladenic}@ijs.si
Joef Stefan Institute
Joef Stefan Institute, Artificial Intelligence Laboratory

Abstract: The objective of this work was to create a web application that can generate
subnetworks of Slovenian research collaboration, based on either research projects or
publications. The two main databases that capture national research activity (SICRIS
and COBISS) were integrated and plugged into the application, which enables
generation of subnetworks with application of different criteria and filters. This work is
relevant for further development of Slovenian research network analysis.
Keywords: network analysis, bibliographic network

1 Introduction
This paper presents a system for generating subnetworks (sub graphs) of research
collaboration in Slovenia, with the focus on the architecture of the system. The system
is using public data on research activity in Slovenia from two data sources: researcher
data in SICRIS and bibliographic data in COBISS. The main challenges presented in
this paper relate to merging scientific research data and bibliographic data of individual
research subjects from the two accessible resources. The proposed system has two main
modules. Integration and cleaning of the data is implemented as a separate module of
the architecture (as shown in Figure 1). The second main module of the architecture is a
publicly available web application that enables generation of subnetworks based on
different criteria selected by the user.
Network of Slovenian researchers was the subject of analysis of several authors in the
past. Growth, emergence of a giant connected component and clustering coefficient of
the network of researchers in Slovenia based on research projects were analyzed in [1].
[2] analyses how the co-authorship structure changes over time for four research
disciplines in Slovenia: biotechnology, mathematics, physics and sociology. In [3]
authors show the degree distribution and analyze cohesion between the science groups.
The problem of visualization of Slovenian scientific community is tackled in [4]. This
work presents a system that enables generation and visualization of collaboration
subnetworks, which can be used for further research and discoveries in this field.

2 System architecture
The proposed system architecture has two main parts covering data handling at the
backend and the frontend supporting interaction with the user. Data Integration part of
the task is the first part of the system architecture. There are two data sources currently
used by the system: SICRIS - a database of research projects activity in Slovenia [5];
and COBISS a database of scientific bibliography in Slovenia [6]. The tasks of this
module include understanding of the data structure and the meaning of the data,
building a data model that integrates the two sources, performing the actual merging
8

with resolving issues and remodeling the model, and ensuring automatic periodical
updates of the database. Furthermore, a client application is built in such a way that it
enables exportation of portions of the data, constructed using different filters.
The second module of the system Web Application, consist of a user-friendly, open
access web interface that enables selection of different subnetworks and provides
several visualizations, that can give insights into the data with the graphical
representation of networks.
First the retrieval of the data from the integrated database is solved. Next, at the server
side of the application, the data is transformed into the suitable graph format. Finally the
visualizations are implemented using the functionalities of the Slovenian Science Atlas
[4].

Figure 1: System architecture

3 Data integration and cleaning module


The first module of the architecture is the Data integration and cleaning module. The
goal of the module is to integrate the data from the two research databases and enable
creation of collaboration networks based on arbitrary criterion. The steps required to
accomplish this goal are shown with Figure 2. These are: data understanding, selection
of entities, relations and attributes for the integrated data model, building a merging
mechanism, creating the data model, importing the data and refining the merging
mechanism, creating mechanism for continuous updating of integrated database,
development of client for exporting data.

Figure 2: Infrastructure of the data integration and cleaning module. Both databases are
first processed to understand the data and select entities, relations and attributes. This is
fed into the merging mechanism.

3.1 Data description


In order to select the entities, relations and attributes for the integrated data module, a
clear understanding of the data is required. In this section, a short description of the two
main datasets (SICRIS and COBISS) is given.
SICRIS is a database that covers research activity in Slovenia. It is managed by the
ARRS (Slovenian National Research Agency) and IZUM (Institute of Information
Science in Maribor). The database contains data about national research projects
registered from 1994 till present, and it is being updated on monthly bases. The main
entities of the SICRIS data model are: research project, research program, researcher,
organization and research group. The main entities are connected with the following
relations: researchers projects/program (researchers working on a project/program),
researchers organizations (researchers employed in an organization) and researchers
research group (researchers are part of a research group). The main entities are
described with the attributes given in Table 1.
Table 1: Attributes of the main entities of SICIRS database
Entity
research
project/program
researcher

organization
research group

Attribute groups
status data, classification, keywords, abstract, significance,
report, organizations, researchers, equipment
status data, contact data, classification, keywords, video,
knowledge of foreign languages, education, employment,
projects, programs
status data, contact data, classification, research groups,
researchers, projects, programs, research equipment
status, contact, classification, researchers

COBISS is a Slovenian national database that contains data about scientific publications
in Slovenia. The maintainer of the database is IZUM. The bibliographic data is stored in
the COMARC/B format. COMARC/B was developed from MARC (Machine-Readable
Cataloging) that originated from US Congress library, in the year 1965. COMARC/B is
the state of the art format for bibliographic data, alongside COMARC/H for stock data
and COMARC/A for normative data [7]. The 12 main fields of the COBISS database
are shown in Table 2.

10

Table 2: Main fields in the COBISS bibliographic database (in COMARC/B format)
Field id
001
010
011
012
013
017
020
021
022
040
041
071

Field name
Editorial format
International Standard Book Number (ISBN)
Int. ident. for serials and other continuing resources, in the elec. and print
world (ISSN)
Fingerprint
International Standard Music Number (ISMN)
Other standard identifiers
National bibliography number
Number of the mandatory copy
Number of the official publication
CODEN
Other codes
Publishers number (sound recordings and music)

3.2 Integrated data model


In order to build a single database that integrates data from both SICRIS and COBISS
databases, first, entities, attributes and the relations were selected from the two
databases, to obtain a set that will be included in the integrated database. Next, the
integrating database model was built, taking into account the entities and attributes that
can connect the entities from the two databases. The integrated relational database
model is shown in Figure 3, with only key attributes of the entities. The final database
model contains the main entities: researcher, author, research project, publication, and
organization. In addition to the main entities, the model contains entities for
classification of projects and researchers, educations, and the entities that connect other
entities.

Figure 3: Integrated data model of Slovenian research collaboration


The entity that was the base for integrating the two databases was the researchers table.
In the SICIRS research projects database, the researchers are identified with an id called
11

MSTID. MSTID is a five characters long identifier assigned by the Slovenian Research
Agency (ARRS). In the COBISS database the author of a publication is identified with
the CONOR ID identifier. The data that enabled integration of the researchers from the
two databases was an attribute MSTID (used by SICRIS) contained in the COBISS
database. An issue with integration was caused with occurrence of authors of
publications recorded in COBISS, but who were not researchers registered in SICRIS
database. This was solved by adding an addition table called tblCobissAuthor, which
covered all those cases. Outcome of that merging is a union of all the researchers listed
in one of the two databases. Very often problem with merging of different data sources
is inconsistency of the data, i.e. different spelling of names, etc. In our case the
inconsistency could be caused with different spelling of a researchers name in different
publications. Since SICRIS is a very clean database, we have avoided this kind of issues
by using SICRIS as the main source for researchers names and contact attributes. The
same approach was used for other main entities descriptive attributes were taken from
SICRIS, while COBISS was used to link the main entities with publications. In
summary, data integration and integrated data model development is a delicate task,
because it is crucial for the rest of the application. Consistency of the two databases
helped a lot with the integration process, which anyway required careful selection of
attributes and solving issue of researcher with missing MSTIDs.

4 Web application
On top of the designed database that integrates SICIRS and COBISS, we have built a
web application that can generate and visualize networks by different filters and criteria.
The web application enables defining different node types (researcher, organization, or
research group) and edge types (publication or project). The network can be constructed
based on one or more reference nodes, which means that all the nodes which have
connection with the reference node are included in the network. The reference node can
be selected using the researcher id, first and last name. After the nodes are selected,
edge between each pair of nodes is created if the selected connection criterion (common
project or publication) is fulfilled. Also, different filters can be applied when generating
subnetwork: starting year, ending year, topology (in case the type of the edge is
bibliography, only bibliography from the defined topology is taken into account),
keywords (units with defined keyword in the title or keywords list).
The generated subgraph can be visualized on the website. The nodes are placed using
the ForceAtlas2 algorithm [8], which helps interpreting the graph by emphasizing
clusters of nodes. Different colors of the nodes are associated with the science fields of
the researchers, while the sizes correspond to number of connections a node has on the
graph. The web application is connected with Science Atlas [9], in the way that click on
a node representing a researcher opens the profile page of the researcher in the Science
Atlas. Figure 4 shows the structure of the web application.

12

Figure 4: Web application for generating subnetworks of research collaboration graphs


in Slovenia. Integrated data sources are the input of the application. The application is
structurally divided into three parts (from top down): defining the subnetwork, list of
nodes and visualization which is the connection with Science Atlas.

5 Conclusion
This application can be of relevance value for the scientific communities interested in
investigating research networks in Slovenia. The most important qualities of the
application are: (i) integrated use of two highly credible databases of scientific activity,
(ii) providing methods for generating subnetworks using arbitrary number of reference
nodes, and (iii) instant visualization of the generated subnetwork and connection with
other web resources. We have elaborated the architecture of the system, described the
data and the integration process, and showed structure of the web application. In future
work we will develop client application that enable creation of subnetworks using
additional methods and filters.

6 Acknowledgements
This work was supported by the Slovenian Research Agency and the ICT Programme of
the EC under PlanetData (ICT-NoE-257641) and XLike (ICT-257790-STREP).

13

7 References
[1] M. Perc, Growth and structure of Slovenias scientic collaboration
network, J. Informetrics, vol. 4, pp. 475482, 2010
[2] L. Kronegger, F. Mali, A. Ferligoj, and P. Doreian. Collaboration structures in
Slovenian scientific communities. Scientometrics, vol. 2 pp. 631-647, 2012.
[3] M. Karlovcec, A. Kastrin, Collaboration Networks of Slovenian Researchers. In the
proceedings of 5th International Conference on Information Technologies and
Information Society (ITIS) ITIS 2012, 7-9 November 2012, Dolenjske Toplice,
Slovenia.
[4] M. Karlovcec, D. Mladenic, M. Grobelnik, M. Jermol. Visualizations of Slovenian
Scientific Community. In Proceedings of the 14th International Multiconference
Information Society, 10 14 October 2011, Ljubljana, Slovenia.
[5] IZUM and ARRS, URL: http://sicris.izum.si/ [August, 2013]
[6] IZUM and ARRS, URL: http://cobiss.izum.si/ [August, 2013]
[7] B. Furrie. Understanding MARC Bibliographic: Machine-Readable Cataloging.
[8] M. Jacomy, S. Heymann, T. Venturini and M. Bastian, 2011. ForceAtlas2, A Graph
Layout Algorithm for Handy Network Visualization.
[9] IJS - Artificial Intelligence Laboratory (2013), URL: http://scienceatla.ijs.si [August,
2013]

14

Interdisciplinarity of Slovenian research


Zoran Levnajic, Borut Luar, Janez Povh
Faculty of Information Studies, Sevno 13, 8000 Novo mesto, Slovenia
{borut.luzar,zoran.levnajic,janez.povh}@fis.unm.si
Matja Perc
Faculty of Natural Sciences and Mathematics, University of Maribor
Koroka cesta 160, SI-2000 Maribor, Slovenia
matjaz.perc@uni-mb.si
Abstract. The growth of modern science is fastest along the borderlines between the
traditional disciplines. Interdisciplinarity usually benefits all fields involved in a given
research despite their diversity. In this short report, we analyze the current state of interdisciplinary science in Slovenia. By detecting the communities in co-authorship networks
we investigate the raise of interdisciplinary research groups from 1960 to 2000. Our preliminary findings indicate that although interdisciplinarity is clearly gaining ground in
Slovenian research landscape, there are still many barriers to overcome.
Keywords. interdisciplinary science, social networks, simulations

Introduction

Recent scientific discoveries are increasingly occurring by interaction between the traditional sciences such as physics, biology, mathematics or sociology [2]. Neuroscience,
bioinformatics or social networks are only a few examples of new disciplines that emerged
from successful symbioses between fields which were at the opposite ends of scientific
spectrum only a few decades ago [3]. In fact, interdisciplinary sciences have already
contributed significant results, usually obtained by cross-utilizing the approaches and
paradigms from both involved fields. This trend can be also revealed by examining the
funding policies of many science-fundings bodies such as governments.
Like in many other countries, Slovenias scientific policy is to encourage the research
interaction among its scientists, which often leads to outstanding results. Yet, the success
of this endeavor, although considerable, is still limited by various factors. In this brief paper, we report our preliminary findings on the growth and dynamics of interdisciplinarity
in Slovenian science.
We construct social networks of Slovenian scientists, by considering a pair of them to
be connected if they published a research paper as co-authors. By examining the period
from 1960 to 2000, we study the growth and diversification of Slovenian research. We
analyze the community structure of the obtained networks, using the standard GirvanNewman algorithm for community detection [6]. The emergence of interdisciplinarity
is quantified by comparing the original scientific backgrounds of communities members.
We identify a community as interdisciplinary if a considerable fraction of its researchers
15

has diverse backgrounds. Our paper finishes with discussion of several possible ways of
defining interdisciplinarity.

Data and Methods

For our research we used the data obtained from the two main databases collecting information about research in Slovenia Slovenian Current Research Information System
(SICRIS) [7] and Co-operative Online Bibliographic System & Services (COBISS) [4].
The former contains data about the registered Slovenian researchers and the latter the data
about their publications.
We constructed eight co-authorship networks modeling the data from 1960 to 2000.
Networks were designed to cumulatively include time-windows of size increased by 5
years, i.e.: 1960-1965 (first network), 1960-1970 (second network), 1960-1975 (third
network), etc. The least (eight) network includes the entire period from 1960 to 2000. A
researcher represented as a network node exists in a given decade, if he/she published in
that period at least one paper recorded in COBISS. Two researchers/nodes are connected
by a non-directed edge, if they co-authored at least one paper in a given period. We
neglect the thickness of links, i.e., all links are of unitary weight regardless of how many
papers were published by a given pair of authors. We denote each of the networks by Gt ,
where t is the last year of the considered period.
The networks are not connected, and the components, other than the largest (giant)
one, are typically very small. On every largest component we apply the standard GirvanNewman algorithm in order to detect its communities. The algorithm proceeds by successively removing the links with maximal edge betweenness M , i.e. at every step an edge
with maximal number of shortest paths between every pair of vertices is being removed.
The algorithm stops when M n, where n is the number of vertices in the network. This
stopping criteria can in fact be defined in many different ways. However, due to the specific structure of our networks, we decided to use the one which stops whenever an edge
which is pendant in the initial graphs should be removed. In addition, we automatically
identify as communities all the components other than the largest one.
Once the algorithm is completed, our initial network is divided into a given number of
communities, representing research groups. We denote this set of communities as Ct , for
t being the last year of the considered period, which we analyze in the sequel. We exclude
the communities with less than five researchers, since even if diverse, they are to small to
embody interdisciplinarity. The total number of vertices in the remaining communities in
Ct is denoted as v(Ct ).
We now start the analysis of interdisciplinarity. SICRIS and COBISS divide the sciences into 7 main categories Natural sciences and mathematics, Engineering sciences
and technologies, Medical sciences, Biotechnical sciences, Social sciences, Humanities,
and Interdisciplinary studies1 . That is to say, that each researcher is identified through
his/her main field of work, belonging to one of these categories. For each community C
in Ct , we assign a seven component vector S(C), in which the i-th component represents
the proportion of the researchers belonging to the i-th science.
It is not straightforward to define what means for a research group to be interdisciplinary. We introduce three different measures by which we determine the state of in1 The

true number of researchers belonging to Interdisciplinary studies in the sense of COBISSs seventh
category is very small, and often imprecisely quantified. For this reason we designed our own ways of measuring
the emergence of interdisciplinarity, rather than simply looking at the number of researchers in this group.
16

terdisciplinarity in each of the networks. We refer to the first measure as single-bound


interdisciplinarity and denote it by B(Gt ). For some community C, consider the value of
M (C), which is defined as the maximal component in the vector S(C). If the value of
M (C) exceeds a given bound, the community is not interdisciplinary, otherwise it is. We
set this bound to be 80% of the sum of all components, i.e., for a community to be interdisciplinary, we require that at least 20% of its scientific content is not along the maximal
component. The value of B(Gt ) is then defined as
B(Gt ) =

# interdisciplinary communities
.
|Ct |

Our second measure is the ordinary interdisciplinarity O(Gt ). For each community C we
compute its O(C) as:
7
7
O(C) = M (C) + .
6
6
Notice that O(C) = 0 if M (C) = 1 and O(C) = 1 if M (C) = 17 , which is the smallest
possible value for M (C). The ordinary interdisciplinarity of Gt is defined as
O(Gt ) =

X O(C)
.
|Ct |

CCt

In the second measure the size of community is not considered. Thus we present also
the normalized interdisciplinarity N (Gt ) of the network, where we take into account that
bigger communities contribute more to interdisciplinarity of the network. Normalized
interdisciplinarity N (C) of a community C is defined as
N (C) = O(C) |C| .
The normalized interdisciplinarity of Gt is then computed as
N (Gt ) =

X N (C)
.
v(Ct )

CCt

Relying on these interdisciplinarity measures, we compute our results in relation to different periods, to be presented in the next section.

Results

In Fig.1 we show a separation of the largest component of co-authorship network in the


period between 1960 and 1975 into communities using the standard Girvan-Newman algorithm. Different communities are visualized via different node colors. The figure was
drawn using Pajek [1].
In Table 1 we present basic network properties of the eight considered networks.
Namely, numbers of vertices and edges of networks, sizes of the largest components,
numbers of analyzed communities and numbers of vertices in the analyzed communities,
respectively.
In Table 2 the values of the three interdisciplinarity measures defined above are listed.
It can be seen that all three measures are more or less stabilized, although the first starts
to decrease in 1995. So, there are less interdisciplinary communities ragarding to the
17

Pajek

Figure 1: A separation of the largest component of G75 into communities using GirvanNewman algorithm.
network
G65
G70
G75
G80
G85
G90
G95
G00

n
194
454
920
1586
2242
3799
5689
7925

m
70
321
947
2243
3739
7838
13580
22600

l
18
76
229
690
1195
2408
3971
5888

c
2
12
29
41
45
60
74
91

nc
12
144
432
913
1397
2544
4076
6000

Table 1: Basic properties of analyzed networks. Here n, m, l, c, and nc are the number of
vertices in the network, the number of edges in the network, the size of the largest component, the number of analyzed communities and the number of vertices in the analyzed
communities, respectively.
first measure, but the interdisciplinary ones are bigger, so the other two measures remain
approximately the same. There is a small increment of the third measure in the year
2000 and the analysis of more recent networks will show if the interdisciplinarity is really
increasing. In fact, taking into account the growth of the total number of researchers in
Slovenia over the considered period of time, findings are reconciled.

Discussion

The emergence of interdisciplinarity in Slovenian science was studied by means of network analysis, and in particular, by extracting research communities from co-authorship
18

network
G65
G70
G75
G80
G85
G90
G95
G00

B(Gt )
0.000
0.357
0.760
0.634
0.689
0.683
0.622
0.560

O(Gt )
0.082
0.250
0.327
0.312
0.338
0.337
0.330
0.300

N (Gt )
0.095
0.244
0.341
0.355
0.365
0.365
0.332
0.359

Table 2: The values of the three measures of interdisciplinarity for every network.

networks and quantifying their interdisciplinarity. We showed that the number of Slovenian research groups with diverse scientific backgrounds has raised considerably over the
past few decades. This trend is likely to be universal for many countries, although we
are not aware of any similar results in this direction. Nevertheless, we note that a large
portion of present co-authorship network is still far from being interdisciplinary, i.e. it is
dominated by traditional single-science research groups.
While our qualitative findings do not depend on the particularities of our methodology, specifies values might depend on it. This primarily refers to choice of community
detection algorithm [5]. Given the available variety of them, it is likely that a different algorithm might revels slightly different communities. We still expect however, that
quantifying their interdisciplinarity would lead to the same results. Stopping criteria, even
within the framework of a given algorithm, might also induce discrepancies. We verified
that for Girvan-Newman algorithm, our findings are essentially independent on this.
Finally, quantifying interdisciplinarity is not a standard task. Our simple binary rule
does not take into account the differences among research groups which combine more
than two sciences. Also, we have not considered which sciences are combined in a given
group, although the true interdisciplinarity might depend on that. This is matter of further
work which is to be published elsewhere; for this preliminary report we limit ourselves to
these simple findings.

Acknowledgments

Work supported by ARRS via program P1-0383 Complex Networks, via project J15454 Unravelling Biological Networks and project L7-4119 Co-authorship networks
of slovenian scholars: Theoretical analysis and visualization user interface development.
Work was also supported by Creative Core FISNM-3330-13-500033 Simulations funded
by the European Union, The European Regional Development Fund. The operation is carried out within the framework of the Operational Programme for Strengthening Regional
Development Potentials for the period 2007-2013, Development Priority 1: Competitiveness and research excellence, Priority Guideline 1.1: Improving the competitive skills and
research excellence.

19

References
[1] Batagelj V., Mrvar A. Pajek - program for large network analysis. In Connections
21.2 (1998), 4757.
[2] Frodeman, R., Klein, J.T., Mitcham, C. Oxford Handbook of Interdisciplinarity. Oxford University Press, Oxford, 2010.
[3] Chubin, D. E. The conceptualization of scientific specialties. The Sociological Quarterly, 17, 448476, 1976.
[4] Institute of Information Science. Co-operative Online Bibliographic System & Services. http://www.cobiss.si/. 2013.
[5] Fortunato, S. Community detection in graphs. Physics Reports, 48, 75174, 2010.
[6] Newman, M. E. J., Girvan, M. Finding and evaluating community structure in networks, Physical Review E, 69, 026113, 2004.
[7] Institute of Information Science. SlovenIan Current Research Information System.
http://sicris.izum.si/. 2013.

20

Modeling and Simulating Social Processes

Invited lecture:
Matteo Marsili
The Abdus Salam International Centre for Theoretical Physics,
Trieste, Italy.
On sampling and modeling complex systems
The study of complex systems is limited by the fact that only few
variables are accessible for modeling and sampling, which are not
necessarily the most relevant ones to explain the systems behavior.
In addition, empirical data typically under sample the space of possible states. We study a generic framework where a complex system
is seen as a system of many interacting degrees of freedom, which
are known only in part, that optimize a given function. We show
that the underlying distribution with respect to the known variables
has the Boltzmann form, with a temperature that depends on the
number of unknown variables. In particular, when the unknown
part of the objective function decays faster than exponential, the
temperature decreases as the number of variables increases. We
show in the representative case of the Gaussian distribution, that
models are predictable only when the number of relevant variables
is less than a critical threshold. As a further consequence, we show
that the information that a sample contains on the behavior of the
system is quantified by the entropy of the frequency with which different states occur. This allows us to characterize the properties
of maximally informative samples: In the under-sampling regime,
the most informative frequency size distributions have power law
behavior and Zipfs law emerges at the crossover between the under
sampled regime and the regime where the sample contains enough
statistics to make inference on the behavior of the system. These
ideas are illustrated in some applications, showing that they can
be used to identify relevant variables or to select most informative
representations of data, e.g. in data clustering.

21

Universality in voting behavior


Marija Mitrovicc, Arnab Chatterjee, Santo Fortunato
Department of Biomedical Engineering and Computational Science
Aalto University School of Science
P.O. Box 12200, FI-00076, Finland
{marija.mitrovic@aalto.fi}
Abstract. Statistical physics provides a conceptual framework for studying large-scale
social phenomena. Elections represent a valuable area for quantitative study of human
behavior. In proportional elections with open lists, the performance of the candidates
within the same party list, has the same distribution regardless of the country and the
year of the election. The study of election data sets from different countries with openlist proportional systems confirms that nations with similar election rules belong to the
same universality class. Deviations from this trend are associated with differences in the
election rules. Our analysis reveals that voting process is characterized with dynamics
that does not depend on the historical, political or economical context where the voters
operate.

22

Looking for stable pluralism


Zoran Levnajic
Faculty of Information Studies, Sevno 13, 8000 Novo mesto, Slovenia
{zoran.levnajic@fis.unm.si}
Abstract. Social phenomena are complex and therefore difficult to analyze quantitatively.
In fact, our present understanding of processes such as spreading of a popular idea or
diffusion of a new radical opinion is still only at a descriptive level. Various statistical and
computational models have been proposed in the recent decades, attempting to capture
these processes through the framework of dynamical systems and complex networks. Despite being intuitively clear, these models are usually able to predict only a limited range
of behaviors of a given social system. In this talk, we propose a different approach in
examining these models. In particular, we suggest to search for specific networks topologies, on which the behavior of a given model of social dynamics resembles more closely
realistic scenarios.

23

Network analysis of the competence centres in


Slovenia
1

Jelena Govorin2, Andrej Kastrin3, Borut Luar4, Uro Pinteri5, Janez Povh6
Faculty of Information Studies
Sevno 13, 8000 Novo mesto, Slovenia
{jelena.govorcin; andrej.kastrin; borut.luzar; uros.pinteric; janez.povh}@fis.unm.si
Alenka Pandiloska Jurak7
Iskra Zaite d.o.o,
Stegne 25a, 1000 Ljubljana
alenka.pandiloska@gmail.com
Abstract: In this paper we present results of research done on the network of
Slovenian scholars, where nodes are research groups and there is a link between two
groups if there exist at least one publication co-authored by at least one member of
each of these groups. We put special focus to sub-networks created by research groups
forming the so-called Competence centres (CC). Each of this CC received
approximately 6.5 million EUR of research money for time span of four years to
create new technological products and processes. Our expectation was that these
strong financial impulses will also imply significant improvement of the network
parameters describing size, productivity and connectivity of CC. We have realised that
for the three CCs which we took into consideration this is not the case.
Keywords: network analysis, competence centre, research policy, Slovenia, bibliometrics

1.Introduction
Networking is one of the most important elements of globalized society.
Historically, science is one of the main fields where networking produces synergies and
breakthroughs in understanding and changing the reality. At the same time analysing the
effects of networking enables us to understand the patterns and gives us further possibility
to model them in the future. In this paper, we present a model for evaluation of
1

3
4
5

6
7

Article is based on research project L7-4119, funded by National research agency of Republic of Slovenia
and is partially supported by ARRS research programs P1-0383 and by Creative Core
FISNM-3330-13-500033 `Simulations' project funded by the European Union.
Jelena Govorin is a PhD student at Faculty of Mathematics and Physics and teaching assistant at Faculty of
Information Studies in Novo mesto.
Andrej Kastrin is a PhD student and teaching assistant at Faculty of Information Studies in Novo mesto
Borut Luar, PhD, is a researcher at Faculty of Information Studies in Novo mesto.
Uro Pinteri, PhD, is an associate professor at Faculty of Social Sciences, University of SS. Cyril and
Methodius in Trnava and at Faculty of Information Studies in Novo mesto.
Janez Povh, PhD, is an associate professor at Faculty of Information Studies in Novo mesto.
Alenka Pandiloska Jurak, M.A., is a PhD student and young researcher from private sector at Iskra zaite
d.o.o.

24

networking within Slovenian scientific community, based on bibliographical data. We


rely on high quality data about Slovene researchers with a special focus to their
bibliographies and investigate how certain policy measures (instruments) (i.e. financial
subsidies) increase cooperation between research groups. We limit our study to a policy
measure called Competence centres where seven groups of research and development
institutions, with each group containing profit and non-profit organisations 8, got
approximately 6.5 million EUR for the proposed research and development program. Our
aim is to figure out how this input of huge amount of public money results in networking
within each competence centre (CC), where we used bibliographic output as descriptors
for network dynamics.
Measuring the research output can be relatively easy. Relatively simple and usable
tool was presented by Kavlie and Sleeckx [9]. Four main stages of research policy are
indicated: input, activities, output and outcomes [8, 9]. While we are able to identify the
input (resources), activities (research and training itself) and outputs (publications,
awarded diplomas, patents, etc.), it is very difficult to measure the outcomes (indirect
effects of output of resource activation). Efforts to measure impact or at least role of
research policy can be traced back in the past, e.g. [7]. However, over the time, evaluation
of research policy measures (instruments) had different focal points. Katz [7] was mainly
interested in impact and role of research and development in companies, Katz and Martin
[8] questioned the term of collaboration in research which can be understood as
methodological issue what and how one should measure at all. Xu [12] followed the
modern trend of university-business research cooperation as basis for innovation and
imposed the question of academia-economy networking as factor of innovation. Aksnes
et al. [1] were on the other hand testing cooperation just between two institutions based
on publications output. Gunnarsson [3, n. d.] on the other hand prepared interesting
analysis of use of the bibliometrical data as an evaluation tool of research policy for
Nordic countries. Between these examples from different fields which cover historical
time span of more than 20 years there is enormous number of different other studies [e.g
10; 2], unique or following one of the aforementioned approaches.
Our approach mainly considers research output (i.e. scientific articles, book
chapters, monographs, final research reports, patents etc.), but we claim that our results
also provide partial answer to the question about outcomes. Indeed, we are sure that the
most desired outcome, i.e. stronger cooperation and interconnection between profit and
non-profit research organisations can be detected via our network parameters. In this
sense our research is close to the research of Xu [13], which was as well conducted on the
level of institutional networking. The main contribution of this article is therefore:
- We apply state of the art network analysis methods to networks spanned by
research groups included in three out of seven approved competence centres.
- We rely on high quality data about research output of institutions included in the
study which is based on two central publically available and well maintained
databases.
- Our approach can be easily extended to consider other type of research output,
other network analysis methods and other research policy measures.
8

Typical non-profit research groups are those working in higher education institutions or within research
institutes while typical profit research groups are those working as a part of some company.

25

2. Sketch of
(instruments)

Slovenian

research

policy

measures

Main goals of Slovenian research and development policy are defined by


Slovenias Development Strategy (SDS) that was accepted in 2005 by the government of
Republic of Slovenia for the period of 2006-2013. One of the main national development
goals was to introduce global competitiveness by supporting innovativeness and
entrepreneurship, supporting the ICT penetration renovation and investment into learning,
education, research and development [6].
One of policy measures to achieve goal from above was establishment of so called
competence centres (CC) [4]. These are conglomerates of research and development
institutions with each of them containing profit and non-profit organisations. All of them
are coming from the area of natural sciences, medicine or technical sciences, while social
sciences are mainly left outside. Despite their main task is to develop new technological
products and processes that should increase competitiveness of Slovenian technological
sector in global economy as well as to support energy efficient economy and
low-carbonate society concept, their responsibility is also to increase the research and
knowledge potential of Slovenia, which can be measured by publication output.
Especially partners coming from non-profit members of CCs are aware of importance of
publication output, so we claim that analysis of publishing records of all members of CCs
reveals a lot about output and outcome of CCs.

3.

Description of the research methodology

Slovenia has central register of all relevant research institutions as well as


researchers called Slovenian Current Research Information System (SICRIS). Institutions
and researchers which are not registered in SICRIS are excluded from almost every
research measure. In august 2013, register had approximately 14.200 researchers working
in over 950 research organisation (profit and non-profit) that were divided into over 1500
research groups9. Their achievements are indexed in another system called Co-operative
Online Bibliographic System and Services (COBISS) and then cross-referenced with
SICRIS. These rather unique and publicly available databases are maintained by Institute
of information science and funded by government. Within a research project L7-4119
funded by National research agency of Republic of Slovenia we provided a new database
which integrates both SICRIS and COBISS and updates daily. It is designed to enable fast
and efficient network analysis of Slovene scholars. We used it to create networks, while
all network parameters were calculated using Pajek - program for analysis and
visualization of large networks [11].
As mentioned above, we categorise all research groups from SICIRS as profit or
non-profit, see footnote Error: Reference source not found. All bibliographical data of
relevance (articles, chapters, monographs, final research reports, patents etc.) will be
9

Research group in this context is smallest independent unit, composed of researchers at individual research
organizations, registered at Slovenian Research Agency (ARRS).

26

analysed for co-authorships with focus on co-authorships, where at least one co-author
comes from a profit and one from a non-profit research group. We compare results for the
following three CCs:
1. Competence centre Devices supported by cloud computing (CC CLASS);
2. Competence centre for biotechnological development and innovations (CC
BRIN);
3. Competence centre Advanced systems of effective use of electrical energy (CC
SURE);
with results on the network of all registered research groups and researchers.
Despite the CC instrument followed mainly the research policy request for
technological development and innovation, as mentioned earlier, we claim that it should
also stimulate the publication of new findings and discoveries; hence the financed
collaborations (i.e. the CCs) should grow and develop as network much faster compared
to general network regarding the following factors:
- size, measured by the number of research groups included in the network of the
particular CC (note that for each CC this is a network spanned by members of this CC
and by other institutions having at least one joint publication with at least one member
of this CC since year 2000).
- productivity, measured by (i) number of publications where authors come from at
least two member groups of the same CC and by (ii) number of publications where
authors come from the profit and non-profit member groups of the same CC.
- connectivity, measured by density, average degree, diameter, number of connected
components and clustering coefficient.
Throughout this paper, each network will be considered as a simple undirected
graph G= ( V,E ) , where V and E represent sets of existing nodes and links, respectively.
For each of the considered network, the following parameters were analysed:
11 Network size The size of a network G can refer to the number of
nodes n=V or, less commonly, the number of edges m=E. In this paper,
network size is measured by the number of nodes.
11 Network density The proportion of links in network compared to the number of

all possible links:


D=

2m
.
n( n1 )
(1)

11 Average degree An important indicator in co-authorship networks is the

average number of co-authorships between the members of the network. This can
be measured by the average degree of the network, denoted by d ( G ) . It is
defined as the sum of all the degrees divided by the size of the network, i.e.
d ( G )=

1
d (v ).
n v
V

(2)

11 Number of connected components Connectivity statistics measure to what

extent a network is connected. Two nodes are said to be connected when they are
either directly connected through a link, or indirectly through a path of several
links. A connected component is a set (sub-network) of nodes all of which are

27

connected, and unconnected to the other nodes in the network.


11 Diameter The distance between two nodes u and v in a network G is the

number of edges in a shortest path (also called a geodesic distance) connecting


them. It is denoted by d ( u,v ) . The length of the longest geodesic distance in the
network G is called the diameter and denoted diam ( G ) . Thus,
d ( u,v ) .

(3)

11 Clustering coefficient the clustering coefficient of a node v in a network G

is the ratio of existing links connecting a node's neighbours to each other, denoted
e v , to the maximum possible number of such links, which is equal to
d ( v ) ( d ( v ) 1 ) /2 . Hence,
Cv=

2 ev
d ( v ) ( d ( v )1 )

(4)

It indicates how frequently neighbours of a node are linked with each other. The
clustering coefficient for the whole network G is the average of the clustering
coefficients taken over all nodes, i.e.
C (G )=

1
C .
n v V v

(5)

Its value ranges from 0 to 1.

4. Results of the data analysis


In this part we are presenting the results for the factors size, productivity and
connectivity for the network of all active research groups registered in SICRIS (we name
this network by ALL) and for the sub-networks spanned by CC CLASS, CC BRIN and
CC SURE. The analysis was done on the cumulative networks for time periods: year
2000; 2000-2001, 2000-2002, ... , 2000-2012. The network ALL is therefore created for
e.g. time interval 2000-2005 by taking into account (i) all research groups that were
active (i.e. have published at least one publication which is recorded in COBISS) within
this time span and (ii) all bibliography co-authored by at least one member of these
groups. Similarly we created the network CC CLAS for time interval 2000-2005 by
considering all research groups which created CC CLASS (note that CC CLASS was
established as consortium only in 2010) and all research groups which have joint
publication with at least one group from CC CLASS in period 2000-2005.

4.1. The size of networks

28

Table 1: Size of network (number of active research groups)


2000

2000/
2001

2000/
2002

2000/
2003

2000/
2004

2000/
2005

2000/
2006

2000/
2007

2000/
2008

2000/
2009

2000/
2010

2000/
2011

2000/
2012

1073

1137

1165

1192

1214

1228

1231

1237

1243

1247

1252

1255

1257

1.33

CC CLASS

86

122

146

163

196

215

254

275

306

324

348

357

373

13.01

CC BRIN

96

122

129

144

162

175

188

207

218

228

238

250

258

8.59

CC SURE

101

136

162

179

203

222

236

271

280

289

308

318

336

10.54

ALL

Source: [5; own research]


Table 1 shows development of the total academic network, called ALL, and the
three analysed cases of competence centres. In all cases we can see a significant growth
of number of active research groups. However, the growth of competence centres
networks is much more significant (multiplied) compared to the growth of the total
network (see the last column where we report the average annual growth rate).

4.2. The productivity


Table 2: Growth of the number of co-authored publications
2000

2000/
2005

2000/
2006

2000/
2011

2000/
2012

92495

105094

117712

130898

143673

26.78

587

674

756

841

912

24.95

67

88

112

133

158

185

35.11

382

454

518

580

645

698

31.54

8331

16728

25753

35079

45482

56521

67701

79865

63

122

205

249

315

366

423

516

CC BRIN

10

18

18

26

40

57

CC SURE

26

65

100

139

179

243

321

CC CLASS

2000/
2004

2000/
2010

2000/
2002

ALL

2000/
2003

2000/
2009

2000/
2001

2000/
2007

2000/
2008

Source: [5; own research]


In Table 2 the growth of the number of publications co-authored by members of at
least two research groups associated to the same competence centre is presented.
Accompanying Graph 1 shows that co-authored publications get some kind of weak
impulse in 2005/06 when research groups forming CC later on start to publish together
more intensively. Average annual growth of the number of publications is more intensive
for the second two competence centres (BRIN and SURE), while the complete network
and CC CLASS have approximately the same growth.

Graph 1: Growth of the number of co-authored publications


Source: [5; own research]

29

Table 3 shows the dynamics of cooperation between the profit and non-profit
research groups in the total network and in the sub-networks spanned by CCs. The
numbers in the table are publications having at least one co-author in a non-profit and at
least one in a profit research group (see definitions above) of the same competence centre.
Note that a publication is also counted if there is an author being at the same time a
member of a public and a private research group (this is possible in Slovenia via part time
contracts). For network ALL one can observe almost linear growth rate, while for the
three analysed competence centres it is obvious that CC CLASS sub-network started on
joint publications between profit and non-profit groups rather late, compared to other two
CCs. This might be connected to the topic.
Table 3: Growth of the number of profit/non-profit co-authorships
2000
ALL

2000/
2001

2000/
2002

2000/
2003

2000/
2004

2000/
2005

2000/
2006

2000/
2007

2000/
2008

2000/
2009

2000/
2010

2000/
2011

2000/
2012

2125

4443

6918

9435

12073

14882

17694

20752

23915

26789

29680

32433

35096

26.33

CC CLASS

12

17

CC BRIN

16

18

27

30

33

36

43

CC SURE

10

Source: [5; own research]


On the other hand we can observe that CC BRIN is the most active in joint
publications between profit and non-profit groups despite equal starting position with CC
SURE. Another perspective can be that CC BRIN research groups act independently to
CC financial scheme while CC CLASS rapidly increased production after gaining the
finances. CC SURE might be understand as closed due to its technical nature (they are
dealing with sustainable use of electricity) where private profit groups see no gain in
publications. This would lead to the conclusion that CC SURE did not manage to
overcome the gap between academic research and business development interests.

4.3 The connectivity


Table 4: Dynamics in network density
ALL
CC CLASS

2000

2000/
2001

2000/
2002

2000/
2003

2000/
2004

2000/
2005

2000/
2006

2000/
2007

2000/
2008

2000/
2009

2000/
2010

2000/
2011

2000/
2012

0.006

0.008

0.009

0.011

0.012

0.014

0.016

0.018

0.02

0.02

0.02

0.02

0.03

14.35

0.07

0.07

0.06

0.06

0.05

0.06

0.06

0.06

0.06

0.06

0.05

0.06

0.06

-1.28

CC BRIN

0.1

0.11

0.12

0.12

0.11

0.12

0.12

0.12

0.12

0.12

0.12

0.13

0.13

2.21

CC SURE

0.1

0.08

0.06

0.06

0.06

0.06

0.06

0.05

0.05

0.05

0.05

0.05

0.05

-5.61

Source: [5; own research]


Density, shown in Table 4, has two main characteristics. First it is low compared to
the possible full density value 1, which means that in the case of Competence centre
BRIN there is only 13% of all possible links while in other sub-networks as well in the
ALL network the density is even lower. Second observation is that competence centres
did not make any significant progress in joint publications, CC BRIN and CLASS
stagnated while the density of CC SURE even reduced by half while the density of ALL
increased five times from initial year 2000 (average annual growth is 14.35 %). This
shows that the growth of the CC networks (recall that it was much faster compared to

30

ALL see Subsection 4.1) was not followed by new links between the groups in the
networks with appropriate rate. Indeed, since the number of new publications was also
increasing (even much faster than the number of groups see Table 2) this increase is
mainly due to publication efforts of the existing co-authorships.
Table 5: Dynamics in average degree
2000

2000/
2001

2000/
2002

2000/
2003

2000/
2004

2000/
2005

2000/
2006

2000/
2007

2000/
2008

2000/
2009

2000/
2010

2000/
2011

2000/
2012

ALL

6.54

9.1

11.22

13.11

15.19

17.34

19.74

22.54

24.79

26.75

28.42

30.36

32.12

14.18

CC CLASS

5.93

8.2

8.78

9.08

10.6

12.29

14.5

15.82

16.95

18.32

19.24

20.1

20.9

11.07

CC BRIN

9.73

13.39

15.1

16.6

18.1

21.3

23.45

24.9

26.6

27.6

28.31

31.86

34.53

11.13

CC SURE

10.22

10.5

10.42

11.33

11.86

12.7

13.74

14.45

15.34

15.7

16.5

17.1

18.26

4.96

Source: [5; own research]


Dynamics in average degree in Table 5 shows how the average number of
co-authoring groups for members of CCs is growing. We see that the average degrees are
increasing, but also the sizes of the networks are increasing, in some cases (CC CLASS
and CC SURE) even faster resulting in the drop of the density. One can see that research
groups in CC BRIN have most spread network where each of the included research
groups collaborate with 34.5 others (in average) in time period 2000-2012 while in the
case of CC SURE this number is almost divided by half to 18.26. It is interesting
observation that the research groups in ALL made much more progress in average degree
than two out of three competence centres. This is mainly because the size of this network
grew slower.
Table 6: Dynamics in number of connected components
2000

2000/
2001

2000/
2002

2000/
2003

2000/
2004

2000/
2005

2000/
2006

2000/
2007

2000/
2008

2000/
2009

2000/
2010

2000/
2011

2000/
2012

172

149

126

119

105

93

84

73

65

61

61

57

52

-9.49

CC CLASS

-3.32

CC BRIN

CC SURE

-13.87

ALL

Source: [5; own research]


Table 6 shows number of connected components or sub-networks which are
existing independently (without mutual links) within the network. Over time it is
expected that number of sub-networks should be reduced which is also the case on
practical example. However, it is visible that according to the data, 2 out of 3 competence
centres reached 100% coverage of their networks already before introduction of
competence centres as policy measure. At the same time it is visible how Slovenian
research sphere became less diversified and more interconnected over time.
Table 7: Dynamics in diameter
2000
ALL

2000/
2001

2000/
2002

2000/
2003

2000/
2004

2000/
2005

2000/
2006

2000/
2007

2000/
2008

2000/
2009

2000/
2010

2000/
2011

2000/
2012

10

-4.17

CC CLASS

-3.32

CC BRIN

-5.61

CC SURE

-1.51

Source: [5; own research]

31

Table 7 presents the greatest distance between any two research groups which are
part of the same network. With increasing density of the networks, given the fact that
number of connected components is reducing over time, it is expected that any two
non-co-authoring research groups will have to co-author with smaller number of research
groups in order to be closer to another non-related research group. This is nicely visible in
the case of total population, where drop from 10 to 6 is obvious as well as in the case of
CC BRIN where drop from 6 to 3 is also significant. We point out that all networks share
the so-called small world property (diameter is small). This ways expected for the CC
networks while for the whole network this is a new interesting fact.
Table 8: Dynamic in clustering coefficient
2000

2000/
2001

2000/
2002

2000/
2003

2000/
2004

2000/
2005

2000/
2006

2000/
2007

2000/
2008

2000/
2009

2000/
2010

2000/
2011

2000/
2012

ALL

0,38

0,38

0,38

0,38

0,38

0,38

0,38

0,38

0,39

0,38

0,38

0,38

0,39

0.22

CC CLASS

0,77

0,73

0,72

0,73

0,69

0,67

0,65

0,62

0,61

0,6

0,59

0,57

0,56

-2.62

CC BRIN

0,75

0,74

0,7

0,67

0,62

0,62

0,6

0,6

0,57

0,58

0,57

0,57

0,59

-1.98

CC SURE

0,84

0,79

0,75

0,71

0,69

0,64

0,64

0,62

0,61

0,59

0,58

0,56

0,54

-3.61

Source: [5; own research]


The clustering coefficients presented in Table 8 characterises the whole network
and it could be seen as a measure of "how close" is a considered network to a fully
connected network (clique) of the same order. We can observe that over the time network
ALL preserves stagnant share of cooperation on level of 0.38 while in the cases of
competence centres, the networks are getting less and less connected, i.e. clustering
coefficients are declining over time. Initial cooperation within the networks creating
competence centres was significantly higher compared to network ALL while at the end
of the time interval the difference is much smaller. It would be expected that the trend
would be reversed due to relative small size of the networks and strong joint interest, but
this is not the case as we have already observed when we have considered other
parameters, like density and average degree.

5. Discussion and conclusion


Results presented in this paper show how it is possible to use network analysis for
understanding the dynamics in society or its parts. Even more, we showed that the used
method can be indirectly used for evaluation of certain public policy which has
measurable results as product of social interaction/networking.
However, we need to refute our three statements. Establishment of competence
centres did not significantly boost co-authored publications in any of three cases
compared to the whole population. Networking potential of competence centres measured
by joint publications prior and after establishment of competence centres, did not change.
Based on the data, it is more valid to say that previous cooperation between research
groups resulted in applying for competence centre financial instruments. Also joint
publications, written by private and public research groups did not disproportionally
increase with establishment/financing of competence centres.

32

As a side observation, it can be noted that Slovenia in recent years excluded social
sciences (none of competence centres as major financing scheme included research
groups connected to major social sciences as partners in the project) from significant
financing. The results of this fact are a topic for the on-going research and will be
published elsewhere.

References
[1] Aksnes, Dag W. et all. Bergen Campus Biomedical Research Cooperation Analysis
using a CERIF-CRIS. In Jeffery Keith G. and Dvok Jan (eds.). E-infrastructures for
Research and Innovation: Linking Information Systems to Improve Scientific
Knowledge Production: Proceedings of the 11th International Conference on Current
Research Information Systems (Prague: June 6-9, 2012): pp. 243-252. 2012.
[2] Chinchilla-Rodriguez, Zaida et all. Blockmodeling of co-authorship networks in
library and information science in Argentina: a case study. Scientometrics : an
international journal for all quantitative aspects of the science of science,
communication in science and science policy: (incorporating the Journal of research
communication studies), Vol. 93, issue 3 (2012), pp. 699-717. 2012.
[3] Gunnarsson, Magnus. International Research Cooperation in the Nordic Countries. A
publication from the NORIA-Net the use of bibliometrics in research policy and
evaluation activities. The Webpages Title,
http://www.nordforsk.org/en/publikasjoner/international-research-cooperation-in-thenordic-countries.
Downloaded: August, 30th 2013.
[4] Internet 1: Republic of Slovenia, Ministry of Higher Education, Science and
Technology.
The
Webpages
Title,
http://www.arhiv.mvzt.gov.si/si/delovna_podrocja/znanost_in_tehnologija/centri_odli
cnosti_in_kompetencni_centri.
Downloaded: August, 28th 2013.
[5] Internet 2: Slovenian Current Research Information System (SICRIS): The Webpages
Title, http://www.sicris.si/default.aspx?lang=slv.
Downloaded: July 1st 2013.
[6] IMAD (Institute of Macroeconomic Analysis and Development), Government of
Republic of Slovenia. Slovenias Development Strategy. 2005. The Webpages Title,
http://www.arrs.gov.si/en/agencija/inc/ssd-new.pdf.
Downloaded: July 1st 2013.
[7] Katz, Michael L. An Analysis of Cooperative Research and Development. The RAND
Journal of Economics. Vol 17. No. 4, pp. 527-543. 1986.
[8] Katz, J. Sylvan, Martin, Ben R. What is research collaboration? Research Policy, vol.:
26, pp.:1-18. 1997.
[9] Kavlie, Dag and Sleeckx, Eric. Report on Monitoring and Evaluation of Competence
Research Centres (CRC). Brussles: IWT. 2011.
[10] Kejar, Nataa, Korenjak erne, Simona, Batagelj, Vladimir. Network analysis of
works on clustering and classification from web of science. In Hermann
Locarek-Junge, Claus Weihs (eds.): Classification as a tool for research: proceedings
of the 11th IFCS Biennial Conference and 33rd Annual Conference of the
Gesellschaft fr Klassifikation e.V., Dresden, March 13-18, 2009. pp. 525-536. 2010.

33

[11] Nooy, Wouter de, Mrvar, Andrej, Batagelj, Vladimir. Exploratory Social Network
Analysis with Pajek, Revised and expanded second edition. Cambridge, New York:
Cambridge University Press. 2011.
[12] Xu, Hui. A Regional University-Industry Cooperation Research Based on Patent
Data Analysis. Kavlie and SleeckxAsian Social Science, vol. 6 no. 11, pp. 88-94.
2010.

34

Data Technologies and Simulations

Invited lecture:
Peter Richtarik
School of Mathematics, University of Edinburgh, UK.
Big Data Convex Optimization: Why Parallelizing Like
Crazy and Being Lazy Can be Good
Optimization with big data calls for new approaches and theory. In
the big data domain most classical approaches are not applicable
as the computational cost of even a single iteration is often too excessive these methods were developed in the past when problems
of huge sizes were rare to find. New methods are hence needed,
method that are simple, gentle with data handling and memory requirements, and scalable. If ones goal is to solve truly huge scale
problems, it is imperative that modern parallel computing architectures such as multicore processors, graphical processing units, and
computer clusters, are employed. In this talk I will describe a new
approach to big data (convex) optimization which uses what may
seem to be an excessive amount of randomization and utilizes what
may look as a crazy parallelization scheme. I will explain why this
approach is in fact efficient, effective and well suited for big data
optimization tasks arising in many fields, including machine learning, social media analysis and engineering. Time permitting, I may
comment on other optimization methods suited for big data applications.

35

Case Study Web Clipping The Preliminary Work


Joe Buar, Dejan Zidar, Matej Mertik
Faculty of Information Studies
University of Novo mesto
Sevno 13, 8000 Novo mesto, Slovenia
{joze.bucar, matej.mertik}@fis.unm.si

Abstract: Web clipping covers retrieving and extracting static information from Web
sites in order to display the data on a Web-enabled personal digital assistant (PDA*).
The idea is to conserve the PDA's resources by extracting once any static data, such as
graphics, logos, photos or even unnecessary text and storing that data on the PDA. This
project is conducted in collaboration with company Nevtron & Company, d.o.o.
(www.nevtron.si), whose role involves experimental testing of our methods. The purpose
is to develop appropriate applications that for information retrieval about who, where,
when and in what form mentioned companys news, articles, etc., and to determine the
response from readers. In this paper we present the framework and preliminary work on
the Web clipping project.
Key Words: web clipping, sentiment analysis, text mining, data mining

1 Introduction
Information about the organization, its products and services can occur anywhere on the
web. This information may be either true or false, regardless of this fact it has relevant
influence on public opinion and response. Knowing and understanding peoples feelings
about their products and services has a great impact on strategic business solutions of
the organization.
The project presented in this paper is developed in collaboration with SMEs company
Nevtron & Company d.o.o. (Ljubljana, Slovenia), which holds two successful web
pages (www.racunalniske-novice.com and www.student.si). Both are leading in its field;
first as leading IT web page, and the other manly dedicated to student population. They
are interested in obtaining up to date and accurate information about the company, its
products and services (e.g. news, articles, forums, etc.) on these two defined pages,
mobile sites and social networks and to evaluate the relevance of the results for the
company. Our research aim is to develop appropriate applications for obtaining useful
information about user data, user place and user content for companys news, articles,
and other unstructured data on the pages including response and opinions from readers.
The rest of the paper is organized as follows: Section Section 2 introduces Methodology
and Preliminary Work. Section 3 describes Problems. Finally, the paper ends with
Conclusion in Section 4.

36

2 Methodology and Preliminary work


2.1 Problem definition
The company produces a great amount of news and articles daily. They are interesting
in footprint of their content on other web pages and digital media and in on user's
feedback. They want to know opinion of readers about their news and its semantic
content. The automation of this digital feedback would benefit to company's future
editor policy and business decisions.
Due to the complexity, we divided the project into two operational phases: (i) Automatic
web text retrieval and record to the database; (ii) Sentiment analysis.

2.2 Automatic web text retrieval and record to the database


Initially we checked which of the existing search engines is suitable to solve the
problem. We chose Google Custom Search Engine, which allows us to adjust the search
parameters for the project. The following functionalities were defined and integrated for
the aims of application in the developed web portal:
Web pages user can choose between an integrated set of web page; general
search or search by individual web pages can be performed; user can also
define new pages;
Time frame user can search for online content published between selected
dates; selection with typing dates or by selecting dates at built-in calendar;
Keywords user can enter a set of keywords for the proposed search;
Extension interface possible new extensions for the application.
Within the project we have developed:
A module that can automatically retrieve HTML code, detect and record web
texts from resources into the database (capture HTML), the function has been
developed to identify, optimize selection and retrieve textual content;
Traceability of posts (automatic process; search engine finds results according
to the given input and search settings);
MySQL database in which we store information (the date of entry into the
database, the URL of the parent web page, the URL of news/article, title of
news/article, keywords, importance of publication of the website according to
www.alexa.com (TR and GL rank) and screenshot;
The name and logo of the portal (Sean);
Web module provides: a tabular overview of search results; recording data into
the database individually or by multiple selection of results; editing, archiving
and deleting content, etc.;
Sign-in and registration form; we arranged two levels (administrator and user
rights);
Access to a test environment, which is available at http://dejan.amadej.si/test.

2.3 Sentiment analysis


From the corpus of analyzed un-structured data retrieved from web, sentiment analysis
will be performed:
Filtering, cleaning and preprocessing of data;
Annotation of randomly selected textual content and generation of learning
corpus for the sentiment, which will be ready for testing classification methods
and algorithms;
37

Testing and integration of machine learning algorithms for the classification of


sentiment;
Evaluation of tested classification algorithms (accuracy, precision, recall);
Branding of the company (identification sentiment about the company,
evaluating response to posts and analysis.
Graphical representation of the results.

3 Problems
During the first phase of the project we encountered a lot of obstacles, which we
managed to overcome successfully. However, we had some problems with the automatic
acquisition of the textual contents from HTML code. Web pages do vary widely, so it
was necessary to develop a specific function for optimal information recognition and
retrieval. Many web pages contain errors in the HTML code. These errors can lead to
difficulties in obtaining web content. For that reason user has an option of manual entry
and editing. Production of screenshots and storing them into the database took a lot of
time and effort. Obtained screenshots show real content of web pages with high
resolution, which consequently takes a lot of space. Problems were also at integration of
search within the time frame (Google Custom Search Engine), and due to the
complexity also with construction of MySQL database.

4 Conclusion
Todays problem is that theres tons of information (customer feedback, competitor
information, client emails, tweets, press releases, legal filings, product & engineering
documents, etc.), which rapidly grow. In addition to saturation of information,
humankind is still hungry of knowledge derived from retrieved information. For many
reasons, it is absolutely impossible to employ readers that could extract important
knowledge about your customers, competitors, or your company operations,
organization, marketing, sales, engineering, and product & service quality. With the use
of appropriate technology and knowledge we can overcome these challenges.
In this article we briefly introduced the preliminary results of Web clipping project. It is
project focused on automatic interpretation of SMEs traces on web pages for generating
users opinions from unstructured text of community on company's portals and
associated social networks.

5 Acknowledgements
"Work supported by Creative Core FISNM-3330-13-500033 'Simulations' project
funded by the European Union, The European Regional Development Fund and
Nevtron & Company d.o.o. research voucher founded by Ministry of Education,
Science and Sport, Slovenia. The operation is carried out within the framework of the
Operational Programme for Strengthening Regional Development Potentials for the
period 2007-2013, Development Priority 1: Competitiveness and research excellence,
Priority Guideline 1.1: Improving the competitive skills and research excellence."

38

6 References
[1] Boiy, E. et al. Automatic sentiment analysis in on-line text. In Proceedings of the
11th International Conference on Electronic Publishing, pages 349-360, 2007.
[2] Kushal, D.; Lawrence, S. and Pennock, D. M. Mining the peanut gallery: Opinion
extraction and semantic classification of product reviews. In Proceedings of the
World Wide Web Conference, pages 519-528, 2003.
[3] Liu, B. Sentiment analysis and subjectivity. Handbook of natural language
processing, pages 1-38, 2010.
[4] Liu, B. Sentiment analysis and opinion mining. In Synthesis Lectures on Human
Language Technologies, 5 (1): 19-21, 2012.
[5] Pang, B.; Lee, L. and Vaithyanathan, S. Thumbs up?: Sentiment classification using
machine learning techniques. In Proceedings of the ACL-02 Conference on
Empirical Methods in Natural Language Processing, pages 79-86, 2002.
[6] Pang, B. and Lee, L. Opinion mining and sentiment analysis. In Foundation and
Trends in Information Retrieval, 2, 2008.
[7] Popescu, A. and Etzioni, O. Extracting product features and opinions from reviews.
In Proceedings of the conference on Human Language Technology and Empirical
Methods in Natural Language Processing, pages 9-28, 2007.
[8] Turney, P. D. Thumbs up or thumbs down? Semantic orientation applied to
unsupervised classification of reviews. In Proceedings of the Association for
Computational Linguistics, pages 417-424, 2002.
[9] Wilson, T. ; Wiebe J. and Hoffman, P. Recognizing contextual polarity: an
exploration of features for phrase-level sentiment analysis. Computational
Linguistics, 35(3): 399433, 2009.

39

Automatic invoice capture in small and medium-sized


Slovenian enterprises project overview
Darko Zelenika, Simon Kegljevi, Andrej Dobrovoljc, Janez Povh
Faculty of Information Studies
Laboratory of Data Technologies
Ulica talcev 3, SI-8000 Novo mesto, Slovenia
darko.zelenika@fis.unm.si, simon.kegljevich@gmail.com,
{andrej.dobrovoljc,janez.povh}@fis.unm.si

Bernard enko
Joef Stefan Institute
Department of Knowledge Technologies
Jamova cesta 39, SI-1000 Ljubljana, Slovenia
bernard.zenko@ijs.si

Boo Tomas
University of Mostar
Faculty of Mechanical Engineering and Computing
Matice hrvatske bb, 88000 Mostar, Bosnia and Herzegovina
bozo.tomas@hteronet.ba

Abstract: In todays digital age paper documents are still in use and are a part of every

enterprise. Many enterprises have a hard time managing all their paper documents,
especially because this requires their manual transcription, which is time consuming.
There are a variety of products on the market that offer solutions for automatic
document capture and data extraction. Experience of Mikrografija d.o.o. suggest that
small and medium-sized enterprises are slow in adopting these automatic solutions and
the main reason for this is the high cost of such solutions. What most of them primarily
want is to automate the capture process of their received invoices. The goal of
Mikrografija d.o.o. is to provide such a service for an affordable price with a focus on
small and medium-sized enterprises from Slovenia and other countries in the Adriatic
region. They asked Faculty of information studies in Novo mesto to address these issues.
In this paper we present the latest results from this still ongoing project.
We analyzed and evaluated twelve commercial and open source OCR SDKs and chose
Abbyy SDK as the main development tool that fulfils the project requirements. Abbyy
SDK is a very powerful and easy to use document recognition tool and its only
disadvantage is that developers cannot influence the main recognition functions and
adjust them for their needs.
Key Words:invoice recognition and categorization, OCR SDK analysis, Abbyy
FlexiCapture Engine, invoice capture software

40

1 Introduction
Even though nowadays computers and the internet are a part of practically every enterprise,
and their documents mostly originate on the computer in a digital form, paper documents
are still an integral component of their everyday tasks and are likely to remain so for the
foreseeable future [4]. The Economist in April 2012 proclaimed that instead of worldwide
paper consumption decreasing with the advancement of digital technology and all of the
new gadgets, it has actually increased by half since 1980 [9]. Thus, as paper documents are
not going to be eliminated, there is a need to deal with the flow of digital and paper
documents in an effective and integrated way.
Document categorization is one of the central document management and retrieval
problems. The aim is to recognize documents and group them into a predefined set of
categories. Enterprises usually perform document categorization tasks on a daily basis
mostly relying on human resources. However, manual document categorization is a very
time consuming and expensive task and also a very error prone task. As the quantity of
paper documents in different enterprises is growing daily, there is an increasing need to
process these documents automatically with as little human interaction as possible. This
automatic process saves a lot of time, money and reduces the required human resources.
The most popular solutions for automatic document recognition and categorization on the
market that enterprises can use to organize their documents are Abbyy FlexiCapture
(Abbyy) and eFlow (TIS) [1, 8]. These solutions use OCR (Optical Character Recognition)
technology to transform paper documents to digital documents and then use the results
obtained by OCR to perform document recognition, categorization and data extraction.
The purpose of this paper is to introduce an invoice recognition project which authors are
currently working on and report the latest results. The main goal of this paper is to analyze
all available open source and commercial OCR SDKs (Software Development Kits) and
choose the one which best suits project requirements to develop invoice recognition
software. The analysis and evaluation of the available OCR SDKs, and discussion of
advantages and disadvantages of Abbyy SDK from a developers point of view are the main
contributions of this paper. The expected result of the project overall is that the Adriatic
market will have a custom designed invoice recognition product supporting languages from
the Adriatic region.
The rest of the paper is organized as follows. In section two, the invoice recognition project
is introduced. Section three describes the analysis of SDKs and presents the initial results of
the project. In section four, the chosen SDK is presented. Finally, section five concludes the
paper.

2 Invoice recognition project


Mikrografija d.o.o. ("the contractor" in the rest of the paper) is a Slovenian company, which
offers modern solutions for electronic document management and electronic archiving [6].
Their general activity is to capture, process and store paper documents in an electronic
form.
The problem that the contractor faces is that small and medium-sized enterprises have a
hard time in adopting automatic data capture software solutions which would convert their
paper documents into electronic ones. The main obstacle is the price of such software.
However, what most of these enterprises primarily want is to automate the capture process
of their received invoices. Therefore, they do not need complex and expensive data capture
software tools like TIS and Abbyy, because most functions that these solutions offer are not
needed in small and medium-sized enterprises. Our contractor wants to offer a simple and
easy to use invoice recognition software solution tailored to the enterprises from the

41

Adriatic region, which is reasonably priced and offers training, recognizing and extracting
data from invoices (e.g., invoice number, total amount, date, etc.) [10]. Other functionalities
of the software that our contractor wants to offer are:

Recognition of special characters like (,,) which are present the in the languages
of the Adriatic region.
The output format of extracted data needs to be compatible with most of the
business applications that are commonly used in the Adriatic region.
Simplified interaction with commonly used business applications.
Easy configuration and preparation of invoices for recognition process.
The solution that can be used on the web and on mobile devices.

Research and development of the invoice recognition software was offered as a project to
the Laboratory of Data Technologies at the Faculty of Information Studies in Novo mesto,
Slovenia [3]. Based on the software requirements listed above, the project is composed of
the following phases. Researchers from the Faculty of Information Studies need to:
Analyze all the available open source and commercial OCR SDK solutions that
enable development of the required software.
Choose the best solution based on the software requirements.
Investigate the Slovenian market regarding the use of business applications (ERP,
CRM).
Investigate the most frequently used metadata on the received invoices.
Investigate the cost-effectiveness of the proposed product (customers expectations).
Develop the application prototype.
This project is still active and in the next two sections we describe the work performed so
far and report the latest results.

3 Analysis of OCR SDKs


An analysis of commercial and open source OCR SDKs [5] was conducted prior to the
selection of Abbyy SDK as the main development tool. TIS was not included into the
analysis because it does not offer SDK for document recognition. Based on specific needs
of the contractor, multiple solutions were analyzed in search of the most appropriate OCR
SDK. Ten commercial and two open source solutions were analyzed, which is shown in
Table 3. These solutions were rated based on sixteen criteria. Criteria were set according to
the contractor's input, best functionalities of comparable solutions and supporting processes
of OCR technology. The importance of the criteria was chosen based on the question:
Which functionalities are needed for a good OCR system? In Table 1, the importance of
the criteria and their ratings ranging from -10 to +7 points are shown. Functionalities of
twelve OCR SDKs according to the chosen criteria are shown in Table 2. The 'license price'
criterion indicates the price class to which a particular solution belongs. The higher class
number indicates more expansive license. In the case of an open source solution this criteria
was not included. The next type of criteria is a numerical type, which indicates the quantity
of a particular criterion (e.g. a particular solution can have many input file formats). And

42

the last type is a binary type, which indicates whether the solution meets conditions of a
criterion or not (e.g. a particular solution can have or not have .NET framework support).
With this OCR SDK analysis methodology we reached the goal (the best optimal OCR
SDK solution) and presented results in an easy way, and that is way this methodology was
chosen for the analysis. Similar research to this, which is focused on the accuracy of OCR
systems [7] uses similar methodology.
The results of the analysis are shown in Table 3. The evaluation of OCR SDKs was based
on Eq. 1 where c indicates the criterion weight from Table 1, r indicates the result of an
OCR SDK according to particular criterion from Table 2 and s indicates the final result of a
particular OCR SDK from Table 3.The binary values of and X from Table 2 were
transformed to numerical values of 1 and 0, respectively, and then used in Eq. 1..
s=(c1*r1)+(c2*r2)++(c16*r16)

(1)

Out of the twelve evaluated solutions, eight of them (1, 2, 3, 4, 5, 7, 8, 9 Table 3) met the
requirements of the contractor and three of the best solutions (1, 2, 3 Table 3) were picked
for further comparison. Abbyy OCR SDK had the highest rating of all analyzed solutions.
Based on our research, open source OCR SDK solutions are comparable to commercial
solutions. This is due to minimal implementation costs and satisfying functionality that
open source solutions offer. Lower graded solutions did not provide the expected
functionalities or were too expensive. Abbyy OCR SDK is one of most well-known
solutions and offers a wide range of functionality at a high cost. Nicomsoft OCR SDK
offers satisfying range of functionality for a lower price and without license fees for each
end user. Rated third, open source solution called Tesseract is free to use but lacks a few
basic functionalities that other solutions offer. The most important missing features are
picture preprocessing, ICR and OMR support.
Table 1: Rating criteria
#
Criterion
1.
License price
2.
Support of specific characters in Slovenian language
3.
Number of different languages supported in OCR
4.
Dictionary and optimization
5.
Scanner support
6.
Picture preprocessing
7.
Input file formats
8.
Font recognition and output
9.
Text formatting output
10. Output file formats
11. ICR (Intelligent Character Recognition) support
12. OMR (Optical Mark Recognition) support
13. .NET framework support
14. Number of supported programming languages
15. Source code documentation and examples
16. Access to internal OCR data used in voting
Source: own research

43

Rating/Weight
-10
+7
+0.05 (per language)
+5
+3
+5
+0.5 (per format)
+4
+5
+1 (per format)
+7
+7
+7
+1 (per language)
+7
+2

Table 2: SDKs functionalities

ABBYY

ExpertVision

OmniPage

Lead Tools

Transym

Dynamsoft

Aquaforest

Nicomsoft

ImageGear

DotImage

Puma.net

Tesseract

Criteria from
Table 1. \
OCR system

1.
3
2.

3.
198
4.

5.

6.

7.
8
8.

9.

10.
11
11.

12.

13.

14.
6
15.

16.

Source: own research

2
X
14

X
X
6

X
5
X
X
X
2

123
X

5
X
X
6

3
X
3

1
X
12

X
2
X
X
1
X
X

40

X
5

3
X
20
X
X

3
X

3
X
X

26

5
X

3
/2

100
X

5
X

32

5
X

27
X
X
X
6

3
X
X

63

X
X
1

1
X
X

Table 3: Results of OCR SDK analysis


#
1.
2.
3.
4.
5.
6.

Product
Abbyy OCR SDK
Nicomsoft OCR SDK
Tesseract (Ray Smith) - open source
Dynamsoft Dynamic .NET Twain
Puma.NET (Maxim Saplin) open source
LeadTools Recognition Imaging Developer
Toolkit
7.
Atalasoft DotImage
8.
Nuance Communications OmniPage
9.
Accusoft ImageGear
10. Transym TOCR 4.0
11. ExpertVision OpenRTK 7.0
12. Aquaforest OCR SDK
Source: own research

Slovenian language support criterion number two from Table 1.

44

Points
59.9
45.3
42.4
42.5
38.4
1
38 (no SI language support)
34.1
31.7
29
1
20.6 (no SI language support)
6.7 (no SI1 language support)
4.5 (no SI1 language support)

4 Abbyy FlexiCapture Engine SDK


Based on the comparison presented in the previous section, Abbyy FlexiCapure Engine 10
SDK [2] (FCEngine) was selected to be used in our project. FCEngine is a data capture and
document processing SDK, which enables developers to build automatic document
recognition, categorization and data extraction functionalities into their applications.
To recognize a document with FCEngine, document training needs to be performed first.
Document training can be performed by using FCEngine training API (Application
Programming Interface) or FlexiLayout Studio, which allows users to create logic for
document recognition and data extraction tasks. This logic can be archived in the form of a
file, which is called Document Definition.
FCEngine training API is a part of FCEngine SDK, while FlexiLayout Studio is a separate
application. FCEngine training API is intended to design simple Document Definitions with
ability to recognize invoices based on keywords and barcodes, and extracted data can be in
the form of simple fields such as invoice number, date, etc. Also, the training API
cannot handle multi-page invoices, while FlexiLayout Studio has the ability to work with
multi-page invoices and to design complex Document Definitions, which include more
options for invoice recognition and data extraction tasks. The data extracted by FlexiLayout
Studio can be caves in a table format.
Document Definition describes: how to identify an invoice, what data need to be extracted
and how to find this data. Therefore, Document Definition represents a simple document
classifier, which has to have unique features, which are used to recognize invoices. After
the invoice recognition process is performed, the trained data fields from Document
Definitions are used to extract data from invoices. Data can be extracted into one of the
following formats: xls, dbf, csv, txt and xml.
FCEngine has a very detailed documentation and a set of examples which enables
developers to quickly understand and implement its functionalities in their applications. It
has built-in functions which enable easy document training and recognition process. The
only disadvantage is that developers cannot influence any of the main functions (e.g.,
training, recognition, etc.) and adjust them to their own requirements and implement their
own logic into them.

5 Conclusion
Automatic document categorization software can be very helpful and time-saving for
enterprises. Small and medium-sized enterprises are mostly dealing with invoices and often
refuse the use of such software because of its complexity and price. Authors of this paper
are part of the project whose aim is to develop affordable invoice recognition software for
small and medium-sized enterprises from the Adriatic region. Based on the analysis of
twelve open source and commercial OCR SDKs, the highest rating was given to Abbyy
SDK, which was also chosen as the main development tool for our project. The project is
still active and our further work includes finishing the application proposed in the project.

6 Acknowledgements
The presented work was supported by Creative Core FISNM-3330-13-500033 'Simulations'
project funded by the European Union, The European Regional Development Fund. The
operation is carried out within the framework of the Operational Programme for
Strengthening Regional Development Potentials for the period 2007-2013, Development
Priority 1: Competitiveness and research excellence, Priority Guideline 1.1: Improving the
competitive skills and research excellence.

45

7 References
[1] Abbyy. Abbyy FlexiCapture, http://www.abbyy.com/data_capture_software/,
downloaded: October 2nd 2013.
[2] Abbyy. Abbyy FlexiCapture Engine, http://www.abbyy.com/flexicapture_engine/,
downloaded: October 2nd 2013.
[3] DataLab,Laboratory of Data Technologies Projects,
http://datalab.fis.unm.si/joomla/index.php/projects, downloaded: October 2nd 2013.
[4] Jervis, M; Masoodian, M. Evaluation of an Integrated Paper and Digital Document
Management System. In Proceedings of the 13th IFIP TC 13 international conference on
Human-computer interaction INTERACT 2011, pages 100-116,Lisbon, Portugal,
2011.
[5] Kegljevi, S. Analysis and comparison of existing OCR libraries. Bachelors thesis,
Faculty of Information Studies, Novo mesto, Slovenia, 2013.
[6] Mikrografija d.o.o, http://www.mikrografija.si/, downloaded: October 2nd 2013.
[7] Stephen V. R.; Frank R. J.; Thomas A. N. The Fifth Annual Test of OCR Accuracy.
Information Science Research Institute, Las Vegas, SAD, 1996.
[8] The eFlow platform, http://www.topimagesystems.com/solutions/overview/eflowoverview, downloaded: October 2nd 2013.
[9] The Economist Online. How much paper does a person use on average in a year?,
http://www.economist.com/blogs/graphicdetail/2012/04/daily-chart-0, downloaded:
June 27th 2013 (29.07.2013).
[10] Zelenika, D; Povh, J; Dobrovoljc, A. Document Categorization Based On OCR
Technology: An Overview. In Proceedings of the 7 th European Computing Conference,
pages 409-414, Dubrovnik, Croatia, 2013.

46

Handwritten signature authentication using statistical


measures of basic on-line signature characteristics
Miroslav Baa, Tomislav Fotak, Petra Grd
Faculty of Organization and Informatics
University of Zagreb
Pavlinska 2, 42000 Varadin, Croatia
{miroslav.baca, tomislav.fotak, petra.grd}@foi.hr

Abstract: Personal signature has become an important part of human identity


verification. Therefore, it is important to protect ones signature and be able to detect
signature forgery. We present a statistical based approach to signature verification with
user oriented signature feature weights. Only basic on-line signature features are used
in the verification process. Depending on the authentication threshold, our method
achieves satisfying FAR for unskilled and skilled forgery, but there is still some research
to be done to improve user experience (reduce FRR).
Key Words: handwritten signature authentication, signature verification, on-line
signature, statistical measures, feature weights

1 Introduction
Handwritten signature is defined as the persons first and last name written in its own
handwriting [1]. It is used almost everywhere, on daily basis, and has become common
authentication method. This is not surprising, since its usage, as a mean of giving
consent to something that needs to be done, dates from ancient times.
Personal signature has become important part of the human identity verification.
Therefore, it is important to protect ones signature and be able to detect signature
forgery. This goal can be achieved through signature biometrics. Whether off-line or online signature features are used, the applied biometric method must provide valid
information about signature correctness in regard to claimed identity. Handwritten
signature is behavioral biometric characteristic and almost everything can have an effect
on one signature, so the task of proving its correctness is not always as easy as it seems
to be.
To avoid static image analysis and be able to operate on live handwritten signatures,
researchers in this field have started analyzing dynamic signature features, such as pen
pressure, signature velocity, etc. Those features are usually obtained through special
electronic device used to retrieve signature. In our previous work [2] we have already
extracted the basic set of mainly global signature features that are being used to register
user, i.e. store their signature vector in the local database and use it in the verification
process.
In the rest of the paper we will describe the basics of our simple authentication
method that uses stored vector to calculate person-specific weights of each signature
feature and use it in the signature correctness decision process.

47

2 Previous work
We have already provided the set of signature features that are only the beginning in the
process of determining the ideal feature subset to be used in personal authentication.
Those features are [2]: number of strokes in signature, number of pen-ups, signature
aspect ratio, signature length, signing time, pen time-down ratio, pen time-up ratio,
signature speed, velocity along the x-axis, velocity along the y-axis, average pen
pressure and strongest pressure moment. One could notice that these features are mainly
global handwritten signature features. It does not mean that we disregard local features;
we rather give the basic set of features that can be used to compute some others and can
also be used on local level, e.g. we can determine all these features for each stroke.
Extracted features are similar to those presented in [4] and [5]. To analyze them we
have chosen five statistical measures: mean, standard deviation, median, minimum
value and maximum value; calculated from the set of all signatures retrieved during the
registration process. These measures are calculated for each feature separately and
stored as a part of the signature vector. Note that there is only one vector per person,
containing all statistical measures of all extracted features. Up till now, we have all
prerequisites for authentication process.
Before we proceed further with the authentication process one must be aware that
on-line signature authentication is well known research problem and that there are
several very important authentication methods in the literature. The newest state of the
art defines approaches to feature extraction and on-line signature verification [3]:
Dynamic Time Warping (Elastic Matching), Hidden Markov Model (HMM), Gaussian
Mixture Models (GMM), structural approaches, statistical-based approaches, Support
Vector Machine, neural networks, transform domain approaches, intelligent approaches,
protected approaches and semi-online approaches.

3 Personal signature authentication


Our authentication model uses feature based statistical approach to achieve successful
personal signature verification. The process starts with the signature acquisition via
digitizing tablet, continues with the dynamic feature extraction and feature weights
calculation, ending in the verification decision making module. This process, including
user registration, is shown in Fig 1.
Each registered user has its own signature vector stored in the database. The vector
contains five statistical measures of all features, meaning that it contains 60 statistical
measures overall (12 features, 5 measures for each). Using these measures we are able
to obtain the most and the least constant feature, i.e. calculate feature weights for
authentication process.

48

Figure 1. Personal signature registration and verification process

3.1 Feature weights calculation


Every user had to provide from 10 to 15 handwritten signatures during the registration
process. The goal of this process is to find which of twelve features was the most
constant, i.e. which feature has the same or similar value in every signature, and which
one was the least constant. According to the mentioned, features will gain their weights.
The sum of all weights must be 1.
Calculating feature weight is basically a four-step process. In the first step we
calculate coefficient of variation from the mean value, i.e. coefficient of variation for
each feature.
Let, ADi be the coefficient of variation of the i-th feature, STDEVi be standard deviation
of the i-th feature and MEANi be the mean of the i-th feature, then we can calculate ADi
using following equation:
(1)

49

The less the coefficient of variation is the more constant the feature is. Special case is
when it is zero. In the authentication process for this special case, if the users signature
does not contain specified value, user will be rejected instantly, but if it contains
specified value, the coefficient of variation will be calculated by dividing mean by
itself, leading to small weight because it is expected of that feature to have specified
value.
Second step of the process is the calculation of the relative range of values for each
signature feature. Since coefficient of variation cannot be the best measure for the
features with great standard deviation, we have to introduce another measure that will
deal properly with these features. It can be calculated as:
(2)
Where,
RRVi Relative range of values of i-th feature,
MAXi Maximum value of i-th feature,
MINi Minimum value of i-th feature.
This is also the measurement of the feature stability. The more stable feature is the more
weight it should gain.
Next step in the process is normalizing previously derived measures to interval [0,
1]. This is achieved by calculating values reciprocal, converting all values to strive to
bigger number instead of zero and finally calculating the Euclidean constants for each
derived measure. The normalization is done by dividing each value from the measures
set with corresponding Euclidean constant followed by dividing each value with the
sum of all values for given measure. Since derived measures for each signature feature
are now in the interval [0, 1], we are able to compare each feature measures and
calculate final feature weight.
The last step of the process is calculating final weights of signature features. This is
done by expressing the mean value of the normalized coefficient of variation and
relative range of values. Eq. 3 describes the process mathematically:
(3)
Where,
w(k) Weight of k-th signature feature,
N_AD(k) Normalized coefficient of variation of k-th signature feature,
N_RRV(k) Normalized relative range of values of k-th signature feature.
The obtained weights are used in the authentication process.

3.2 Authentication process


Each signature feature can obtain certain amount of points, depending on its weight.
The maximum points it can obtain is the value of its weight and this is possible only if
the given signature has feature value in defined maximum deviation from mean or
median. If the value is closer to maximum or minimum value from the vector, the
feature can obtain 80% or 90% of maximum points. These margins are determined
empirically, leaving the user possibility to gain some points even if the presented
signatures feature is closer to registered extreme values. How much points can each
statistical measure of one signature feature obtain is described in the following
algorithm:
50

if ((AVG < MEAN and MEAN < MEDIAN) or (AVG > MEAN and MEAN > MEDIAN))
feature100percent = MEDIAN
else
feature100percent = MEAN
if (ABS(feature100percent MIN) < ABS(feature100percent MAX))
feature90percent = MIN
feature80percent = MAX
else
feature90percent = MAX
feature80percent = MIN

The AVG in the algorithm is the mean value of minimum value, median and maximum
value. The ABS denotes the absolute value.
Every feature of the signature acquired for verification can fall in the scope of
defined deviation. The allowed deviations from statistical measures are calculated from
AD for each feature and depend whether feature is closer to minimum or maximum
value. They are shown in Table 1 for mean and median, i.e. for the potential 100%feature.
Feature AD
<0.15
0.15- 0.3
0.3 - 1

Feature
closer to
Minimum
Maximum
Minimum
Maximum
Minimum
Maximum

Allowed deviation from


statistical measure (-)
1 * STDEVi
0.5 * STDEVi
0.5 * STDEVi
0.25 * STDEVi
0.25 * STDEVi
0.125 * STDEVi

Allowed deviation from


statistical measure (+)
0.5 * STDEVi
1 * STDEVi
0.25 * STDEVi
0.5 * STDEVi
0.125 * STDEVi
0.25 * STDEVi

Table 1. Allowed deviations from mean and median


Similar principle is applied to calculating the allowed deviations for 80%- and 90%features. The amount of one half or one quarter standard deviation is allowed.
Authentication is done by comparing acquired signature features with statistical
measures and allowed deviations, giving them 100%, 90%, 80% or 0% of feature
weight. Points for all features are summed and if they pass predefined threshold the user
is authenticated, otherwise it is rejected.

4 Results and discussion


To authenticate user, a predefined threshold must be defined. We tested our system with
several thresholds to obtain system performances. The system was tested with three
measurements: false acceptance rate with unskilled forgeries (FAR-N), false acceptance
rate with skilled forgeries (FAR-S) and false reject rate. The results are presented in
Table 2.
Threshold
0,57

FAR-N
0,42%

FAR-S
1,47%

51

FRR
42,4%

0,5
0,45
0,42
0,38

1,33%
4,83%
6,57%
9,66%

4,98%
9,42%
11,32%
12,24%

27,82%
20,45%
17,32%
12,31%

Table 2. System performance for proposed authentication method


At the first sight, the results of this system might seem unusable in commercial systems
but one must be aware that this authentication method is actually very rigorous and by
higher thresholds achieves reliable false acceptance rate. This authentication method is
very simple and uses only the basic on-line handwritten signature features. To achieve
better results, probably a larger and more descriptive set of features should be extracted.

5 Conclusion
Handwritten signature and its biometric application is one of the most common used
biometric characteristic among researchers in the academic community. We presented a
statistical based approach to signature verification with user oriented signature feature
weights. Our approach uses only the basic on-line signature features and, depending on
defined threshold, achieves satisfying false acceptance rate, but there is still more
research to be done regarding feature extraction and making authentication method
better and more robust to environmental factors.

6 References
[1] Ani, V. et al. Hrvatski enciklopedijski rjenik. In Croatian. Novi Liber, Zagreb,
Croatia, 2002.
[2] Baa M., Koruga P., Fotak T. Basic on-line handwritten signature features for
personal biometric authentication. 2011 Proceedings of the 34th International
Convention MIPRO, pages 116-121, Opatija, Croatia, 2011.
[3] El-Henawy, I.M., Rashad, M. Z., Nomir, O., Ahmed, K. Online Signature
Verification: State of the art. International Journal of Computers & Technology,
4(2):664-678, 2013.
[4] Gupta, G. K., Joyce, R. C. A Study of Some Global Features in On-Line Handwritten
Signature. The International Journal of Automated Identification Technology
(IJAIT), 1(2), 2009.
[5] Lee, L. L., Berger, T., Aviczer, E. Reliable On-Line Human Signature Verification
Systems. IEEE Transactions On Pattern Analysis and Machine Intelligence,
18(6):643-647, 1996.

52

Information Society and Simulations

53

An approach to identify organizational security


vulnerabilities
Andrej Dobrovoljc
Faculty of Information Studies
University of Novo mesto
Sevno 13, 8000 Novo mesto, Slovenia
andrej.dobrovoljc@fis.unm.si
Abstract. Fundamental goal of information security management system (ISMS) is continuous assurance of information systems (IS) operability and safety. In order to avoid
risk, we need to identify and remove all sorts of vulnerabilities within it. Due to technological novelties and permanent changes in the user environment, we are daily faced with
the new ones. Sustainable security and operability can be achieved only by proactive risk
anticipation. The acceptance rate of the established ISMS reflects in its organizational
vulnerabilities and strengths. In our study, we propose the UTAUT acceptance model in
order to measure the usage rate of ISMS. Besides that, such model can be improved by
using security domain specific factors, which influence the security system acceptance.
Results will allow us to proactively estimate organizational security risks.
Keywords. vulnerability, ISMS, risk, UTAUT, acceptance model, security culture

Introduction

Organizations deploy information security management systems (ISMS) in accordance


with the requirements and recommendations of the various safety standards. One of the
most notable ones is ISO 27001. The requirements can be met by using technical or
organizational solutions. Usually it is a balanced combination of both. Relying solely on
technical solutions is not enough. We must always pay attention to the human factor as
well.
Unfortunately, acquired security certificate does not tell us enough about the real information system (IS) security. Organizations with the same security certificate can heavily differ in their security level. Even organizations without a certificate can offer higher
security than the certified ones if they follow proper recommendations. In this respect, we
pose an important question. Namely, what impact has the obtained security certificate on
the actual IS security. Does it have any impact on the organizations culture?
Organizations protect their information assets according to their needs. This reflects
in a selection of suitable technologies and procedures. Implemented security components
influence security behaviour of employees and finally it evolves as a security culture [1].
In order to achieve a high security of IS, the employees have to strictly obey security
policy. Apparently, the real IS security depends on the security culture that prevails in the
company.
54

IS is a combination of hardware, software, infrastructure, data, procedures and trained


personnel. With each of the components, we can bind specific risks that exist as a result
of existence of their vulnerabilities. We talk about:
- hardware vulnerabilities (natural catastrophes, physical deterioration),
- vulnerabilities of software products (unplanned usage like XSS, Buffer Overflow),
- data vulnerability (obsolete data formats),
- organizational vulnerabilities (mistakes and deficient procedures, user attitudes),
- human vulnerabilities (social engineering) [2].
Accordingly, several models for anticipation of vulnerabilities have been developed
with regard to specific elements:
- hardware: MTBF (Mean Time between Failures), MTTF (Mean Time to Failure)
- software: VDM (Vulnerability Discovery Models) [3]
- human: NLP (Neuro-linguistic Programming), TA (transactional analysis) [2]
Operating procedures are those elements of IS, which connect other components of
IS into useful service. In organizations with the established ISMS, security procedures
become their integrating part. Therefore, it is very important to anticipate also the organizational vulnerabilities. They appear as a consequence of non-acceptance of security
procedures by users. In our study, we are going to develop the model for the organizational
vulnerability estimation. Such model will be useful tool for proactive risk anticipation.
The remainder of this paper is structured as follows. Section two shortly describes
some existing technology acceptance models (TAM). In sections three and four, the methodology of research is described. In the final section, a paper summary is given with a short
description of further work.

Acceptance models

On the field of IT/IS, several models have been developed to assess the acceptance and
use of technology. Davis [4] proposed the technology acceptance model (TAM) with the
following key concepts:
- perceived ease of use (PEOU): degree to which a person believes that the system is
easy to use,
- perceived usefulness (PU): degree to which a person believes that the system will
be useful in performing his or her job,
- behavioural intention (BI): degree to which a person has formulated conscious
plans to perform or not perform some specified future behaviour.
TAM is a flexible model. It has been extended several times and adapted into specific domains. Among other important acceptance models, we should put out also some
other models [5]: the theory of reasoned action (TRA), the motivational model (MM), the
theory of planned behaviour (TPB) and the innovation diffusion theory (IDT). Venkatesh
55

et al. [6] reviewed the existing acceptance models and formulated the unified theory of
acceptance and use of technology (UTAUT). This new model outperforms other existing
acceptance models in predicting user behaviour. UTAUT model consists of the following
constructs (causal relationships are depicted on Fig. 1):
- performance expectancy: degree to which an individual believes that using the system will help him or her to attain gains in job performance,
- effort expectancy: degree of ease associated with the use of system,
- social influence: degree to which an individual perceives that important others believe he or she use the system,
- facilitating conditions: degree to which an individual believes that an organizational
and technical infrastructure exists to support use of the system.

Figure 1: UTAUT model


We can find many researches using the UTAUT model in the area of eLearning systems, mobile services, process automation and some others [5]. Up to our knowledge
no similar studies have been done in the security domain. Regular usage of ISMS is extremely important for IS security. Our idea is to verify UTAUT on the ISMS acceptance
and then to extend it with the factors that are specific for security domain. Expected final
result is the acceptance model, which will help us improve the ISMS usage and consequently to decrease the level of the organizational security risk.

Model verification method

At first we will verify the suitability of the UTAUT model for the ISMS acceptance by
employees. Empirical data will be collected by means of an on-line questionnaire. The
survey will be carried out by inviting employees in different organizations. We will invite
also some organizations with the established ISMS but without corresponding certificate.
The ISMS acceptance will be checked by using the concepts of the basic UTAUT model.
We will use the 7-step Likert scale. Respondents will make their selections according to
the degree of agreement with the given assertion. The research will be carried out in three
steps:
56

- Pre-test: contents test of the questionnaire in order to improve the understandability of the questions (5 persons)
- Pilot-test: the pilot implementation of the questionnaire, where we will check the
reliability of the questionnaire (empirical validation, which will include 30 persons). In this step, we need to ensure the accuracy of the measuring instrument.
- Implementation of the questionnaire survey: the acquisition of empirical data
through questionnaires (300 persons planned)
Characteristics of the respondents will be presented in the form of descriptive statistics. Empirical data on their agreement with the individual assertions will be used to assess
the impact of individual constructs of the UTAUT model. Structural equation modelling
(SEM) will be used to test the fit of the model with the empirical data. The analysis will
answer the question about which factors have a significant impact on the acceptance and
use of the ISMS.

Specific factors of security domain

When we study users behaviour and their attitude toward the usage of system, the human
factor is involved. Many studies in security area have already identified and included
such factors in their models [1, 7, 8]. A well known is the Swiss Cheese Model (SCM)
of accident causation, which is used for risk analyses [7]. The basic idea behind it is to
illustrate the human system (the hierarchy of organization) with four layers of cheese.
On these defensive layers are holes, which represent hidden weaknesses (vulnerabilities).
When the holes in all layers align, the organisation can suffer loss. We have to determine,
what are these holes and how big they are. Top three layers of the SCM model are similar
to some concepts of the UTAUT model:
- Top management (setting strategy, decision-making) social influence,

- Middle management (implementation of the strategy) facilitating conditions,

- Operational level (performing procedures) actual system use,

- Layer of actual risk events no similarities to any UTAUT concept.

In top three layers latent failures can happen due to decisions made by managers and
users on all levels. These decisions produce special conditions, which result in pressure,
inadequate equipment, lack of technology, understaffing, inexperience etc. Decisions are
a result of the organizational culture. The operational level depends much on the invisible
level of the security culture. At this level, subconscious beliefs of employees about the
system and their assumptions about it are especially important. Higher organizational
levels have greater impact on the visible part of the culture [9]. But all these factors
together have impact on the actual use of the ISMS. In our study, we will extend the
existing UTAUT model with the factors from the security area and empirically test it.
One of the first steps will be the identification of human factors on existing security and
organizational models.
Based on the identified key factors and the causal relationships among them, we can
design methodology for proactive risk assessment. With the appropriate activities, we
will be able to reduce vulnerabilities on individual organizational layers and consequently,
improve the safety of IS.
57

Conclusion and further work

We present the concept for measuring the organizational vulnerabilities in the organizations with the established ISMS. We propose the usage of the UTAUT model in order to
estimate the acceptance of the system. The greater is the regular use of the ISMS, the less
vulnerabilities exist in the IS and the better security we have.
In further research, we plan to realize the proposed model with the factors that are
specific for the security domain. The new model will be used to estimate the organizational vulnerabilities in the organizations, where the ISMS has already been established.
Model will help the IS owner to improve security by managing factors with the biggest
impact on the user behaviour and the usage of the ISMS.

Acknowledgments

Work supported by Creative Core FISNM-3330-13-500033 Simulations project funded


by the European Union, The European Regional Development Fund. The operation is carried out within the framework of the Operational Programme for Strengthening Regional
Development Potentials for the period 2007-2013, Development Priority 1: Competitiveness and research excellence, Priority Guideline 1.1: Improving the competitive skills and
research excellence.

References
[1] Da Veiga, A.; Eloff, J.H.P. A framework and assessment instrument for information
security culture, Computers & Security, 29:196-207, 2010.
[2] Mann, I. Hacking the Human, Gower Publishing, Burlington, USA, 2009.
[3] Alhazmi, O.H.; Malaiya, Y.K.; Ray, I. Measuring, analyzing and predicting security
vulnerabilities in software systems, Computers & Security, 3:219-228, 2007.
[4] Davis, F.D. Perceived Usefulness, Perceived Ease Of Use, And User Acceptance, MIS
Quarterly, 319-340, 1989.
[5] Oye, N. D.; A.Iahad, N.; Ab.Rahim, N. The history of UTAUT model and its impact
on ICT acceptance and usage by academicians, Education and Information Technologies, 319-340, 2012.
[6] Venkatesh, V.; Morris, M.G.; Davis, G.B.; Davis, F.D. User acceptance of information
technology: Toward a unified view, MIS quarterly, 425-478, 2003.
[7] Reason, J. Human Error, MIS Quarterly, Cambridge University Press, 1990.
[8] Kolkowska, E. Security subcultures in an organization - exploring value conflicts,
19th European Conference on Information Systems, ECIS 2011, Helsinki, Finland,
2011.
[9] Schein, E.H. The Corporate Culture Survival Guide, Jossey-Bass, San Francisco,
USA, 1999.

58

A proposal for a web based educational data mining and


visualization system
Igor Jugo, Boidar Kovai, Vanja Slavuj
Department of Informatics
University of Rijeka
Radmile Matejcic 2, 51000 Rijeka, Croatia
{ijugo, bkovacic, vslavuj}@inf.uniri.hr

Abstract: Educational data mining (EDM) is a relatively new appellation for an area of

research in which classic data mining methods (originating from the areas of machine
learning and artificial intelligence) along with some new ones, are used on data
collected from e-learning systems to clarify e-learning problems. The results of these
methods are often used as a basis for automatic adaptation of e-learning systems to
the needs, expectations and habits of their users. Such a system has been under
development at our institution in the last few years and has been used in a number of
courses. It is based on a hierarchical structure of knowledge accompanied by learning
materials and tests for each knowledge unit. In this paper we present our model of a
web based EDM system. The emphasis of the system will be on a modern, responsive
user interface that will facilitate the usage of data mining (DM) methods by nonexperts (teachers) and offer rich data visualizations to help them get the a better
insight into the learning process from DM outputs.
Key Words: educational data mining, LMS, data visualization, adaptable e-learning
systems

1 Introduction
The primary goal of Educational Data Mining (EDM) is to use large-scale educational
datasets to better understand learning and to provide information about the learning
process. In the last overview paper of the state-of-the-art in EDM [3], a new set of
important research objectives was given: a) EDM tools have to be designed so that their
use is simpler for educators or non-expert users in data mining, b) a data mining tool has
to be integrated into the e-learning environment as one more traditional authoring tool,
c) standardization of input data and output models, along with preprocessing,
discovering and post-processing tasks, and d) traditional mining algorithms need to be
tuned to take into account the educational context. Our current and future research is
concerned with the first two objectives.
Data visualization is closely connected to statistical analysis and data mining. Through
graphical visualization we tap into the vast human potential for spotting patterns,
identifying exceptions and important variations in data that would be overlooked in
tabular form. The output of DM methods is usually displayed in two-dimensional charts,
but with new web technologies we can also create trees, network maps or animations
(time lapse, heat maps, path/pattern following, etc.) and add interactivity (zoom in/out
for general/detailed views, etc.).

59

Some authors have developed web applications for DM based data visualizations, but
rarely using standard World Wide Web technologies. In [1], authors developed a
Flex/Flash based application in the field of Bioinformatics, while in [5], authors
developed a Java/Matlab based application that accepts data file uploads and returns
results from a small set of DM algorithms. In [2], authors developed a student forum
activity visualization tool using the Scalable Vector Graphics (SVG) web standard.
In this paper we put forth our model of a web-based EDM tool designed for educators
and non-experts in data mining. The paper is structured as follows: after a brief
introduction to the field of educational data mining, we proceed to present two
important technological foundations for data visualization in the browser. In the third
section we present our technical proposal for a web-based interface for educational data
mining and data visualization. Finally, we conclude the paper with future work plans
and references.

2 Web graphics - underlying technologies


Data visualizations on the Web followed the development of main Web technologies. At
the end of the 1990s, the Scalable Vector Graphics (SVG) standard was introduced
which enabled vector graphics in the browser. However, it had little JavaScript support.
With the development of the XmlHttpRequest class, JQuery platform and support for
<canvas> elements, the development of live data-based animations was made possible.
Data visualizations are no longer static bitmap images but can be refreshed many times
in a second. Nowadays, data visualizations have two more important characteristics:
interactivity and animation. With this, web-based graphics practically reached the
maturity level of desktop graphics. In addition, the need for large web browser plugins
(Java, Flash, Silverlight) has been reduced and even minimized. Today, users can get
web standards compliant graphics without any plugins or add-ons and are able to
manipulate it, connect to different data sources, start and stop animations and zoom in
and out to get the most information from them.
There are two ways to generate graphics in a web page: using vectors or bitmaps. Vector
based graphics are created using the SVG standard (retained mode model), while bitmap
based graphics are created within the <canvas> element of the web page. <canvas>
exposes a more programmatic experience for drawing immediate mode graphics
including rectangles, paths, and images, similar to SVG. Immediate mode graphic
rendering is a fire and forget model that renders graphics directly to the screen and
then subsequently has no context as to what was done. In contrast to retained mode,
rendered graphics are not saved; a developer needs to re-invoke all drawing commands
required to describe the entire scene each time a new frame is required, regardless of
actual changes. The basic characteristics of both technologies are displayed in Table 1
below. In the last four years many new JavaScript libraries that offer various data
visualization capabilities have been developed. The main difference is the way they
generate output: some use vector based graphics and some use bitmap graphics. Here
we mention two most prominent ones, namely:
- D3 [6] is a JavaScript library for manipulating documents based on data. Developed in
2011, it uses HTML5, SVG and CSS3 to create visualizations. It has a strong emphasis
on web standards which facilitates the use of capabilities of modern browsers without
tying the user to a proprietary framework. D3 combines powerful visualization
components and a data-driven approach to DOM manipulation. D3 enables the user to
bind arbitrary data to the DOM, and then apply data-driven transformations to the
document. D3 supports large datasets and dynamic behaviors for interaction and
animation.
60

- Processing.js [8] is a port of the Processing Visualization Language (based on Java),


designed for the Web. Developed in 2009, it uses Processing language constructs and
the JS library enables the user to run the visualization in a web browser. This library
uses the <canvas> HTML5 element to display the graphics. The <canvas> element is
too low-level for most developers to use directly, so JavaScript libraries are necessary.
Processing.js simplifies the use of 2D and 3D <canvas> operations.
Table 1: Bitmap and vector graphics technical overview [7]
Canvas

SVG

Pixel based (Dynamic .png)

Shape based

Single HTML element

Multiple graphical elements, which


become part of the Document Object
Model (DOM)

Modified through script only

Modified through script and CSS

Event model/user interaction is granular


(x,y)

Event model/user interaction is abstracted


(rect, path)

Performance is better with a smaller


surface, a larger number of objects (>10k),
or both

Performance is better with a smaller


number of objects (<10k), larger surface,
or both

As we plan to have a responsive and interactive interface, vector graphics will have an
advantage over bitmaps. It will also enable us to zoom in and out, which can be used to
create global or detailed visualizations and thus effectively enable the user to drill down
into the data.

3 Educational data mining and visualization tool


The overall process of DM defines a set of steps that have to be followed to apply it to a
perceived system and gain useful insight [4]. Although some steps of this process are
very technical, we feel that it can be implemented in such a way that (once connected to
the LMS) it is mostly invisible to the teacher, and that it can be done in a familiar
environment of a web browser. In the case of EDM being applied to an LMS, the result
of this system would be a continuous feedback loop from the LMS (providing
usage/learning data) to the Web interface (providing various prepared EDM
procedures), to a DM tool (performing DM algorithms on prepared set of features). The
feedback then goes back to the Web interface (providing insight via rich data
visualizations) and finally, the user implements suggested changes in the LMS. This
process is depicted in Figure 1 below.
The use of DM in e-learning systems or everyday classrooms is far from widespread.
Also, it seems that the majority of research in the EDM field is done in and for higher
education institutions. We would like the focus of this tool to be on the needs of teachers
in elementary and secondary schools. We presume that by integrating DM tools with elearning systems (and the educational process in general) we can help broaden the
application of DM in education significantly. This integration should be invisible to the
teachers. In this way they could stay in the familiar environment of the e-learning
system they use regularly and start using new DM powered features that will bring new,
helpful insights to their work without losing time and energy learning about DM. To
succeed, the system has to satisfy general requirements that we put forth; it has to: a) be
61

goal oriented (bring new value to the teacher and the education process as a whole), b)
be easy to use (the user interface should be intuitive, ensure easy data and feature
manipulation) and c) provide rich data visualizations (timeline analysis, animations,
interactivity). As said by other authors, the challenging part will be creating an EDM
system tailored to the needs of teachers without forcing them to learn about DM. By
involving them in the development process from the very beginning and following an
iterative development cycle with ongoing user trials, this goal should be achieved.

Figure 1: Creating a continuous feedback loop using the Web interface system
Before we start working on the proposed system we will perform a series of
presentations at local elementary and secondary schools about the possible outputs and
benefits of educational data mining with an emphasis on data visualizations. Afterwards,
we will perform interviews and surveys in order to a) identify the needs of the teachers,
b) identify the key benefits they expect from the system, c) identify specific
functionalities they would be motivated to use in this system.
This data will then be analyzed, summarized and converted into a preliminary
requirements document which will then be checked for unfeasible or contradicting
requirements. Requirements concerning data analysis will be paired with appropriate
data mining methods and algorithms. We will consider the requirements of these
methods and define the possibilities for simplification. A scenario in which teachers will
use this method will be devised and a user-friendly, simplified user interface model
(wireframes) or sequence will be sketched.
We propose a system consisting of four basic modules. Block diagram of the system can
be seen in the Figure 2 below. First, the Learning Data module will be used to make
the connection to the LMS and define data sources needed for various EDM actions.
The sources will consist of attributes from multiple tables. This module will monitor
data sources and periodically pull new data from the LMS. This module will be used by
the DM expert. For our LMS the primary interest will lay in knowledge domain
structure analysis, so data about paths through the knowledge structure created by the
students will be needed. We will also analyze data concerning access times, learning
session duration and frequency, learning materials usage, quiz and test results, etc. Once
data sources are defined, this module will not need further attention from the DM
expert.
The central part of the Web interface is the Actions repository module. It is envisioned
as an extensible set of DM based operations of interest for the teachers (analysis
toolbox). Every action would offer a different analysis concerning the learning process,
for example prediction of student success on a course. Actions would define the
necessary data preparation routines, feature extraction, discretization, etc. needed for the
62

DM algorithm that will be used. Concerning our LMS, one of the first actions we will
develop will be pattern discovery in students paths through the knowledge structure.
Actions could be run in predefined data intervals (week, month, and semester) or in an
interval defined by the teacher. This part of the system would be primarily used by
teachers.

Figure 2: Block diagram of the proposed web application


The DM expert will be able to connect to the system, analyze the ways teachers use the
actions, and modify or create new actions. As the system will be open source, other
developers will be able to make and install new actions. Each action will employ a DM
tool in the background, concealed from the end user. The DM tool will be contacted via
API calls, by the Input/Output module that will receive data from the Action selected
by the user and create a set of arguments to make a correctly formatted call to the Weka
API. It will also handle the output file of the DM tool by preparing it to be used by the
Data Visualization module, which will also have the role of the user interface of the
application. So, this module will consist of a data visualization area, the Actions toolbar
and a data manipulation (intervals, filter) area. The key concept is that the user interface
will not be DM-oriented, i.e. it will not mimic user interfaces of common data mining
tools like Weka, but instead will be user-oriented and optimized to bring real value in
the form of insight into the learning progress of the students. Consequently, it will
optimize the overall efficacy of the educational process.
For our LMS we plan to develop an adaptive component that will be based on data from
Actions that perform DM on knowledge structure usage patterns. Although it is
currently being developed along an existing LMS we plan to make it suitable for
installation along other LMSs.
Teachers can use the Data Visualization module, run various Actions and use the
provided insight to modify the LMS in two web browser tabs, without the need to learn
about or even see DM tools, algorithms, database queries or any other technical details.
The role of a DM expert will be minimized after the initial connections with the LMS
are defined in the Learning Data module. Although the system could be fully integrated
63

into our LMS, we believe that creating a separate system with powerful connection
abilities will enable other LMSs and educational data providers to use our system for
their analysis and new applications.
From a more technical perspective, this web application will be object oriented and use
the Model-View-Controller (MVC) pattern. It will be based on the open source LAMP
(Linux, Apache, MySQL and PHP) platform. It will have a responsive interface that will
be equally usable on smartphone, tablet or desktop monitor size screens/resolutions.
The architecture will consist of a central core of the system containing a small number
of well documented classes, while the other parts of the application will be expandable
by component development and installation.

4 Conclusions
In the field of e-learning, there is a growing awareness about the need and value of
visualizations for all stakeholders of the education process. They can help students
understand complex concepts and their connections, they can offer teachers insight into
student activities, groups, access to materials or problematic test questions and they can
offer administrators an overview of results at the institutional level. From the
technological standpoint we have multiple options to choose from when generating web
graphics based on data which can be in many (well standardized) formats. The system
will be an object oriented MVC application with strict module code separation. The core
of the system will allow interested developers to create new functionalities and connect
the system to their LMSs. This tool focuses on teachers and their needs for new
information that DM methods can provide, but through a simplified, easy to use
interface and meaningful data visualizations.

5 References
[1] Koelling, J; Langenkaemper, D; Abouna, S; et al.; WHIDE-a web tool for visual data
mining colocation patterns in multivariate bioimages. Bioinformatics, 28(8):11431150, 2012.
[2] Jyothi, S, McAvinia, C., Keating, J; A visualisation tool to aid exploration of
students interactions in asynchronous online communication. Computers &
Education, 58(1):3042, 2012.
[3] Romero, C; Ventura, S; Educational Data Mining: A Review of the State of the Art.
Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE
Transactions on, 40(6):601-618, 2010.
[4] Shearer, C; The New Blueprint for Data Mining. Journal of Data Warehousing,
5(4):13-22, 2000.
[5] Zorrilla, M, Garca-Saiz, D; A service oriented architecture to provide data mining
services for non-expert data miners. Decision Support Systems, 55(1):399-411,
2013.
[6] D3js.org. The D3 JavaScriptLibrary, http://d3js.org/, downloaded: September
10th 2013.
[7] Miscrosoft MSDN. Choose between SVG and Canvas,

http://msdn.microsoft.com/enus/library/ie/gg193983%28v=vs.85%29.aspx, downloaded: September 10th


2013.
[8] Processingjs.org. Processing JavaScript Library, http://processingjs.org/,
downloaded: September 10th 2013.
64

Extending BPMN 2.0 Conversation diagrams for modeling


complex communication
Gregor Polani, Botjan Orban
Faculty of electrical engineering and computer science
University of Maribor
Smetanova 17, 2000 Maribor, Slovenia
gregor.polancic@um.si, bostjan.orban@uni-mb.si

Abstract: Business Process Model and Notation (BPMN) is process modeling standard
used in almost all business domains. With the release of BPMN 2.0, a new notation
- Conversation diagrams, was introduced. They are useful for modeling process
landscapes as well basic interactions between different process participants. Since
Conversation diagrams consist only of few elements, our primary objective was to
maintain the notation as simple and extend it to support the modeling of complex
communication, including e-communication. At the beginning of the research, we
analyzed similar diagraming techniques, e.g. BPMN, UML, ARIS EPC, theory of
databases and related conceptual modelling as well as recommendations for graphical
notations. By considering them and the SWOT analysis of Conversation diagrams, we
defined 15 new elements on three different abstract levels conceptual, logical and
physical.
Key Words: BPMN 2.0, Conversation diagrams, Conversation node, Complex
communication, E-communication, Push technologies.

1 Introduction
Business Process Model and Notation (BPMN) is standardized graphical notation for
modeling business processes. The first version of BPMN was introduced in 2005. In
2009, the Object Management Group (OMG) prepared a draft for the second BPMN
version, which introduced the idea about modeling conversations. The final release
of BPMN 2.0 was published in January 2011 and included notations and models for
representing business processes, collaborations, choreographies and conversations [1].
Conversation diagrams (hereinafter referred to as CD) are used to represent high-level
interactions between involved parties [3]. They are useful for representing an overview
of a network of partners and how they communicate with each other. A weakness of
CD is lack of elements which could present different types of real-word communication
and e-communication [1]. In this paper, we introduce an idea of extending CD with new
intuitive and understandable elements which will enable modeling and representation of
complex communication [3, 4].

2 BPMN 2.0 Conversation diagrams


BPMN 2.0 Conversation diagrams (CD) notation consists of seven visual elements.
Two elements represent different Participants where four elements are Conversation
Node elements. The last notation element is connection line between a Participant
and a Conversation. The Participant is graphically presented similar to BPMN pool
and represents people, organization or devices, which are involved in a certain
65

communication. The second Participant element is Multiple Participant which is


recognized by the symbol ||| (figure 1).
The other four elements are Conversation Node elements, which are graphically
represented as hexagons. The basic Conversation node element is Conversation which
defines communication between two or more Participants (figure 1). Semantically
a Conversation element can be expanded into a series of message flows between
participants. In case a Conversation element has the symbol +, it represents a
Sub-conversation, which can be expanded into a series of message flows as well
Conversations. It is similar to a Sub-process in a business process diagram [3, 4].
Global Conversation and Collaboration elements are represented as hexagons with
thick borders (figure 1). Their aim is to reuse and integrate with other diagrams. Global
Conversation reuses attributes and properties of other instances of diagrams, where
Collaboration element can be called by other Collaboration diagrams [4, 5].
Conversation link connects Conversation Node elements with Participants. It is
graphically represented with a solid double line (figure 1). The conversation link always
connects Conversation Node elements with Participants, where it is not allowed to
connect two same type of elements [5].

Figure 1: Poster with existing Conversation diagram elements

Analysis of strengths, weaknesses, opportunities and threats of conversation


diagrams is presented in the following table (Table 1).
Table 1: SWOT Matrix
Strengths
Weaknesses
Presentation
of
a
process-landscape

Lack of elements for modeling


view.
complex communication [1, 2].
Quickly learnable elements and
No elements for e-communication.
rules [1, 2].
Inability to determine the type of
participant.
Unique shapes for elements [2].
Opportunities
Threats
Could be useful for organizations
Lack of elements for real-world
where communication is important.
communication could decrease
acceptance.
Has potential for standalone use
(communication view).
Conversation diagrams could stay in
the shadow of successful business
Could be extended with additional
elements and concepts.
process diagrams.
66

As evident from table 1, CD are quickly learnable and easy to understand to new
and existing users. This probably results from the fact that they consists of a small
number of unique shapes. On the other hand, it is sometimes difficult to model complex
communication, which appears in a real world, including e-communication. This is
maybe an opportunity to extend the notation to make it more useful.

3 Proposed solution
The main objective of extending CD is to make them more useful and expresive to
users. Another objective is to define additional elements for representing complex
communications such as e-communication and SOA (Service Oriented Architecture),
which was perceived as a weakness in related studies [3]. Since representing complex
communication might oppose the simplicity of a diagram, we propose a three-level
(conceptual, logical and physical level) modelling approach, which already proved as
successful in the field of data-base modeling.

3.1 Review of related fields


In order to assure the ease of use and usefulness of proposed CD extensions (hereinafter
referred to as extended Conversation diagrams xCD), we searched for similar ideas
in other notations and related fields, e.g. UML, ARIS EPC and theory of databases.
We also incorporated the recommendations for designing new graphical notations
as proposed by Moody [6]. Nine recommendations for a successful design of a new
notation were identified: semiotic clarity, perceptual discriminability, semantic
transparency, complexity management, cognitive integration, visual expressiveness,
dual coding, graphic economy and cognitive fit [6]. Besides, the graphical symbols
should also satisfy the requirements of the construction of new elements (redundancy,
overload, excess and deficit) [6, 7].
Regarding the modeling notations, our proposal incorporates ideas from Unified
Modeling Language (UML). UML is a modeling language for data, business and object
modeling [8, 9]. It has 12 different diagrams for software engineering. In case of UML,
we reused the idea of connecting related elements, like generalizations and extensions in
UML Use case diagrams, which was added to xCD. We also analyzed UML Sequence
diagrams, where we identified some ideas for new elements. One of ideas which was
taken from Sequence diagrams was how to define the type of Participant. We identified
three different types of participants: a device, a person and an organization (figure 2).
Another idea, which was obtained from Sequence diagrams is the concept of sequence
flow for one of the new elements. Idea of Sequence conversation (figure 2) is to keep
the same order of messages in each instance of conversation.
Another set of ideas was obtained from Event-driven Process Chain (EPC), which
is part of Architecture of Integrated Information Systems (ARIS). EPC is useful for
modeling, analyzing and reengineering business processes. It is also easy for modeling
and has a good semantic to describe business processes [10]. EPCs element Process
path has been adapted to xCDs navigation element Background conversation (figure
2). By using this element, we can model detailed organizational conversation on a
separate model.
The majority of ideas were obtained from the BPMNs process modeling notation, for
example, we decided to reuse the artifacts (comments), annotation of rules, navigation,
events and the idea of complex conversation between participants (figure 2).
One of the main objectives for extending CD was to support e-communication. For
its purpose we choose the concepts of Service oriented architecture (SOA) message
exchange methods one way, two way and data stream concept [11].
67

In the theory of databases we found the idea on how to present new elements in a
way that they will be understandable for users and computers. In the development of
database modeling it is common to define the models on three levels. This follows from
Poppers three worlds, named as World 1, World 2 and World 3 [12]. The conceptual
level (World 1) is suitable for users and represents symbols of the real world. The
logical (World 2) level is intended to advanced users and also for computer recognition.
The physical level (World 3) is mostly suitable for computers and data storage [13].

3.2. Extended conversation diagrams


Considering related fields and SWOT analysis of CD, we defined 15 new elements
(figure 2).

Figure 2: Poster with new Conversation elements and links


Table 2 summarizes new elements, together with their descriptions, examples and
related field (notation).

Element name

Table 2: New elements and connections


Description
Examples

Participant Organization

Type of participant
is person or group of
people.
Type of participant
is organization.

Participant
Device

Type of participant
is device.

Participant rule

Participants
must comply
with the defined
rules, written in a

Participant Person

68

Participant in conversation
is engineer, doctor, etc.
Participant in
conversation is computer
company, bank, etc.
Participant in conversation
is computer, smart phone,
etc.
Participant Engineer must
have Java license.

Related
field
UML
Use case,
BPMN
UML
Use case,
EPC
UML
Use case,
BPMN
BPMN

Element name

Background
conversation

Conversation rule

Complex
conversation

Navigation

Sequence
conversation

One way
conversation
Two way
conversation
Push conversation

Events

Generalization

Extend

Description
comment.
Navigation element
that points to the
separate model
of organization
conversation.
Conversation must
comply with the
defined rules,
written in comment.
Conversation
between three or
more participants,
where details of
conversation are not
important.
Navigation elements
that points to the
separate model of
conversation.
Specifies message
flows order.

Same direction
messages.
Every message is in
different direction
than the last one.
The first and the
last message is in
the same direction,
where the remaining
messages directions
are reversed.
Events begin,
terminate or end
conversation.
Participant inherits
rights from other
participant.
Optional possibility
to start other
Conversation
elements.

69

Examples

Related
field

We are interested in the


communication within the
company. So we present
organization conversation
on a separate model.
Conversation takes places
in the CEOs office.

EPC

BPMN

Supplier, dealer and


company have complex
conversation.

BPMN

Conversation is too
complex to be modeled on
the same model.

BPMN

Order of taking credit is


same in any case.

UML
Sequence
diagram,
BPMN
2.0
SOA

Weather sensor is sending


data to the server.
Servers are exchanging
processed data.

SOA

RSS, e-mail newsletters,


live scores, etc.

SOA

E-mail from bank starts


the conversation.

BPMN

CEO has same rights as


engineer. But also can
arrange new business.
Successful completion of
negotiations can start a
new conversation about
new jobs.

UML
Use case
UML
Use case

3.3. An example of an xCD element Push conversation


Push conversation element belongs to xCD notation and was introduced since push
conversation is common in SOA and e-communication concepts. Push conversation
element represents push technologies, like RSS, live score website concept, instant
messaging, mail newsletters, etc. Since Push conversation is commonly implemented
with RSS technology, we reused RSS graphical representation (figure 3).

Figure 3: Push conversation element graphical representation on the conceptual level


On the logical level, push conversation element is represented though a series of
message flows (figure 4). In addition, it is important that type of participants, which
are connected to push conversation, have to be devices. In case of push conversation,
the direction of messages between two devices is also important. The first and the last
message is going in the same direction, where the remaining messages directions are
reversed.

Figure 4: Element on logical level


The physical level of the push conversation element is readable for devices and is
therefore defined in XML format. XML have to be consistent with the following XML
Schema Definition (XSD) of the push conversation element.
<xsd:element name="pushConversation" type="tConversationNode"/>
<xsd:complexType name="tConversationNode">
<xsd:complexContent>
<xsd:extension base="tBaseElement">
<xsd:sequence>
<xsd:element name="messageFlowRef"
type="xsd:QName" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="participantRef"
type="xsd:QName" minOccurs="0" maxOccurs="unbounded"
fixed="Device"/>
<xsd:element name="boundaryEventRefs"
type="xsd:QName" minOccurs="0" maxOccurs="1"/>
</xsd:sequence>
<xsd:attribute name="conversationRef"
type="xsd:QName"/>
<xsd:attribute name="correlationKeyRef"
type="xsd:QName"/>
<xsd:attribute name="extendFlowRefs"
type="xsd:QName" use="optional"/>
70

</xsd:extension>
</xsd:complexContent>
</xsd:complexType>

The above XSD consists of references of message flows, participant, boundary,


conversation, correlation key and extend flow, which is optional. References are links to
XSD Schemas of BPMN standard and our newly build XSDs.

Figure 5: Push conversation example


Figure 5 is representing an example of a Push conversation element. The element
must be connected with two or more Devices (e.g. server and tablet computer). Other
type of Participants, like Organization or Person are not allowed to connect with a
Push conversation element. This element could present technologies like RSS, instant
messaging, etc.

4 Future work and challenges


We are planning to create more elements and present them on all three levels.
Especially we will focus on elements for e-communication. We will also prepare an
extended meta-model of xCD and research possibilities for serialization of all elements.
Additionally, our plan is to introduce new elements in different generic modeling tools,
like Microsoft Visio or ARIS Business Architecture. We want to analyze how complex
the introduction of a new notation in existing modeling tools can be. Our last goal is
to experimentally investigate if modelers would accept the proposed extensions to
conversation diagrams.

5 Summary
Conversation diagrams were introduced in BPMN 2.0, so the number of models,
implementations and studies about the usefulness of Conversation diagrams is still
limited. This is an opportunity for researchers because the notation provides many
possibilities for further researches and innovations. In our opinion the Conversation
diagram have potential also for modeling e-communication.

6 References
[1] Chinosi M, Trombetta A. BPMN: An introduction to the standard. Computer
Standards & Interfaces. January 2012.;34(1):12434.
[2] Chinosi M, Trombetta A. Result of the BPMN Usage Survey. April 2011.
[3] Allweyer T, Allweyer D. BPMN 2.0 : Introduction to the Standard for Business
Process Modeling. Norderstedt: Books on Demand GmbH; 2010..
[4] OMG. Business Process Model and Notation (BPMN). OMG; 2011.
[5] Shapiro R, White SA, Bock C, Palmer N, Muehlen M zur, Brambilla M, idr. BPMN
2.0 Handbook Second Edition: Methods, Concepts, Case Studies and Standards in
Business Process Modeling Notation. Future Strategies, Incorporated; 2011.
71

[6] Moody DL. Why a Diagram is Only Sometimes Worth a Thousand Words: An
Analysis of the BPMN 2.0 Visual Notation. Department of Information Systems
University of Twente, The Netherlands.
[7] Moody DL. The Physics of Notations: Toward a Scientific Basis for Constructing
Visual Notations in Software Engineering. IEEE Transactions on Software
Engineering. 2009.;35(6):75679.
[8] Eloranta L, Kallio E, Terho I. A Notation Evaluation of BPMN and UML Activity
Diagrams. Special Course in Information Systems Integration;2006.
[9] OMG. Unified Modeling LanguageTM (OMG UML). OMG; 2011.
[10] ivkovi A. Druge notacije za poslovno modeliranje - ARIS EPC in UML.
[11] Erl T. Service-oriented architecture: concepts, technology, and design. Upper
Saddle River, NJ: Prentice Hall Professional Technical Reference; 2005.
[12] Popper K. Three Worlds. Tanner Lectures on Human Values. The University of
Michigan. 1978.
[13] Husemann B, Lechtenbrger J, Vossen G. Conceptual Data Warehouse Design.
In Proc of the International Workshop on Design and Management of Data
Warehouses (DMDW 2000. 2000. str 39.

72

The Way to Efficient Management of


Complex Engineering Design
Edo Ravnikar,

Borut Luar,

Martin Ravnikar,

Roman oper

Ambient d.o.o.
Mestni trg 25, 1000 Ljubljana, Slovenia
borut.luzar@gmail.com, martin.ravnikar@ambient.si, roman.soper@ambient.si

Abstract: The increasing growth in complexity of architectural and urban design is becoming
one of the central issues concerning Architectural, Engineering and Construction domains.
Current software deliveries, however, are unsuited for dealing with the huge amount of
information transacted in the course of the design process. This may primarily be attributed to
deficient CAD methodology that underlies the building information modelling initiatives, aimed
at efficacious information management, yet unapt to enlist available computer power, the
essential modern requisite in dealing with complexity. A paradigm shift to challenge the faulty
ways by which complex systems in AEC domain are now addressed is overdue. This paper
describes how the attributes and the application of an advanced design methodology
provisionally named RS Technologyprovide a powerful computational alternative.
Keywords: architecture, engineering, design, complexity, complex systems, CAD

1 Introduction
A signal objective of contemporary architectural design is delivery of fully operational
computational methods capable of handling the steadily increasing complexity of engineered
artefacts [1,2,3,4,5,6]. In this perspective complexity is a subjective measure of difficulty in
creating either the artefact itself or describing the process of its design. This inherent
characteristic of the artefact/process represents the void between knowledge of components and
knowledge of their compounded behaviour in cases of insufficient predictability of the output [7].
Architectural artefacts and even more their digital representations are regularly referred to as
complex systems. Such consist of subsystems and parts, whose internal organisation and
connectivity to their surroundings cannot be described in simple terms [4]. This precisely makes
them complex.

The author is also employed as researcher at the Faculty of Information Studies in Novo mesto.
73

According to Luzeaux three main characteristics define a complex system: a) its elements
respond to stimuli from adjacent elements, b) it is composed of a vast number of mostly
heterogeneous units and c) interweaving of components is such that the general global structure
and its behaviour cannot be deduced solely from the knowledge of particular local structures and
their behaviour [8]. Meier & Rechtin define 'complex' as "composed of interconnected or
interwoven parts" and 'system' as "set of different elements so connected or related as to perform
a unique function not performable by the elements alone" [9]. Both architectural [10] and
engineering design [11] are consistent with these qualifications, being complex systems
themselves.
When large engineering assets are at stake (building construction among the largest) promoters
expect careful, failure-proof, well planned-out ways of execution of projects, plus their
guaranteed outcome. Or so they should. However, in the complex modern world, in the absence
of methods that move with the times, such is more and more a self-defeating travail [12]. As the
number of active elements in engineered systemsarchitectural artefacts includedas well as
real-time demands on them keep increasing, highly complex design and subsequent engineering
require new insights and new tools to tackle the growing complexity of these structures [13].
Constructionconsistently with systems engineeringis now primarily about separating large
highly complex plans, destined for reification, into smaller segments for distribution to various
subcontractors to work on and coordinate their development so that the results of thus allotted
pieces of the action can be effectively assembled at the end of the process. This schema of
disassembly of global systems into sections is applied recursively into smaller and smaller parts
until bits are sufficiently simple for single operators (men or machines) to implement and
inversely to assemble the parts to integrate the entire system into a working order. The problem
with this approach, however, is that it requires guidance in the distribution of tasks and
coordination of assembly. Such sufficed in the past when projects were simpler. As they become
more and more complex the process exposes its deficiencies until it breaks down [12].
Eventually people get proficient at engineering systems in a typical, predictable manner, yet at
the execution of complex and diversified projects, this principle increasingly fails [12]. With
building design steadily growing more complex, breakdown happens earlier and earlier in the
process. The larger the building, the higher its actual complexity, the sooner the malfunction
occurs. To avoid aforesaid situation simplification as aimed at reduction of complexity is applied.
In practice at a certain point where such approach is no longer feasible, yet the system's
complexity still intractable, partly resolved design solutions are passed on to building contractors
for further impoverishment in the course of their in-house provision of executive blueprints and
shop drawings.
The most efficient way of handling complex systems in engineering design as everywhere else is
by computer power (computationally). The problem of current CAD methods in tackling
complexity is their incapacity of manipulation of more than a limited number of geometric
entities. This is for two main reasons. One is the vast amount of geometric data and interrelations
between them. The other is geometric constraints problem [14,15], a consequence of the former.

74

With conventional CAD/BIM relying on pre-rationalization1 of design and geometric constraints,


the growing complexity in the object of design overwhelms its progress, slowing it down until it
halts. Since, as said, this occurs unacceptably early in the process, the conventional systems, unfit
to address complexity, spuriously reduce it in order to proceed. Simplification, repetition and
imposition of staple library components together with regimented constraints pre-imposed on
design models, deprive conventional CAD platforms of flexibility, necessary for production of
complete information on the totality of building parts, fit for management of fabrication
machinery.
Why it is that AEC hangs on to these dated methods of information management in construction?
Part of the answer lies in past experience with strategies that worked for centuries. The present
awareness that such no longer work exacts new design principles and tools, however. Some have
attempted to address the issue of complexity by what has been termed generative design which
efforts, geared to serve specific purposes, eventually proved unfit for complex engineering, while
other initiatives insist on systemic decrement of complexity of design, vying to bestow on this
pragmatism the stamp of method. Building information modelling, representative of the latter,
aimed at remedying the current situation, is still laden with the conventional CAD paradigm
whose logic decrees that the larger a scheme the more one should relyin keeping it together
on reduction and simplification of its combined contents as well as components, greater size thus
growing simpler not more complex. Functional issues of building design exact the very opposite,
namely flawless information on non-uniform (also pseudo-regular) uncurtailed complexity whose
degree of articulation increases with size.
To overcome the stalemate a paradigm change is essential. RST with its patented features
effectively handles complexity on the path of its growth so to, in abrogation of recourse to
facilitation through uniformity and regressive articulation, preserve complete information
throughout the process of open variation of possible solutions as precondition for selective
guidance of complex development.

2 RST approach
Development of architectural objects in ways introduced in this paper is a novel approach [16],
the culmination of work started in the 1970's [17], with theoretical foundations outlined in 1982
[18]. Experimental implementation had to wait until sufficiently capable computing equipment
became readily affordable.
The general objective of engineering design is creation of data necessary for full description of
physical objects to be fabricated. On top of satisfying this objective, RST achieves the additional
goal of variability. Variability is signally important in design of complex objects, qua ability to
concurrently explore multiple solutions and implement changes at any stage of objects
construction. Owing to the complexity of considered solutions it is the normal requirement that
continuous change be possible at any stage before the initial concept is finalized. With RST's design
1

Better known as 'comprehensive advance planning'. Bar-Yam Y. (2004) Making Things Work

75

technique that is easily done.


As stated, a defining property of a complex object is our inability to predict its behaviour in
advance. With pre-rationalization ruled out, another approach is necessary. We have, for the
purposes of RST, adopted a generalized Darwinistic procedure as summarized in the BVSR
principles, first introduced by D. Campbell [19,20] to so provide the design method with a
theoretical framework.
RST stems from two principles: causality and locality. The former determines that each new state
of a system is a causal derivation of a previous state in an arbitrarily long series of events. The
latter dictates that the parameters of each operation in the system be defined by relative paths
regarding the object over which the operation is executed. Result of operations are added as
new objects related to s. In RST, all objects are manipulated by the graph data structure which
we describe in the sequel.

2.1 The structure of the system


In RST description of an object's geometry, its composition and the procedure describing its
creation is called 'model'. With objects composed of different parts, the model must have a
description for each of its parts plus information about how these are to be put together (their
connectivity reflected in relative positions and assembly procedures). In RST all information is
stored in a graph data structure called 'model graph' . RST uses the model graph not only to
store model data (information describing the model), but also information describing the design
process itself, its steps, its parameters and their relations.
Each vertex in contains either information about an 'atomic part' of the physical object (the
final, indivisible part of the global model), abstract objects suggesting design concepts (floor,
roof, window, ... ), auxiliary geometry objects (point, line, plane, intermediate solid object, etc.)
and special types of placeholders representing abstract (logical) structures embedded in the
model.
Two vertices in are connected by an edge, if their contents are in some (logical or physical)
relation within the model; e.g. vertices representing adjacent placeholders for two neighbouring
components are connected by an edge. Edges in are directed and each has the attribute name.
During the process new vertices are inserted into so that eventually every part of the model is
embedded in as content of some vertex.
Three main types of operations in RST are: selector, generator, task.
Selectors are functions that identify subsets of vertices from , such that they (or their contents)
satisfy given conditions; e.g. value of some attribute is greater than a given parameter. A vertex is
incident to an edge with a given name. Sets of vertices returned by selectors are 'selections'.
Selectors mainly identify patterns or subsets of vertices representing concepts.

76

Generators are functions inputted by selections and sets of parameters that create content for
each vertex in a selection. Numbers and types of external parameters depend on types of
generating functions. For example, a sine function, say, may be given three parameters:
argument, phase and scale. Selection to which a generator is applied must be ordered in a way to
provide the generating function with a discrete variable, each of the vertices assigned a different
label, usually integer. The output of the generator can be any type of geometric primitive or
placeholder of the abstract structure.
Tasks are functions that take various parameters (numerical, textual, and functional), as well as
vertices of , specified exclusively by relative paths, as inputs. Tasks are executed upon all
vertices of a given selection, referred to as 'working selection'. Outputs are new vertices of
together with their contents. Vertices of , given as inputs to tasks, are reminiscent of constraints
(the fons et origo of ineffectual coping with complexity) in conventional CAD/parametric
systems. With RST complexity is simplified (but not reduced) by adoption of locality principle
and by avoidance of cyclical dependencies.

2.2 Model construction


Next we describe the method of constructing a model. Figure 1 represents a diagram of the
process.

Figure 1: Flow chart diagram of RST approach.


STEP 1: The first step defines the initial graph of the underlying logical structure as the original
state of . Experimental work showed that initial structures need not be elaborate. It suffices to
start with simple, well understood graphs; e.g. paths, planes, toroidal grids etc. or arbitrary
meshes.

77

The logical structure initiates the division of design space into logical units. For example, with,
say, a high-rise building the initial logical structure might reflect past experience that such tower
is primarily organized as a set of floors on top of one another. The logical-structure graph
representing such arrangement is a list of vertices, each pertaining to a floor with two adjacent
consecutive floors, connected by an edge. Obviously, all vertices have two neighbours, floor
below and floor above, except the first and the last one. An example of a logical-structure graph
is shown in Figure 2.
Floor 6

Floor 5

Floor 4

Floor 3

Floor 2

Floor 1

Floor 0

Figure 2: Initial logical structure graph of a building with 7 floors.


STEP 2: Possession of a prior abstract structure comes into force in the second stage of RST
procedure. According to the BVSR approach, using generators and varying their parameters, a
number of initial representations of the model are created. Arbitrary variation of parameters can
produce unexpected, emergent results that may have radically different geometric representations,
while the underlying structure the model graph stays the same, its topology unchanged from
variation to variation. Emergent features represent considerable, essentially insurmountable
problems for conventional CAD applications. RST handles them easily, since the underlying
structure is sustained regardless of variation. The designer can either select or define its features
in the terms of graph structure without recourse to geometric quantities.
Among the infinite number of different shapes there will be such that match conditions of a
design brief. This shifts the focus from pre-rationalising of solutions to discovering them, as
phenomena are discovered in the natural world.

78

Figure 3: Variations of a tall building shape all derived from the same axiom.
STEP 3: With the shape of a model defined we proceed to construction of feature placeholders,
which, after many refinements, will be transformed into fully resolved objects of the model. In
particular, when constructing an atomic part of the model, we define a whole host of sequences of
tasks and execute them upon selections. Notice that arbitrary sequences of tasks may be defined,
since owing to locality and acyclicity the complexity of execution remains on the same level
throughout the process.
Above described steps are not disjoined. Additional vertices representing a logical structure may
therefore be added at any stage of modelling as well as new generating functions and additional
tasks applied to selections.

2.3 Model evaluation


Next we describe how the model is evaluated. Concepts proposed in a design process are abstract,
and may be extracted from operations performed in the model graph. Operations are defined in
script language interpreted by the execution engine (Figure 4). Execution of selectors and
generators is straightforward, since they perform upon sets of vertices of without further
dependencies.
Tasks, on the other hand, are substantially more demanding owing to simultaneous
interdependencies within the model graph. The number of these is usually much greater than the
number of selectors and generators. For instance, there are several tens of thousands tasks
executed in a design of a major building, while only few hundreds of selectors and generators
may be defined.

79

Figure 4: A data model of RST approach.


Tasks are defined in script language. The order of their definitions can be arbitrary, since
ordering is handled by the execution engine. A task is executed only after the objects it derives
from are already created, i.e. inserted in . Therefore, tasks are performed in such order that a
task is executed in sequel to all tasks that create objects which receives as parameters.
Consequent to the nature of defining of tasks, such ordering is altogether possible, carried out in
an acyclic directed graph which we refer to as the dependency graph of the model. Vertices in
represent tasks while two tasks, and , are connected by a directed edge (,) if the output
of is parameter of . Obviously, the task is executed after each task from which a directed
path goes to has finished the execution - we say that depends on . Since every task is
executed at every vertex v of a given working selection, there is a check whether all objects
prescribed as parameters to were created on the relative paths from v. If not, the task is not
executed at v.
RST creation of objects in a model thus enables a high degree of parallelization. In particular,
execution of a task can be divided into as many processes as is the size of its working selection.
Additionally, any pair of independent tasks in , with all the tasks they depend on already done,
may be executed in parallel.

3 Conclusion
We have introduced a design methodology that solves many problems of modern building design
and construction which other design initiatives fail to address. Our process results in fully
resolved models of highly complex objects, delivering complete geometric, material and any
other required information for each and every part of the designed object.
In Figure 5 an example of detailed construction of a hanging curtain elements with connecting
components is shown. Be noted it was done by computation exclusively.

80

Figure 5: Construction of a complex curtain faade.

By the logic of this novel approach to design of complex objects we managed to bypass the
pitfalls of standard constraining techniques and cross-dependencies within sets of constraints in
conventional digital models. Adherence to causality and locality rules enables RST to pass from
attempts at tackling of insoluble systems of constraints to solution of multiple independent ones,
individually for each part of the model. Causality warrants non-circular dependencies between
constraints, while locality mandates locally polynomial computational complexity in their
evaluation only. Since maximum number of local constraints for each part in the model is small,
the overall complexity of solving constraints is asymptotically linear with the number of parts in
the model. Linearity thus enables RST to scale progressively and naturally with size.
Some will argue that too high price for such results is paid by giving away the coordination of the
overall model. In answer to that we point at another quality of RST, strict maintenance of model
graph the abstract, logical structure of the whole model which coordinates every operation.
Further important advantage of RST is highly parallelizable execution plan for model evaluation.
Again, locality principle ensures that each operation depends on a small number of previously
created objects in the model only, so that any number of disparate parts of the model can be
evaluated in parallel. Another advantage is that each atomic operation merely demands modest
computing resources, while the bulk of the model resides on secondary storage. RST is thus
perfectly fit for clusters of low-powered standard computers and for usage in on-demand cloud
solutions.

81

Current implementation of RST is in the phase of technology demonstration. In future we aim at


developing a GUI for handling the logical structure of the model and for definition of operations
(generators, selectors, tasks).
Given the fact that our design procedure is independent of the domain (3D artefacts to be
industrially manufactured), this procedure can be implemented in other domains where
complexities arise also, such as knowledge management, expert systems, computer programming
and like.

References
[1] Yahiaoui A., Sahraoui A.E.K., Hensen J.LM., Brouwer P.J. (2006) A systems engineering
environment for integrated building design, Proceedings of the EuSEC- European Systems
Engineering Conf., IS - 20 September, pp. 11. Edinburgh: The International Council on
Systems Engineering
[2] Grsel Dino I. (2012) Creative design by parametric generative systems, METU Journal of
the Faculty of Architecture, (29:1) 207-224, METU, Ankara, Turkey
[3] Bittermann M. (2009) Intelligent Design Objects (IDO): a cognitive approach for
performance-based design, PhD Thesis, Delft University of Technology, Delft 2009
[4] Snchez Vibk K. (2011) System structures in architecture, PhD Thesis, Centre for
Industrialised Architecture, The Royal Danish Academy of Fine Arts, School of
Architecture, Copenhagen 2011
[5] Aish R., Woodbury R. (2005) Multi-level Interaction in Parametric Design, Butz A. et al.
(eds.) Lecture Notes in Computer Science, Volume 3638, 2005, pp 151-162, SpringerVerlag Berlin Heidelberg 2005
[6] Herr C. M. (2002) Generative Architectural Design and Complexity Theory, 2002
Generative Art Conference paper, www.generativeart.com/on/cic/papersGA2002/16.pdf
last accessed Oct. 2013
[7] Northrop R. B. (2010) Introduction to Complexity and Complex Systems, CRC Press, 1st
edition, Boca Raton December 2010
[8] Luzeaux D. (2011) Engineering Large-scale Complex Systems; Luzeaux D., Ruault J.R.,
Wippler J.L. (eds.) Complex Systems and Systems of Systems Engineering, John Wiley &
Sons, Inc., Hoboken 2011
[9] Maier M.W., Rechtin E. (2009) The Art of Systems Architecting, CRC Press; 3rd edition,
January 2009
[10] Schmidt K., Wagner I. (2004) Ordering Systems: Coordinative Practices and Artifacts in
Architectural Design and Planning, Computer Supported Cooperative Work (2004) 13: p.
349-408, Springer 2005
[11] De Weck O. L., Roos D., Magee C. L. (2011) Engineering Systems: Meeting Human Needs
in a Complex Technological World, The MIT Press, Cambridge, MA October 2011
[12] Bar-Yam Y. (2004) Making things Work Solving complex problems in a complex world,
NECSI Knowledge Press, Cambridge MA 2004

82

[13] Bar-Yam Y. (2005) About Engineering Complex Systems: Multiscale Analysis and
Evolutionary Engineering, Brueckner S. et al. (Eds.): ESOA 2004, LNCS 3464, pp. 16 31,
Springer-Verlag Berlin Heidelberg 2005
[14] Hoffmann C. M., Vermeer P. J. (1994) Geometric constraint solving in R2 and R3, D.Z.
Du, F. Hwang (Eds.), Computing in Euclidean Geometry, Singapore: World Scientific
Publishing
[15] Thierry S., Schreck P., Michelucci D., Fnfzig C., Gnevaux J.D. (2011) Extensions of the
witness method to characterize under-, over- and well-constrained geometric constraint
systems, Computer-Aided Design, Volume 43, Issue 10, October 2011, Pages 1234-1249
[16] Ravnikar E., Soper R. (2012) Method and apparatus for computer-aided design of threedimensional objects to be fabricated, US Patent 8,103,484, 2012.
[17] Ravnikar E., Kmet A. (1971) Propositions: Esquisse d'une theorie de la projetation, NEUF
No. 30 Mars-Avril 1971, pages 93-128
[18] Ravnikar E. (1982) Tracciato morfogenetico dello sviluppo del manufatto, Padovano G.
(ed.) Territorio e Architettura: Metodologie sceintifiche nell'analisi e nell'intervento, Etas
Libri, Milano 1982
[19] Ayala, F. J., Grigorievich Dobzhansky, T. (eds.) (1974) Studies in the Philosophy of
Biology:Reduction and Related Problems, University of California Press, Chapter 9:
Campbell, Donald T., Unjustified Variation and Selective Retention in Scientific
Discovery, p. 142
[20] Boyd, B. (2009) Purpose-Driven Life, The American Scholar, Spring edition, p. 26-28

83

Cultural Components in Information Society


Tatjana Welzer, Marko Hlbl, Marjan Druovec, Aida Kamiali Latifi, Lija Emmelin
Verneli
Faculty of Electrical Engineering and Computer Science
University of Maribor
Smetanova ulica 17, 2000 Maribor, Slovenia
tatjana.welzer@um.si

Abstract: The increasing availability of information communication technology has

changed communication in society and enabling the easy crossing of physical borders.
With the help of global communication systems, we are able to change continents,
countries, regions, cultures and languages very easily, so easily that we are mostly
not aware that this is the mobility in a virtual world in the information society. To
bring connections and communications even closer to users, tools were developed to
support their needs in different countries and language groups (i.e. the translation of
tools into national languages).
Users of these tools and collaborative communities have to be aware that they are
writing, chatting, presenting, to/with other users who could be members of different
language groups and cultural communities. Languages like English (common language)
are breaking down language barriers but we do not have any general culture, which
would be common.
In our contribution we will concentrate us on importance of cultural issues in software
engineering, data modelling as well as evaluating quality of numbered fields through
cultural point of view.
Key Words: cultural components, information society, software engineering, data
modelling, quality

1 Introduction
We are living in the global world where we have to cope not only with the
communication in the own environment or national culture, but we have to be global,
because of general globalization and internationalization in companies, business and
education. Some sources reports [13] problems that appear in the global and distributed
communication, like lack of informal communication, misunderstandings, cultural
problems on national and organizational level, communication technologies problems,
even time zones differences as well as lack of trust between different partners.
In our contribution we will concentrate us on those which are basing on culture,
cultural awareness and cross cultural communication. More detailed, we would like to
point out culture sensitive aspects in software engineering, conceptual modelling and
quality. Numbered topics offer many open questions, like which aspects we have to take
into account, how are the expert groups acquainted with cultural components and if
some general solutions exists in spite of the fact that we do not have culture franca. To

84

conceptual modelling, we will devote most of our attention and present our experiences
collecting during last years in research of this topic.

2 Cultural Components
The world culture has grown over the centuries to reach its currently broad
understanding [1].The culture is a common and familiar word in any community, but
anyway we are confronted with many definitions of culture.
Many authors defined culture and different cultural components. Hofstede defined
culture as follows [2], [3]: Culture is a collective phenomenon, because it is shared
with people who live or lived within the same social environment. Culture consists of
unwritten rules of social game. It is the collective programming of the mind that
distinguishes the member of one group or category of people from others. For Lewis,
culture is an integrated pattern of human knowledge, a core belief, and a behaviour that
depends upon the capacity for symbolic thought and social learning [5]. Culture has also
been defined as a shared pattern of behaviour, but observing behaviour is not enough.
Emphasis should be placed on the meaning of that behaviour [4]. Culture can be
presented also as a particular way of life for a group of people, comprising the deposit
of knowledge, experience, beliefs, values, traditions, religion, notions of time, roles,
spatial relations, worldviews, material objects and geographical territory [6]. If people
adjust to cultural differences, they can better face challenges and become better in their
own profession [7].
Besides culture, we also have to clarify some other terms. Culture shock is closely
connected to culture and users in information society can be confronted with culture
shock in different ways. Culture shock leads to feelings of disorientation and anxiety
that a sojourner experiences when familiar cultural norms and values governing
behaviours are questioned in a new cultural environment [10]. It is natural to have
difficulty adjusting to a new culture. People from other cultures may have grown up
with values and beliefs that differ from our own. To understand culture shock, it helps to
understand what cultures are [10].
Between cultural components there is also the vitally important term: cultural
awareness. No or poor cultural awareness means a poor understanding of cross-cultural
dialogue, which can lead to blunders and damaging consequences, especially in
business, management and advertising, where cultural awareness seems to be of key
importance for success [2]. According to the definition, cultural awareness is the
foundation of communication, and it involves the ability of observing our cultural
values, beliefs and perceptions from the outside [10]. It becomes important in
communication with people from other cultures, and we have to understand that people
from different cultural societies can see, interpret and evaluate things in different ways.
Finally, we also have to mention cross-cultural communication, which means the
acceptance of different cultures - intercultural understanding. Voli said: The
understanding is the first step towards acceptance [4].

3 Cultural components in Information Society


Characteristics of information society including software engineering, data modelling
and quality are changing rapidly. Expert topics are more and more integrated and
connected. A lot of research is done in the area as well as in the connection with global
products and cultural sensitive aspects when organizations are multicultural or products
are developed for multicultural environment.
The changes in software engineering are recognised in transfer from plan driven
development to agile development and in transfer to distribute and multicultural
85

organizational structure. Active interaction inside teams and between developers and
clients as well as distributed work is increasing difficulty of developing software
engineering. If organizations are multicultural, one additional dimension of difficulty
appears. Even in a single unit, differences in cultural background may cause problems,
but the problems become emphasized especially in the case of distributed work [14].
The same problem also appears in software related services. Predefined processes force
developers to follow given guidelines throughout the organization, regardless of
geographical location and cultural background of employees. More careful look in real
situation confirms, that some processes are more culture sensitive than others, and the
practices is more and more oriented to the benefit of culture [14].
Similar comprehension is recognised for conceptual modelling in the sense of
cultural sensitive aspects. International projects and projects for the customer from
another lingual or cultural environment are recognized as an exception and developed
according to the special needs. In such cases conceptual modelling teams will grow up,
while language experts and experts with the knowledge of cultural-communication and
cultural awareness will be needed, to consider professional and functional cultures,
business culture and corporate cultures. Also changes in the well-known approach like
conceptual modelling can be expected [11].
According to a long list of the literature, conceptual modelling is very well defined
Steps defined for conceptual modelling are first of all oriented on detecting main objects
of the enterprise which are later presented as entities and attributes in the entityrelationship diagram which is mostly used for presenting the conceptual model.
Relationships between entities are defined as well. In the CER model (Culture based
Entity Relationship), we added steps to cover the cultural differences and awareness in
the most brother sense (countries, topics, businesses) by data base modelling. The main
purpose is to select culture and language sensitive elements that have to be discussed
and decisions about including in the development process have to be done [12].
Upgrade of existing tools should be easy enough or already available elements should
be used to put into model some flags, to mark culture dependent or culture sensitive
elements and its content.
Cultural issues as components of software engineering and conceptual modelling as
shortly presented above, are giving new possibilities also for higher quality of products.
Cultural issues in those cases mean not only country dependent parameters but also
business and domain dependent cultural issues. As a consequence, data quality as well
as information quality of products using mentioned approach improves. Influence on
TQM as well as on some of Demings Fourteen Points should be researched and
discussed in more details in the near future.

4 Conclusion
In our contribution we have tried to present the importance of cultural issues like
cultural awareness and intercultural dialogue for software engineering, data modelling,
especial global conceptual modelling and quality.
Our cultural awareness in general means that we are open to the idea of changing
cultural attitudes. We can understand it as how we like to be perceived by others or how
we are actually perceived. Cultural awareness recognizes that we are all shaped by our
cultural and lingual background, which influences how we interpret the world around
us, perceive ourselves and relate to other people [12]. According to the last statement,
that means that cultural awareness is and has to be a part of everyday life in information
society, including expert information technology life [8], [9].

86

We are used to present, use and teach information technology topics like software
engineering conceptual modelling and quality from the expert topics point of view,
where we do not introduce the importance of intercultural dialogue, cultural awareness
and language. However, the present time needs and research activities are showing us,
that multiculturalism and multilingualism are elements which are giving added value to
our work and upgrade the quality of our expert and domain activities and resulted
models.

5 References
[1] Heimbrger, A.; Sasaki, S., Yoshida, N., Venlinen, T., Lina, P., and Welzer T.
Cross-cultural collaborative systems: towards cultural computing. In Book:
Information modelling and knowledge bases XXI, (Frontiers in artificial
intelligence and applications, vol. 206). IOS Press, Amsterdam, 2010, pp. 403-417.
[2] Hofstede, G. Cultures Consequences, Comparing Values, Behaviors, Institutions
and Organizations Across Nations, Sage Publications, Thousand Oaks, 2001.
[3] Hofstede, G., Hofstede, G. J., and Minkov, M. Cultures and Organizations: Software
of the Mind: Intercultural Cooperation and its Importance for Survival, McGrawHill, New York, San Francisco, 2010.
[4] Jakkola, H.; Heimbrger, A. Cross-Cultural Software Engineering, In Proceedings:
MIPRO 2009, 32nd International Convention on Information and Communication
Technology, Electronics and Microelectronics, Opatija, Croatia, 2009.
[5] Lewis, R.D. When Cultures Collide, Managing Successfully Across Cultures,
Nicholas Brealey Publishing, London, 1999.
[6] Liu, S., Voli, Z., Gallois, C. Introducing Intercultural Communication: Global
Cultures and Contexts, Sage, London, 2011.
[7] Welzer T.; Druovec, M., Cafnik, P., Jaakkola, H., Zoric-Venuti, M. Awareness of
culture in e-learning. In Proceedings: ITHET 2010, IEEE, 2010, pp. 312-315.
[8] Welzer, T.; Druovec, M., Hlbl, M., Zoric-Venuti, M. Experiences in international
cooperation in teaching. The Journal: Elektron. elektrotech., 2010, Nr. 6, pp. 19-22.
[9] Welzer, T; Druovec, M. and Jaakkola, H. Culture sensitive aspects in informatics
education, In Proceedings: EAEEIE 2012, 23rdEAEEIE Annual Conference,
Cagliari, Italy, 2012, pp. 1-3.
[10] Welzer, T; Hlbl, M., Druovec, M., Brumen, B. Cultural awareness in social
media. In Proceedings: DETECT 2011, Glasgow, UK, ACM, 2011, pp. 1-5.
[11] Welzer, T.; Jaakkola, H., Bonai, M. and Druovec, M. Cultural awareness for the
21st century IT teaching, In Proceedings: Proceedings of 13th East-European
Conference Advances in databases and information systems, ADBIS 2009, Riga,
Latvia, 2009, pp. 209-213.
[12] Welzer, T; Tatjana, Jaakkola, H., Druovec, M., Hlbl, M. Cultural and lingual
awareness for the global conceptual modeling. In Book: Information modelling and
knowledge bases XXIV, (Frontiers in artificial intelligence and applications, Vol.
251). IOS Press, Amsterdam, 2013, pp. 271-276.
[13] Welzer, T; Jaakkola, H., Druovec, M., Hlbl, M., Brumen, B. Culture sensitive
aspects in industry, business and education. In Proceedings: 7th International
Conference on New Horizons in Industry, Business and Education, New horizons in
industry, business and education, Chios Islands, Greece, 2011, pp. 580-584.
[14] Jaakkola, H., Culture Sensitive Aspects in Software Engineering. In Book:
Conceptual modelling and its Theoretical Foundations. Springer Verlag, Berlin
Heidelberg, 2012, pp. 291-315.

87

Simulating Cultural Processes

Invited lecture:
Matjaz Perc
Faculty of Natural Sciences and Mathematics, University of
Maribor, Slovenia.
Culturomics of physics: Which words and phrases defined
the biggest breakthroughs of the 20th century?
The 20th century is often referred to as the century of physics.
From x-rays to the semiconductor industry, the human society today would be very different were it not for the progress made in
physics laboratories around the World. The information provided
in the titles and abstracts of over half a million publications that
were published by the American Physical Society during the past 119
years can be used to quantify trends of progress, and to identify the
most influential trendsetters. By identifying all unique words and
phrases and determining their monthly usage patterns, one finds
that the magnitudes of upward and downward trends yield heavytailed distributions, and that their emergence can be attributed to
the Matthew effect. The data also confirm that periods of war decelerate scientific progress, and that the later is very much subject
to globalization. These initial efforts suggest that it may be interesting to extend the study to larger corpora of scientific literature,
as well as to different fields of research.

88

Preliminary Report on the Structure of


Croatian Linguistic Co-occurrence Networks
Domagoj Margan, Sanda Martincic-Ipic, Ana Metrovic
Department of Informatics
University of Rijeka
Radmile Matejcic 2, 51000 Rijeka, Croatia
{dmargan, smarti, amestrovic}@uniri.hr
Abstract. In this article, we investigate the structure of Croatian linguistic co-occurrence
networks. We examine the change of network structure properties by systematically varying the co-occurrence window sizes, the corpus sizes and removing stopwords. In a cooccurrence window of size n we establish a link between the current word and n 1
subsequent words. The results point out that the increase of the co-occurrence window
size is followed by a decrease in diameter, average path shortening and expectedly condensing the average clustering coefficient. The same can be noticed for the removal of
the stopwords. Finally, since the size of texts is reflected in the network properties, our
results suggest that the corpus influence can be reduced by increasing the co-occurrence
window size.
Keywords. complex networks, linguistic co-occurrence networks, Croatian corpus, stopwords

Introduction

The complex networks sub-discipline tasked with the analysis of language has been recently associated with the term of linguistics network analysis. Text can be represented
as a complex network of linked words: each individual word is a node and interactions
amongst words are links. The interactions can be derived at different levels: structure,
semantics, dependencies, etc. Commonly they rise from a simple criterion such as cooccurrence of two words within a sentence, or text.
The pioneering construction of linguistic networks was in 2001, when Ferrer i Cancho
and Sol [8] showed that the co-occurrence network from the British National Corpus
has a small average path length, a high clustering coefficient, and a two-regime power
law degree distribution; the network exhibits small-world and scale-free properties. Drogotsev and Mendes [6] used complex networks to study language as a self-organizing
network of interacting words. The co-occurrence networks were constructed by linking
two neighboring words within a sentence. Masucci and Rodgers [10] investigated the
network topology of Orwells 1984 focusing on the local properties: nearest neighbors
and the clustering coefficient by linking the neighboring words. Pardo et al. [11] used
the complex networks clustering coefficient as the measure of text summarization performance. The original and summarized texts were preprocessed with stopwords removal
89

and lemmatization. For the network construction they used reversed window orientation which caused the word to be connected to the previous words with forwarding links
directions. Caldiera et al. [4] examined the structure of the texts of individual authors. After stopword elimination and lemmatization each sentence was added to the network as a
clique1 . Biemann et al. [2] compared networks where two neighboring words were linked
with networks where all the words co-occurring in the sentence were linked. From the
network properties they derived a quantifiable measure of generative language (n-gram
artificial language) regarding the semantics of natural language. Borge-Holthoefer [3]
produced a methodological and formal overview of complex networks from the language
research perspective. Liu and Cong [9] used complex network parameters for the classification (hierarchical clustering) of 14 languages, where Croatian was amongst 12 Slavic.
In this paper we construct the linguistic co-occurrence networks from Croatian texts.
We examine the change of a networks structure properties by systematically varying the
co-occurrence window sizes, the corpus sizes and stopwords removal. In a co-occurrence
window of size n we establish a link between the current word and n 1 subsequent
words.
In Section 2 we define network properties needed to accurately analyze small-world
and scale-free characteristics of co-occurrence networks, such as diameter, average path
length and average clustering coefficient. In Section 3 we present the construction of 30
co-occurrence networks. The network measurements are in Section 4. In the final Section,
we elaborate on the obtained results and make conclusions regarding future work.

The network structure analysis

In the network N is the number of nodes and K is the number of links. In weighted
networks every link connecting two nodes has an associated weight w R0+ . The cooccurrence window mn of size n is defined as n subsequent words from a text. The
number of network components is denoted by .
For every two connected nodes i and j the number of links lying on the shortest path
between them is denoted as dij , therefore the average distance of a node i from all other
nodes is:
P
j dij
.
(1)
di =
N
And the average path length between every two nodes i, j is:
L=

X
i,j

dij
.
N (N 1)

(2)

The maximum distance results in the network diameter:


D = maxi di .

(3)

For weighted networks the clustering coefficient of a node i is defined as the geometric
average of the subgraph link weights:
ci =
1A

X
1
(w
ij w
ik w
jk )1/3 ,
ki (ki 1) ij

(4)

clique in an undirected network is a subset of its nodes such that every two nodes in the subset are linked.
90

where the link weights w


ij are normalized by the maximum weight in the network
w
ij = wij / max(w). The value of ci is assigned to 0 if ki < 2.
The average clustering of a network is defined as the average value of the clustering
coefficients of all nodes in a network:
1 X
ci .
(5)
C=
N i
If > 1, C is computed for the largest network component.
An important property of complex networks is degree distribution. For many real
networks this distribution follows power law, which is defined as:
P (k) k .

3
3.1

(6)

Network construction
Data

For the construction and analysis of co-occurrence networks, we used a corpus of literature, containing 10 books written in or translated into the Croatian language. For the
experiments we divided the corpus into three parts: C1 - one book, C2 - four books and
C3 - ten books, where C1 C2 C3, as shown in Table 1.
Stopwords are a list of the most common, short function words which do not carry
strong semantic properties, but are needed for the syntax of language (pronouns, prepositions, conjunctions, abbreviations, interjections,...). The Croatian stopwords list contains 2,923 words in their inflected forms. Examples of stopwords are: is, but, and,
which, on, any, some.
Corpus part
# of words
# of unique words
# of stopwords

C1
28671
9159
371

C2
252328
40221
588

C3
895547
91018
629

Table 1: The statistics for the corpus of 10 books

3.2

The construction of co-occurrence networks

We constructed 30 different co-occurrence networks, weighted and directed, from the


corpus in Table 1. Words are nodes, and they are linked if they are in the same sentence
according to the size of the co-occurrence window. The co-occurrence window mn of size
n is defined as a set of n subsequent words from a text. Within a window the links are
established between the first word and n 1 subsequent words. During the construction
we considered the sentence boundary as the window boundary too. Three steps in the
network construction for a sentence of 5 words, and the co-occurrence window size n =
2..5 is shown in Fig. 1.
The weight of the link between two nodes is proportional to the overall co-occurrence
frequencies of the corresponding words within a co-occurrence window. For all three
parts of the corpus C1, C2, C3, we examined the properties of co-occurrence networks
91

Figure 1: An illustration of 3 steps in a network construction with a co-occurrence window


mn of sizes n = 2...5. w1 ...w5 are words within a sentence.
constructed with various mn , n = 2, 3, 4, 5, 6. Besides 5 window sizes for co-occurrence
networks, we also differentiate upon the criterion of the inclusion or exclusion of stopwords.
Network construction and analysis was implemented with the Python programming
language using the NetworkX software package developed for the creation, manipulation,
and study of the structure, dynamics, and functions of complex networks [7]. Numerical
analysis and visualization of power law distributions was made with the powerlaw software package [1] for the Python programming language.

Results
Nsw
N
Ksw
K
Lsw
L
Dsw
D
Csw
C
sw

m2
9530
9159
22305
14627
3.59
6.42
16
26
0.15
0.01
5
15

m3
9530
9159
43894
28494
2.92
4.73
9
15
0.55
0.47
5
15

m4
9530
9159
64161
41472
2.70
4.12
7
11
0.63
0.56
5
15

m5
9530
9159
83192
53596
2.55
3.79
6
10
0.66
0.60
5
15

m6
9530
9159
101104
64840
2.45
3.58
6
8
0.68
0.64
5
15

Table 2: Networks constructed from C1. Measures noted with the sw subscript are results
with stopwords included.
The comparisons of the properties for networks differing in the co-occurrence window
size are shown in Tables 2, 3 and 4. Clearly, the results show that the networks constructed
92

with larger co-occurrence window emphasize small-world properties. More precisely,


the values of the average path length and network diameter decrease proportionally to
the increase of co-occurrence window size. Likewise, the average clustering coefficient
becomes larger in accordance with the increment of mn .

Nsw
N
Ksw
K
Lsw
L
Dsw
D
Csw
C
sw

m2
40809
40221
156857
108449
3.25
4.69
18
24
0.25
0.02
9
33

m3
40809
40221
307633
207437
2.81
3.86
12
14
0.58
0.43
9
33

m4
40809
40221
445812
296233
2.64
3.54
8
11
0.65
0.52
9
33

m5
40809
40221
572463
375535
2.52
3.35
7
9
0.68
0.56
9
33

m6
40809
40221
688484
446547
2.43
3.23
6
9
0.70
0.59
9
33

Table 3: Networks constructed from C2. Measures noted with the sw subscript are results
with stopwords included.

Nsw
N
Ksw
K
Lsw
L
Dsw
D
Csw
C
sw

m2
91647
91018
464029
360653
3.10
4.17
23
34
0.32
0.03
22
64

m3
91647
91018
911277
684008
2.74
3.55
13
19
0.61
0.42
22
64

m4
91647
91018
1315888
963078
2.58
3.30
9
14
0.67
0.51
22
64

m5
91647
91018
1680848
1202869
2.47
3.16
7
11
0.69
0.55
22
64

m6
91647
91018
2009187
1409599
2.38
3.08
7
9
0.71
0.58
22
64

Table 4: Networks constructed from C3. Measures noted with the sw subscript are results
with stopwords included.
In Tables 2, 3 and 4 we also compare the characteristics of networks with the removal
of the stopwords. In addition to the proportional strengthening of small-world properties
with the increase of mn , the same phenomenon appears with the inclusion of stopwords
in the process of building the network. All of the networks show smaller network distance
measures and greater clustering coefficient with the stopwords included.
93

Furthermore, stopwords have an impact on the average clustering coefficient in a way


that increasing the corpus size with the stopwords included will result in a higher clustering coefficient, while increasing the corpus size with the stopwords excluded will result
in a lower clustering coefficient (Fig. 2). This may be explained by the high impact of
stopwords as the main hubs. Table 5 shows that stopwords are much stronger hubs than
other hubs which we gain with the exclusion of stopwords.
SW included
m2
word
i (and)
je (is)
u (in)
se (self)
na (on)
da (yes)
a (but)
kao (as)
od (from)
za (for)

degree
29762
13924
13116
11033
9084
8103
6637
5452
4773
4708

m6
word
i (and)
je (is)
se (self)
u (in)
da (yes, that)
na (on)
su (are)
a (but)
kao (as)
ne (no)

SW excluded
degree
67890
53484
42563
41188
35632
29417
22366
21919
18141
16211

m2
word
kad (when)
rekao (said)
sad (now)
rece (said)
jedan (one)
ima (has)
ljudi (people)
dobro (good)
dana (day)
reci (say)

degree
4260
2036
1494
1319
1318
1281
1264
1119
998
968

m6
word
kad (when)
rekao (said)
jedan (one)
sad (now)
ljudi (people)
dana (day)
ima (has)
rece (said)
dobro (good)
c ovjek (human)

degree
14921
5755
5142
5062
4836
4679
4406
4178
3964
3496

Table 5: Top ten hubs in networks constructed from C3.

Figure 2: The impact of stopwords on the average clustering coefficient in accordance


with the various sizes of the corpus parts. Csw (from networks constructed with stopwords included) is represented by solid lines, while the C (from networks constructed
with stopwords excluded) is represented by dashed lines. (a) m3 networks, (b) m6 networks.
Numerical results of power law distribution analysis indicate the presence of the
power law distribution. The visualization of power law distribution for 4 networks created
from C3 is shown in Fig. 3. We found that networks constructed with included stopwords
generally represent a good power law fit starting from the optimal xmin . The numeric
values of for the power law distributions shown in Fig. 2 are respectively: 2.167, 2.172,
2.339, 2.040. The networks with stopwords included have a better power law fit.
94

Figure 3: Comparison of plots. Probability density function (p(X), lower line) and complementary cumulative distribution function (p(X x), upper line) of node degrees from
networks constructed from C3: (a) m2 , stopwords included, (b) m6 , stopwords included,
(c) m2 , stopwords excluded, (d) m6 , stopwords excluded.

Conclusion

In this work we have presented multiple metrics of complex networks constructed as cooccurrence networks from the Croatian language. Since, the sensitivity of the linguistic
network parameters to the corpus size and stopwords [4, 5] is a known problem in the
construction of linguistic networks, we analyzed the Croatian co-occurrence network.
We presented the results of 30 networks constructed with the aim to examine variations
among: corpus size, stopword removal and the size of the co-occurrence window.
The results in Tables 2, 3, 4, are pointing that the increase of the co-occurrence window size is followed by the diameter D decrease, average path L shortening and expectedly condensing the average clustering coefficient C. It is worth noticing, that the
increased window size contributed to the results the same as the increase of the used
quantity of texts did, suggesting emphasized small-world properties. The larger size of
co-occurrence window plays a key role in the strengthening of properties of the smallworld networks. This observation should be considered in detail in the prospect work.
Furthermore, the inclusion of stopwords in the process of network construction causes
the same effect. It is evident from Table 5 that stopwords, although they have no strong
semantic properties, act as hubs which can be cumbersome for semantic text analysis.
The inclusion of stopwords in co-occurrence networks seems to contribute to the benefit
of power law distribution, regardless of the co-occurrence window size. We point out the
varying behaviour of the clustering coefficient (dynamics) by increasing the corpus size.
According to our results, it depends on the presence of stopwords in the corpus: increasing
95

the corpus size with stopwords included, increases the value of C, while increasing the
corpus size with the stopwords excluded, decreases the value of C.
Finally, since the size of texts is reflected in the network properties, our results suggest
that the influence of the corpus can be reduced by increasing the co-occurrence window
size. This paper is a preliminary study of the Croatian linguistic network, and more
detailed research should be performed in the future. Firstly, the results should be tested on
a larger corpus and power law and scale free properties proven. Additionally, the research
towards extracting network semantics is a new and thrilling branch of our pursuit.

References
[1] Jeff Alstott, Ed Bullmore, and Dietmar Plenz. powerlaw: a python package for
analysis of heavy-tailed distributions. arXiv preprint arXiv:1305.0215, 2013.
[2] Chris Biemann, Stefanie Roos, and Karsten Weihe. Quantifying semantics using
complex network analysis. In COLING, pages 263278, 2012.
[3] Javier Borge-Holthoefer and Alex Arenas. Semantic networks: Structure and dynamics. Entropy, 12(5):12641302, 2010.
[4] Silvia MG Caldeira, TC Petit Lobao, Roberto Fernandes Silva Andrade, Alexis
Neme, and JG Vivas Miranda. The network of concepts in written texts. The European Physical Journal B-Condensed Matter and Complex Systems, 49(4):523529,
2006.
[5] Monojit Choudhury, Diptesh Chatterjee, and Animesh Mukherjee. Global topology
of word co-occurrence networks: Beyond the two-regime power-law. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters,
pages 162170. Association for Computational Linguistics, 2010.
[6] Sergey N Dorogovtsev and Jos Fernando F Mendes. Language as an evolving word
web. Proceedings of the Royal Society of London. Series B: Biological Sciences,
268(1485):26032606, 2001.
[7] Aric Hagberg, Pieter Swart, and Daniel S Chult. Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Laboratory (LANL), 2008.
[8] Ramon Ferrer i Cancho and Richard V Sol. The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences,
268(1482):22612265, 2001.
[9] HaiTao Liu and Jin Cong. Language clustering with word co-occurrence networks
based on parallel texts. Chinese Science Bulletin, 58(10):11391144, 2013.
[10] AP Masucci and GJ Rodgers. Network properties of written human language. Physical Review E, 74(2):026102, 2006.
[11] Thiago Alexandre Salgueiro Pardo, Lucas Antiqueira, M das Gracas Nunes,
ON Oliveira, and Luciano da Fontoura Costa. Using complex networks for language
processing: The case of summary evaluation. In Communications, Circuits and Systems Proceedings, 2006 International Conference on, volume 4, pages 26782682.
IEEE, 2006.
96

Initial Comparison of Linguistic Networks Measures for Parallel


Texts
Kristina Ban, Ana Metrovi , Sanda Martini-Ipi
Department of Informatics
University of Rijeka
Radmile Mateji 2, 51000 Rijeka, Croatia
kristina.ban89@gmail.com, amestrovic@uniri.hr, smarti@uniri.hr

Abstract: In this paper we compared the properties of linguistic networks for Croatian,
English and Italian languages. We constructed co-occurrence networks from parallel text
corpora, consisting of the translations of five books in the three languages. We generated an
Erds-Rnyi random graph with the same number of nodes and links, which enabled the
comparison with linguistic co-occurrence networks, showing small-world properties.
Furthermore, the comparison of Croatian, English and Italian linguistic networks showed
that, besides expected commonalities of networks, there are also certain differences. The
networks measures across the three studied languages differ particularly in the shortest path
length. The results indicate that size of the corpus and anomalies in text affect the network
structure.
Key words: linguistic networks, co-occurrence networks, small-world, parallel texts

1. Introduction
Network analysis nowadays exhibits a growing popularity because it provides a way to
analyse real complex systems. Language is an example of a complex system and in the last
decade it has been the subject of many network based studies, highlighting the field of
linguistic networks. Various linguistic networks can be analysed such as syntax networks [24], semantic networks [5], phonological networks [6-8], syllable networks [14-15], word cooccurrence networks [1,8].
The focus of the research in linguistic networks has shifted from one language to multiple
languages. The work in [10] examines structural differences in Chinese and English by
comparing the intensity and density of the connection in networks. In [9] the network
properties of the English and German Wikipedia are compared. The paper by Liu and Jin [11]
studied language networks on multilingual parallel texts of 15 languages. One of the 12 Slavic
languages was Croatian. Network parameters were used for the hierarchical classification of
the languages.
Besides multiple language studies (language differentiation and classification) the research
communitys attention is also focused on the genre of literature or author detection, based on
the analysis of complex networks. The authors in [21] examine the correlation between the
network properties and author characteristics in terms of the clustering coefficient, in and out
degree, degree distribution and component dynamics. The corpus used included over 40 books by
eight authors in English. The work [16] investigates the properties of the writing style of five
Persian authors in 36 books. The network derived measures: degree distribution and power law exponent were used for authorship identification.

97

Our research is an initial attempt at the analysis of parallel corpora of Croatian, Italian and
English literature. We examined the comparative network properties of three languages in
terms of language and book differentiation. The parallel nature of the corpus, consisting of the
translations of five books in three languages, gives the opportunity to compare network
properties across languages and to check the translation consistency on the book level.
Section 2 of the paper presents key measures of complex networks. Section 3 discusses the
experiments set up and in Section 4 the results are shown. The paper concludes with the
discussion and further research plans.

2. Methodology
Every network is constructed of nodes and links . The degree of a node is the number
of connections that the node has. The average degree of the network is:
.
(1)
For every two connected nodes and the number of links lying on the path between them is
denoted as , therefore the average distance of a node from all other nodes is:
.

(2)

The shortest path length L is an average value of di of all nodes:


,
(3)
and the maximum distance between two nodes in the network is the diameter, D:
.
(4)
The clustering coefficient of a node is described as a probability of the presence of a link
between any two neighbours of a node. It is calculated as a ratio between the number of links
that actually exist amongst these and the total possible number:
.
(5)
The average clustering of a network
the nodes:

is the average value of the clustering coefficient of all


.

(6)
One of the commonly examined properties of real world networks are small-world properties
[1]. The network is a small-world if its shortest path length
and its clustering
coefficient
where
is the shortest path length and
is the clustering
coefficient of an Erds-Rnyi (ER) random graph with the same number of nodes and links
[22].

3. Experiments
3.1. Data
We prepared a twofold balanced corpus: parallel translations of five books in Croatian, Italian
and English. Each book was originally written in one and translated to the other two
languages. We took care that for each language at least one native book is present and the
length of the books varies from short to long (Table 1).
Every text was cleared of the table of contents, the authors biography and page numbers.
Afterwards the corpus was tokenized, the punctuation marks, special characters, and
stopwords were removed and inflected word forms were lemmatized. For Croatian we used
the stopwords list of 2922 words, for English 341 words and Italian 371 words. Table 1 shows

98

the number of words with and without stopwords per book depending on the language. For
Croatian we used the Croatian Lemmatization Server [20] for Italian and English TreeTagger
[19].
Table 1. The total number of words in the books with and without stopwords by book and by language. The
Croatian books show a smaller number of words but after the removal of the stopwords the total number of
words is higher than in the Italian and English

English
Italian
Croatian

Book
B1-EN
B2-EN
B3-EN
B4-EN
B5-EN
B1-IT
B2-IT
B3-IT
B4-IT
B5-IT
B1-HR
B2-HR
B3-HR
B4-HR
B5-HR

With
stopwords
47684
147537
27299
235245
204517
48487
156325
25523
235207
213147
44433
125997
24507
217987
198188

Without
stopwords
16372
56525
10120
89245
76476
33657
115855
20136
183435
157878
18627
59293
10973
100308
90299

3.2. Networks construction from books


We constructed a co-occurrence network for each book: 15 directed and 15 undirected from
the cleaned corpus. Words are represented as nodes and linked if they appear as adjacent
words in the text. For the directed network two words are connected with and arc if one
precedes the other. The same applies for the undirected network only the words are connected
with a link. Additionally, we also generated an ER random graph with the same number of
nodes and links for each network.
We used the Python programming language with its module NLTK [20] for text processing,
the NetworkX module [12] for the construction of the networks and analysis, and Gephi
software [13] for the manipulation of the networks and visualization.

4. Results
As shown in Tables 2 and 3, co-occurrence networks based on parallel texts share common
properties: a small shortest path length L and diameter D and a high clustering coefficient C in
comparison with its associated ER graph. The difference between the clustering coefficient of
the linguistic and the random networks varies from the minimum
29
to the
maximum
148
. The linguistic networks for all three languages thus have smallworld properties. Another shared property of the undirected networks is a higher C and
smaller L and D compared to the same measures of the directed network of the same book.
This means that undirected networks are closer to the small-world networks, which is an

99

expected result. However, there is one exception with the results for book B5 the diameter of
which increased in the undirected network.
Table 2. The results for the directed networks of five books in three languages: N - number of nodes, <k>
average node degree, CDIR clustering coefficient, LDIR shortest path length, DDIR diameter. CER - clustering
coefficient, LER shortest path length and DER diameter of ER random graph.

English
Italian
Croatian

B1-EN
B2-EN
B3-EN
B4-EN
B5-EN
B1-IT
B2-IT
B3-IT
B4-IT
B5-IT
B1-HR
B2-HR
B3-HR
B4-HR
B5-HR

N
2389
7322
1798
12126
10027
3858
9120
2269
14009
13403
4155
12610
2970
15256
15985

5.40
6.50
4.38
5.87
6.38
4.28
6.45
4.30
6.34
5.86
3.74
4.23
3.23
5.40
4.91

CDIR
0.070
0.054
0.076
0.072
0.051
0.052
0.044
0.068
0.047
0.044
0.047
0.034
0.049
0.051
0.038

Directed
LDIR
3.33
3.56
3.23
3.52
3.64
3.51
3.64
3.32
3.62
3.65
3.65
3.92
3.51
3.74
3.87

DDIR
10
13
10
12
14
13
13
10
14
14
12
13
11
13
14

ErdsRnyi
CER
LER
0.00228 4.60
0.00089 5.34
0.00247 4.53
0.00049 5.93
0.00064 5.59
0.00111 5.06
0.00071 5.51
0.00191 4.65
0.00045 5.95
0.00044 6.01
0.00090 5.09
0.00033 5.93
0.00110 4.69
0.00036 6.20
0.00031 6.25

DER
15
19
14
20
20
17
21
14
22
20
16
21
15
20
21

Table 3. The result for the undirected networks of five books in three languages: N - number of nodes, <k>
average node degree, CDIR clustering coefficient, LDIR shortest path length, DDIR diameter. CER - clustering
coefficient, LER shortest path length and DER diameter of ER random network.

English
Italian
Croatian

B1-EN
B2-EN
B3-EN
B4-EN
B5-EN
B1-IT
B2-IT
B3-IT
B4-IT
B5-IT
B1-HR
B2-HR
B3-HR
B4-HR
B5-HR

N
2389
7322
1798
12126
10027
3858
9120
2269
14009
13403
4155
12610
2970
15256
15985

10.8
13
8.76
11.74
12.76
8.56
12.9
8.6
12.68
11.72
7.48
8.46
6.46
10.8
9.82

Undirected
CUNDIR LUNDIR DUNDIR
0.145 3.32
8
0.109 3.36
8
0.155 3.30
8
0.144 3.52
8
0.103 3.60
23
0.108 3.45
9
0.088 3.35
11
0.137 3.29
9
0.096 3.42
9
0.088 3.60
19
0.099 3.58
10
0.069 3.67
11
0.098 3.54
9
0.103 3.49
10
0.077 3.77
22

100

ErdsRnyi
CER
LER
DER
0.005 3.52
6
0.002 3.74
6
0.004 3.67
6
0.001 4.07
7
0.001 4.00
7
0.003 4.08
7
0.001 3.83
6
0.004 3.82
7
0.001 4.02
6
0.001 4.12
7
0.002 4.36
8
0.001 4.67
8
0.003 4.47
8
0.001 4.31
7
0.001 4.49
8

Further analysis of B5 revealed a proportion of Latin and German, where Latin names, were
inflected in Croatian, and subsequently not lemmatized, which caused additional anomalies in
the results. The English lemmatizer failed due to the same problem too.
Table 4 presents network measures for the B5 after the removal of Latin and German words.
Compared to the initial B5 results from Tables 2 and 3 the DDIR and DUNDIR has decreased as
expected. The undirected network had changed more than the directed. The results suggest
that the Latin and German parts from the book created loops which caused CDIR to decrease. At
the same time B5 in Italian behaves differently due to the close nature of Italian and Latin,
which was partially captured during lemmatization.
Table 4. The new values for the directed and undirected networks of B5 by language.

B5-EN
B5-IT
B5-HR
B5-EN
B5-IT
B5-HR

N
9355
10674
12817

6.754
6.739
5.463

9355 62754
10674 6.739
12817 5.463

CDIR
0.054
0.051
0.042

LDIR
3.59
3.53
3.82

DDIR
13
13
14

CUNDIR

LUNDIR

DUNDIR

0.108
0.103
0.085

3.42
3.43
3.54

17
15
15

The differences across languages are presented in Fig. 1: in general, English has a higher
clustering coefficient than Croatian.
Figure 1. Values of average degree and clustering coefficient for 15 directed and 15 ER random networks
grouped in languages.

Shortest path lengths L are the highest for Croatian, in the middle for Italian and the lowest
for the English language networks as shown in Fig. 2. Similar results are presented in [12]
where it is shown that Croatian language has larger values of L and D but C twice as small
than those of English. According to the graphs shown in Fig. 2 the shortest path length seems
to be more influenced by the language than diameter. D depends on the book size, but it is
also sensitive to potential anomalies in the books language, as is previously shown for book
5.

101

Figure 2. In the first row the ratio between the diameter of the books by language for directed and undirected
networks is shown. The second row is the differentiation by the shortest path length.

5. Conclusion
In this paper we have examined linguistic networks for Croatian, English and Italian
language. The measures of 30 directed and undirected co-occurrence networks for five books
in three languages have been compared.
It has been shown that for all languages co-occurrence networks share small-world properties
and corpus-sensitivity. Corpus size and possible anomalies in the text have an impact on the
network structure in all three languages. An anomaly, such as the introduction of another
language causes that the diameter of an undirected network becomes much higher than the
diameter of a directed network as has been shown in the case of book B5. In addition, the
results show that there are expected differences between the measures for directed and
undirected networks for all three languages.
However, further examination of the measures of networks differs across languages: the
clustering coefficient of English and Italian books is closer than that of Croatian. The Croatian
language exhibits a higher path length in both directed and undirected networks, which can be
caused by the relatively free word order. The word order of English is more precise than the
Italian which is reflected in the directed networks in Fig. 2. The Croatian language also has
the smallest clustering coefficient which can indicate a richer language morphology. This
result is partly sensitive to the degraded lemmatization of Croatian, which is also grounded in
its complex morphology.
Finally, the shortest path length and clustering coefficient show language differentiation
potential and should be analysed on larger corpora to test if they may be used as language

102

classifiers. On the other hand the diameter is more related to books, which implies that it
could be used as measure of the authors vocabulary or verbosity. In further work all results
should be tested on larger corpora in more languages in order to classify authorial or book
genres from network parameters.

6. References
[1] R.F.I Cancho and R.V. Sol, The small- world of human language, Proceedings of The
Royal Society of London. Series B(268), pages 22612265, 2001.
[2] R.V. Sol, B.C. Murtra, S. Valverde, L. Steels, Language Networks: their structure,
function and evolution, Trends in Cognitive Sciences, 2005.
[3] R.F.I Cancho, R.V. Sol and R.Kohler, Patterns in syntactic dependency networks,
Physical Review, E 69, 051915, 2004.
[4] H. Liu and C. Hu, Can syntactic networks indicate morphological complexity of a
language, EPL 93 28005, 2001.
[5] J. Borge-Holthoefer and A.Arenas, Semantic Networks: Structure and Dynamics, Entropy,
12, 1264-1302, 2010.
[6] S. Arbesman, S.H. Strogatz, M.S.Vitevitch, Comparative Analysis of Networks of
Phonologically Similar Words in English and Spanish, Entropy, 12, pages 327-337,
2010.
[7] S. Arbesman, S.H. Strogatz, M.S.Vitevitch,The Structure of Phonological Networks
across Multiple Languages, International Journal of Bifurcation and Chaos, 20(2):679685, 2009.
[8] A.P. Masucci and G.J. Rodgers, Network properties of written human language, Physical
Review E, 74.026102, 2006.
[9] F. C. Pembe and H. Bingol, Complex Networks in Different Languages: A Study of an
Emergent Multilingual Encyclopedia, Proceedings of the Sixth International Conference
on Complex Systems, 3, pages 612-617, 2008.
[10] L. Sheng and C. Li, English and Chinese language as weighted networks, Physica A,
388:2561-2570, 2009.
[11] H. Liu C. Jin, Language Clustering with Cord Co-occurrence Networks Based on Parallel
Texts, Chin. Sc. Bull., 58(10), pages 1139-1144, 2013.
[12] A. A. Hagberg, D. A. Schult and P. J. Swart, Exploring network structure, dynamics, and
function using NetworkX, in Proceedings of the 7th Python in Science Conference
(SciPy2008), Gel Varoquaux, Travis Vaught, and Jarrod Millman (Eds), (Pasadena,
CA USA), pages 1115, 2008.
[13] M. Bastian, S. Heymann, and M. Jacomy, Gephi: An open source software for exploring
and manipulating networks, 2009.
[14] M. Medeiros Soares, G.Corso, L. S. Lucena, The Network of syllables in Portuguese,
Physica A, 355(2-4): 678-684, 2005.
[15] K. Ban, I. Ivaki and A. Metrovi, A preliminary study of Croatian Language Syllable
Networks, Mipro SP, pages 1697-1701, 2013.
[16] A. Mehri, A. H. Darooneh and A. Shariati, The complex networks approach for
authorship attribution of books, Physica A, 391(7):24292437, 2012.

103

[17] S. Bird, E. Klein and E. Loper, Natural Language Processing with Python, O'Reilly
Media, 2009.
[18] M. Bastian, S. Heymann, and M. Jacomy, Gephi: An open source software for exploring
and manipulating networks, ICWSM, The AAAI Press, 2009.
[19] The Center for Information and Language Processing, TreeTagger - a language
independent part-of-speech tagger, http://www.ims.unistuttgart.de/projekte/corplex/TreeTagger/, downloaded: June, 2013.
[20] M. Tadic and S. Fulgosi, Building the Croatian Morphological Lexicon, Proceedings of
the EACL2003, pages :41-46, 2003.
[21] L. Antiqueira, T. A. S. Pardo, M. das G. V. Nunesand, O. N. Oliveira Jr., Inteligencia
Artificial, 11(36), pages 51-58, 2007.
[22] P. Erds, A. Rnyi, On the evolution of random graphs, Publ. Math. Inst. Hung. Acad.
Sci. 5 (1960) 17-60.

104

An Overview of Prosodic Modelling for Croatian Speech


Synthesis
Lucia Nainovi Prskalo, Sanda Martini-Ipi
Department of Informatics
University of Rijeka
Radmile Mateji 2, 51000 Rijeka, Croatia
{lnacinovic, smarti}@inf.uniri.hr

Abstract: In order to include prosody into the text to speech (TTS)

systems prosody knowledge needs to be acquired, represented and


incorporated. Two main features of prosody important for modelling
prosody for TTS systems are duration and F0 contour. There are various
approaches to modelling those features and they can be categorized
into three main groups: rule based, statistical and minimalistic. Some
of the best known approaches to duration acquiring are Klatts model,
classification and regression trees (CARTS) and neural networks and to
F0 modelling TOBI, Fujisaki and Tilt. A procedure for automatic
intonation event detection on Croatian texts based on the Tilt model
was evaluated in terms of Root Mean Square Error values for generated
F0 contours.
Key Words: prosody modelling, speech synthesis, TTS, duration models, F0 contour
models, prosodic characteristics of Croatian

1 Introduction
The main task of speech synthesis is the generation of voice signal understandable to
listener from the text input. This implies that the synthesized speech should sound
natural, and that it should own prosodic characteristics of natural human speech.
Language conveys a wide range of information about the duration, intonation, emphasis,
grouping words into phrases, voice quality, rhythm, etc., and these features are
collectively referred to as - prosody. Prosody plays a great role in intelligibility, and
especially in the naturalness of synthesized speech.
The ability of humans of using the prosody knowledge is naturally acquired but difficult
to articulate. For synthesizing speech from a text by a machine this prosody knowledge
needs to be acquired, represented and incorporated. Therefore prediction of the prosodic
patterns directly from text is not an easy task [1]. However, for this purpose, there are
different models or algorithms that attempt to predict the prosodic elements from text.
These models vary from models based on a set of rules to data driven models, such as
classification and regression trees (CARTS) [2] and Hidden Markov models [3]. Besides
the mentioned models that tend to fall into one of the basic categories, there are models
that use additional methodology (JEMA: Joint feature extraction and modelling) [4] or
combine rule-based approach with data driven approach [5].
This paper discusses the component of prosodic analysis in TTS systems. A procedure
105

for automatic intonation event detection on Croatian texts is evaluated with Root Mean
Square Error values for generated F0 contours using Tilt.
The paper is organized as follows. In the second section basic concepts of prosody in
TTS systems are described. In the third section rule based, statistical and minimalistic
approaches to prosody acquiring are outlined. Duration models are presented in the
fourth section and F0 contour models in the fifth section. In the sixth section basic
prosodic characteristics of Croatian and in the seventh related work for languages
cognate to Croatian are outlined. A procedure for automatic intonation event detection
for Croatian speech synthesis is presented in the eighth section. The paper concludes
with our plans for future work for prosody modelling for Croatian TTS.

2 Prosody in TTS basic concepts and definitions


Prosody is a complex combination of phonetic factors which has a task to express
attitude, assumptions and draw attention as a parallel channel of communication in our
daily speech [1].
Semantic content that is transmitted via voice or text message is also called denotation,
and emotional aspects and the effects of intent that speaker wishes to convey are a part
of the message that is called connotation. Prosody plays an important role in the
transmission of denotation, and a major role in the transmission of connotations
(speakers attitude toward the message, toward the listener and toward the overall
communication event) [1].
Prosody represents the acoustic properties of speech that transmit information, which is
not conveyed by the word meaning such as emotions, discourse features, syntax [6].
Two most important prosodic features that affect the quality of the synthesized speech
are considered to be duration and F0 contour.
Duration refers to the duration of all speech particles: paragraph, sentence, intonation
unit, speech word, syllables and phonemes. However, for TTS the duration of phonetic
segments rather than the duration of words and syllables is used [7] [8]. One of the
reasons for this is that the pause (boundaries) between segments, which is one of the
most important prosodic features, can be relatively easy to determine automatically.
Research regarding the duration on the level of syllables and phonemes were mostly
focused on the duration of syllables in read speech [9] [10]. It has been shown that the
duration of vowels depends on many factors, some of which include the articulatory
context (phonemes before and after vowels), accent (both word accent and sentence
accent), position (the position of syllables in a word and speech unit).
The fundamental frequency (F0 contour) is determined by many factors such as
segmental factors (microintonation), patterns of stress, melody, rhythm, gender, attitude,
and physical and emotional state of the speaker. Two main approaches to intonation
acquiring are phonological models and phonetic (parameter based) models.

3 Basic approaches to prosody acquiring


Three main approaches to prosody acquiring have been distinguished so far: rule based
approach, statistical approach and minimalistic approach.

3.1 Rule based


Rule based approach of implementing prosody into the synthesized speech uses written
rules to predict prosodic characteristics from text. One of the best known rule based
approaches for the duration modelling is Klatts MTalk system [11]. For F0 contour
modelling the best known rule based model is Pierrehumberts system [12] in which the
106

contour is described as a series of target values which are connected together by


transition rules. The target values are expressed as locations within the current pitch
range. Which syllables within the phrase are assigned a target depends on the stress
pattern. For example, in a declarative neutral intonation, all pitch accents are high (H),
when the phrase is terminal, the phrase final tones are low-low (L-L) and if it is
nonterminal, they are low-high (L-H).

3.2 Statistical
The statistical model is trained on labelled data. Hand-labelled prosodic features are
used for parameter estimation. Parameters represent the probability of prosodic events
in the context of different linguistic features. Model is used to predict the most likely
prosodic labels on any input text.
One of the methods used in statistical approach are decision trees (CART - classification
and regression trees) [13] [14] [15]. A list of possible features must be determined, and
the system automatically selects features that have the greatest ability of prediction.
Hidden Markov Models (HMMs) is another method that can be used to predict prosodic
events. In [3] HMMs are used to predict phrase boundaries, and the model is trained on
the information about the type of word and preceding anticipated border. This approach
requires a large amount of data for model training.

3.3 Minimalistic approach


In minimalistic approach, large natural language corpuses are used to train prosodic
models, and as a source for units needed in the concatenation synthesis. There are
several instances of units (most often diphones) with different characteristics in different
phonetic and prosodic environment. One of the first systems that used unit selection
approach in speech synthesis was CHATR: a generic speech synthesis system [16].

4 Approaches to duration modelling


As mentioned before, one of the two most important prosodic features in speech
synthesis is duration. There are different approaches of duration modelling and some of
them are described in this chapter.

4.1 Klatts duration model


This model was developed in the 70ies and 80ies of the 20th ct and is an integral part of
a MITalk formant speech synthesizer [11]. It is composed of sequential rules that
include phonetic environment features, accents, shortening and lengthening of syllables
at certain positions etc. The basic assumption in Klatts model is that each segment has
its inherent duration; each rule increases or decreases the duration of the segment for a
certain percentage, and the duration of each segment cannot be decreased beyond
minimal length.

4.2. CARTs
Some of the features that can be included in the duration modelling with classification
and regression trees are phoneme identity, identity of phoneme to the left, identity of
phoneme to the right etc. There are different programs for CARTS training and one of
them is for example Wagon procedure in the Festival Speech Synthesis Systems [17].

4.3. Neural networks


Neural networks can be used in duration modelling [18]. The model first predicts the
duration of the syllable and then complements it with the phoneme duration. For each
107

syllable vector which consists of information about the number of the phonemes in the
syllable, accent, part of speech tag etc. is calculated.

5 Approaches to F0 modelling
Phonological approaches to prosodic analysis of speech use a set of abstract
phonological categories (tone, breaks etc.) to describe F0 contour and each category has
its own linguistic function. An example of this approach is ToBI intonation model [19].
Parameter based approaches attempt to describe F0 contour using a set of continuous
parameters. Such approaches are, for example, Tilt intonation model [20] and Fujisaki
model [21].

5.1 ToBI
ToBI (Tones and Break Indices) [19] takes a linguistic or phonological approach
specifying a small set of discrete labels which identify the intonational space of accents
and tones. It is used for transcribing accents and phrasing (grouping of words). ToBI
differs two pitch accents: H* or L* and four main boundary tones L-L%, L-H%, H-H%,
H-L%. One pitch accent is associated to each accented word and one boundary tone is
associated to the end of each prosodic phrase.

5.2 Fujisaki
Fujisaki model [21] describes F0 contour as a superposition of two contributions: a
phrase component and an accent component. The phrase component models the baseline
component and the accent component models micro prosodic variations. F0 contour is
generated as a result of the superposition of the outputs of two second order linear filters
with a base frequency value. The second order linear filters generate the phrase and
accent components. The base frequency is the minimum frequency value of the speaker.

5.3 Tilt
Tilt [20] is a phonetic model of intonation that represents intonation as a sequence of
continuously parameterized events (pitch accents or boundary tones). These parameters
are called tilt parameters, determined directly from F0 contour. Basic units of a Tilt
model are intonation events the linguistically relevant parts of the F0 contour.
Parameters important for events detection are rise amplitude (Hz), rise duration
(seconds), fall amplitude (Hz), fall duration (seconds), position (seconds) and F0 height
(Hz). Those parameters can be transformed into Tilt parameters:
Tilt-amplitude (Hz): the sum of the magnitudes of the rise and fall amplitudes:
Tilt-duration (seconds): the sum of the rise and fall durations:
Tilt: a dimensionless number which expresses the overall shape of the event,
independent of its amplitude or duration:

6 Prosodic characteristics of the standard Croatian language


The core of the most European languages makes the accented syllable in a stressed word
of the intonation unit, while in Croatian the core is comprised of the accented syllable
and syllable behind the accented syllable because of the differentiation of the ascending
and descending stress.
108

In Croatian, there are six different intonation cores: descending (\), ascending (/),
descending-ascending (\/), descending-ascending-descending or reversed (\ / \),
ascending and descending or complex (/ + \) and flat (-). Their distribution is not related
to the grammatical syntactic types [22].
The most common intonation beginning in Croatian is descending, after which any type
of intonation core can follow. The intonation ending is always descending or low and
flat except after a flat core, when it is high and flat. If the end of intonation core is low,
intonation ending extends into a flat, low tone.
Syllables in the standard Croatian can be accented or unaccented, long or short and high
or low (tone). In one spoken word, only one accented syllable is allowed in Croatian.
The most common accented syllable is the first syllable of the word (in about 66% of
the words in the text), then the second (in about 23% of the words), the third (6.7%) and
the fourth (1.6%) [22].
Only one syllable in a spoken word is accented and all others are unaccented. Before the
accented syllable all syllables are of high tone and short, and after of low tone and short
or long.
Long accented syllables are 50% longer then long unaccented syllables and short
accented are 30% longer then short unaccented [22].
Prosodic structure is an aspect of a prosody which refers to the fact that some words
group together and some have a break or natural pause between them. At the boundaries
between prosodic phrases we often hear a change in the rhythm of the speech or a
pause. Prosodic unit smaller than prosodic phrase and greater than phonological word is
called clitic group or spoken word. It consists of a word and proclitic or enclitic. A
clitic is a morpheme that is grammatically independent, but phonologically dependent
on another word (e.g. /koli/). In Croatian low tone accent can only be found on the
first syllable of a word and when there is a proclitic in front of a word the accent moves
from the first syllable in a word to the proclitic. If a word had three or more syllables,
the accent stays on the first syllable of a word [22].

7 Related work for languages cognate to Croatian


The Slovenian language is by prosodic characteristics similar to Croatian. Several
studies regarding prosody implementation into TTS have been conducted for Slovene.
ef and Gams [23] developed a prosody generating system for TTS. They used the
approach of duration modelling at two levels: intrinsic (type of voice, the voice
environment, record type, syllable emphasis, etc.) and extrinsic (speed of pronunciation,
position of the words within phrases and the number of syllables in a word). In F0
modelling, they differ two main phases: text segmentation on intonation units and
definition of F0 contours for specific intonation units. ef [24] also explored the
automatic accentuation of words for Slovene words. First, it was determined whether
each vowel is stressed or unstressed, and then the accents were corrected using decision
trees, and taking into account the number of accented vowels and word length.
Marini et al. [25] analyzed the automatic accentuation in the Slovene language, and
compared the human and machine capacity of accent allocation. Gros [26] recorded a
long continuous speech database and studied the influence of speech rate on the
duration of syllables and phonemes. She presented models of intonation for the
Slovenian language, based on the intrinsic level (word level) and extrinsic level (level
higher than word level).
The Czech language can to some extent be compared to the Croatian language in its
prosodic characteristics. Romportl and Kala [27] described the statistical F0 modelling,
intensity and duration of the Czech language. Tihelka et al. [28] describe a speech
109

synthesis system for Czech language which includes prosodic characteristic module
based on the unit selection approach.
Tihelka and Matouek [29] also incorporated phonetic transcription and prosodic rules
to convert an input text to its phonetic form and to estimate its suprasegmental features
in ARTIC system for Slovak. Kondelova et al. [30] proposed statistical approach for
prosody contour modelling based on sentence classification for the Slovak language.
Seujski [31] has developed dictionary of accents for Serbian designed for the Serbian
speech synthesis.

8 Automatic intonation event detection for Croatian speech synthesis


A procedure for automatic intonation event detection on Croatian texts based on the Tilt
model was proposed in [32]. In order to detect intonation events automatically, we
chose a representative set of utterances and marked four main prosodic events (pitch
accents, boundaries, connections and silences) within each utterance. Then we trained
HMMs to mark events automatically on a larger set of utterances. To extract F0 features
from the training set of utterances we used RAPT algorithm [33] as implemented in
Voicebox Matlab toolbox. The obtained F0 contours contained some noise which we
smoothed with a three point median filter. We set the F0 value to 0 Hz to represent the
unvoiced segments where F0 cannot be determined and in another attempt we used
linear interpolation to determine the missing values. Finally, we obtained three different
F0 feature sets: raw output from the RAPT algorithm smoothed and interpolated. We
parameterized the detected events with tilt parameters and generated F0 contours out of
those parameters. In order to evaluate the obtained F0 contours, we compared three
different F0 contours based on three models for automatic event detection, trained on
raw, smoothed and interpolated F0 features to the original contour. The F0 contour
synthesized using hand-labelled events was also compared with the original F0. The
usual measure for F0 contour evaluation is the root mean square error (RMSE) between
the original and generated F0 contour. The obtained results are shown in Table 1 and
graphical comparison is shown in Fig. 1.
Table 1: Root Mean Square Error values for generated F0 contours
Event label model
raw
smoothed
interpolated
hand-labelled

250

250

Original f0
Hand-labelled

150

Original f0
Raw

200

f0 (Hz)

200

f0 (Hz)

RMSE (Hz)
25.16
26.69
25.57
23.11

150

100

100

50

50

0
0

t (s)
250

150

Original f0
Interpolated

200

f0 (Hz)

f0 (Hz)

250

Original f0
Smoothed

200

4
t (s)

150

100

100

50

50

0
0

t (s)

t (s)

Figure 1: Comparison of the generated F0 contours with the original F0


110

9 Conclusion and future work


A procedure for automatic intonation event detection on Croatian texts based on the Tilt
model was evaluated in terms of Root Mean Square Error values for generated F0
contours. Three different F0 feature sets: raw output from the RAPT algorithm;
smoothed and interpolated were compared.
The results that we obtained are preliminary and we expect to get better results after we
train the model on a larger set of sentences. All F0 contours obtained from automatically
detected events have similar RMSE values, and perform comparably to the handlabelled case which encourages us to use this method in the future work. We plan to
build CARTS for Tilt parameter prediction from text. We also plan to build duration
model for Croatian and to automatically accent the Croatian words with CARTS. Then
we will incorporate the obtained duration and F0 models into Croatian TTS system and
evaluate the generated speech.

10 References
[1] Huang X.; Acero A.; Hon H. Spoken Language Processing: A Guide to Theory,
Algorithm and System Development. New Jersey: Prentice Hall, 2001.
[2] Dusterhoff K. E.; Black A.; Taylor P. Using Decision Trees within the Tilt
Intonation Model to Predict f0 Contours. Eurospeech, pp. 1627-1630, 1999.
[3] Taylor P.; Black A. Assigning Phrase Breaks from Part-of-Speech Sequences.
Computer Speech and Language, pp. 99-117, 1998.
[4] Rojc M; Aguero P. D.; Bonafonte A; Kacic Z. Training the tilt intonation model
using the JEMA methodology. Interspeech, pp. 3273-3276, 2005.
[5] Aylett M. Merging Data Driven and Rule Based Prosodic Models for Unit
Selection TTS. Pitsburgh, 2004.
[6] Fordyce C.S. Prosody Prediction for Speech Synthesis Using Transormational
Rule-Based Learning. 1998.
[7] Santen van J. Segmental Duration and Speech Timing. Computing Prosody, pp.
225-250, 1997.
[8] Santen van J. Assignment of Segmental Duration in Text-to-Spech Synthesis.
Computer Speech and Language, pp. 95-128, 1994.
[9] Kato H; Tsuzaki M; Sagisaka Y. Acceptability for Temporal Modification of Single
Vowel Segments in Isolated Words. J. Acoust. Soc. Am., pp. 540-549, 1998.
[10] Stergar J.; Erdem C. Adapting Prosody in a Text-to-Speech System. Products and
Services; from R&D to Final Solutions, 2010.
[11] Allen J.; Hunnicut S.; Klatt D. Text-to-Speech: The MITalk System. Cambridge:
Cambridge University Press, 1987.
[12] Pierrehumbert J.B. Synthesizing intonation. J. Acoust. Soc. Am., pp. 985-995,
1981.
[13] Hirschberg J. Pitch Accent in Context: Predicting Intonational Prominence from
Text. Artificial Intelligence, vol. 3, pp. 305-340, 1995.
[14] Ross K.; Ostendorf M. Prediction of abstract prosodic labels for speech synthesis.
Computer Speech and Language, vol. 10, pp. 155-185, 1996.
[15] Ostendorf M.; Veileux N. A Hierarchical Stochastic Model for Automatic
Prediction of Prosodic Boundary Location. Computational Linguistics, vol. 20, pp.
27-54, 1994.
111

[16] Taylor P.; Black A. CHATR: a generic speech synthesis system. In COLING '94,
pp. 983-986, 1994.
[17] The Festival Speech Synthesis System. [Online].
http://www.cstr.ed.ac.uk/projects/festival/
[18] Campbell W. N. Syllable-based segmental durations. Talking Machines: Theories,
Models, and Designs, pp. 43-60, 1992.
[19] Silverman K.M. et al., TOBI: A Standard Scheme for Labeling Prosody. Banff,
1992.
[20] Taylor P. Analysis and Synthesis of Intonation using the Tilt Model. Journal of the
Acoustical Society of America, pp. 1697-1714, 2000.
[21] Fujisaki H.; Ohno S. Analysis and Modeling of Fundamental Frequency Contours
of English Utterances. In Speech Communication, vol. 47, 2005, pp. 59-70.
[22] Babi S. et al. Povijesni pregled, glasovi i oblici hrvatskoga knjievnog jezika.
Globus, Nakladni zavod, Zagreb, 1991.
[23] ef T; Gams M. SPEAKER (GOVOREC): A Complete Slovenian Text-to Speech
System. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY 6, pp. 277287, 2003.
[24] ef T. Automatic Accentuation of Words for Slovenian TTS System. In
Proceedings of the 5th WSEAS International Conference on Signal Processing, pp.
155-160, 2006.
[25] Marini D.; Tuar T.; Gams M.; ef T. Analysis of Automatic Stress Assignment
in Slovene. Informatica, pp. 35-50, 2009.
[26] Gros J. Samodejno tvorjenje govora iz besedil. Doktorska disertacija, 1997.
[27] Romportl J.; Kala J. Prosody Modelling in Czech Text-to-Speech Synthesis. In
Proceedings of the 6th ISCA Workshop on Speech Synthesis, pp. 200-205, Bonn,
2007.
[28] Tihelka D.; Kala J; Matousek J. Enhancements of viterbi search for fast unit
selection synthesis. Interspeech, pp. 174-177, 2010.
[29] Tihelka D.; Matouek J.; Romportl J. Current state of Czech text-to-speech system
ARTIC. Berlin, Heidelberg, 2006.
[30] Kondelova A.; Toth J.; Rozinaj G. Statistical Approach for Prosody Contour
Modeling Based on Sentence Classification. Elektrorevue, pp. 40-44, 2013.
[31] Seujski M. Akcenatski renik srpskog jezika namenjen sintezi govora na osnovu
teksta. DOGS2002, 2002.
[32] Nainovi L.; Pobar M.; Martini-Ipi S.; Ipi I. Automatic Intonation Event
Detection Using Tilt Model for Croatian Speech Synthesis. In Information
Sciences and e-Society, Zagreb, 2011, pp. 383-391.
[33] Talkin D. A robust algorithm for pitch tracking (RAPT). Speech coding and
synthesis, pp. 495-518, 1995.

112

Increasing Well-being through IT and Simulations

Invited lecture:
Tijana Milenkovic
Dept. Computer Science and Engineering, University of Notre
Dame, Indiana, USA.
What can complex networks tell us about human aging?
Networks have been invaluable models for studying an enormous
array of real-world phenomena in many domains, such as social or
technological systems. Biomedical domain is no exception. In biological networks, nodes correspond to biomolecules such as genes
or proteins and edges correspond to physical or functional interactions between the biomolecules. And whereas genomic sequence
research has revolutionized our understanding of how cell works,
genes and their protein products carry out cellular processes by interacting with each other instead of acting in isolation. And this is
exactly what biological networks model. Thus, studying these data
via complex network analysis is expected to give further insights into
organizational principles of life, evolution, disease, and therapeutics.
This talk will focus on using our recent strategies for network alignment, network clustering, and dynamic network analysis to mine
complex biological network data, with the goal of predicting new biological knowledge. In particular, we will focus on uncovering new
knowledge about human aging, for the following reasons. Since susceptibility to diseases increases with age, studying human aging is
of societal importance. However, human aging is hard to study experimentally due to long lifespan and ethical constraints. Thus, we
will show that the above network analysis strategies can be used to
computationally identify key players in aging.

113

Informationsolutionsfordiabetespatientsand
healthcareprofessionals
MatjaTome,AndrejDobrovoljc
FacultyofInformationStudies
UniversityofNovomesto
Sevno13,8000Novomesto,Slovenia
{tome.matjaz@gmail.com,andrej.dobrovoljc@fis.unm.si}

Abstract: In the last decade a dramatic increase in diabetes is recorded


worldwide. Peoplesufferingfromdiabetesareundergreatpsychologicalpressure.
It is reflected in concerns to perform daily measurements of blood glucose,
taking into account the diet restrictions, thinking about possible health
complications and care for periodic medical examinations. The use of smart
phones, computers anduserfriendly software solutions can significantly improve
the quality of life of diabetes patients. Such solutions can help them to control
and monitor the illness more effectively.On the other hand, health care
institutions are also faced with the organizational problems. The use of
contemporary computer and mobile networks enable fast and efficient flow of
information, which is the key point to connect patients with the health care
institutions.
KeyWords:diabetes,smartphones,informationsolution,webapplication

1Introduction
According to the InternationalDiabetesFederation(IDF)in2011were366millionpeople
in the world suffering from diabetes. At thesametime,IDFnotesthatthediabetestype
2 is increasing in all countries of the world. The most patients are in the population
between 40 and 59 years.In 2011, the costs of treatment at the global level were 360
billion . This represents 11% of the total costs for all treatments of adults between 20
and 79 years of age. IDF also noted a worrying trend of increasing thediabetesillness.
Based on theIDFassumptionsthenumberofdiabeticpatientswillincreaseto552million
peopleby2030[1].
Only in 2012 the number of patients rose by 1.37%, what represents 371 million
people.Consequently, the costs of treatment increased for 1.29% and the total amount
for2012was364billion[2].
According to the Ministry of Health, there were 136.000 diabetics in the Republic of
Sloveniain the year 2010. Among them, 109.000 patients were already registered. On
other people the illness has not beendiscovered yet. In 2010, 92.500 people took
medicines for the treatment of diabetes. The direct medical costs in 2010 amounted to
87.8 million . If Slovenia would followthisglobaltrend,therewillbe205.000diabeticsin
Sloveniabytheyear2030.
These astonishinglyhigh numbers pose very important question. How can we use the
informationtechnologyinordertohelpimprovethelifeofdiabetics?
In the next section we presentthecurrentsituationontheareaofsoftwaresolutionsfor
diabetics. Section three describes proposed software solutions for patients and
114

employeesinhealthinstitutions.Thepaperisconcludedwithashortsummary.

2ExistingITsolutionsandrequirements
Before starting the design of information technology solution, we found and examined
the existing software solutions. Simultaneously, we studied the basic facts about
treatment and monitoring of diabetes. Fortunately, we got a professional assistance in
the hospital of Novo mesto in Department of Internal Medicine. During the
familiarization phase with diabetes, we found out that there are many opportunities to
improve the quality of life of diabetics. An important observation discovery for us was
the fact that these patients are under constant psychological pressure. Psychological
pressureismainlyreflectedinthe:
careforregularmedicalexaminations,
careforcarryingoutregularmeasurementsofbloodglucose,
concernforthecorrectcalculationoftherequiredinsulin,
awarenessoftheimpactofdietontheamountofglucoseinblood,
awareness of the consequences of too low or too high level of glucose in
blood.
In an effort to improve both, the qualityoflifeofpatientsaswellastofacilitatethework
of the employees in the health care, we defined the following key goalsthat oursystem
hastoreach:
bettercommunicationbetweenpatientsandhealthcareinstitutions,
notificationsystemforallkeyactorsintheprocessoftreatment,
developmentofthedecisionmakingsysteminplanningmeals,
systemforremindersforupcomingevent(e.g.regularvisitsinthehospital),
patienttrackingsystem.
Before we started with the development of solutions, we investigated the existing
software solutions and checked their functionalities. We compared them and identified
their strengths and weaknesses. We identified the following software solutions for
diabetics:
DiabetesSoftware:MakeDiabetesEasier[8]
GGCGNUGlucoControl[7]in
sugaR:Methodofoptimizingthediabetes[3].
The following table (Tab. 1) summarizes some important properties of the studied
products.Wefoundoutbigdifferencesamongthem.Theyarereflectedmainlyin:
limitationonlytosomeoperatingsystems,
potentialuseofITsolutiononsmartphones,
complexityofusage,
qualityofreportsand
lessormorecomprehensiveapproach.

115

Table1:Comparisonoffunctionalities[9]

3DesignofInformationsolution
OurcomprehensiveITsolutionconsistsoftwo parts.Thefirstoneisasoftwaresolution
intended for diabetics, and the second one for healthcare institutions. IT solution for
diabetics will be developed for smart phones.Ourintentionistoprepareapplicationsfor
all three platforms, which are popular today, iOS, Android and Windows Phone and
shouldsupportthefollowingfunctionalities:
entry of daily values
(the amount of sugar in the blood, the amountofinsulin,
thedateandtimeofthemeasurement),
decision making support (the impact ofcertainfoodsontheamountofglucose
intheblood),
reports(measurementsinagivenperiod),
identification and reporting of GPS location of diabetic (providing security for
patientbyinformingthetrusteeaboutthecurrentlocation),
reminders(automaticpatientremindersforupcomingevents),
questionnaires (electroniccompletionofthequestionnairesbeforearrivaltothe
periodicmedicalexamination),
entry and transmission of measurements to the hospital of the last three days
priortoperiodicmedicalexamination,
116

entry and transmission of measurements to thehospital incaseofbasal insulin


test.

IT solution for healthcare institutions must be designed for use asawebapplication.It


has to support all important browsers on the market today: Internet Explorer (version 8
and above), Firefox, Chrome and Opera. The main view should allowthe access to the
followingfouroptions:
addpatient,
allpatients,
questionnaire,
tableofmeasurements.
The first option Add patient is realized by using the search function in the main
databaseof all insuredcitizens inSlovenia.Theresultofsuccessfulsearch(basicpatient
data: name, second name, address etc.) is recorded to the local database of diabetic
patients. Other important data are entered manually. The second option All patients
revealsthefollowinginformation:
personaldata,
measurements,
questionnaire,
periodicmedicalexaminations.
Besides that, all calls to complete the questionnaire and notifications for periodical
examinations add reminders to the patient's mobile application. Reminders in the
patients mobile applications cant be edited or deleted. Figure`s below shows the flow
ofthedataforentitiesandindividualfeaturesofITsolution.

117

Figure1:Dataflowdiagram[9]

118

Figure2:Dataflowdiagram[9]

Proposed information solution was designed in the first place to improve the qualityof
life of diabetics. Besides that it tries to connect them to the healthcare institutions, by
simplifying and facilitating the work ofemployees.Apparently,thesolutionisusefulfor
both sides. We decided to verify the acceptance of thesolution throughthesurveyon
bothsides.Analysisofthequestionnairesshowedthefollowingfindings:
119

Detailed knowledge of IT solution has resulted in a higher belief of health


professionalsthatsolutionwillraisethequalityoflifeofdiabeticpatients.
A new way of preparationfor the periodic medical examinationis reflected in a
higherqualityofpreparationaswellasinashorterperiodoftime.
Automatized notification of patients about periodic medical examinations
simplifiesthejobofemployeesinthehealthinstitutions.
According to the responds of diabetic, the most important functions are
Reminders, GPS location and Web based questionnaires andTablesof
measurements.
According to the responds of employees, the most important functions are
Tables of measurements, GPS location, Decision making support
andSupportforperiodicexaminations.

4Conclusion
One of the ways to improve the quality of life of diabetes patients istheuseofmodern
information technologies. On the web we can find both propriety and open source
solutions to monitor and control diabetes. Analysis of the existing software solutions
showed that some of them lack the important functionality or are difficult to use. We
proposed the solution, which connects the patients with the health institutions and
bringsimportantimprovementsforbothsides.
The share of the smart phones within the customers of Slovenian biggest operator
reached one third in 2012 [4]. In the same year, 24.6% of theSlovenianpopulation had
the access to the broadband and 29.1% to the mobile broadband network [5]. 76% of
households ownedpersonal computer[6].Wecanconcludethattheinfrastructuretoday
issufficientforsuccessofproposedsolution.

References
[1]INTERNATIONALDIABETESFEDERATION(2012)Theglobalburdern.
Accessiblevia:http://www.idf.org/diabetesatlas/5e/theglobalburden(08.03.2013).
[2] INTERNATIONAL DIABETES FEDERATION (2013) IDF diabetes atlas update
2012.Accessiblevia:http://www.idf.org/diabetesatlas/5e/Update2012(08.03.2013).
[3]MOLLER,STEFFEN(2009)IntroductiontotheRpackagesugaR.Accessiblevia:
http://cran.rproject.org/web/packages/sugaR/vignettes/sugaR.pdf(12.03.2013).
[4] MOBITEL TEHNIK (2012) The proportion of smartphones steadily growing mainly
at the expense of Android. Accessible via: http://tehnik.mobitel.si/delez
uporabnikovpametnihtelefonovprinasvztrajnorastepredvsemnaracunandroid
a/(08.05.2013).
[5] RIS RABA INTERNETA V SLOVENIJI (2012) In Slovenia the share of fixed
broadbandaccessbelowtheaverageEU.Accessiblevia:
http://www.ris.org/db/26/12563/Novice/Slovenija_po_delezu_fiksnega_sirokopasov
nega_dostopa_pod_povprecjem_EU/?&p1=276&p2=285&p3=1318(08.05.2013).
[6]STATISTINIURADREPUBLIKESLOVENIJE(2012)Useofinformation
telecommunicationtechnologiesinhouseholdsandbyindividuals,detaileddata,
Slovenia,2012finaldata.Accessiblevia:http://www.stat.si/novica_prikazi.aspx?
id=5179(08.05.2013).
[7]GGC(2011)GGCGNUGlucoControl.Accessiblevia:http://ggc.sourceforge.net/
index.php?show=news(12.03.2013).
120

[8]DIGITALALTITUDESLLC(2013)DiabetesPilot:SoftwareforDiabetes.
Accessiblevia:http://www.diabetespilot.com/(12.03.2013).
[9]MatjaTome(2013)ITsolutionstoassistdiabeticpatientsandmedicalstaff.Novo
mesto:FacultyofInformationstudies:inselfpublisher

121

Long-Term-Care and Intelligent IT


in a changing demographic landscape
David A. Fabjan, Vedrana Vidulin
Department of Intelligent Systems
Joef Stefan Institute
Jamova cesta 39, 1000 Ljubljana, Slovenia
{david.fabjan; vedrana.vidulin}@ijs.si

Abstract: The paper presents an outlook on long-term-care (LTC) and the benefits that
integrated intelligent IT systems for LTC can have on the monitoring activities. Monitoring
activities are crucial in assuring persons appropriate care, safety and medical treatment.
Common monitoring scenarios that will be addressed are: monitoring behaviour, general
well-being, access control to dangerous zones, observation of results of treatments, early
detection of mental disorders and alarming the appropriate services of critical situations.
In such situations, an intelligent IT incorporating body sensors and decision support
systems can reduce costs without decreasing the quality of care provided, successfully
managing the continuous increase of elderly in professional care. In addition, such
solutions provide non-intrusive monitoring for interns.
Key words: Long-Term Care, Demographics, Intelligent IT, Monitoring infrastructure

1. Introduction
Long-Term Care (LTC) is care for people needing daily living support over a prolonged
period of time. How societies address the issue of LTC is linked to social, moral and ethical
norms, government policy and other country-specific circumstances [1].
Despite recent and justified uproar on collecting and analysing large quantities of personal data,
large groups of people still need continuous body monitoring infrastructure. Body monitoring is
a broadly used term addressing detection of unusual events related to persons age, disease or
various other personal and safety reasons.
In the future, the IT support to the ageing population that is underpinned by advancements in
sensor technologies, will present the most promising research fields for organisations and
industry. To improve LTC, the most promising approach seems the development of pervasive
and intelligent body monitoring services. This assumption is based on various, sometimes
interlinked, facts. We will present some of those facts.

122

2. Outlook on LTC
Informational age is changing our society. Old continent is experiencing declining family size
and changing residential patterns for elderly. With equal participation of females in the formal
labour market, ageing of population is putting additional pressure on the formal age care
systems.
We have large parts of a population that is becoming wealthier. They expect higher quality of
care services and demand socially verified and responsive care systems. On the other hand,
scale and affordability of care must be urgently addressed. Expert systems, such as health
systems analytics for continuous monitoring of various diseases, must come at an affordable
price, great accuracy and unobtrusive use.
Most prevalent problem for LTC is the speed at which population is ageing and age-related
illnesses. Between 1998 and 2008, the share of the population aged 65 years or older increased
by 12% across the EU, while the share of those aged 80 years and over increased by 32%. In
most countries this also led to an increase in professional aged care [2]. Such an increase is
causing pressure on fiscal stability, as well as, the increasing size of health-related LTC system.
The policies related to the age care system are important polices to be addressed by the OECD
and enlarged EU countries. This is obvious from the statistics presented in Figure 1 showing the
number of LTC workers across OECD countries expressed per 100 persons older than 80. The
statistics are collected for the year 2008 or later years when newer statistics were available. The
graph indicates a high demand for age care workers in developed countries that may soon
outgrow the size of the traditional LTC workforce. Some countries already report shortages of
workers in the sector, for example Spain and Austria.

Figure 1: The number of LTC workers in comparison to the LTC workers with full time equivalent (FTE).
123

3. Addressing the problem


Providing quality care in later life is a key concern of Europeans as they age. People want to
know and expect to have affordable quality of their later life. Aged care is thus becoming
fastest growing part of the health care system [3]. At current levels, it will need large
incremental expansion investments, or rethinking and implementing better care strategies.
The outlook on the landscape of the traditional LTC is pressed with reports on inadequate
funding levels versus actual costs for the intern. EU aged care system is experiencing rapid
demand growth due to demographic ageing, increases in expectations, and increasing
prevalence of accompanying chronic disease with lack of expert care personnel.
Age-related public expenditure in EU is projected to increase from 23.1 % of GDP in 2007 to
27.8 % of GDP in 2060 [4]. It is projected that the ratio of retirees to workers in Europe will
double to 0.54 by 2050 (from four workers per retiree to two workers per retiree) [3].
Chronic diseases, frailty, mental disorders and physical disabilities tend to become more
prevalent in older age. And it may result in a lower quality of life for those who suffer from
such conditions, while the burden of these conditions may also impact on health care and
pension provisions.
Figure 2 confirms that the number of dependents will grow at a much faster pace than health
care expenditure. The projections made in the 2009 Ageing report suggest that the number of
dependents in the EU-27 receiving formal care will more than double (up 128.1 % overall) during
the next 50 years from 9.2 million persons in 2010 to 21 million by 2060 .

We have a clear need for decisive action including more innovative policy responses to ensure
the provision of accessible and sustainable high quality care in the future. This includes
implementation of advanced technological solutions and the integration of aged care within the
broader health and hospital systems, as well as in the social system, in general. Informal and
professional care personnel are projected to increase by around 14% by 2020 in contrast to
more than 50 % increase in demand. Workforce investments have become critical to quality
care, given the labour intensity of the aged care sector, with shortages already becoming
apparent.
Part of solution is taking care for elderly in domestic environments with supervision of informal
carers1 and professional care staff such as medical doctors and other age care trained
professionals. Such monitoring and personalized treatment is not scalable with the current
means. With appropriate automation procedures, we can introduce flexible, secure, and scalable
LTC in addition to achieving higher-quality care for increasing numbers of people.
Technological advances have enabled the possibilities for seamless, safe and easy-to-use body
sensors. Available intelligent IT solutions are tailored to non-intrusively invisibly follow
interns well-being, with predefined parameters on mobile and home IT appliances. Example of

1 Informal carers are defined as family members, close relatives, friends or neighbours who provide care as non-professionals. The
care provided includes emotional support and assistance.

124

Figure 2: Projections of health care expenditures and a number of dependants by 2060.

simple expert body data acquisition systems can be a simple bracelet, with embedded relevant
sensors, and a mobile device.
Sensors are collecting body signals inputs (location, temperature, kinetics, blood pressure, etc).
Mobile device aggregates the data from all body sensors, performs simple analytics and sends
majority of data to server side for in-depth analysis.
Server side then returns the information about analysis outcome to a mobile device interface,
prompting the user to take the appropriate action.

4. IT with a multi-layered structure as a solution


The most promising architecture for elderly care IT system is the multi-layered architecture that
complies to the EU policies and standards regarding health-care systems. The architecture
comprises of four layers.
Technological Information Infrastructure (TII) layer is the infrastructure intended to constantly
(and online) collect a persons data during their daily activities. The monitoring can be done
everywhere, from the patients homes to buildings whenever they are equipped with the
corresponding sensors.
Interpretation and Decision Support (DIDS) layer interprets the data collected at the TII layer
and transforms it into information necessary to support decision making of relevant parties. For
medical doctors, DIDS can provide various basic diagnostic parameters (i.e., those obtained

125

from individual sensor signals) and combine them into higher-level measures, indicators or
parameters.
Managing and Security layer is comprised of an architecture and a data model designed
according the health data handling EU security standards. The layer implements security
policies, such as Access Control Policy, Incident Response and Recover Policy, and Privacy and
Confidentiality Policy.
Human Interface layer includes user-interfaces for presenting the patient responses either to
medical staff or to their friends and relatives. This layer combines the outcomes of all other
layers.

5. Recently finished or undergoing Department's research


contributing to the topic of health IT systems
Confidence - The main objective of project Confidence (Ubiquitous Care System to Support
Independent Living) is the development and integration of innovative technologies to build a
care system for the detection of abnormal events (such as falls) or unexpected behaviours that
may be related to a health problem in elderly people [6]. The project was presented at the
European Parliament, together with the project Chiron.
Chiron is developing an integrated framework for personalized healthcare at home, in a
nomadic environment and in the hospital. A patient is equipped with wearable sensors, which
continuously monitor his condition. The sensors are connected to a smartphone, which issues
warnings and advises the patient based on a personalized health assessment model [7].
Commodity12 project is building a multi-layered multi-parametric infrastructure for
continuous monitoring of diabetes type 1 and 2. System will exploit multi-parametric data to
provide healthcare workers and patients, with clinical indicators for the treatment of diabetes
type 1 and 2. It will focus on interaction between diabetes and cardiovascular diseases [8].

6 Conclusion
Good part of future LTC projects must, and will provide, research into an infrastructure and
intelligent services with various levels of monitoring. Affordable expert monitoring can provide
analytics on progress for most common chronic diseases with kinetic and movement symptoms.
Quality advances in professional aged health care can be achieved through rapid investment
into intelligent IT assistive technologies and by giving higher priority to consumer directed
care that meets individual needs. With such approach, it is possible to reduce costs of LTC

126

without decreasing quality of life. Common goal is to enable people in LTC to live
independently with only specific event related contacts to professional and informal carers.
These standardised, inexpensive off-the-shelf technologies are able to collect, store and analyse
physical and mental states of the LTC intern and assess related information in order to provide
personalized recommendations.

References
[1] L. Rachel Ngai, Christopher A. Pissarides. Welfare Policy and the Distribution of Hours of
Work. Paper. Centre for Economic Performace. England. 2009.
[2] OECD Stats 2010 2011.
[3] G. Carone, De. Costello. Can Europe Afford to Grow Old. Finance and Development,
International Monetary Fund. 2006.
[4] EUROSTAT. Sustainable development - Demographic changes. 2011.
[5] European Comission, European Economy - Economic and budgetary projections for the EU
27 Member States 2008-2060. Report. 2009.
[6] Confidence FP7 project, http://dis.ijs.si/confidence.
[7] Chiron ARTEMIS project, http://www.chiron-project.eu
[8] Commodity12 FP7 research project, www.commodity12.eu.

127

Perception of privacy in social networks among youth


Dejan Fortuna, Bla Rodi
Faculty of Information Studies
University of Novo mesto
Sevno 13, 8000 Novo mesto, Slovenia
{Bla.Rodi}@fis.unm.si

Abstract: The paper presents the results of a study in which we assessed how the
younger Facebook users in Slovenia perceived their privacy in social networks. We
postulated a research question "To what extent are the users of Facebook aware of the
privacy related issues?" and a hypothesis "Understanding of privacy on Facebook is
different from the real world as Facebook users are willing to share more personal data
online than in the real world." We carried out a quantitative study among younger
Facebook users, and the results suggest that users have a limited awareness of the
threats privacy in social networks, and that they assign less importance to privacy in
social networks than in the real world as they are likely to share more personal
information in social networks than on the street.
Key Words: online social networks, privacy, Facebook, personal data privacy policy

1 Introduction
The Internet has brought new threats to the security of personal data and privacy of
users of online services. In the field of personal data and privacy security the online
social networks present a distinct challenge, both in terms of security policy of such
networks and in the users attitude to their privacy.
In Slovenia in year 2010, 60% of respondents had a profile in at least one social
network. Younger users dominate social networks, since 75% of all pupils or students
use at least one social network [1].
Our empirical observation is that the users of social networks feel less exposed in
making virtual contacts than in real life, and willing to share a lot of personal
information with the wide circle of friends. The reason is, in our opinion, the virtual
nature of social networks, where individuals have a different attitude towards other
persons than in real life, and trust has a different meaning. This is primarily reflected in
the size of the user's network of friends which is much higher than in real life. The
difference is also that real life friendships are defined differently than in social
networks. [2]
If we look at the use of social networking sites in terms of time spent, the most
popular network is Facebook. In mid-March 2011, 627,360 people in Slovenia have
used Facebook, which is 51% of those who have ever used the Internet, and 37% of the
Slovenian population between 10 and 74 years of age. Most users are aged between 25
and 34 years, followed by the age group of 18 to 24 years [3].
Jones and Hiram Soltren [4] state that privacy in Facebook is jeopardized by three
key factors: (1) users reveal too much personal information; (2) Facebook does not take
adequate measures to protect personal privacy and (3) third parties, mainly advertising
128

companies are actively looking for information about the end user. Only when these
three factors are put together, we can the whole problem of privacy on Facebook.
According to the report Privacy and Human Rights 1999, privacy is jeopardized by
three major trends: globalization (removing geographical restrictions on the flow of
data), convergence of technologies (these are mutually increasingly interoperable), and
multi-media (data in some form can change quickly) [5].

2 Methodology
In our study we observed how users perceive their privacy in social networks compared
to its perception in the real world and how well they know the issues of privacy and
security measures in social networks. Due to the prevalence of Facebook, we have
focused our research on the users of this online social network.
We have asked the following research question: "To what extent are the users of
Facebook aware of the privacy related issues?"
At the same time, we postulated the following hypothesis: "Understanding of
privacy on Facebook is different from the real world as Facebook users are willing to
share more personal data online than in the real world."
In the case of social networks, it seems that the word privacy is really losing its
meaning. Many people are reluctant to share information such as their hobbies and birth
date with casual acquaintances in the street. But it seems that if the same person were to
add us as a friend on Facebook, reluctance would fade. This is in our opinion due to the
physical separation from "friends" and events in the virtual world of online social
networks. Because of this separation individuals have a stronger sense of security in
their virtual interactions than in real life interaction.
Our research sample included Facebook users within our personal social network
and in our friends social networks. Sample was further limited by age and included
only users between 16 and 25 years of age. This limit was imposed to limit the
relevancy of results to the age group which dominates the social networks [3]. Because
of sample size (201 respondents) and the geographical dispersion of the sample we have
chosen online survey as our research tool. We used the tool 1KA, which is available
online at https://www.1ka.si/. The tool is free for use and is developed in the context of
social informatics study program at the Faculty of Social Sciences in the University in
Ljubljana.
In order to determine how Facebook users perceive privacy in this network, we used
a questionnaire, which consisted of 27 questions, of which the first three questions were
demographic. We present the main findings of the research.

3 Results
Our survey was at least partially completed by 201 of 353 respondents (57%), which is
a good response for an online survey. The percentage of fully completed surveys was
slightly lower, at 50%. 61% of respondents were male and 39% were female, a
significant difference with the demographic data for a previous nationwide research
where 51% of users were male and 49% female [6]. Most of respondents have
completed four years of secondary school (62%), with 13% followed by respondents
with completed high school or Bologna first level study (Figure 1).

129

Figure 1: The educational level of the respondents


With the next set of questions we wanted to see how the users take care of their safety
on Facebook by using some basic tools such as signing out and using a secure
connection. The majority of respondents (85%) which access their account via a public
computer, always sign out, while many users which use Facebook on their own PCs and
mobile devices never sign out. 41% of own PC users and 52% of mobile device users do
not sign out (Figure 2).

Figure 2: Do you sign out of Facebook after you stop using it?

Next we asked users about the size of their networks, and how many people they know
only on Facebook. The most common (21%) is a networks with 201 to 300 friends,
closely followed (20%) by a network of 701 or more friends, and 18% of networks have
from 301 to 400 friends, and then (16%) networks with 401 to 500 friends (Figure 3).
According to the RIS, the average user in 2012 had 130 friends [7].

130

Figure 3: How many friends you have on Facebook?


The next question also referred to the number of friends, but here we wanted to find out
how many friends the users only met on Facebook. The result is quite surprising and
may also worrying, as 18% of users have 91 or more friends that theyve never met
outside of Facebook (Figure 4). However, 44% of the users have only up to 10
Facebook-only friends. Facebook has introduced the option to categorize friends in
several groups according to what we want to share with them, which can be used to
protect ones privacy. In our research, 78% of the respondents were aware of this option.
However, only one third of these 78% use this option.

Figure 4: How many of your Facebook friends do you know only through Facebook?

We also asked the users whether they have changed the default Facebook privacy
settings. The majority of users share their posts only with their network, and 74% have
changed the default settings (Figure 5).

131

Figure 5: Have you changed the default privacy settings?

The terms of use state that the user must provide accurate personal information when
registering [8]. We asked respondents what personal information they added to their
profile and whether the information is correct. The results are shown in Figure 6. Basic
personal information such as name was entered by 99% of users, probably to be found
and contacted by their real life acquaintances more easily. Many users also entered
information about their birthday (82%). Of these 82%, 64% have entered the entire birth
date, while 36% of users entered only the day and month of birth. Upon registration, we
can also provide an e-mail address, which can be hidden from other users. 46% of
respondents entered their e-mail address but do not share it, while 50% also share their
e-mail address with friends. Other information that users share includes their place of
residence (51%), education (76%), workplace (27%), hobbies (25%) and telephone
number (13%).

Figure 6: What information about yourself do you disclose on Facebook?


When it comes to sharing personal information it seems that people are willing to share
a lot more online than in real life. Therefore we also asked Facebook users what
personal information would they be willing to share on the street with a stranger or a
casual acquaintance (Figure 7). From the answers we can conclude that we are willing
to share personal information only with some people as expected. More than half are
132

willing to share personal information in certain circumstances (sometimes). Basic


personal information (name and surname) would never be shared by 11% of
respondents, which is significantly more than 1% of respondents on Facebook, and an
even greater percentage of respondents would never share their birth date (29%
compared with 18% on Facebook) and phone numbers (only 5% would always share
their number in the street, compared with 13% who share their phone number on
Facebook).

Figure 7: What personal information would you be willing to share on the street with a
stranger or a casual acquaintance?

4 Discussion
In our study, we presented a research question and one hypothesis.
Research question: "To what extent are the users of Facebook aware of the privacy
related issues?"
Users are to some extent aware of the problems related to privacy. Thus they
limiting access to their profile and posts, but they also sometimes add complete
strangers to their network. One statistic that has improved since 2008 is the changing of
default privacy settings. In 2008 only 20% of users changed the default privacy settings,
compared to 75% of the respondents in our survey. However, few people sing out of
Facebook after use. More than half of the respondents did not read the terms and
policies of registration, although they had to confirm that they did when registering.
Hypothesis: "Understanding of privacy on Facebook is different from the real world
as Facebook users are willing to share more personal data online than in the real world."
Although in our opinion it is easier to determine whether you can trust a person
when talking with them live than in the virtual world, the perception of privacy has
changed, not only in the virtual space, but also in real life. As stated by the founder of
Facebook, privacy is no longer a social norm and everything is public, everything is to
be shared [9]. It is not surprising that according to the findings of our research the
young people are much more willing to share personal information with acquaintances
and strangers on Facebook than in the street. The hypothesis can therefore be
confirmed.
133

We can conclude that today's youth has a relaxed attitude towards privacy in social
networks and are prepared with to share their personal data and information about their
private lives with many casual acquaintances. Although this is a disturbing due to the
potentials for abuse, there is also a bright side to this phenomenon: the world is
becoming a global village in which geographical restrictions have a declining influence
in shaping of communities, and we can postulate a new hypothesis for the future
research: in online social networks, young people socialize regardless of differences in
race, nationality and religion.

References
[1] Jerman Kueliki, A.; Lebar, L. and Vehovar, V. Socialna omreja 2011,
http://www.ris.org/db/13/12076/RIS%20poro%C4%8Dila/
Socialna_omrezja_2011/?&p1=276&p2=285&p3=1318, 2011, downloaded: 19. 3.
2013.
[2] Gross, R.; Acquisti, A. Information revelation and privacy in online social
networks. Workshop on privacy in the electronic society,
http://privacy.cs.cmu.edu/dataprivacy/projects/facebook/facebook1.pdf, 2005,
downloaded: 3. 3. 2013.
[3] Raba interneta v Sloveniji (2011) Na Facebooku skoraj 630.000 prebivalcev
Slovenije, http://www.ris.org/index.php?fl=2&lact=1&bid=11980&
parent=27, downloaded: 20. 3. 2013.
[4] Jones, H.; Hiram Soltren, J. Facebook: Threats to privacy,
http://groups.csail.mit.edu/mac/classes/6.805/student-papers/fall05papers/facebook.pdf, 2005, downloaded: 30. 4. 2013.
[5] Kovai, M. (2003) Zasebnost na internetu. Ljubljana: Mirovni intitut, 2003.
[6] Raba interneta v Sloveniji (2013) Na Facebooku skoraj 750.000 slovenskih
uporabnikov, http://www.ris.org/db/27/12535/Raziskave/Na_Facebooku_
skoraj_750000_slovenskih_uporabnikov/?&p1=276&p2=285&p3=1318,
downloaded: 17. 7. 2013.
[7] Raba interneta v Sloveniji (2012) Povpreen Facebook uporabnik ima 130
prijateljev, http://www.ris.org/db/26/12323//Povprecen_Facebook_
uporabnik_ima_130_prijateljev/?q=prijateljifacebook&qdb=26&qsort=0,
downloaded: 19. 7. 2013.
[8] Facebook. Statement of rights and responsibilities,
https://www.facebook.com/legal/terms, downloaded: 21. 5. 2013.
[9] Kui, L. J. Poutim se kot nori okoljevarstvenik iz estdesetih let. Delo Sobotna
priloga, 12. november 2011, pg. 1819.

134

New Ways to Manage Communication with Customers on


the Internet
Andrej Kovai, Ph.d.
Faculty for Media
Leskokova 9D, 1000 Ljubljana, Slovenia
{andrej.kovacic@ceos.si}
Nevenka Podgornik, Ph.d.
School of Advanced Social Studies in Nova Gorica
Gregorieva 19, 5000 Nova Gorica, Slovenia
{nevenka.podgornik@fuds.si}
Abstract: Internet is expanding rapidly and as such it seems to provide

numerous potential advantages to marketers in terms of reducing costs,


expanding time horizons, and increasing the organizations reach. As
a result, the academic discussion on this topic is abundant. However
the majority of literature too often addresses the analysis of
consumers and the effects of marketers separately and from economical
perspective. We believe the combined approach focusing on the need for
a relationship hasn't received the attention it should deserve in a
transition to the digital economy. The main aim of this article is
thus to provide an analytical evaluation of communication between
consumers and marketers forming and maintaining a long-term
relationship on the internet.
Keywords: internet, communication, relationship marketing

1. Introduction
Relationship marketing can be defined, according to Kotler [3], as the "... practice of
building long-term satisfying relations with key parties - customers, suppliers,
distributors - in order to retain their long-term preference and business". The principal
aim of relationship marketing is thus to build, develop, and maintain the relationship
with the customer as an individual, rather than target an anonymous mass broken up
into homogenous segments. This approach is according to Cova [2] justified by the
following arguments:
If the market can no longer be cut up into stable segments, the only alternative
is to address consumer individually.
If the consumers are unpredictable, it is not as important to predict their
behavior as to be able to react immediately to their aspirations through
maintenance of a continuous relationship.
If the consumers wish to differentiate themselves, they ask for personalized
products and services, which only a continuous, close relationship can assure.
Although the argumentation may be new, the concept of relationship marketing is not.
Different marketing techniques have long been tested to establish a personalized and
135

individual relationship with the customer. Nevertheless, it is with new information and
communication technologies used on the internet that some of relationship marketing
strategies become affordable and economically acceptable (for example with Facebook).
In a market share statistics (Cross-Channel Marketing Report based on a 2013 survey of
nearly 900 companies) just 22% of companies are conducting no relationship marketing
at all, whereas 30% companies say they are very committed to relationship marketing
and 46 % committed a certain extent [11].

2. Change in value for consumers


Besides enabling some new marketing strategies, the new environment has completely
changed the way companies communicate and deliver value to their customers. On the
internet the burden is on marketers to animate consumers who are now actively
involved in their own message construction. In fact many marketers are saying that this
phenomenon is changing the whole field of marketing in fundamental ways.
The first factor which affects building a relationship with consumers is technology.
Internet, for example, makes certain strategies affordable and expands the capacity for
relationship marketing. As Sisodia and Wolfe [9] argue, there is a " ...symbiotic
relationship between technology advances and the change in marketing paradigm
toward relationship marketing." This cycle can clearly be seen in Figure 1.
Figure 1: A cycle between IT and Relationship marketing [9]

Increased
Competition

Changing
Customers

Greater
Demand for
Relationship
Marketing

Increased
Competition

New
Technology
Development

IT
Revolution
Relationship
Marketing
More
Affordable
& Effective

A greater demand for relationship marketing triggers new technology development.


With new technology relationship marketing becomes more affordable and effective.
Figure 1 also shows us that the incentive for relationship marketing comes in part from
the developments in informational technology, from rising expectations of the
customers, greater competitive pressures, and changes in the marketing thinking.

136

The second driver towards relationship marketing is a change in customer behavior.


Some marketing principles may never change, for example customers trust good brands
and companies. Nevertheless, as the new interactive medium gains on importance and
companies are loosing their market share, many marketers want to understand which of
their over and over tested concepts might need modification.
Internet is not like other media and only by understanding what its characteristics are
marketers may extend their internet marketing benefits to their full capacity. In the new
environment marketers have to learn new "rules on the market" to be able to implement
the (sometimes radical) necessary changes towards relationship marketing. By
understanding the new environment marketers may at the same time discover that many
of these "rules" speak in favor and may to some extent justify a relationship marketing
approach.

3. New "rules" for marketers


With the informational age marketers face changes in the consumer markets and new
rules can be summarized with the following points. First internet becomes a collection
of niche communities. Despite the millions of users, the internet must not be understood
as a mass medium. Secondly, there is a power shift from a passive audience to online
consumers who actively select which internet content they are interested in. They decide
what marketing information they will receive, about which products and services, and
under what conditions. In online marketing, the consumer, with a control of the mouse,
controls the interaction. Third, markets face increased velocity in delivering value to
their customers and in changing the environment. Everyone, including the competitors,
knows instantly what the other is doing. Fourth, knowledge management becomes vital.
As Strauss and Frost [10] stress, the technology automatically records actions of users in
a digital format that can be easily, quickly, and mathematically calculated and analyzed.
Marketing managers can track the results of their strategies as they are implemented.
Organizations build large databases of information. This information can then be turned
into marketing knowledge and can provide a strong tool for implementing relationship
marketing strategies. Finally there is now, a shift towards a new matrix organizational
structure and thus towards a new interdisciplinary approach in marketing [10].
Marketing department can no longer develop and implement strategies alone. Social
sciences and consumer psychology play an increasingly important role in
understanding0 the true nature of the relationship between consumers and companies.

4. Forming a relationship do more with existing consumers


A company's demand can derive from new customers and repeat customers. Traditional
marketing theory and practice have focused on attracting new customers while little was
done to keep them. This preoccupation with customer acquisition rather than customer
retention has been criticized as a 'leaky bucket' approach to business [5] As long as
enough new customers are acquired to replace those existing customers lost through the
hole in the bucket, success in the form of sales is achieved. Mudie [5] has estimated that
most organizations lose significantly more than 30 per cent of their customers before or
at the time of a repurchase decision, mainly through poor service and neglecting the
collaboration. In the past growing markets meant a sufficient supply of new customers.
Companies could keep filling the marketing "bucket" with new customers without
worrying about losing old customers through holes in the bottom of the "bucket" [5].
Today, on the other hand, as Kotler and Armstrong [4] argue, companies are facing
some new marketing realities and the emphasis is shifting. Among these are: changing
demographics, a slow-growth economy, competitors that are more sophisticated, over137

production capacity and fewer new customers. Consequently the costs of attracting new
customers are rising and now according to Kotler and Armstrong [4] it costs five times
as much to attract a new customer than to keep the current ones. Additionally,
companies are also realizing that losing a customer means not only losing a single sale
but losing the entire stream of purchases that a certain customer does in a lifetime.
Finally, the problem of traditional transaction marketing production is that increasingly
turbulent and fragmented market demands a greater and greater variety of goods and
services. Transaction marketers focusing on sale tend to respond to these needs by
developing more and more products. However, at the end, as Payne [6] argues
companies only bombard their customers with too many choices from which the
customers cannot find the right one for their individual needs - customers do not want
more choices. They want exactly when, what, where, and how they want it. A
technology can be combined with a relationship marketing knowledge to make it
possible for companies to deliver better services and products. First interactive database
technology permits companies to gather large amounts of data on individual customers'
needs and preferences. Second, information technology and flexible manufacturing
systems enable companies to customize large volumes of goods or services for
individual customers at a relatively low cost [7]. Apart from designing strategies to
attract new customers and create transactions with them, companies are now practicing
relationship marketing. Emphasis is on maintaining profitable long-term relationships
with customers by creating superior customer value and satisfaction. The shift towards
the relationship marketing thinking can be seen on Figure 2.
Figure 2: The shift towards relationship marketing

The relationship concept at first appeared to be only suitable for a niche market of rich
clients. Today modern IT, particularly the new interactive medium Internet, provides an
opportunity to bring personalized and customized products to the mass market at a
138

mass-produced price. Nevertheless, this concept requires new thinking that breaks away
from the traditional concepts of transaction-oriented mass marketing and mass
production. The most important differences are summarized in the Table 1.
Table 1: The differences between transactional and relationship marketing [1]
Transactional
paradigm
concept

Relationship
paradigm
concept

Comments

Market segment

Individual
customer

Transactional marketing identifies a statistical


customer - the hypothetical human who is
composed of statistically averaged attributes
drawn from research.
Relationship marketing focuses on individual
customer needs.

Duration of
transaction

Lifetime
relationship

Margin

Life-time

Market share

Most valued
customers

Mass marketing
Direct marketing
monologue
Passive
consumers

Empowered
clients

The pursuit of customer loyalty is more of a


journey
The justification of a relationship approach is
the life-time value of these prospective
customers, not the unit sale
For companies implementing relationship marketing relationships with the customers are
more important than the current and unstable
market share.
The new marketing requires feedback as a twoway communication
Transactional marketing is all about seduction
and propaganda and it depends on a passive,
narcotized receptor - the legendary "couch
potato"

Despite the apparently clear separation, many companies find themselves somewhere
between the concepts [8]. This is happening mainly because the companies are often
reluctant to change quickly and radically especially in this early stage of relationship
marketing adoption where everyone is testing new concepts and waiting for results. In
the future, however, wider gaps between companies implementing and those not
implementing relationship marketing are expected.

5. Conclusion
Understanding internet and relationships that can be sustained over a long period is
undoubtedly important for marketers implementing strategies in the future. On the one
hand, internet empowers the relationship marketing. On the other hand, implementing
technology itself is not enough. The two forces are empowering each other and
therefore instead of addressing them separately we need to focus on the combined
sociological as well as economical approach.

139

The basic idea of relationship marketing is to focus mainly on keeping the existing
customers. It seems that the parallelism between the forces of technological change and
radically new thinking in marketing has been highly fortuitous. Internet has emerged
just in time to allow marketers to implement many essential aspects of relationship
marketing. The new technologies have become more affordable, enabling marketers to
deploy them more widely. Internet nature as a collection of niche communities and new
rules, such as power shift, increased velocity, the importance of knowledge
management, and the interdisciplinary focus, can all be seen as reasons for applying
relationship marketing on the Internet. To what extent relationship marketing can be
applied to other markets (suppliers, referral, internal, influence and recruitment) remains
to be seen. In addition it is yet to be investigated what the cross-influences between the
markets are and what additional benefits can be expected when relationship marketing is
applied to every aspect of a company.
Internet has made the relationship between the marketer and consumer increasingly
more personalized whether consumers or companies like this change or not. This
enables greater potential for one-to-one relationship marketing, a new relationship
marketing paradigm. Internet one-to-one, interactive nature is thus a good environment
for building customer relationship. It is bringing companies and customers closer
together to learn from each other. A vital role in relationship marketing is the
information about the consumer's needs. The necessary information is increasingly
received through one-to-one as opposed to the previous many-to-many communication
model.
Internet has also been particularly influential in driving marketing toward mass
customization. Information has to be turned into marketing knowledge used for mass
customization so that companies can deliver products/services tailored to the individual
needs. Mass customization heralds a new era in product design, marketing as well as
manufacturing. For many products, customization now can be accomplished on a mass
basis for little or no more cost than standardized production.

References:
[1] Chaffey, Dave, Mayer, Richard, & Johnston, Kevin (2000): Internet Marketing:
Strategy, Implementation and Practice. Harlow: Pearson Education Ltd.
[2] Cova, Bernard (1999): "From Marketing to Societing: When the Link Is More
Important than the Thing." Rethinking Marketing: Towards Critical Marketing
Accountings. Ed. Douglas Brownlie, Mike Saren, Robin Wensley et al. London,
Califomia, New Delhi: SAGE Publications, pp.: 64-84.
[3] Kotler, Philip (1997): Marketing Management. Analysis, Planning, Implementation
and Control. 8th ed. New Jersey: Prentice Hall International, inc.
[4] Kotler, Philip, & Armstrong, Gary (1997): Marketing: An Introduction. 4th ed. New,
Jersey: Prentice- Hall International, inc.
[5] Mudie, Peter (1997): Marketing: An Analytical Perspective. London: Prentice Hall
Europe
[6] Payne, Adrian (2000): "Relationship Marketing: The UK Perspective." Handbook of
Relationship Marketing. Ed. Jagdish N. Sheth, Atul Parvatiyar. Thousand Oaks:
Sage Publications Inc., pp.: 39-69.
[7] Peppers, Don, Rogers, Martha, & Dorf, Bob (2000):"Do You Want to Keep Your
Customers Forever." Markets of One, Creating Customer-Unique Value through
Mass Customization. Ed. James H. Gilmore, Joseph B. Pine II. Harward: Harward
140

Business Review Books, pp.: 75-99.


[8] Rapp, Reinhold, & Giehler, Miriam (1999): "Relationship Marketing im Intemet."
Handbuch Relationship Marketing: Konzeption und erfolgreiche Unsetzung Ed.
Adrian Payne, Reinhold Rapp. Mnchen Verlag Franz Vahlen GmbH, pp.: 275-291.
[9] Sisodia, Rajendra S., & Wolfe, David B. (2000):"Informational Technology: Its Role
in Building, Maintaining and Enhancing Relations." Handbook of Relationship
Marketing. Ed. Jagdish N. Sheth, Atul Parvatiyar. Thousand Oaks: Sage Publications
Inc., pp.: 525-56.
[10] Strauss, Judy, Frost, Raymond, & El-ansary, Adel (2008). E-Marketing. 8th ed.
New Jersey: Prentice Hall International, inc.
[11] Web Cross-Channel Marketing Report (2013). available on
http://econsultancy.com/si/reports/cross-channel-marketing-report

141

Selecting crossplatform for mobile application


development - food market communities LOKeT
Matej Mertik
Faculty of Information Studies
University of Novo mesto
Sevno 13, 8000 Novo mesto, Slovenia
{matej.mertik}@fis.unm.si

Abstract: Ensuring food security and promoting sustainable agricultural practices are
two major global challenges for European agriculture that will be in the front line for
many of the challenges that European society is facing today implementing different
strategies and top-down approaches to address there issues [1]. In today information
society these top-down approaches are not sufficient instruments to be successful
enough. In this paper we present some of developing issues by LOKeT, a Pilot
Programme of local food market mobile services, that was designed to support an
alternative bottom-up open source approach by strengthening local food production.
The project was co-financed under European Regional Development Fund and
Slovenian Ministry of Education, Science and Sport.
Key Words: mobile application, phone web-based application, local food systems,
opensource

1 Introduction
Sustainable development might be additional supported with utilizing innovative
approaches to using mobile services easily connecting consumers and producers. One of
such service present opensource developed platform LOKeT, that was developed in a
collaboration between academia and local industry. LOKeT in a crossplatform
opernsource mobile application that easily connects producers and consumers of
local/regional food where user can plan a trip with collecting his weekly basket of local
food. It addresses the producers (touristic farms, family farms, processors) in defined
area on the one side and consumers of local produced food on the other. LOKeT enables
following functionalities: (a) easy entry of producers products, (b) easy entry opening
hours during the weekend, (c) creating and editing producer local market (add or
remove of products), (d) edit the properties of the product (price, availability) for
producers. Following consumers functionality are supported: (a) access to the list of
producers around the current position of the consumers, (b) access to the map according
to a list of products, (c) list of basket items, (d) display the distance and direction to
producers. The LOKeT opensource platform was developed for various mobile
platforms. Due to limited time of development and requirements by supported platforms
- iOS, Andorid, Windows phone and Blackberry - the selection of appropriate platform
and development tool was necessary. Like all things in software development, also by
LOKeT development this depended from specifications and requirements as there are
no silver bullets.

142

2 Multiple phone web-based application framework


With mobile device manufactures each having its own preferred development
environment, a growth mobile phone application developments that are World Wide
Web capable, there has arisen web based application frameworks to help developers
write applications that can be deployed on multiple devices. A multiple phone
web-based application framework is a software that is designed to support the
development of phone applications that are written as embedded dynamic websites and
may leverage native phone capabilities. There are many platforms on the market. In this
paper we presents two possible approaches (PhoneGap, Appcelerator Titanium) with
strengths and weaknesses and finally selected LOKeT platform.

2.1 PhoneGap
is a mobile development framework produced by Nitobi, purchased by Adobe Systems.
[2,3]. It enables software programmers to build applications for mobile devices using
JavaScript, HTML5, and CSS3, instead of device-specific languages such as
Objective-C or Java. The resulting applications are hybrid, meaning that they are neither
truly native, all layout rendering is done via web views instead of the platform's native
UI framework, nor purely web-based, they are not just web application, but are
packaged as applications for distribution and have access to native device APIs. Using
PhoneGap as multiple framework for LOKeT requirements specification was
considered for its development.
Strengths
it is small and simple
it is web browser-based application (any native platform that supports a web
view can be a PhoneGap platform)
you can registering native code to receive messages from the web view are
very modest
its simple native extensions can be developed rapidly
supports HTML, CSS, JavaScript web languages
Weaknesses
the quality of the user interface in a PhoneGap application will various
the Webkit-based rendering engine on iOS is strong, Android is limited
lack in standard cross-browser issues web developers
cannot be extended with native user interface
very few native APIs are exposed to PhoneGap applications by default, which
makes platform integration limited

2.1 Appcelerator Titanium


is a platform for developing mobile, tablet and desktop applications using web
technologies. It was introduced in December 2008 [4,5]. iPhone and Android-based
mobile applications were added later with support for developing iPad-based tablet and
BlackBerry devices. Appcelerator Titanium Mobile is one of several multiple phone
web-based application frameworks allowing web developers to apply existing skills
(such as JavaScript) to create native applications mentioned platforms.
Strengths
access to a wide array of native features and functionality out of the box
143

reduce the functionality gap between Titanium and pure native application
extensions with visional components
the look and feel of a Titanium application, using common UI widgets
high level native programming API in JavaScript

Weaknesses
native platform is a massive undertaking
some of user interface components may yet perform as well as their native
counterparts

3 Crossplatform for LOKeT development


As might seems a rather simple decision, this is usually not a case. Based on philosophy
of opensource LOKeT application and its properties of architectural design, the
PhoneGap as Titanium meets most of their requirements. However considering some of
opesource platform philosophy following disadvantages of PhoneGap where taken into
the consideration: (a) strong iOS and limited Android webkit rendering engine, (b)
PhoneGap philosophy. PhoneGap web applications are wrapped in a native application
shell, and can be installed via the native application stores for multiple platforms. In one
aspect this could be a reality in future, however no platform makes web applications
first class citizens today. Developing in such a way also usually means strong restriction
by extension with native user interfaces. On the other hand Titanium seems a lot more
complex development tool, where there are still bugs and a weakness in user interface
components that may yet not perform as well as their native counterparts. However high
level native programming API in JavaScript, access to a wide array of native features
and functionality out of the box, and a real mobile application at the end (not web
applications wrapped in a native shell) are the reasons that Appcelerator Titanium was
used as the crossplatform for LOKeT opensource development.

4 Conclusion and final remarks


LOKeT, a Pilot Programme of local food market mobile services, was designed to
support an alternative bottom-up open source approach. The project was co-financed
under European Regional Development Fund and Slovenian Ministry of Education,
Science and Sport and represents an opensource platform where producers and
consumers of local food can be easily integrated. As the specifications requirements
were strict by supporting a wide shell of mobile application platforms (Android, iOS,
WindowsPhone, BlackBarry) the use of crossplatform development framework was
necessary due to project time scope. In this article we presented some of the strengths
and weakness of today actual mobile platforms and the reasons for selecting appropriate
one due to this particular project. Some final remarks still exists. Using a crossplatform
showed as an obstacle when finalizing some of the UI elements for different platforms.
It showed up some weaknesses that were mentioned in comparison and properties
before. It has been also noticed that for future development of opensource only one
native platform will be published considering complex nest structure of sentences if
providing solution for multiple platforms. In this sense we selected the Android as most
used opensource application platform for future development.

144

5 Acknowledgementsons
The development of the application was co-financed under European Regional
Development Fund and Slovenian Ministry of Education, Science and Sport [6].

9 References
[1] Opinion of the Committee of the Regions on Local food systems (outlook
opinion), Official Journal of the European Union, C 104/1
[2] Adobe Announces Agreement to Acquire Nitobi, Creator of PhoneGap.
Adobe.com. 2011-10-03.
[3] PhoneGap, PhoneGap, http://phonegap.com/, downloaded: October 1st 2013
[4] Appcelerator Raises $4.1 Million for Open Source RIA Platform. Techcrunch.
[5 ] Appcelerator, Appcelerator, http://www.appcelerator.com/, downloaded: October 1st
2013
[6] Faculty of information studes. LOKeT, http://loket.fis.unm.si, downloaded:
October 1st 2013
[7] IITP, Building food market communities with the Opensource LOKeT Project,
http://www.iitp.org.nz/conference2013/MatejMertik, downloaded October 23th
2013

145

Simulating Business Processes

146

TAD methodology for process pattern assessment in weakly


defined organizational formations
Jernej Agre, Nadja Damij
Faculty of Information Studies in Novo mesto
Sevno 13, 8000 Novo mesto, Slovenia
{jernej.agrez, nadja.damij}@fis.unm.si

Abstract: This paper presents approach of assessing process patterns within weakly
defined organizational with the TAD methodology that was developed for business process
reengineering. The significant contribution of the paper is set of proposals, how the
methodology should be adjusted in order to make it fully capable of mapping process based
patterns within highly unstable organizational environments. We also reveal importance of
individuals that are present within organizational environment and that are contributing to
the instability of the organizational formations. Not only that they influence process
patterns dynamic, based on activity flow, they create important deviations from primary set
list of entities, causing highly unstable organizational structure.
Key Words: TAD, process patterns, colouring, organizational formation, entity

1 Introduction
Process patterns within weakly defined organizational formations can be assessed with
adjusted methodology that was originally developed for process analysis in formal
organizations. In this paper we first introduce theoretical background of process
architecture and patterns within organizational environment. We also present motivation
why to use TAD methodology, when analyzing process patterns. In following section we
present reality show Survivor that we used as a case of environment built of weakly defined
organizational formations. Later in this section we continue with reasoning adjustments of
the TAD methodology in order to be fully capable of mapping organizational formations`
process patterns. Before the conclusion we describe pattern based behavior of
organizational formation, that we were able to detect during the research.

2 Process architecture
Process architecture is structural element of organization, present in global enterprises
[1], as well as within informal organizations [2]. It defines structure of the processes and
relations among them from perspective of whole organization or individual organizational
fragments [3]. Process architecture can be identified and modeled [4], with aim of better
understanding process based structure, established within observed organizational
environment. Through the modeling and diagrammatic description we get clear insights
which are patterns that form process architecture [5] and get the ability of categorizing
patterns to those, that highly influence process based organizational behavior and those who
emerge without any important influence [6].

147

2.1 Process patterns in formal organization


Stable, well-structured process patterns that form process architecture appear as goaloriented structures that follow specific business objectives [7]. Such patterns present
healthy core of business processes and can be used as a foundation for business process
reengineering [8]. For the purposes of pattern based reengineering, pattern repository is
created that is put in use a form of cloud. Within the cloud, functional patterns are used to
create those parts of business process, where they can create highly optimized effects [9].
At the same time we can identify also semi-structured patterns that deviate from a stable
structure, but can still be mapped and optimized. Beside these there are left activity flow
based fragments that create no structured pattern, but can be mapped as loose block
structures [10].

2.2 Process patterns in organizational formation


Organizational formations as an informal social structure are themselves patterns of
relations among individuals that are actively involved in the formation [11]. Existing
relations are structural links that create the shape of organizational formation, but at the
same time goal orientation is a primary driver of formation`s existence and runs its
processes [12]. Organizational behavior of such formations is constrained by involved
individuals and attributes they bring to the formation [13], which means that highly
fluctuating environment creates unstable process framework and low maturity process
management. If formation is capable of creating a stable entity core of relation based,
connected individuals, this motivates the rise of process stability, but at the same time this
is not necessary capability of process control and management [14]. Even though available
internal process control mechanisms are not put in any form of systematic and mature
usage, limited instability still creates an organizational environment that allows detecting,
mapping and predicting process patterns on the basis of stable entity core.

2.3 TAD methodology for pattern assessment


Tabular application development methodology is built of several phases that are used to
map and describe process oriented organizational system [15]. TAD presents an ideal
methodological solution for assessment of process patterns within business organizations,
because it creates a process map based on activity flow and entities that are related to
different activities. In comparison with other methodologies that are mostly based on
diagrammatic presentation of activity flow, not concerning aspect of entities, TAD brings us
the possibility of build process patterns based on entities also within less formal
organizational environments.

3 Survivor reality show process patterns assessment case study


The survivor reality show is a competition, produced by the seasons. One season lasts
thirty nine days and includes twenty participants that are competing for the same goal, a
reward of one million dollars. The main objective of the competition is to avoid the
elimination from the game and to become the last person of the original group of
participants that is selected for a winner by a jury of previously eliminated competitors. At
the early stage of the game, participants get divided into separate groups. This creates a
motivation for the rivalry between the groups and usually puts a competition within the
single group on the second place. Later during the game, groups merge in a one group,
148

where every single participant aims to remain in the game as long as possible and
eventually becomes sole survivor, the only remaining player, who wins a prize. The only
way to get eliminated out of the game is, beside serious illness or injury, to be voted out by
other participants. Ambition to avoid the elimination creates alliances within the groups that
present good examples of goal oriented organizational formations.

3.1 Organizational formation and process definition


According to limited timeline and diverse dynamic of the Survivor season that we
analysed, we were able to detect that organizational formations emerged with low stability
and were subjected to constant changing. To be able detected patterns, we divided the
timeline in four phases that follow the unfolding of the survivor competition: early phase,
the phase before merging, phase after merge and phase with only 5 players remaining. All
four phases consist of the same processes and entities that are formed on the basis of
attributes that create most obvious divisions among participants. To be able to track process
patterns, occurring during all four phases, we followed the emergence of different
formations, which we defined as a pair or group of participants that work systematically to
achieve the set goal. Every formation consists of at least one entity that is formed out of at
least one participant. Through such approach we used Activity tables of TAD methodology
as the main platform and add elements that describe formations, entities and individual
participants. We implemented also previously tested pattern colouring for easier recognition
and comparison of the patterns.

3.2 Results
Results that we collected through the process pattern mapping, reveal us differences
among emerged organizational formations within the reality show that led dynamic of the
game from its beginning until the final phase. Among sixty five identified activities we
detected different patterns that were used by organizational formations during different
phases of the game. In the first phase of the game the crucial pattern was building a strong
alliance that affected the success during immunity and reward challenges, causing
irreversible damage to the formation that was not able to establish strong internal
connections. In a second phase it was of key importance to release original ties and
selectively maintain the physical and mental strength of the formations in order to face
other competitors as hard as possible. With the third phase emergence of small but stable
formations brought success along with the ability to win challenges that are part of the
game, and final phase switched importance to pairs, being able to set up proper strategy of
winning the final jury voting. All critical patterns are based on activities that contribute to
setting, sustaining and adjusting or abandoning formations at the time that individuals, who
populate them would consider to be perfect in order to get them closer to the final goal.

149

Figure 1: Process architecture and pattern colouring

4 Conclusion
The important attribute of weakly defined organizational formations, when trying to
map and assess their process patterns, is a low level stability of the formation itself, which
consequently shortens existence of certain patterns, which are not necessarily highly
unstable. To be able to detect, and at the same time evaluate dynamic of the patterns, we
must observe organizational behavior of formations in several time sections. More sections
can we identify during the whole timeline, more precise can we assess the changes among
organizational formations and their process patterns. According to the highly unstable
environment, it is of crucial importance to be able to assess patterns not based only on
relation entity-activity, but also on relation individual-activity. Even though we have the
ability to define entities within organizational formations, they are subjected to the
individual decision making and behavioristic twists, what introduces additional dynamic,
based on entity change. To be able to fully understand process evolution patterns within
weakly defined organizational formations, entity change in an unstable environment
presents a good topic for further research.

150

5 Acknowledgements
Work supported by Creative Core FISNM-3330-13-500033 'Simulations' project funded
by the European Union, The European Regional Development Fund. The operation is
carried out within the framework of the Operational Programme for Strengthening Regional
Development Potentials for the period 2007-2013, Development Priority 1:
Competitiveness and research excellence, Priority Guideline 1.1: Improving the
competitive skills and research excellence.

6 References
[1] Barros, O; Julio, C. Enterprise and process architecture patterns. Business Process
Management Journal, 17(4): 1463-7154, 2011.
[2] Morton, S.C; Brookes, N.J; Smart, P.K; Backhouse C.J; Burns, N.D. Managing the
informal organisation: conceptual model. International Journal of Productivity and
Performance Management, 53(3): 214-232, 2004.
[3] Barros, O. Business process trends, http://bptrends.com/publicationfiles/05-07-ARTBusiness%20Processes%20and%20Design-Barros.pdf , downloaded: July 25th, 2013.
[4] Gersch, M; Hewing, M; Schler, B. Business Process Blueprinting an enhanced view
on process performance. Business Process Management Journal, 17(5): 732-747, 2011.
[5] Paul, C. J. The process of building a Process Manager: Architecture and design patterns.
IBM Systems Journal, 46(3): 479-495, 2007.
[6] Wang, L; His-Yung, F; Cai, N. Architecture design for distributed process planning.
Journal of Manufacturing Systems, 22(2): 99-115, 2003.
[7] Andersson, B; Bider, I; Johannesson, P; Perjons, E. Towards a formal definition of goaloriented business process patterns. Business Process Management Journal, 11(6): 650-662,
2005.
[8] Barros, O. Business process patterns and frameworks. Business Process Management
Journal, 13(1): 1463-7154, 2007.
[9] Nowak, A; Fehling, C; Leyman, F. Pattern-driven green adaptation of process-based
applications and their runtime infrastructure. Computing, 94(1): 463-487, 2012.
[10] Ouyang, C; Dumas, M; Hofstede, A; van der Aalst, W. Pattern-based translation of
BPMN Process Models to BPEL Web Services. International Journal of Web Services
Research, 5(1): 42-62, 2008.
[11] Bratton, M. Formal versus informal institutions in Africa. Journal of Democracy,
18(3): 96-110, 2007.
[12] Toni, A, F; Nonino, F. The key roles in the informal organization: a network analysis
perspective. The Learning Organization, 17(1): 86-103, 2010.
[13] Scott, R. Informal politics. The Review of Politics, 69(3): 497-502, 2007.
[14] Rglinger, M; Pppelbuss, J; Becker, J. Maturity models in business process
management. Business Process Management Journal, 18(2): 328-346, 2012.
[15] Damij, T. An object-oriented methodology for information systems development and
business process reengineering, Journal of Object - Oriented Programming, 13(4): 23-34,
2000.

151

Inclusion of tacit knowledge in the simulation of


business processes
Grzegorz Majewski, Nadja Damij
Institute for process management,
Faculty of Information Studies, Ulica talcev 3, Novo mesto, Slovenia
{g.majewski, nadja.damij}@fis.unm.si
Abstract. This paper investigates the potential methods of including tacit knowledge
into the business process simulation agenda. It identifies key characteristics of tacit
knowledge and how they can relate to the business process simulation. After that it
proposes a generic approach based on the widely known concept of black box
technique. It distinguished the various situations that may occur when considering
simulation of tacit knowledge.
This work is supported by Creative Core
FISNM-3330-13-500033 'Simulations' project funded by the European Union, The
European Regional Development Fund. The operation is carried out within the
framework of the Operational Programme for Strengthening Regional Development
Potentials for the period 2007-2013, Development Priority 1: Competitiveness and
research excellence, Priority Guideline 1.1: Improving the competitive skills and
research excellence.
Key Words. business process simulation, business process modeling, tacit knowledge,
black box

1 Introduction
Knowledge is widely regarded as a critical asset in modern organizations
(Teerajetgul and Chareonngam, 2008). It has been seen as a source of income generation
and a way to gain competitive advantage in profit-oriented organizations. However; in
order to achieve these goals organizations face the challenge of properly managing their
knowledge. Knowledge management can be perceived as a fundamental process for
managing organizations (Bailey and Clarke, 2000). This process is both technology- as
well as people-dependent. Knowledge itself has been classified according to a variety of
criteria. One of such classifications was done by Nonaka and Takeuchi (1995). This
classification distinguishes explicit and tacit knowledge. The first kind is easy to be
codified into a form (e.g. written or electronic one), which can be transferred to others.
Tacit knowledge is less tangible therefore processes of its generation and transfer are
more complex.
Tacit knowledge may be perceived as a combination of education, training and
real life experiences. An individual may acquire tacit knowledge such as mental models,
technical skills through observation, imitation and practice (Teerajetgul and
Chareonngam, 2008). Furthermore tacit knowledge can be either perceived as such
because of intrinsic difficulties in sharing it (e.g. amount, experiences that have to be
lived through) or because the person, group or organization bearing it may be not fully

152

aware of it. Polanyi (1996) provides an insight into the essence of tacit knowledge in the
words We know more than we can tell. Further clarification of these words can be in
the form of examples such as riding a bicycle or swimming. Once these activities are
learnt it is difficult to pass the knowledge associated with them. They may however be
learnt by observation and experiencing them. Tacit and explicit knowledge is however in
a very close connection to each other and distinction between them must be treated with
caution. Tacit knowledge may be possessed by itself; however explicit knowledge must
rely on being tacitly understood and applied (Senker 1995, p. 426).
As the person, group or organization progresses through real life experiences it
acquires a lot of information. Some of this information is acquired unintentionally. Brain
is known to record and store experiences from the senses; however not all of these
experiences are fully processed. Some of the experiences and information may be
quickly and effortlessly recalled, while others are difficult to remember. They may
however be recalled in special circumstances, when there is a vital need for that. Another
example of tacit knowledge that may exist on the edge of the awareness is the daily
routines. These almost mechanical activities that (although may differ in nature
depending on the position) are performed in a similar way across the organization from a
simple manual worker to a top executive. Quite often a given person when asked how a
given goal or task was achieved is unable to give a detailed response. This truth is a
common fact experienced by any business process analyst. Moreover tacit knowledge is
usually built on top of experiences over many years. In order to effectively generate
business knowledge, either explicit or tacit, it is a necessity for business leaders to imitate
experiences that built that knowledge in the first place. It is a common truth that business
process simulation and modelling are an effective way to analyze even time consuming
business processes that can span a wide timeframe. In other words business simulations
provide a field of interaction where multiyear experiences are created in compressed
timeframes (Lefebvre, 2011). In this view business process simulation serves as a tool
that can be utilized to both gain insight into not fully recognized business processes as
well as tacit knowledge transfer method that mimics the long-life experiences.

Literature Review

Nowadays organizations have to operate in the unknown environments, which


moreover change rapidly. Decisions in such environments are often guided by intuition
and lifelong experience. Moreover they often have far-reaching consequences and need
to be taken in a very limited timeframe. There is a crucial need to understand the nature
and role of tacit knowledge as well as possible ways of its dissemination both within the
organization and externally. One way to achieve that is from a process perspective, which
is concerned with the accumulation of implicit knowledge acquired over time in
organizational processes (Venkitachalam and Busch 2012, p. 356). This perspective may
be however not enough in order to adequately comprehend all of the intangible dynamics
intrinsic to tacit knowledge generation, adoption and diffusion. Apart from that tacit
knowledge is recognized as highly contextual (Busch, 2008). Moreover its transfer,
interpretation and application requires multiple stakeholders. Therefore only discussing
what tacit knowledge is or pure investigation of its features should have lower priority (in

153

both scholarship as well as business dispute) than examining how tacit knowledge may be
better made use of (Venkitachalam and Busch 2012, p. 357).
There is a crucial need for clarification and understanding of tacit knowledge
significance and potential application in certain knowledge management domains. Some
of these domains include: the role of tacit knowledge in organizational learning, tacit
knowledge codification and transfer techniques, the influence of tacit knowledge on
intellectual capital, the use of tacit knowledge in the communities of practice, group
aspect of tacit knowledge (knowledge networks and teams) and relation of tacit
knowledge to the technology.
Modern organizations have realized the need and advantages of investments in
developing employee capabilities as part of their training and work environment.
However; the influence of employee profile in the use of tacit knowledge is not
adequately evident in the literature (Venkitachalam and Busch 2012, p. 364).
Knowledge management discipline in the past focused too much on technology
(e.g. expert systems) and therefore the human factor was often overlooked. This
approach gained a significant amount of criticism as it focused its attention on the
technology itself and design of e.g. intelligent machines using for instance artificial
intelligence (AI) techniques, which was often not adequate to the real world challenges.
Instead knowledge management advocates the design of tools, techniques and
technologies that augment the human capabilities. Apart from that there are a lot of
studies that examined the meaning and definition of tacit knowledge and comparatively
very few studies, which have investigated analyzing tacit knowledge (Venkitachalam and
Busch 2012, p. 364).

Individual vs. group tacit knowledge


Knowledge (both explicit and tacit) can be related and analyzed at the level of
individual or group (community or organization) (Nonaka and Takeuchi, 1995;
Merx-Chemin and Nijhof, 2005; Teerajetgul and Chareonngam, 2008; Venkitachalam and
Busch 2012). It is expected that at the individual level knowledge will mainly consist of
tacit knowledge, which is not typically articulated, but may be codified depending upon
the circumstances. At the group level one can expect a greater share of explicit
knowledge. This is quite obvious given the fact that knowledge sharing processes (which
require some sort of codification) occur more often at the group level. Some authors
regard procedural knowledge as a form of tacit knowledge (Colonia-Willner, 2004;
Sternberg and Hedlund, 2002; Bossen and Dalsgaard 2005). This sort of knowledge is
usually used to carry out daily activities and is relevant to the person making use of it
(individual level). On the organizational level this type of knowledge becomes a practical
intelligence to the organization (Venkitachalam and Busch, 2012).
The role of groups, teams, communities, networks in the modern organizations and
their approach to tacit knowledge is of crucial importance (Jorgensen, 2004). There are a
variety of factors that influence knowledge sharing processes within teams: e.g. trust,
sense of belonging, composition of teams, culture and technology. Another important

154

thing to remember is that knowledge and in particular tacit knowledge is sticky by nature
(Szulanski 2003). This indicates that the more valuable the (tacit) knowledge, the less
likely the individual, group, team, community or society is to share or transfer it out. In
this view sharing or transferring such knowledge may mean losing the competitive
advantage over other individuals, groups, teams or organizations. Some authors reveal
that in some cases sharing of knowledge (and in particular tacit knowledge) causes the
individual or team to become less important to the organization (Desouza and Evaristo,
2004). Moreover the more time and resources were devoted to generating such
knowledge the less likely its sharing or transfer is to occur.

Black-Box (BB) approach

Tacit knowledge due to its nature, as it was explained in the previous sections,
may pose difficulties while being included in the business process simulation. Moreover
characteristics of the tacit knowledge may differ from one industry to another and from
one type of organization to another. In this paper authors would like to propose a generic
approach that can be utilized regardless of the type of industry, organization or particular
characteristics of tacit knowledge. In order to provide such a generic solution it is
necessary to realize that tacit knowledge is very closely related to the individual, group or
community and it is very difficult to separate it from the underlying base when
recognizing the very contextual nature of tacit knowledge, it makes little sense to attribute
properties to knowledge that does not exist outside human consciousness
(Venkitachalam and Busch, p. 361). Therefore the first step in the inclusion of tacit
knowledge in the business process simulation is therefore the identification of the
resource (e.g. individual, group or community) that possesses the tacit knowledge, which
allows higher productivity. In other words if a given individual, group or community
achieves better results than the others it may be an indication that it possesses a strategic
tacit knowledge, which may increase the overall productivity. Inclusion of this tacit
knowledge may provide a greater insight into the internal process-flow. This section will
further investigate the potential solution on how to include tacit knowledge in the
business process simulation. It will describe an approach originating from the
well-known Black-Box concept.
Black-box is a term in computer science, which denotes that it is possible to
observe, measure and analyze input(s) and output(s) of a given system, device, program,
object, module or application; however it is not possible to have an insight into the
internal mechanics of the process behind translating input(s) into output(s) (Delinchant
et al., 2007, p. 369). Black-box approach, due to its characteristics, may establish a very
good generic starting point for inclusion of tacit knowledge into the business process
simulation. As it was explained previously tacit knowledge is intrinsically related to
individual, group or community. Therefore it may be not possible to measure its impact
on business processes directly, but it may be possible to measure the difference in how the
input(s) are translated into output(s) in two distinct cases. The first case is when a
given resource (individual, group or community) that posses the tacit knowledge is
present (i.e. business process takes full benefit of the tacit knowledge or rather a resource
that possesses it). The second case occurs when the resource possessing the tacit
knowledge is artificially removed from the business process or replaced with a resource,

155

which only possesses a generic explicit knowledge. Process analyst can easily compare
the output(s) of both cases and deduct how important is the tacit knowledge to the
successful completion of the business process, how much time or resources can be spared
with the use of this particular tacit knowledge and how well the input(s) will be translated
into desired output(s). Such an analysis is relatively easy to be carried out and should bear
close similarities to the common tasks performed by process analyst(s). Moreover its
ease of use meets the criteria for a generic approach (i.e. one that can be used in any
situation with relatively low effort/cost). Once the information obtained through this step
is available to the process analyst can decide whether there is a need to further investigate
what particular tacit knowledge may be involved in a particular business process.
In some cases the reason for knowledge to remain in tacit state might be due to the
cost. Whether a particular bit of knowledge is in principle articulable or necessarily tacit
is not a relevant question in most behavioral situation. Rather, the question is whether the
costs associated with the obstacles to articulate are sufficiently high so that the knowledge
in fact remains tacit (Nelson and Winter 1982, p. 82). At this point process analyst
should calculate the costs associated with the obstacles related to the articulation of the
tacit knowledge. Should these costs be higher than the potential gains (as observed in the
previous step) it would be advisable to simply disregard the tacit knowledge in the
simulated process(es). However it is important to remember that this situation may
change in the future (i.e. costs of articulation of tacit knowledge may be lower than the
potential gains); therefore it is sensible to monitor the changes in the process(es) as well
and should such change occur adjust the analysis by including tacit knowledge.
It is a common truth that tacit knowledge differs from industry to industry and
from one organization to another. Black-box approach allows disregarding of such
differences and assessing whether the inclusion of tacit knowledge in the business process
simulation is an economically viable option. The next section of this paper will provide
an insight into the post black-box analysis.

Post-BB analysis

Once the Black-Box approach analysis of tacit knowledge in business process(es)


is concluded it may be reasonable to further analyze the whole situation. Previous section
hinted on some of the most important questions that need to be answered (e.g. whether it
is economically feasible to investigate the articulation of tacit knowledge). At this point
it is necessary to realize that the proportion and impact of tacit knowledge on business
process may be greatly influenced by the nature of the business process as well as
industry type or organizations profile. In the case of the individuals, who are mainly
involved in the manual labor (e.g. manufacturing industry, mining or agriculture) it is
possible to state that tacit knowledge (although still important) may have a relatively
small contribution towards the final outcome of the given process that this individual is
involved in. This situation radically changes in the case of individuals, who are mainly
involved in the cognitive labour (e.g. software programming, research and development,
professional services).

156

It is possible to state that the replacement (as suggested by the Black-Box


approach presented previously) of a manual laborer with extensive tacit knowledge with
another one that possesses only generic explicit knowledge or training in the field may
only result in slight delays or output(s) of slightly lower quality. This statement however
would be totally wrong in the case of replacing for example an experienced software
programmer with years of experience with another one only possessing a generic training.
The difference in the productivity in this case may reach up to twenty times more
(Atwood 2004). On the other hand manual workers and corresponding manual processes
are more common to occur. Therefore it is possible to state that even a slight
improvement in such processes may be multiplied (relatively easier than in the case of
predominantly cognitive processes) by the numerous instances such a process is likely to
occur. In other words inclusion of a tacit knowledge in predominantly manual processes
that occurs very often may result in considerable better outputs or greater savings that can
be observed on a mass scale. Predominantly cognitive processes may even not be able to
reach such a mass scale, given the lower number of workers involved in them. Moreover
predominantly manual processes usually have a lower cost of articulating tacit knowledge
so that it is available to other workers as compared with highly cognitive processes.
At this point (having the information presented in the previous sector) it is feasible
to reconsider the economic side of tacit knowledge and the nature of the processes
influenced by it. Process analyst should investigate how often a given process will occur
in the real world and then re-evaluate (using the Black-Box approach), whether it is
feasible to include tacit knowledge into further analysis. At this point it is necessary to
remember that any potential benefits may be multiplied by the number of occurrences of
a given process in the real world.
Business process simulation can further aid the processes of tacit knowledge
creation and transfer. As it was explained previously business simulation offers a field of
interaction where multiyear experiences are created in compressed timeframes. In this
view process analyst provides an artificial environment to the process stakeholders, when
they can observe the process (input(s), output(s), decision points, etc.). These
stakeholders can learn from this observation in a similar way that they would learn from
real-world experiences. It is however important to remember that business processes in
such analysis are an approximation of the real-world business processes. They will never
carry one hundred percent of the information that the real-world setting would convey.
Nonetheless this may be a cheaper, less time-consuming option to learn the tacit
knowledge other people gained through lifelong experiences. This option should be
however available only once the previous steps were accomplished.
This section of the paper covered the potential post Black-Box analysis of tacit
knowledge in business process simulation. It distinguished potential differences between
tacit knowledge in mainly manual and predominantly cognitive processes. After that it
suggested revisiting the comparison of potential costs incurred by the articulation of tacit
knowledge with the benefits that may arise from the use of tacit knowledge in a given
situation. At the end it provided an option of compressing the long-lasting experiences,
which may involve tacit knowledge, in the business process simulation, which may be
further shared with process stakeholders. The next section of this paper will focus on the
conclusions of this work.

157

Conclusions and recommendations for further work

This paper major contribution to the field of business process simulation is the
presentation of generic approach, which allows the inclusion of tacit knowledge into the
business process simulation. The approach based on Black-Box concept was investigated
as the potential solution. It proved to be a viable option on investigating the impact of
tacit knowledge (associated with a resource such as an individual or group) on the
business process. Due to its characteristics this approach can be applied regardless of the
nature of the business process, characteristics of the industry or the type of organization.
Moreover it is relatively easy to be applied therefore can serve as a starting point for more
sophisticated analyses.
Additionally characteristics of mainly manual and predominantly cognitive
processes were presented together with the implications for tacit knowledge and process
simulation. This was followed by the feasibility consideration and a potential option of
utilizing business process simulation as a way of compressing lifelong experiences and
presenting them to the business process stakeholders.
Future work could endeavor to further investigate the differences occurring in tacit
knowledge depending on the industry type, organizations profiles and business process
characteristics. Such analysis could form part of the post Black-Box analysis, where the
feasibility of the analysis could be further aided by greater insight into such differences.
Eventually such work could result in concepts that could aid development of novel
software for business process analysis, which would incorporate tacit knowledge.
Currently there is no such software package available on the market.

References
[1] Bucher, T. and Winter, R., (2009), Project types of business process management,
Towards a scenario structure to enable situational method engineering for business
process management, Business Process Management Journal, Vol. 15, No. 4, pp.
548-568.
[2] Venkitachalam, K. and Busch, P., (2012), Tacit knowledge: review and possible
research directions, Journal of Knowledge Management, Vol. 16, No. 2, pp. 356-371.
[3] Teerajetgul, W. and Chareonngam, C., (2008), Tacit knowledge utilization in Thai
construction projects, Journal of Knowledge Management, Vol. 12, No. 1, pp.
164-174.
[4] Senker, J., (1995), Tacit Knowledge and Models of Innovation, Industrial and
Corporate Change, Vol. 4, No. 2, pp. 425-447.
[5] Suppiah, V. and Sandhu, M. S., (2011), Organisational cultures influence on tacit
knowledge-sharing behaviour, Journal of Knowledge Management, Vol. 15, No. 3,
pp. 462-477.
[6] Linde, C., (2001), Narrative and social tacit knowledge, Journal of Knowledge
Management, Vol. 5, No. 2, pp. 160-170.
[7] Abdullah, M. S., Kimble, C., Benest, I. and Paige, R., (2006), Knowledge-based
systems: a re-evaluation, Journal of Knowledge Management, Vol. 10, No. 3, pp.
127-142.

158

[8] Gavrilova, T. and Andreeva, T., (2012), Knowledge elicitation techniques in a


knowledge management context, Journal of Knowledge Management, Vol. 16, No. 4,
pp. 523-537.
[9] Byosiere, P. and Luethge, D. J., (2008), Knowledge domains and knowledge
conversion: an empirical investigation, Journal of Knowledge Management, Vol. 12,
No. 2, pp. 67-78.
[10]Proenca, M. T. V. C., de Oliveira, E. T. V. D., (2009), From normative to tacit
knowledge: CVs analysis in personnel selection, Employee Relations, Vol. 31, No. 4,
pp. 427-447.
[11]Wang, X., (2013), Forming mechanisms and structures of a knowledge transfer
network: theoretical and simulation research, Journal of Knowledge Management,
Vol. 17, No. 2, pp. 278-289.
[12] Atwood,
J.,
(2004),
Skill
Disparities
in
Programming,
http://www.codinghorror.com/blog/2004/09/skill-disparities-in-programming.html
(last accessed 29.08.2013).
[13]Jorgensen, B. (2004), Individual and organisational learning: a model for reform for
public
organisations, Foresight, Vol. 6 No. 2, pp. 91-103.
[14]Mulder, U. and Whiteley, A., (2007), Emerging and capturing tacit knowledge: a
methodology for a bounded environment, Journal of Knowledge Management, Vol.
11, No. 1, pp. 68-83.
[15] Willoughby, K. and Galvin, P. (2005), Inter-organizational collaboration, knowledge
intensity, and the sources of innovation in the bioscience-technology industries,
Knowledge, Technology, and Policy, Vol. 18 No. 3, pp. 56-73.
[16]Nelson, R. and Winter, S. (1982), An Evolutionary Theory of Economic Change,
Harvard University Press, Cambridge, MA.
[17]Nonaka, I. and Takeuchi, H., (1995), The Knowledge-Creating Company, Oxford
University Press, New York, NY.
[18]Desouza, K.C. and Evaristo, J.R. (2004), Managing knowledge in distributed
projects, Communications of the ACM, Vol. 47 No. 4, pp. 87-91.
[19]Bailey, C. and Clarke, M., (2000), How do managers use knowledge about
knowledge management, Journal of Knowledge Management, Vol. 4, No. 3, pp.
235-243.
[20] Lefebvre, J. R., (2011), Simulations Accelerate Tacit Knowledge Transfer,
http://clomedia.com/articles/view/simulations-accelerate-tacit-knowledge-transfer
(accessed 25.08.2013).
[21]Polanyi, M (1969), The Logic of- Tacit Inference, in M. Greene (ed.), Knowing and
Bang. Routledge and Kegan Paul: London.
[22]Merx-Chermin, M. and Nijhof, W. (2005), Factors influencing knowledge creation
and innovation in an organisation, Journal of European Industrial Training, Vol. 29
No. 2, pp. 135-147.
[23]Bossen, C. and Dalsgaard, P. (2005), Conceptualization and appropriation: the
evolving use of a collaborative knowledge management system, Proceedings of
AARHUS05, Aarhus, Denmark, 2005, pp. 99-108.
[24]Szulanski, G. (2003), Sticky Knowledge: Barriers to Knowing in the Firm, Sage
Publications, Thousand, Oaks, CA.

159

[25]Colonia-Willner, R. (2004), Self-service systems: new methodology reveals customer


real-time actions during merger, Computers in Human Behavior, Vol. 29 No. 2, pp.
243-267.
[26]Busch, P. (2008), Tacit Knowledge in Organizational Learning, IGI-Global Hershey,
Pennsylvania, PA.
[27]Delinchant, B., Duret, D., Estrabaut, L., Huu, G. H. N., Du Peloux, B., Rakotoarison,
H. L., Verdiere, F. and Wurtz, F., (2007), An optimizer using the software component
paradigm for the optimization of engineering systems, COMPEL: The International
Journal for Computation and Mathematics in Electrical and Electronic Engineering,
Vol. 26 No. 2, pp. 368-379.
[28]Sternberg, R. and Hedlund, J. (2002), Practical intelligence, g, and work psychology,
Human
Performance, Vol. 15 Nos. 1/2, pp. 143-160.

160

Connection between Process Model and Data Model:


Metamodelling Approach
eljko Dobrovi, Katarina Tomii-Pupek, Martina Tomii Furjan
Faculty of Organization and Informatics
University of Zagreb
Pavlinska 2, 42000 Varazdin, Croatia
{zeljko.dobrovic, katarina.tomicic, martina.tomicic}@foi.hr
Abstract: Process modelling and data modelling appeared as methods in information

system's development methodology. However, nowadays they are used as


organizational methods in business process redesign efforts and documentation
analysis and optimization. These two methods are not completely independent. On
the contrary, there is a strong natural relationship between them. Regardless of the
way process model was developed (data flow diagram, workflow diagram) it represents
the starting point in data model development. All data flows (information flows) that
appear in a process model, contribute to the unique data model cumulatively, as their
information content has been modelled. Therefore CASE tools that support process
modelling and data modelling have some concepts in common. In this paper we
explored these by using metamodelling.
Keywords: Process model, data model, metamodel.

1 Introduction
Business process modelling and business data modelling are usually accepted as two
activities in a range of activities in information systems development (ISD). For
example, in accordance with information engineering methodology [13], process
modelling as a structural analysis of the system is a part of the main project of logical
design of information systems (ISs), while data modelling is a part of the executional
project of logical IS design.
The significance of process modelling and data modelling lies as well in the
construction of business and information architecture [7], [6]. There are three significant
architectures: business, information and technological. Business architecture represents
the organisation structure, from top management to final operational levels and is
expressed in the form of a diagram with a description of jobs for each unit in the
organisation. Top management runs the organisation supervising the implementation of
the strategic plan in all business subsystems. The strategic planning of the information
system results in information subsystems [12]. In most cases they do not correspond to
the business subsystems, but they make a base for information management. Along with
the process and data modelling for each information subsystem, information system
strategic planning results in information architecture [19].
Business architectures contribute to clarify the complexity within an organization and
form a useful starting point from which to develop subsequent functional, information,
process and application architectures. [18].
Enterprise Architecting (EA) is the process of developing enterprise Information
Technology architecture. An EA focuses on a holistic and integrated view of the why,
where, and who uses IT systems and how and what they are used for within an
161

organization. An enterprise architect develops the strategy and enables the decisions for
designing, developing, and deploying IT systems to support the business as well as to
assess, select, and integrate the technology into the organization's infrastructure.
Alignment between business and IT is one of the top issues for CIOs and IS
managers.[1]
Authors van Steenbergen, Brinkkemper introduce an architecture effectiveness model
(AEM) to express how enterprise architecture practices are meant to contribute to the
business goals of an organization [16]. This paper introduces a Componentized Industry
Business Architecture as a vehicle to address this gap and to make processes better
integrated with other critical dimensions in organizational design. This architecture
provides the foundation for a taxonomy of processes and enables process models to be
created or potentially rationalized against a comprehensive framework.[15]
The main concepts describing information architecture are business process and
information flow. Both form significant parts of the process model. Whichever process
modelling method you use (Data Flow Diagram, WorkFlow Diagram [13], IDEF0
[11],[17]) business processes will be modelled in each of them, and the obtained model
will contain information flows (data flows). These information flows shall, after making
the process model, be modelled individually, making up a unique data model of the
organisation.

2 Process Modelling
The term process, or the term business process, which is used more and more
nowadays, does not have a unique definition. The manner in which we view business
process today has developed by transit of civilisation from the industrial into the
information age in mid 1980s. The first definitions of the term process can be found in
early 1970s. For example, in [3], the term process usually means a manufacturing
process. Back then, the most commonly used term was operation / activity. The same
book defines the framework of the manufacturing processes, and it is believed that such
a term covered the entire range of activities performed fully manually, through a semiautomatic systems, man-machine, to completely automatized processes, where a man as
workforce has supervisory function only. The main characteristic of the process is
transformation of input into output.
The administrative processes were clearly determined as processes transforming
information. Their significance has begun to grow rapidly even then, since in the
beginning of the 1970s the number of administrative workers in the USA surpassed the
number of workers employed in the manufacturing department [3]. It was then that the
administrative processes started gaining significance and the term integrated data
processing system first appeared. Business processes modelling was restricted to
making assembly diagrams in the manufacturing processes, while there was no business
processes modelling in the current, IT sense of the word.
One of the first process modelling methods, SADT (Structured Analysis and Design
Technique), was developed and tested in the field from 1969 to 1973 [11]. Although
that method enabled modelling of all types of processes, the term process in the original
materials of this method is not used, but is replaced by term activity. In 1973, a form of
this method was founded, called IDEF0 (Integrated DEFinition), which became a
federal system (organisation) modelling standard in the USA and Europe [17].
Even now, there is still no unique definition of a business process. Business process is
rather described than defined. Therefore, in [8], a business process is described as stepby-step rules specific for solution to a business problem. Apart from that, the process
executes a range of activities in a certain time interval, in order to achieve a certain
162

organisational goal. R.N. Khan in [10] defines a business process as a range of


activities performed in series or in parallel by two or more individuals or computer
applications, in order to achieve the general goal of the organisation. According to the
same author, business process modelling is a part of business process management, and
business process management is the area of modelling, automatizing, managing and
optimizing business processes through their life cycle, in order to increase profitability.
Nowadays, processes are modelled with the help of different modelling methods. That is
how we obtain different diagrams that describe business processes: data flow diagram
(fig.1), workflow diagram [14], IDEF0 diagram [11], [17].

Figure 1: Data Flow Diagram

2.1 Process model metamodel


In order to better understand the very process model, and to facilitate the understanding
of the connection between the process model and the data model, we define the process
model metamodel. Metamodel is a model about model [9] and it enables CASE tool
to be made, where the process modelling results would be kept for all organisations in
which the processes were modelled.
To define the metamodel, it is necessary to understand the concepts of the very process
model. The process model, as a result of logical design of the information system,
answers this question: What is done in the organisation system? In order to answer this
question, making it as easy as possible to understand, graphic presentation of the
process model is used in the form of e.g. data flow diagram (DFD). DFD (fig.1) is a
part of the structural system analysis method (SSA) and it is drawn up for each level of
functional decomposition of the object system. The process of the highest level of the
system represents the function, the medium level is the process level, and the lowest
functional decomposition level is the set of activities. Although the function, process
and activity differ in the level of detailed description of the object system, they are all
synonyms and we shall call them processes. Each process represents transformation of
input data flow into output data flow, therefore data flow arises as the next concept of
DFD. That is an organised set of data (document, oral order,...) entering or exiting the
process. The object system does not exist on its own, but is a part of some meaningful
environment for information exchange. External system represents the DFD concept
which is the source of input data flow into the observed object system, or is the
destination of the data flows that the monitored system generates. In realistic object
system, there is always a time period between generating data and using them, therefore,
term data store is introduced to describe this characteristic in DFD. A data store
163

represents interruption of the data flow control and brings time delay between data
generation in a single process and using the same by another process.
Process model metamodel (fig. 2) is a data model about the process model concepts. In
other words, metamodel contains data on processes, data stores, external systems and
data flows from DFD of all levels of functional decomposition of the object system.
Process modelling method enables drafting the process model, and its metamodel
enables keeping and documenting knowledge collected by that method. Apart from that,
this metamodel sets basis for CASE tool to be made, supporting process modelling
method.

Figure 2: Basics of process model metamodel


We shall elaborate figure 2. Process, data store and external agents are three process
model concepts that are independent, and therefore presented as separate entities. Data
flow is a concept that is not independent, but represents connection of other concepts,
therefore it is displayed with four aggregations, depending on the concepts it connects.
These aggregations are: writes flow, reads flow, out_flow, and in_flow. More detailed, a
process writes flow(s) into a data store, a process reads flow(s) from a data store,
external agent can give incoming flow(s) to a process, or an external agent can receive
outgoing flow(s) from a process.

3 Data Modelling
Data modelling is an activity in the information system development, which comes after
the process modelling. We get the basis for business data modelling by collecting all
information (data) flows between individual processes. The data modelling method [2]
is a defined procedure of finding and displaying information objects (entities) and their
mutual relationships (fig. 3).
The most commonly used model nowadays is relation data model, bases of which are
given in [4], [5]. The data model is often called information model. That model is
defined as specification of data structures and business rules [2]. Data modelling is
often called information modelling, and is defined as a technique for describing
information structures and preserving information on requirements and rules[2].

164

3.1 Data model metamodel


One of the concepts that appear at DFD is the data flow (fig. 1). The process accepts the
data flow as input, transforms it and generates a new data flow as its output. In case the
organisation system is well-organised, all data flows appearing therein are standardised
and formalised to a certain extent. Generally speaking, data flow is a set of data that are,
in an ideal case, standardised throughout the relevant document. In SSA method
metamodel (fig. 2), data flows are kept in the following aggregations/entities: writes
flow, reads flow, out_flow, and in_flow.
The object of transformation executed by the business processes, from the point of view
of information system suitable to the same, is elementary data contained in one of the
data flows recorded in the above aggregations, i.e. entities. However, it is not sufficient
to list data flows (set of data) flowing into the system. All elementary data relevant for
functioning of the system, present in the data flows, must be presented as information
objects that are interconnected in a certain way. Defined procedure of finding and
displaying those information objects and their mutual relationship is called data
modelling method 4..
The result of application of the data modelling method is a data model (fig. 3). There are
several data modelling methods. However, in this work the recommended method is
ER-entity relationship method and the accompanying entity relationship model.

Figure 3: Data model concepts


In order to give a clear, simplified image of the data structure used by the organisation,
the final set of concepts must be defined that shall sufficiently clarify information
objects and connections between them. The recommended entity relationship model
metamodel has basic concepts represented in fig. 4.

Figure 4: Basics of ER data model metamodel

165

The ENTITY concept represents the basic term of entity relationship method (ER
method) and marks an actual item or an abstract term that can be clearly distinguished
from the environment. The ENTITY TYPE concept represents the different types of
entities (weak, strong, aggregated). RELATIONSHIP is a concept marking a
relationship existing between the entities, in reality or in thought. ATTRIBUTE is the
name of a property a certain entity or relationship possesses. PARTICIPATE is an
aggregated concept marking the participation of entities in different relationships.
RELATIONSHIP TYPE is concept that determines the type of relationship (identifying
or non identifying).
Metamodel of the ER method is a data model on ER model concepts. This metamodel,
as well as process modelling method metamodel (fig 2.), has a double role: on one side
it elaborates the ER method, clearly displaying relationships among the method
concepts, and on the other side it represents the base for extending the data dictionary
defined by the process model metamodel. Now we have basic elements to develop the
integrated matamodel.

4 Connection between process model and data model common


integrated metamodel
The connection between the business processes and business data is visible in the works
of J. Martin [12], [13], where two sides of information engineering pyramid are clearly
shown: functional (process) and data. Apart from that, J. Zachman [19] shows process
and data-based perspective of organisation in its famous information system
architecture.
The natural way to show that the process model and data model are connected as well,
is through the common integrated metamodel of these two modelling methods. In other
words, the results that we obtain through DFD and ER methods are in a way connected.
Metamodels from figures 2 and 4 will serve for that purpose. The manner of connecting
them is defined on figure 5.
The figure shows three sets of concepts (entities/aggregations). The left set contains the
concept ENTITY, belonging to ER method metamodel. On the other side, the right set
contains aggregations which are the elements of process modelling method metamodel,
and they represent different forms of data flow of the organizational system. The set in
the middle consists of aggregations containing the data flow modelling results of the
system. In other words, middle concepts (WF STRUCTURE, RF STRUCTURE, IF
STRUCTURE, OF STRUCTURE) represents the data structure of organizational data
flows. For example, WF STRUCTURE consists of all the entities that appeared while
analysing data flows from the processes to data stores (fig 2). In the same manner RF
STRUCTURE consists of all the entities that appeared while analysing data flows from
data stores to processes. IF STRUCTURE and OF STRUCTURE relates to data flows
from externals to processes and from processes to externals, respectively.
Middle concepts in figure 5 represents the intersection between process model
metamodel and data model metamodel and serves as a basis for the development of
integrated repository of the CASE tool supporting process modelling and data
modelling.

166

Figure 5: Integration of process metamodel and data metamodel

5 Conclusion
Process model and data model are naturally connected, and the connection becomes
obvious at the moment when we use the information system design methodology
correctly. The process model responds to the question What happens in the
organisation?. The answer to that question is in the relevant process model, which
occurs as a result of structural analysis of the system. Although the structural analysis of
the system is a part of structural methods, it is well-known that the process model
(function model) may be obtained by applying object analysis. In any case, the
significant concepts of the process model are the process and data flow that connect the
processes.
By taking over the data flows from the process model and by modelling data in every
data flow, we obtain a gradual wholesome data model of the organisation. The business
data model obtained is translated into a relational form, in order to be able to use the
same simply and efficiently in implementation of the relevant data base in the selected
data base management system.
The process model and data model are connected by the concept of data flow from the
process model. This connection is the most visible if we observe the process model and
the data model through the relevant integrated metamodel, basics of which are proposed
in this paper. A metamodel has dual purpose: (1) it enables understanding of the very
procedure of process and data modelling and (2) it represents the basis for drawing up
the wholesome CASE tool for supporting the information system of the organisation.

6 References
[1] Armour, F.; Kaisler, S.; Huizinga, E. Introduction to business and enterprise
architecture: processes, approaches and challenges Minitrack, System Science
(HICSS), 45th Hawaii International Conference, vol. no.3, pp. 24-29, 2012.
[2] Bruce, T.A. Designing quality databases with IDEF1x information models, Dorset
House, USA, 1992.
[3] Buffa, E.S. Basic Production Management, J.Wiley & Sons, USA, 1971.
[4] Chen P.P.S. The entity-relationship model toward a unified view of data, ACM
167

Transactions on Database systems, 1(1):9-36, 1976.


[5] Codd E.F. A relationship model of data for large shared databases, Communications
of the ACM, 13(6):377-387, 1970
[6] Cook M.A. Building enterprise information architectures, Prentice Hall, USA,
1996.
[7] Dobrovi, . Strategijsko planiranje, poslovna i informacijska arhitektura, In
Proceedings of CASE 12 conference, pages 60-72, Opatija 2000.
[8] Havey, M. Essential business process modeling, O'Reilly Media Inc., CA, 2005.
[9] Hay, C. D. Data Model Patterns A Metadata Map, Morgan Kaufmann Publishers,
USA, 2006.
[10] Khan, R.N. Business process management, Meghan-Kiffer Press, USA, 2004.
[11] Marca, D.A.; Mcgowan, C.L. IDEF0/SADT - Bussiness process and enterprise
modeling, Eclectic Solutions, CA,1993.
[12] Martin, J.; Leben, J. Strategic information planning methodologies, Prentice-Hall,
USA, 1989.
[13] Martin, J. Information engineering, I, II, III Prentice Hall, Englewood Cliffs, New
Jersey, 1990.
[14] Mayer, R.J.; Crump, J.W.; Fernandes, R. IDEF methods compendium of methods
report, Armstrong Laboratory, Wright-Patterson, Ohio, 1995.
[15] Sanz, J. L. C.; Leung, Y.; Terrizzano, I.; Becker, V.; Glissmann, S.; Kramer, J.; Ren,
G. Industry Operations Architecture for Business Process Model Collections,
Business Process Management Workshops, Lecture Notes in Business Information
Processing Vol.100:62-74, 2012.
[16] Van Steenbergen, M.; Brinkkemper, S. Modeling the contribution of enterprise
architecture practice to the achievement of business Goals, Information Systems
Development 2010, pp 609-618, 2010.
[17] U.S. NIST (National Institute of Standards and Technology), DISA (Defence
Information Systems Agency): Integration Definition for Function Modeling
(IDEF0), Federal Information Processing Standards, Publication 183, 1993.
[18] Versteeg, G.; Bouwman, H. Business architecture: A new paradigm to relate
business strategy to ICT, Information Systems Frontiers, 8(2): 91-102, 2006.
[19] Zachman, J.A. A framework for information systems architecture, IBM Systems
Journal, 26(3):276-292, 1987.

168

Use of Printed Textbooks and Digital Content in


Secondary School Education
Renato Barisic
Algebra University College for Applied Computer Engineering
Ilica 242, 10000 Zagreb, Croatia
renato.barisic@racunarstvo.hr

Abstract: Modern information and communication technologies suited to be used in the


educational process have to bridge the gap between the desires of today's students for
modern educational content and compulsory literature in the form of printed textbooks.
This paper presents the results of research on use of printed textbooks in the
Humanities, Social and Natural Sciences and Engineering in the third and fourth grade
of secondary school. The survey was conducted among 806 students from 21 secondary
schools from 9 Croatian counties. An analysis of research on the use of printed
textbooks in the final years of secondary school shows that a significant number of
textbooks are used on a small scale during the day at school and even to a lesser extent
during independent work at home. The aim of this paper is to highlight the need to find
alternative and complementary solutions whose application should modernize the
educational process and present courses to students in an attractive and motivating way
on a variety of devices that today's students have at home, at school and on the move.
Keywords: content, education, information, student, textbook.

1 Introduction
Daily creation and sharing of unlimited amounts of information is no longer possible to
be followed with the traditional printed resources due to the slowness of their
occurrence and the inability to change content in the time of new knowledge creation
and technological achievements.
In the last decade, several authors deal with topics of printed textbooks as everyday
teaching resources by analyzing the attributes of efficiency, convenience, rationality and
expediency.
Using digital content provides the flexibility to individualized learning. It increases
motivation and students self-esteem because it helps them to adopt their own
knowledge. However, the computer itself does not necessarily mean that students will
develop the ability of divergent thinking, creativity, cooperation, responsibility, decision
making, democratic behavior, etc.
Even if you do not raise doubts about the accuracy, completeness or timeliness of the
classic printed textbooks, we need to rethink whether a teaching aid in that form is
suitable for the education of modern technology minded young generations and consider
whether the modern tech-savvy teachers would use classic printed teaching tool as the
first choice for high-quality transmission of knowledge in educational institutions of 21 st
century? Are there alternatives or complements, i.e., are there opportunities, methods
and systems that can bring classroom demands and desires of today's students and
which will motivate them to learn and explore?

169

2 Methodology
2.1 Method
The research was conducted during the period from 22 nd November 2012 to 13th May
2013 in 21 secondary schools in 9 Croatian counties. Survey was selected data
collection method.
Prior to conducting the survey the permission was sought from the responsible persons:
principal, school counselor, psychologist or teacher. The survey was carried out in
groups, on a voluntary basis, using paper format questionnaire, for one or two classes
that were present in the classroom at the same time. At the beginning of the survey the
interviewer demonstrated the questionnaire to students and they were given filling
instructions. Interviewer emphasized that the survey is completely anonymous and does
not collect any personal or school data. Time to fill in the questionnaire was limited to
10 minutes.

2.2 Instrument
Data collection instrument was a questionnaire. A questionnaire completed by the
students consists of general questions and questions about the use of printed textbooks
and digital content.
General questions about age, gender, type of school, and the direction of the class are
primarily used for the analysis and presentation of the structure of the respondents
according to various criteria. Given that the target sample were students of third and
fourth grade of secondary school, the question about respondents age offered a choice
between 13 and 19. Question about gender could be responded by circling M for male
or F for female. When asked about the type of school, the students could respond with T
for technical school or with G for gymnasium. The question about the direction of
education is set up as an open question which could be responded by free writing. The
last question is a general question about the current grade with possible answers 1 to 4.
It is axiomatic that the target population was fourth grade and third grade at times. Still,
there was an option of circling first grade or second grade if someone from the lower
grade appears.
Questions about the use of printed textbooks and digital content are set as a closed set of
multiple choice questions. Question about the use of printed textbooks during the school
was constructed in the form of a table where the columns highlight three main divisions:
subject, at school, at home. In the subject column is a list of subjects in the Humanities
(Croatian, foreign languages, History, Religion, Ethics), Social Sciences (Politics and
Economy), Natural Sciences (Mathematics, Physics, Geography) and Engineering
(Computer Science, Information Science). In the group of columns at school and at
home, there are subdivisions offered intensities: never, rarely, often or always. With this
approach the students were able to immediately designate the usage intensity of printed
textbooks at school and at home for one course. Below the table with the previous
question, there are two questions about use of digital content at school and during
independent work at home and they offered answers: never, rarely, often and very often.

2.3 Sample
The questionnaire was answered by 806 students. There were not invalid questionnaires
and possibly unanswered questions are during data enrollment set to 0 or X indicating
170

no response and at figures that is shown as a generally accepted abbreviation "n/a",


which means no answer. Such an approach provides a number of students who did not
answer a particular question.

Given that these secondary school students, both third grade (114) and fourth grade
(692) respondents are aged between 16 and 19. The sample includes both male (717)
and female (89) students. Technical school is attended by 760 students in several
technical directions and gymnasium is attended by 46 students in general and
mathematics direction.
Structure of respondents by age, gender, type of school and grade are visible on the Fig.
1, 2, 3 and 4.

Figure 1: Age

Figure 2: Gender

Figure 3: Type of school

Figure 4: Grade

171

3 Results
3.1 Use of printed textbooks at school
Data on use of printed textbooks at school that are presented and analyzed in groups
according to the sciences they belong, in this section will be displayed in one place
regardless of the scientific field, in order to monitor the use or non-use of textbooks in
each subject. Chart and accompanying tables are shown in Fig. 5 and Fig. 6.

Figure 5: Use of printed textbooks at school number of responses

Figure 6: Use of printed textbooks at school share in answers

172

3.2 Use of printed textbooks at home


Data on use of printed textbooks at home that are presented and analyzed in groups
according to the sciences they belong, in this section will be displayed in one place
regardless of the scientific field, in order to monitor the use or non-use of textbooks in
each subject. Chart and accompanying tables are shown in Fig. 7 and Fig. 8.

Figure 7: Use of printed textbooks at home number of responses

Figure 8: Use of printed textbooks at home share in answers

173

3.3 Use of digital content at school


Data and analysis of the given answers about the usage of digital content (presentation,
e-learning content, interactive content, computer simulations, quizzes, games, .pdf, .doc
and .docx documents, ...) during the day at school is shown in Fig. 9 and Fig. 10 and
accompanying tables.

Figure 9: Use of digital content at school number of responses

Figure 10: Use of digital content at school share in answers

174

3.4 Use of digital content at home


Data and analysis of the given answers about the usage of digital content (presentation,
e-learning content, interactive content, computer simulations, quizzes, games, .pdf, .doc
and .docx documents, ...) during independent work at home is shown in Fig. 11 and Fig.
12 and accompanying tables.

Figure 11: Use of digital content at home number of responses

Figure 12: Use of digital content at home share in answers

175

4 Conclusion
Results of surveys conducted among students of third and fourth grade of secondary
schools on use of printed textbooks from nine subjects in Humanities, Social and
Natural Sciences and Engineering, show that textbooks in only four of nine courses
regularly used in the classroom at school, and only one of nine used regularly during
independent work at home. Research has shown that there are a significant number of
subjects for which the classical printed textbooks are very little used and some hardly
used. This especially applies to use of textbooks at home, but it is evident that the usage
at school for the majority of analyzed subjects are not much better. These data indicate
that students are not motivated and do not want to significantly use printed textbooks
and that teachers do not insist on their use at schools, and even at home.
Analysis of digital content use has shown that they are extensively used during the day
at school and during students independent work at home. It is obvious that both, the
teachers and especially the modern generations of students, have already adopted
electronic communication and presentation of teaching content assisted with digital
content and technology as common and everyday.

5 Bibliography
[1] Barisic, R: Geanium Interactive Chronological Visualization System, Croatian
Journal of Education, Faculty of Teacher Education University of Zagreb, Zagreb,
Vol: 13, No. 4, 2011., p.151-174, ISSN: 1846-1204
[2] Brusilovsky, P. et al.: Teaching Information Retrieval With Web-based Interactive
Visualization, Journal of Education for Library and Information Science, Vol. 51,
No. 3, 2010., p. 187-200, ISSN: 0748-5786
[3] Spanovic, S.: Pedagogical Aspects of E-textbooks, Educational Sciences, Vol. 12,
No. 2, 2010., p. 459-470
[4] Tkalac Vercic, A., Sincic Coric, D., Poloski Vokic, N.: Handbook of research
methodology - how to design, implement and describe the scientific research,
M.E.P. d.o.o., Zagreb, 2010.
[5] Zelenika, R.: Methodologies and technologies of scientific and professional work
(fourth edition), Faculty of Economics in Rijeka, Rijeka, 2000.

176

How to identify knowledge and evaluate knowledge


management in organization
Botjan Delak
ITAD, Revizija in svetovanje, d.o.o. Technological Park Ljubljana
Pot za Brdom 100, 1000 Ljubljana, Slovenia, www.itad.si
bostjan.delak@itad.si

Abstract: Knowledge is recognized as the most important strategic asset every

organization has. It is very important to identify, capture/acquire, share, reuse and


unlearn knowledge. These activities are managed through Knowledge Management
(KM). It is a rather challenging task to evaluate the level of KM in an organization.
The paper presents two approaches, COBIT 5 and Framework for Information System
Due Diligence (FISDD), to be used for knowledge and KM level identification. The
research objective is to identify which approach could identify KM levels in the
organization more quickly and effectively. The research evaluation is based on two
real case studies, where both approaches will be used. The analysis is planned in four
phases, which are described. As the paper is a research in progress report, it sets
out the current status of the research and its further planned steps.

Key Words: Knowledge, Knowledge management, IS audit, IS analysis

1 Introduction
The next step beyond data and information is knowledge [6]. Knowledge is recognized
as the most important strategic asset that each organization has. It is very important to
identify, capture/acquire, share, reuse and unlearn knowledge. This is managed through
knowledge management (KM). Moos et al. argued that a key challenge is to disclose
how an organization can acquire and utilize relevant knowledge [13] and how is this
related to organizations innovative success. Nonaka & Takeuchi [16] described that in
an economy where the only certainty is uncertainty, the one sure source of lasting
competitive advantage is knowledge and they also explained how Japanese
organizations are dealing with knowledge, innovations and their success. Nonaka and
Konno described how Japanese organizations are managing the place (ba) to locate
the knowledge and how to share it [15]. Another issue is knowledge sharing. There are
several research papers describing this topic. Pirhonen & Vartiainen argued what kind of
knowledge transfer is required to reduce the risks when replacing a project manager
[18]. Nodari et al. made a review of scientific writings and a research model that relates
the intra-organizational and inter-organizational sharing process to absorptive capacity
and organizational performance [14]. Recently numerous scientific papers describing
knowledge creation in software development teams have been published. Spohrer et al.
described the role of pair programming and peer code review [21]. Dissanayake et al.
described knowledge creation in agile software development and the important aspect of
creativity [4].
177

A great challenge is how to identify the level of: knowledge, knowledge identification,
knowledge creation, KM and knowledge sharing in the observed organization. Henczel
suggested performing an information audit as the first step towards KM evaluation [7].
The author is trying to find an answer to her question whether an IS audit could define
the level of KM in the observed organization and the plan how to get this result is
described further in this paper.
The remainder of the paper is organized as follows. In the next section the terms
knowledge, knowledge life cycle, KM and KM systems are presented. This is followed
by a brief presentation of two approaches for information system (IS) analysis. Section
four describes the motivation behind the research, which is followed by a description of
the research. Discussion describes the current status and related work. Finally, the
conclusion outlines the implications of the research in practice and further possible
research activities.

2 Knowledge management
Knowledge is considered to be an important resource to maintain the competitiveness of
an organization [11]. Nonaka & Takeuchi [16] have defined knowledge by comparing it
with information Knowledge, unlike information is about beliefs and commitment.
They say knowledge, like information, is about meaning. Another explanation is
knowledge is a function of a particular stance, perspective, or intention [11].
Knowledge is an asset, but its value is much harder to assess than that of physical
assets. Knowledge may be categorized into two types: tacit and explicit [16]. Polanyi
[19] defines tacit knowledge as personal, context-specific and thus not easily visible and
expressible nor easy to formalize and communicate to others. Professor Levy
described tacit knowledge very graphically, what someone has between the ears. On
the other hand, Polanyi refers to explicit knowledge as being transmittable in some
systematic language such words, numbers, diagrams or models [19]. Nonaka &
Takeuchi expand Polanyis tacit knowledge into two dimensions, technical and
cognitive. Technical is often referred to as know-how and the other consists of beliefs,
ideals, values, schemata and mental models [16].
Knowledge creation takes place through transformation of tacit knowledge to explicit
and back as Nonaka & Takeuchi explained in their knowledge life cycle with a
knowledge spiral that contains the following phases: socialization, externalization,
combination and internalization [16].
Over the past few years papers have described further developments of KM. Some of
them described different approaches from knowledge to KM, e.g. from tacit knowledge
to KM [10]. KM is an effort to increase useful knowledge in the organization as
explained by McInerney [12], adding that KM promotes the sharing of appropriate
knowledge artifacts. As nowadays ICT plays an important role in major competitive
organizations, several papers refer to the role of ICT in KM [11]. Rosemann & Chan
define the framework for enterprise knowledge. They identify stages of knowledge life
cycle: identification, creation, transfer, storage, reuse and unlearning of knowledge [20].

3 IS analyses
There are several methods, standards, tools, and frameworks that can be used to
conduct, analyse, or deliver a specific type of IS due diligence of a particular IS area. IS
auditors all over the globe, members of ISACA (Information System Audit and Control
Association), use the COBIT (Control Objectives for Information and related
Technologies) methodology in their day-to-day operations [8]. There are several
approaches available for IS due diligence, but the author used specific approach called
178

FISDD (Framework for Information System Due Diligence), which enables delivery of
a rapid IS due diligence and also has an integrated decision model. This framework
consists of four phases: preparation, realization / on-site review, analysis, and decision.
Each of these phases involves specific activities, sub-processes, supporting documents
(questionnaires, templates, etc.), and results. The time frame for each phase may vary
depending on the size of the observed organization, the location(s) and available
documentation. Other vital parts of the framework include predefined questionnaires,
different types of reports and the decision model [2]. The FISDD approach with some
basic questions regarding KM was earlier used in some cases to identify KM in IS due
diligence processes [3].

4 Motivations
As ISACA has extended the objectives of IT governance with COBIT 5, it is the
authors challenge to asses if COBIT 5 supports KM domain in a way to be used for
such an analysis. The authors objective is to identify which approach can identify the
level of KM in the organization more quickly and effectively. He applied two
approaches for analyzing IS in order to identify the level of knowledge in the
organization, COBIT 5 methodology and the FISDD. The author chose the two
approaches based on his experience. As an IS auditor he daily uses COBIT and the new
COBIT 5 methodology for enterprise IT governance, and on the other hand he has also
used the FISDD for different IS due diligences.
He has updated the FISDD by adding KM-related questions to identify the level of KM.
The following KM level marks have been proposed: non-existent, ad hoc, initial,
defined, managed and optimized. As knowledge is a valuable asset in every
organization, the FISDD will continuously be improved, as reports will bring additional
information.
His motivation is to identify which approach is the most suitable for identifying
knowledge in the observed organization.
The hypotheses are:
H1: With COBIT 5 you can identify explicit knowledge and also some tacit knowledge
in the observed organization and its ICT.
H2: With FISDD you can identify explicit knowledge and also some tactic knowledge
in the observed organization and its ICT.
Both hypotheses will be evaluated through real case studies and interviews in two
different organizations in different industries.
Based on the results, there might be some activities for possible upgrades of both
approaches.

5 Proposed research
Two real case studies will be conducted with both approaches to get answers to the
abovementioned hypothesizes. This paper is a research in progress report. The proposed
research consists of four phases: phase one upgrade of FISDD framework with KM
related questions; phase two define IS audit program for KM identification with
COBIT 5 methodology; phase three select organizations for case studies; and phase
four perform case studies. Each approach will have three sub-goals: establish the
179

status of KM system in the observed organization and how knowledge creation is


documented, and identify explicit and tacit knowledge in the ICT team. The hypothesis
is expected to be confirmed through case studies. One case study will be based on a
general IS due diligence with FISDD approach and the other IS audit will be based on
COBIT 5 audit program. After the case studies have been completed, a questionnaire
will be sent to the involved managers to collect their expert opinions regarding a
particular case study.

6 Discussion
The author is currently working on phase two. As ISACA is still developing other
related COBIT 5 documents, this phase will take more than the author has originally
planned.
Aggestam et al. identified a specific type of risk - knowledge loss [1]. They identified
seven types of knowledge loss. The author had also incorporated this kind of risk
evaluation in the FISDD and will try to include it also in the IS audit program for
knowledge and KM level evaluation and will present these results, too.
In phase one the FISDD IS status questionnaire was updated with additional questions
relating to tacit and explicit knowledge. The updates refer to the ways how knowledge
is collected and shared, knowledge dimensions, knowledge repositories, collaboration
techniques, social competences of an individual, methods for KM motivation, software
used for internal networking, risks related to KM, etc.
In parallel with phase two, some activities referring to phase three are already ongoing,
such as identifying potential organizations to conduct real case studies. As soon as
phase two is chosen, the organizations for case studies will be selected.

7 Conclusion
Farhadi argued how important knowledge management (KM) audit due diligence is,
with the aim of explaining the relationship between intangible knowledge (tacit
knowledge), assets and inorganic business growth through mergers and acquisitions [5].
He added a new dimension to the area of due diligence in general, as well as special IS
due diligences. Jennex & Olfman described four different KMS success modules and
presented a framework for assessing KMS success [9]. Such an approach is valid for
organizations with implemented KMS, which might not be suitable for due diligence
activities. Paliszkiewich made the research on how organizational trust has a positive
influence on organization performance [17]. She mentions that modern organizations
have identified importance of trust related to KM as a means to gain and sustain
competitive advantage. She states that trust among employees is an essential
prerequisite for knowledge sharing. Trust is more a social competence and this kind of
evaluation would require additional questionnaires and additional resources, so
researcher decided not to include it in this research.
The authors future work includes finalization of the research and presenting the
findings at KM conferences and in research papers. Additional future work includes
some analysis of the FISDD areas related to KM and KM level evaluation and possible
cooperation with ISACA in order to upgrade COBIT 5 with additional KM activities.

180

8 References
[1] Aggestam, L; Sderstrm, E; Persson, A. Seven Types of Knowledge Loss in the
Knowledge Capture Process. ECIS 2010 Proceedings. Paper 13. 2010
[2] Delak, B; Bajec, M. Information system due diligence data as an input for
knowledge management. Online journal of applied knowledge management. 1(2):
pp.15-24. 2013
[3] Delak, B; Bajec, M. Framework for the delivery of information system due
diligence. Information systems management. 30(1): pp.137-149. 2013
[4] Dissanayake, I; Dantu, R.; Nerur, S. Knowledge Management in Software
Development: The Case of Agile Software. AMCIS 2013 Proceedings. 2013
[5] Farhadi, M. Intellectual Assets & Knowledge Due Diligence. University of Reading
- Henley Business School. 2009.
Available at SSRN: http://dx.doi.org/10.2139/ssrn.1359663
[6] Gray, P. Knowledge management. Proceedings of the Americans Conference on
Information Systems. Paper 292. 1999
[7] Henczel, S. The information audit as a first step towards effective Knowledge
Management: an opportunity for special librarian. INSPEL 34(3/4). pp.210-226.
2000
[8] ISACA. COBIT 5 Process Enabling. 2012
[9] Jennex, M.E; Olfman, L. Assessing Knowledge Management Success /
Effectiveness Models. International Journal of Knowledge Management (IJKM).
1(2). pp.33-49. 2005
[10] Kakabadse, N.K; Kouzmin, A; Kakabadse, A. From Tacit Knowledge to
Knowledge Management: Leveraging Invisible Assets. Knowledge and Process
Management. 8(3). pp.137-154. 2001
[11] Mahapatra, R.K; Sarkar, S. The Role of Information Technology in Knowledge
Management. AMCIS 2000. Proceedings. Paper 421.2000
[12] McInerney, C. Knowledge management and the dynamic nature of knowledge.
Journal of the American Society for Information Science and Technology. 53(12).
pp.1008-1016. 2002
[13] Moos, B; Beimborn, D; Wagner, H-T; Weitzel, T. Knowledge Management
Systems,Absorptive Capacity, and Innovation Success. ECIS 2011 Proceedings.
Paper 145. 2011
[14] Nodari, F; Oliveira, M; Maada, A.C.G., Knowledge Sharing, Absortive Capacity
And Organizational Performance. ECIS 2013 Proceedings. Paper 69. 2013
[15] Nonaka, I; Konno, N. The Concept of Ba: Building a Foundation for Knowledge
181

Creation. California Management Review. 40(3). pp.40-54. 1998


[16] Nonaka, I; Takeuchi, H. The Knowledge-Creating Company. Oxford University
Press. 1995
[17] Paliszkiewicz, J; Koohang, A. Organizational trust as a foundation for knowledge
sharing and its influence on organizational performance. Online journal of applied
knowledge management. 1(2): pp.116-127. 2013
[18] Pirhonen, M; Vartiainen, T. Replacing the Project Manager in Information System
Projects: What Knowledge Should be Transferred? AMCIS 2007 Proceedings. Paper
47. 2007
[19] Polanyi, M. The Tacit Dimension. Routledge and Kegan Paul. London. 1966
[20] Rosemann, M; Chan, R. A Framework to Structure Knowledge for Enterprise
Systems. AMCIS 2000 Proceedings. Paper 23. 2000
[21] Spohrer, K; Kude, T; Schmidt, C.T; Heinzl, A. Knowledge Creation In Information
Systems Development Teams: The Role Of Pair Programming And Peer Code
Review. ECIS 2013 Proceedings. Paper 244. 2013

182

Proceedings of the 5th International Conference on Information Technologies and Information Society ITIS 2013 held in Dolenjske toplice,
Slovenia, 7-9 November, 2013
Webpage: http://itis2013.fis.unm.si/
Proceedings edited by: Zoran Levnajic
Proceedings are a collection of contributions presented at the conference
Copyright by: Faculty of Information Studies in Novo mesto, Slovenia, 2013
Online free publication
Published in: Dolenjske toplice, Slovenia, 2013
Main conference sponsor: Creative Core FISNM-3330-13-500033
Simulations project funded by the European Union, The European Regional Development Fund. The operation is carried out
within the framework of the Operational Programme for Strengthening Regional Development Potentials for the period 2007-2013,
Development Priority 1: Competitiveness and research excellence,
Priority Guideline 1.1: Improving the competitive skills and research
excellence.

183

You might also like