Intelligent Systems for Security Informatics
()
About this ebook
The Intelligent Systems Series comprises titles that present state-of-the-art knowledge and the latest advances in intelligent systems. Its scope includes theoretical studies, design methods, and real-world implementations and applications.
The most prevalent topics in Intelligence and Security Informatics (ISI) include data management, data and text mining for ISI applications, terrorism informatics, deception and intent detection, terrorist and criminal social network analysis, public health and bio-security, crime analysis, cyber-infrastructure protection, transportation infrastructure security, policy studies and evaluation, and information assurance, among others. This book covers the most active research work in recent years.
- Pulls together key information on ensuring national security around the world
- The latest research on this subject is concisely presented within the book, with several figures to support the text.
- Will be of interest to attendees of The Intelligence and Security Informatics conference series, which include IEEE International Conference on Intelligence and Security Informatics (IEEE ISI)
Related to Intelligent Systems for Security Informatics
Related ebooks
New Advances in Intelligence and Security Informatics Rating: 0 out of 5 stars0 ratingsCyber Security and Policy: A substantive dialogue Rating: 0 out of 5 stars0 ratingsEmerging Cyber Threats and Cognitive Vulnerabilities Rating: 0 out of 5 stars0 ratingsOSINT in the Intelligence Era: Lecture notes Rating: 0 out of 5 stars0 ratingsOS X Incident Response: Scripting and Analysis Rating: 0 out of 5 stars0 ratingsBuilding an Intelligence-Led Security Program Rating: 5 out of 5 stars5/5The Effects of Cybercrime in the U.S. and Abroad Rating: 0 out of 5 stars0 ratingsAutomating Open Source Intelligence: Algorithms for OSINT Rating: 5 out of 5 stars5/5Threat Forecasting: Leveraging Big Data for Predictive Analysis Rating: 0 out of 5 stars0 ratingsCyber Crime and Cyber Terrorism Investigator's Handbook Rating: 4 out of 5 stars4/5The Decision to Attack: Military and Intelligence Cyber Decision-Making Rating: 3 out of 5 stars3/5Insider Threats Rating: 5 out of 5 stars5/5Targeted Cyber Attacks: Multi-staged Attacks Driven by Exploits and Malware Rating: 5 out of 5 stars5/5Contemporary Digital Forensic Investigations of Cloud and Mobile Applications Rating: 0 out of 5 stars0 ratingsInformation Security Science: Measuring the Vulnerability to Data Compromises Rating: 0 out of 5 stars0 ratingsSecuring Social Media in the Enterprise Rating: 0 out of 5 stars0 ratingsAttribution of Advanced Persistent Threats: How to Identify the Actors Behind Cyber-Espionage Rating: 5 out of 5 stars5/5Unified Communications Forensics: Anatomy of Common UC Attacks Rating: 4 out of 5 stars4/5Handbook of Digital Forensics and Investigation Rating: 4 out of 5 stars4/5Cyber Influence and Cognitive Threats Rating: 0 out of 5 stars0 ratingsApplication of Big Data for National Security: A Practitioner’s Guide to Emerging Technologies Rating: 0 out of 5 stars0 ratingsCyber Warfare: Techniques, Tactics and Tools for Security Practitioners Rating: 4 out of 5 stars4/5Insider Threat: A Guide to Understanding, Detecting, and Defending Against the Enemy from Within Rating: 0 out of 5 stars0 ratingsIntroduction to Cyber-Warfare: A Multidisciplinary Approach Rating: 5 out of 5 stars5/5The Basics of Cyber Warfare: Understanding the Fundamentals of Cyber Warfare in Theory and Practice Rating: 4 out of 5 stars4/5Security Science: The Theory and Practice of Security Rating: 0 out of 5 stars0 ratingsCyber Attacks: Protecting National Infrastructure Rating: 4 out of 5 stars4/5Threat Intelligence Feeds Third Edition Rating: 0 out of 5 stars0 ratingsVulnerability And Patch Management A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratings
Security For You
Tor and the Dark Art of Anonymity Rating: 5 out of 5 stars5/5Hacking For Dummies Rating: 4 out of 5 stars4/5Destination CISSP Rating: 3 out of 5 stars3/5CompTIA Security+ Study Guide: Exam SY0-601 Rating: 5 out of 5 stars5/5Practical Lock Picking: A Physical Penetration Tester's Training Guide Rating: 5 out of 5 stars5/5Cybersecurity All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsCybersecurity For Dummies Rating: 4 out of 5 stars4/5Cybersecurity: The Beginner's Guide: A comprehensive guide to getting started in cybersecurity Rating: 5 out of 5 stars5/5Wireless Hacking 101 Rating: 4 out of 5 stars4/5Ethical Hacking 101 - How to conduct professional pentestings in 21 days or less!: How to hack, #1 Rating: 5 out of 5 stars5/5Mike Meyers CompTIA Security+ Certification Passport, Sixth Edition (Exam SY0-601) Rating: 5 out of 5 stars5/5Hacking : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Ethical Hacking Rating: 5 out of 5 stars5/5Codes and Ciphers - A History of Cryptography Rating: 4 out of 5 stars4/5Make Your Smartphone 007 Smart Rating: 4 out of 5 stars4/5Remote/WebCam Notarization : Basic Understanding Rating: 3 out of 5 stars3/5How to Hack Like a Pornstar Rating: 5 out of 5 stars5/5Amazon Web Services (AWS) Interview Questions and Answers Rating: 5 out of 5 stars5/5Network+ Study Guide & Practice Exams Rating: 4 out of 5 stars4/5Handbook of Digital Forensics and Investigation Rating: 4 out of 5 stars4/5Mike Meyers' CompTIA Security+ Certification Guide, Third Edition (Exam SY0-601) Rating: 5 out of 5 stars5/5Ultimate Guide for Being Anonymous: Hacking the Planet, #4 Rating: 5 out of 5 stars5/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5The Art of Intrusion: The Real Stories Behind the Exploits of Hackers, Intruders and Deceivers Rating: 4 out of 5 stars4/5The Cyber Attack Survival Manual: Tools for Surviving Everything from Identity Theft to the Digital Apocalypse Rating: 0 out of 5 stars0 ratingsHow to Become Anonymous, Secure and Free Online Rating: 5 out of 5 stars5/5
Reviews for Intelligent Systems for Security Informatics
0 ratings0 reviews
Book preview
Intelligent Systems for Security Informatics - Christopher C Yang
Preface
The Intelligence and Security Informatics conference series, which includes the IEEE International Conference on Intelligence and Security Informatics (IEEE ISI), the European Intelligence and Security Informatics Conference (EISIC), and the Pacific Asia Workshop on Intelligence and Security Informatics (PAISI), started about a decade ago. Since then, it has brought together many academic researchers, law enforcement and intelligence experts, and information technology consultants and experts to discuss their research and practices. The topics in ISI include data management, data and text mining for ISI applications, terrorism informatics, deception and intent detection, terrorist and criminal social network analysis, public health and bio-security, crime analysis, cyber-infrastructure protection, transportation infrastructure security, policy studies and evaluation, and information assurance, among others. In this book, we have covered the most active research work in recent years.
The intended readership of this book includes (i) public and private sector practitioners in the national/international and homeland security area, (ii) consultants and contractors engaged in ongoing relationships with federal, state, local, and international agencies on projects related to national security, (iii) graduate-level students in Information Sciences, Public Policy, Computer Science, Information Assurance, and Terrorism, and (iv) researchers engaged in security informatics, homeland security, information policy, knowledge management, public administration, and counter-terrorism.
We hope that readers will find the book valuable and useful in their study or work. We also hope that the book will contribute to the ISI community. Researchers and practitioners in this community will continue to grow and share research findings to contribute to national safety around the world.
Christopher C. Yang, Drexel University
Wenji Mao, Chinese Academy of Sciences
Xiaolong Zheng, Chinese Academy of Sciences
Hui Wang, National University of Defense Technology
Chapter 1
Revealing the Hidden World of the Dark Web
Social Media Forums and Videos¹
Hsinchun Chen∗, Dorothy Denning†, Nancy Roberts†, Catherine A. Larson∗, Ximing Yu∗ and Chun-Neng Huang∗, ∗Management Information Systems Department, The University of Arizona, Tucson, Arizona, USA, †Department of Defense Analysis, Naval Postgraduate School, Monterey, California, USA
Chapter Outline
1.1 Introduction
1.2 The Dark Web Forum Portal
1.2.1 Data Identification and Collection
1.2.2 Evolution of the Dark Web Forum Portal
Version 1.0
Version 2.0
Version 2.5
1.2.3 Summary of the Three Versions
1.2.4 Case Studies using the Dark Web Forum Portal
Case study I. Dark Forums in Eastern Afghanistan: How to influence the Haqqani audience
Case study II. Psychological operations
Conclusion
1.3 The Video Portal
1.3.1 System Design
1.3.2 Data Acquisition
1.3.3 Data Preparation
1.3.4 Portal System
Access control
Browsing
Searching
Post-search filtering
Multilingual translation
Social network analysis
1.4 Conclusion and Future Directions
Acknowledgments
References
1.1 Introduction
The Internet presence of terrorists, hate groups, and other extremists continues to be of significant interest to counter-terrorism investigators, intelligence analysts, and other researchers in government, industry, and academia, in fields as diverse as: psychology, sociology, criminology, and political science; computational and information sciences; and law enforcement, homeland security, and international policy. Through analysis of primary sources such as terrorists’ own websites, videos, chat sites, and Internet forums, researchers and others attempt, for example, to identify who the terrorists and extremists are, how they are using the Internet and for what intent, who the intended audience is, etc. [1]. For example, the United Nation’s Counter-terrorism Implementation Task Force in 2009 issued a report describing member states’ concerns about continued terrorist use of the Internet for fundraising, recruitment, and cyber attacks, among other things, and analyzed steps to address this use [2]. McNamee et al. [3] examined the message themes found in hate group websites to understand how these groups recruited and reacted to threats through the formation of group identity. Post [4] noted how terrorists had created a virtual community of hatred
and wrote of the need to develop a psychology-based counter-terrorism program to, in part, inhibit potential participants from joining, reduce support for these groups, and undermine their activities.
In 2002, partly in response to burgeoning interest in terrorist use of the Internet, particularly in the aftermath of 9/11, and partly as a natural expansion of its previous work in border security, and information sharing and data mining for law enforcement, the Artificial Intelligence (AI) Lab of the University of Arizona founded its Dark Web
project. Dark Web,
as it has become known, is a long-term scientific research program that aims to study international terrorism via a computational, data-centric approach (http://ai.arizona.edu/research/).
Dark Web focuses on the hidden, dark
side of the Internet, where terrorists and extremists use the Web to disseminate their ideologies, recruit new members, and even share terrorism training materials. Project goals are twofold: (1) to collect, as comprehensively as possible, all relevant web content generated by international extremist and terrorist groups, including websites, forums, chat rooms, blogs, social networking sites, videos, virtual world, etc.; and (2) to develop algorithms, tools, and visualization techniques that will enhance researchers’ and investigators’ abilities to analyze these sites and their relevance, and that are generalizable to and useful across a wide range of domains.
The next section provides an overview of the genesis and evolution of the Dark Web Forum Portal and includes an examination of the data sources and collection. The following section provides an overview of video portal development. The chapter ends with a conclusion and directions for future work.
1.2 The Dark Web Forum Portal
The Dark Web project has for several years collected a wide variety of data related to and emanating from extremist and terrorist groups. These data have included websites, multimedia material linked to the websites, forums, blogs, virtual world implementations, etc. Forums, as dynamic, interactive discussion sites that support online conversations, have proven to be of significant interest. Through the anonymity of posting under screen names, they allow for and support free expression. They are an especially rich source of information for studying organizations and individuals, the evolution of ideas and trends, and other social phenomena. In forums, ongoing conversations are captured in threads, with each thread roughly corresponding to a subject area or topic. The replies, called postings or messages, are generally time-stamped and attributable to a particular poster (author). Analysis of the threads and messages can often reveal dynamic trends in topics and discussions, the sequencing of ideas, and relationships between posters.
1.2.1 Data Identification and Collection
The forum sites collected for the Dark Web project were identified with input from terrorism researchers, security and military educators, and other experts. They were selected in part because each is generally dedicated to topics relating to Islamic ideology and theology, and range from moderate
to extremist
in their opinions and ideologies.
Once identified, semi-automated methods of collection known as spiders
are used to crawl the forums and capture all messages including metadata, such as author (also known as poster
), date, and time. The date and time stamps are especially important for helping to maintain the reply network: the order in which messages are posted and replied to. The spiders are described in more detail below.
The forums were originally collected to serve as a research testbed for use in the Lab, particularly to support work in sentiment and affect analysis, and the study of radicalization processes over time.
Access to these forums is now provided to researchers and others through the Dark Web Forum Portal [5]. The portal contains approximately 15,000,000 messages in five languages: Arabic, English, French, German, and Russian. The English- and Arabic-language forums selected include major jihadist websites; some of the Arabic forums have English-language sections. Three French forums, and the single forums in German and Russian, provide representative content for extremist groups producing content in these languages. Collectively, the forums have approximately 350,000 members/authors. The portal also provides statistical analysis, download, translation, and social network visualization functions for each selected forum.
Incremental spidering keeps the content up to date [6]. Tools developed for searching, browsing, translation, analysis, and visualization are described in a later section.
1.2.2 Evolution of the Dark Web Forum Portal
Version 1.0
This section covers the development of the portal and includes references to previous work where certain aspects of the portal research and development are explained in more detail.
As mentioned above, the Dark Web forums were originally collected to serve as a research testbed for the Artificial Intelligence Lab to develop techniques for analyzing the Internet presence and content of hate and extremist groups (e.g. Refs [7–11]). At the time, little previous research had been done on Dark Web forum data integration and searching. Dark Web forums are heterogeneous, widely distributed, numerous, difficult to access, and can mysteriously appear and disappear with no notice or warning. The growing amount of forum material makes searching increasingly difficult [12]. For researchers interested in analyzing or monitoring Dark Web content, data integration and retrieval are critical issues [10]. Without a centralized system, it is labor-intensive, time-consuming, and expensive to search and analyze Dark Web forum data.
Two other characteristics of Dark Web forums create barriers to use. The first is the dynamic nature of the forums, which creates difficulties for analyzing and visualizing interactions between participants. Visualization can reveal hitherto hidden relationships and networks behind online activity [13]. Social Network Analysis (SNA) is a graph-based method that can be used to analyze the network structure of a group or population [14]. SNA has been used to study various real-world networks [15]. Web forums are ideal platforms for social network research because by default they record participants’ communications and the postings are retrievable [16]. However, few prior studies had actually incorporated an SNA function into a real-time system.
A second characteristic is the multilingual nature of the forums. Forums can be found in many of the world’s languages, and forums collected for Dark Web study were in Arabic and English, initially, with French, German, and Russian forums being added later. It was thus critical that the language barrier be addressed.
Based on the research gaps discussed above, it was clear that a systematic and integrated approach to collecting, searching, browsing, and analyzing Dark Web forum data was needed. We developed these research questions to guide the next steps [5]:
• Q1: How can we develop a Web portal for Dark Web forums which will effectively integrate data from multiple forum data sources?
• Q2: How can we develop efficient, accurate search and browse methods across multiple forum data sources in our portal?
• Q3: How can we incorporate real-time translation functionality into our portal to enable automatic forum data translation from non-English (e.g. Arabic) to English?
• Q4: How can we incorporate real-time, user-interactive social network analysis into our portal to analyze and visualize the interactions among forum participants?
The first iteration of the portal was developed based on the system design shown in Figure 1.1.
Figure 1.1 Early system design of the Dark Web Forum Portal.
The early system design contained three modules:
• Data acquisition – Using spidering programs, web pages from the selected online forums were collected. In the first iteration of the portal, we included six Arabic forums and one English-language forum with a total of about 2.3M messages.
• Data preparation – Using parsing programs, the detailed forum data and metadata were extracted from the raw HTML web pages and stored locally in a database.
• System functionality – Using Apache Tomcat for the portal and Microsoft SQL Server 2000 for the database, functions including searching and browsing could be supported. Forums could be searched individually or collectively. For forum statistics analysis, Java applet-based charts were created to show the trends based on the numbers of messages produced over time. The multilingual translation function was implemented using the then-current Google Translation API (http://code.google.com/apis/ajaxlanguage/documentation/#Translation). The social network visualization function provided dynamic, user-interactive networks implemented using JUNG (http://jung.sourceforge.net/) to visualize the interactions among forum members.
Figure 1.2 shows a results screen from a single-forum search using the term bomb
in the forum Alokab. Alokab is in Arabic; the search term bomb
was used to retrieve matching threads (shown in the middle column, labeled Thread Title
), and the translation function was then invoked to translate on the fly from Arabic to English (Thread Title Translation
).
Figure 1.2 Screenshot of single-forum search result.
An evaluation was conducted with a small group of users, each of whom performed all tasks related to each function. All search tasks were completed successfully on both our portal and a benchmark system; however, on our system, searching was faster. Users also reacted positively to the translation and SNA functions when queried using a seven-point Likert scale to assess their subjective assessments of their overall satisfaction with the portal, including its usefulness and ease of use.
This first iteration of the portal was created to address the challenges involved in integrating data from multiple forum data sources in multiple languages, developing search and browse methods effective for use across multiple data sources, and incorporating into a portal real-time translation and real-time social network analysis functions that are typically stand-alone. More details about the first version of the system and the user evaluation can be found in Zhang et al. [5].
Version 2.0
Version 2.0 was developed with several goals in mind:
• Increase the scope of data collection while minimizing the amount of human effort or intervention needed.
• Improve the currency of the data presented in the portal and develop the means to keep it updated in as automated a fashion as possible.
• Enhance searching and browsing from a user perspective.
To increase the scope of data collection and keep the collection up to date, we needed to examine our spidering procedures. Spiders [17] are defined as software programs that traverse the World Wide Web information space by following hypertext links and retrieving web documents by standard HTTP protocol.
As explained in our previous research, there are six important characteristics of spidering programs: accessibility, collection type, content richness, URL-ordering features, URL-ordering techniques, and collection update procedure [18]. A functional spider program must handle the registration requirement of targeted forums (accessibility), extract the desired information from various data types (collection type), filter out irrelevant file types (content richness), sort queued URLs based on given heuristics (URL-ordering features and techniques), and keep the collection up to date (collection update procedure). An incremental spidering process was added to the data acquisition module of the system [6].
The addition of the incremental spidering component allowed the portal to stay up to date within 2 weeks of forum postings. It also enabled us to acquire a great many more forums and to increase the collection from seven forums with 2.3M messages in the first version to 29 forums and more than 13M messages in the second. Tests performed during the development of version 2.0 showed, for example, that the incremental spider allowed us to collect 29,000 messages in less than 45 minutes [6].
Another goal, as listed above, was to improve the searching and browsing experience of users. More flexible Boolean searching was added, to allow users to perform AND
and OR
searches. Users could also now enter their search terms in English (or any language) and retrieve matches regardless of the original language of the portal. The display was improved to allow users to comprehend, at a glance, how to view, translate, or download results, whether threads or messages.
Version 2.5
While version 2.0 addressed many of the issues we identified in usability tests, improvements in searching were still needed. Search is one of the most important and well-used functions in the portal and, as of version 2.0, the search results were still not very satisfactory in the following aspects:
• Query parsing: While version 2.0 added some Boolean searching capability, it did not support complex, sophisticated queries.
• Search ranking: The search ranking was problematic when multiple keywords with the OR
relationship were entered by users.
• Hit highlighting: Matched keywords were not always correctly highlighted; some highlighted words did not match the input search terms.
• Searching efficiency: Searching for messages in more than five forums simultaneously was very slow from a user perspective.
Given these issues, we embarked on a newer version of the portal, version 2.5, based on version 2.0. We adopted Lucene, a popular Java-based full-text indexing framework for the indexing and searching of thread titles and message contents (http://lucene.apache.org/).
Features of Lucene include high-performance indexing that scales well, and accurate and efficient search algorithms. Its index size is roughly 20–30% of the size of the text indexed, and Lucene Java is compatible with Lucene implemented in other programming languages. Incremental and batch indexing are both fast. It offers ranked searching in which the best
results are returned first, and also offers a wide range of query types.
Implementing Lucene to work for multilingual searching required analysis before proceeding. The Dark Web Forum Portal (DWFP) contains 29 forums in five languages: English, Arabic, French, German, and Russian. We examined the languages contained in the 29 forums manually and found that among the 17 Arabic forums, 16 are purely in Arabic. An exception was the forum Alqimmah, which contains a considerable number of English messages. All seven English-language forums contain Arabic messages. All French, German, and Russian forums also contain Arabic messages. See Table 1.1 for a listing of