You are on page 1of 61

Handling memory leaks in Java

programs

Find out when memory leaks are a concern and how to prevent them
Memory leaks in Java programs? Absolutely. Contrary to popular belief,
memory management is still a consideration in Java programming. In this
article, you'll learn what causes Java memory leaks and when these leaks
should be of concern. You'll also get a quick hands-on lesson for tackling
leaks in your own projects.
PDF (214 KB) | 11 Comments
Share:
Facebook
Twitter
Linked In
Google+

Jim Patrick (patrickj@us.ibm.com), Advisory Programmer, IBM Pervasive Computing


01 February 2001
Also available in Japanese
Table of contents

Develop and deploy your next


app on the IBM Bluemix
cloud platform.
Start building for free

How memory leaks manifest themselves in Java


programs
Most programmers know that one of the beauties of using a programming language such as
Java is that they no longer have to worry about allocating and freeing memory. You simply
create objects, and Java takes care of removing them when they are no longer needed by
the application through a mechanism known as garbage collection. This process means that
Java has solved one of the nasty problems that plague other programming languages -- the
dreaded memory leak. Or has it?
Before we get too deep into our discussion, let's begin by reviewing how garbage collection
actually works. The job of the garbage collector is to find objects that are no longer needed
by an application and to remove them when they can no longer be accessed or referenced.
The garbage collector starts at the root nodes, classes that persist throughout the life of a
Java application, and sweeps through all of the nodes that are referenced. As it traverses
the nodes, it keeps track of which objects are actively being referenced. Any classes that are

no longer being referenced are then eligible to be garbage collected. The memory resources
used by these objects can be returned to the Java virtual machine (JVM) when the objects
are deleted.
So it is true that Java code does not require the programmer to be responsible for memory
management cleanup, and that it automatically garbage collects unused objects. However,
the key point to remember is that an object is only counted as being unused when it is no
longer referenced. Figure 1 illustrates this concept.
Figure 1. Unused but still referenced

The figure illustrates two classes that have different lifetimes during the execution of a Java
application. Class A is instantiated first and exists for a long time or for the entire life of the
program. At some point, class B is created, and class A adds a reference to this newly
created class. Now let's suppose class B is some user interface widget that is displayed and
eventually dismissed by the user. Even though class B is no longer needed, if the reference
that class A has to class B is not cleared, class B will continue to exist and to take up memory
space even after the next garbage collection cycle is executed.
Back to top

When are memory leaks a concern?


If your program is getting a java.lang.OutOfMemoryError after executing for a while, a
memory leak is certainly a strong suspect. Beyond this obvious case, when should memory
leaks become a concern? The perfectionist programmer would answer that all memory leaks
need to be investigated and corrected. However, there are several other points to consider
before jumping to this conclusion, including the lifetime of the program and the size of the
leak.
Consider the possibility that the garbage collector may never even run during an
application's lifetime. There is no guarantee as to when or if the JVM will invoke the garbage

collector -- even if a program explicitly calls System.gc(). Typically, the garbage collector
won't be automatically run until a program needs more memory than is currently available. At
this point, the JVM will first attempt to make more memory available by invoking the garbage
collector. If this attempt still doesn't free enough resources, then the JVM will obtain more
memory from the operating system until it finally reaches the maximum allowed.
Take, for example, a small Java application that displays some simple user interface
elements for configuration modifications and that has a memory leak. Chances are that the
garbage collector will not even be invoked before the application closes, because the JVM
will probably have plenty of memory to create all of the objects needed by the program with
leftover memory to spare. So, in this case, even though some dead objects are taking up
memory while the program is being executed, it really doesn't matter for all practical
purposes.
If the Java code being developed is meant to run on a server 24 hours a day, then memory
leaks become much more significant than in the case of our configuration utility. Even the
smallest leak in some code that is meant to be continuously run will eventually result in the
JVM exhausting all of the memory available.
And in the opposite case where a program is relatively short lived, memory limits can be
reached by any Java code that allocates a large number of temporary objects (or a handful
of objects that eat up large amounts of memory) that are not de-referenced when no longer
needed.
One last consideration is that the memory leak isn't a concern at all. Java memory leaks
should not be considered as dangerous as leaks that occur in other languages such as C++
where memory is lost and never returned to the operating system. In the case of Java
applications, we have unneeded objects clinging to memory resources that have been given
to the JVM by the operating system. So in theory, once the Java application and its JVM
have been closed, all allocated memory will be returned to the operating system.
Back to top

Determining if an application has memory leaks


To see if a Java application running on a Windows NT platform is leaking memory, you might
be tempted to simply observe the memory settings in Task Manager as the application is
run. However, after observing a few Java applications at work, you will find that they use a
lot of memory compared to native applications. Some Java projects that I have worked on
can start out using 10 to 20 MB of system memory. Compare this number to the native
Windows Explorer program shipped with the operating system, which uses something on the
order of 5 MB.
The other thing to note about Java application memory use is that the typical program
running with the IBM JDK 1.1.8 JVM seems to keep gobbling up more and more system

memory as it runs. The program never seems to return any memory back to the system until
a very large amount of physical memory has been allocated to the application. Could these
situations be signs of a memory leak?
To understand what is going on, we need to familiarize ourselves with how the JVM uses
system memory for its heap. When running java.exe, you can use certain options to control
the startup and maximum size of the garbage-collected heap (-ms and -mx, respectively).
The Sun JDK 1.1.8 uses a default 1 MB startup setting and a 16 MB maximum setting. The
IBM JDK 1.1.8 uses a default maximum setting of one-half the total physical memory size of
the machine. These memory settings have a direct impact on what the JVM does when it
runs out of memory. The JVM may continue growing the heap rather than wait for a garbage
collection cycle to complete.
So for the purposes of finding and eventually eliminating a memory leak, we are going to
need better tools than task monitoring utility programs. Memory debugging programs
(see Resources) can come in handy when you're trying to detect memory leaks. These
programs typically give you information about the number of objects in the heap, the number
of instances of each object, and the memory being using by the objects. In addition, they
may also provide useful views showing each object's references and referrers so that you
can track down the source of a memory leak.
Next, I will show how I detected and removed a memory leak using the JProbe debugger
from Sitraka Software to give you some idea of how these tools can be deployed and the
process required to successfully remove a leak.
Back to top

A memory leak example


This example centers on a problem that manifested itself after a tester spent several hours
working with a Java JDK 1.1.8 application that my department was developing for
commercial release. The underlying code and packages to this Java application were
developed by several different groups of programmers over time. The memory leaks that
cropped up within the application were caused, I suspect, by programmers who did not truly
understand the code that had been developed elsewhere.
The Java code in question allowed a user to create applications for a Palm personal digital
assistant without having to write any Palm OS native code. By using a graphical interface,
the user could create forms, populate them with controls, and then wire events from these
controls to create the Palm application. The tester discovered that the Java application
eventually ran out of memory as he created and deleted forms and controls over time. The
developers hadn't detected the problem because their machines had more physical memory.
To investigate this problem, I used JProbe to determine what was going wrong. Even with
the powerful tools and memory snapshots that JProbe provides, the investigation turned out

to be a tedious, iterative process that involved first determining the cause of a given memory
leak and then making code changes and verifying the results.
JProbe has several options to control what information is actually recorded during a
debugging session. After some experimentation, I decided that the most efficient way to get
the information I needed was to turn off the performance data collection and concentrate on
the captured heap data. JProbe provides a view called the Runtime Heap Summary that
shows the amount of heap memory in use over time as the Java application is running. It
also provides a toolbar button to force the JVM to perform garbage collection when desired.
This capability turned out to be very useful when trying to see if a given instance of a class
would be garbage collected when it was no longer needed by the Java application. Figure 2
shows the amount of heap storage that is in use over time.
Figure 2. Runtime Heap Summary

In the Heap Usage Chart, the blue portion indicates the amount of heap space that has been
allocated. After I started the Java program and it reached a stable point, I forced the garbage
collector to run, which is indicated by the sudden drop in the blue curve before the green line
(this line indicates a checkpoint was inserted). Next, I added, then deleted four forms and
again invoked the garbage collector. The fact that the level blue area after the checkpoint is
higher than the level blue area before the checkpoint tells us that a memory leak is likely, as
the program has returned to its initial state of only having a single visible form. I confirmed

the leak by looking at the Instance Summary, which indicates that the FormFrameclass
(which is the main UI class for the forms) has increased in count by four after the checkpoint.
Back to top

Finding the cause


My first step in trying to isolate the problems reported by the tester was to come up with
some simple, reproducible test cases. For this example, I found that simply adding a form,
deleting the form, and then forcing a garbage collection resulted in many class instances
associated with the deleted form to still be alive. This problem was apparent in the JProbe
Instance Summary view, which counts the number of instances that are on the heap for each
Java class.
To pinpoint the references that were keeping the garbage collector from properly doing its
job, I used JProbe's Reference Graph, shown in Figure 3, to determine which classes were
still referencing the now-deleted FormFrame class. This process turned out to be one of the
trickiest aspects of debugging this problem as I discovered many different objects were still
referencing the unused object. The trial-and-error process of figuring out which of these
referrers was truly causing the problem was quite time consuming.
In this case, a root class (upper-left corner in red) is where the problem originated. The class
that is highlighted in blue on the right is along the path that has been traced from the
original FormFrame class.
Figure 3. Tracing a memory leak in a reference graph

For this specific example, it turned out that the primary culprit was a font manager class that
contained a static hashtable. After tracing back through the list of referrers, I found that the
root node was a static hashtable that stored the fonts in use for each form. The various

forms could be zoomed in or out independently, so the hashtable contained a vector with all
of the fonts for a given form. When the zoom view of the form was changed, the vector of
fonts was fetched and the appropriate zoom factor was applied to the font sizes.
The problem with this font manager class was that while the code put the font vector into the
hashtable when the form was created, no provision was ever made to remove the vector
when the form was deleted. Therefore, this static hashtable, which essentially existed for the
life of the application itself, was never removing the keys that referenced each form.
Consequently, the form and all of its associated classes were left dangling in memory.
Back to top

Applying a fix
The simple solution to this problem was to add a method to the font manager class that
allowed the hashtable's remove() method to be called with the appropriate key when the
form was deleted by the user. The removeKeyFromHashtables() method is shown below:
public void removeKeyFromHashtables(GraphCanvas graph) {
if (graph != null) {
viewFontTable.remove(graph);

// remove key from hashtable


// to prevent memory leak

}
}

Next, I added a call to this method to the FormFrame class. FormFrame uses Swing's internal
frames to actually implement the form UI, so the call to the font manager was added to the
method that is executed when an internal frame has completely closed, as shown here:
/**
* Invoked when a FormFrame is disposed. Clean out references to prevent
* memory leaks.
*/
public void internalFrameClosed(InternalFrameEvent e) {
FontManager.get().removeKeyFromHashtables(canvas);
canvas = null;
setDesktopIcon(null);
}

After I made these code changes, I used the debugger to verify that the object count
associated with the deleted form decreased when the same test case was executed.
Back to top

Preventing memory leaks


You can prevent memory leaks by watching for some common problems. Collection classes,
such as hashtables and vectors, are common places to find the cause of a memory leak.
This is particularly true if the class has been declared static and exists for the life of the
application.

Another common problem occurs when you register a class as an event listener without
bothering to unregister when the class is no longer needed. Also, many times member
variables of a class that point to other classes simply need to be set to null at the appropriate
time.
Back to top

Conclusion
Finding the cause of a memory leak can be a tedious process, not to mention one that will
require special debugging tools. However, once you become familiar with the tools and the
patterns to look for in tracing object references, you will be able to track down memory leaks.
In addition, you'll gain some valuable skills that may not only save a programming project,
but also provide insight as to what coding practices to avoid to prevent memory leaks in
future projects.

How Garbage Collection works in Java


I have read many articles on Garbage Collection in Java, some of them are too complex to understand and
some of them dont contain enough information required to understand garbage collection in Java. Then I
decided to write my own experience as an article or you call tutorial about How Garbage Collection works in
Java or what is Garbage collection in Java in simple word which would be easy to understand and have
sufficient information to understand how garbage collection works in Java.

This article is in continuation of my previous


articles How Classpath works in Java and How to write Equals method in java and before moving ahead let's
recall few important points about garbage collection in java:
1) objects are created on heap in Java irrespective of there scope e.g. local or member variable. while its
worth noting that class variables or static members are created in method area of Java memory space and
both heap and method area is shared between different thread.
2) Garbage collection is a mechanism provided by Java Virtual Machine to reclaim heap space from objects
which are eligible for Garbage collection.
3) Garbage collection relieves java programmer from memory management which is essential part of C++
programming and gives more time to focus on business logic.
4) Garbage Collection in Java is carried by a daemon thread called Garbage Collector.
5) Before removing an object from memory Garbage collection thread invokes finalize () method of that
object and gives an opportunity to perform any sort of cleanup required.
6) You as Java programmer can not force Garbage collection in Java; it will only trigger if JVM thinks it
needs a garbage collection based on Java heap size.
7) There are methods like System.gc () and Runtime.gc () which is used to send request of Garbage
collection to JVM but its not guaranteed that garbage collection will happen.
8) If there is no memory space for creating new object in Heap Java Virtual Machine throws
OutOfMemoryError or java.lang.OutOfMemoryError heap space
9) J2SE 5(Java 2 Standard Edition) adds a new feature called Ergonomics goal of ergonomics is to provide
good performance from the JVM with minimum of command line tuning.

When an Object becomes Eligible for Garbage Collection


An Object becomes eligible for Garbage collection or GC if its not reachable from any live threads or
any static references in other words you can say that an object becomes eligible for garbage collection if its

all references are null. Cyclic dependencies are not counted as reference so if Object A has reference of
object B and object B has reference of Object A and they don't have any other live reference then both Objects
A and B will be eligible for Garbage collection.
Generally an object becomes eligible for garbage collection in Java on following cases:
1) All references of that object explicitly set to null e.g. object = null
2) Object is created inside a block and reference goes out scope once control exit that block.
3) Parent object set to null, if an object holds reference of another object and when you set container object's
reference null, child or contained object automatically becomes eligible for garbage collection.
4) If an object has only live references via WeakHashMap it will be eligible for garbage collection. To learn
more about HashMap see here How HashMap works in Java.

Heap Generations for Garbage Collection in Java


Java objects are created in Heap and Heap is divided into three parts or generations for sake of garbage
collection in Java, these are called as Young generation, Tenured or Old Generation and Perm Area of
heap.
New Generation is further divided into three parts known as Eden space, Survivor 1 and Survivor 2 space.
When an object first created in heap its gets created in new generation inside Eden space and after subsequent
Minor Garbage collection if object survives its gets moved to survivor 1 and then Survivor 2 before Major
Garbage collection moved that object to Old or tenured generation.
Permanent generation of Heap or Perm Area of Heap is somewhat special and it is used to store Meta data
related to classes and method in JVM, it also hosts String pool provided by JVM as discussed in my string
tutorial why String is immutable in Java. There are many opinions around whether garbage collection in
Java happens in perm area of java heap or not, as per my knowledge this is something which is JVM dependent
and happens at least in Sun's implementation of JVM. You can also try this by just creating millions of String
and watching for Garbage collection or OutOfMemoryError.

Types of Garbage Collector in Java


Java Runtime (J2SE 5) provides various types of Garbage collection in Java which you can choose based
upon your application's performance requirement. Java 5 adds three additional garbage collectors except
serial garbage collector. Each is generational garbage collector which has been implemented to increase
throughput of the application or to reduce garbage collection pause times.
1) Throughput Garbage Collector: This garbage collector in Java uses a parallel version of the young
generation collector. It is used if the -XX:+UseParallelGC option is passed to the JVM via command line options
. The tenured generation collector is same as the serial collector.
2) Concurrent low pause Collector: This Collector is used if the -Xingc or -XX:+UseConcMarkSweepGC is
passed on the command line. This is also referred as Concurrent Mark Sweep Garbage collector. The
concurrent collector is used to collect the tenured generation and does most of the collection concurrently with
the execution of the application. The application is paused for short periods during the collection. A parallel
version of the young generation copying collector is sued with the concurrent collector. Concurrent Mark Sweep
Garbage collector is most widely used garbage collector in java and it uses algorithm to first mark object which
needs to collected when garbage collection triggers.
3) The Incremental (Sometimes called train) low pause collector: This collector is used only if XX:+UseTrainGC is passed on the command line. This garbage collector has not changed since the java 1.4.2
and is currently not under active development. It will not be supported in future releases so avoid using this
and please see 1.4.2 GC Tuning document for information on this collector.
Important point to not is that -XX:+UseParallelGC should not be used with -XX:+UseConcMarkSweepGC.
The argument passing in the J2SE platform starting with version 1.4.2 should only allow legal combination of
command line options for garbage collector but earlier releases may not find or detect all illegal combination
and the results for illegal combination are unpredictable. Its not recommended to use this garbage collector in
java.

JVM Parameters for garbage collection in Java


Garbage collection tuning is a long exercise and requires lot of profiling of application and patience to get it
right. While working with High volume low latency Electronic trading system I have worked with some of the
project where we need to increase the performance of Java application by profiling and finding what causing

full GC and I found that Garbage collection tuning largely depends on application profile, what kind of object
application has and what are there average lifetime etc. for example if an application has too many short lived
object then making Eden space wide enough or larger will reduces number of minor collections. you can also
control size of both young and Tenured generation using JVM parameters for example setting XX:NewRatio=3 means that the ratio among the young and tenured generation is 1:3 , you got to be careful
on sizing these generation. As making young generation larger will reduce size of tenured generation
which will force Major collection to occur more frequently which pauses application thread during that
duration results in degraded or reduced throughput. The parameters NewSize and MaxNewSize are used to
specify the young generation size from below and above. Setting these equal to one another fixes the young
generation. In my opinion before doing garbage collection tuning detailed understanding of garbage collection
in java is must and I would recommend reading Garbage collection document provided by Sun Microsystems
for detail knowledge of garbage collection in Java. Also to get a full list of JVM parameters for a particular Java
Virtual machine please refer official documents on garbage collection in Java. I found this link quite helpful
though http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html

Full GC and Concurrent Garbage Collection in Java


Concurrent garbage collector in java uses a single garbage collector thread that runs concurrently with
the application threads with the goal of completing the collection of the tenured generation before it becomes
full. In normal operation, the concurrent garbage collector is able to do most of its work with the application
threads still running, so only brief pauses are seen by the application threads. As a fall back, if the concurrent
garbage collector is unable to finish before the tenured generation fill up, the application is paused and the
collection is completed with all the application threads stopped. Such Collections with the application stopped
are referred as full garbage collections or full GC and are a sign that some adjustments need to be made to
the concurrent collection parameters. Always try to avoid or minimize full garbage collection or Full GC
because it affects performance of Java application. When you work in finance domain for electronic trading
platform and with high volume low latency systems performance of java application becomes extremely critical
an you definitely like to avoid full GC during trading period.

Summary on Garbage collection in Java


1) Java Heap is divided into three generation for sake of garbage collection. These are young generation,
tenured or old generation and Perm area.
2) New objects are created into young generation and subsequently moved to old generation.
3) String pool is created in Perm area of Heap, garbage collection can occur in perm space but depends
upon JVM to JVM.
4) Minor garbage collection is used to move object from Eden space to Survivor 1 and Survivor 2
space and Major collection is used to move object from young to tenured generation.
5) Whenever Major garbage collection occurs application threads stops during that period which will reduce
applications performance and throughput.
6) There are few performance improvement has been applied in garbage collection in java 6 and we usually
use JRE 1.6.20 for running our application.
7) JVM command line options Xmx and -Xms is used to setup starting and max size for Java Heap. Ideal
ratio of this parameter is either 1:1 or 1:1.5 based upon my experience for example you can have either both
Xmx and Xms as 1GB or Xms 1.2 GB and 1.8 GB.
8) There is no manual way of doing garbage collection in Java.

Read more: http://javarevisited.blogspot.com/2011/04/garbage-collection-injava.html#ixzz3K9KoqJrp

Java SE 6 HotSpot[tm] Virtual Machine


Garbage Collection Tuning

Note: For Java SE 8, see Java Platform, Standard Edition HotSpot Virtual Machine Garbage
Collection Tuning Guide.

Table of Contents
1. Introduction
2. Ergonomics
3. Generations
o Performance Considerations
o Measurement
4. Sizing the Generations
o Total Heap
o The Young Generation
Survivor Space Sizing
5. Available Collectors
o Selecting a Collector
6. The Parallel Collector
o Generations
o Ergonomics
Priority of goals
Generation Size Adjustments
Default Heap Size
o Excessive GC Time and OutOfMemoryError
o Measurements
7. The Concurrent Collector
o Overhead of Concurrency
o Concurrent Mode Failure
o Excessive GC Time and OutOfMemoryError
o Floating Garbage
o Pauses
o Concurrent Phases
o Starting a Concurrent Collection Cycle
o Scheduling Pauses
o Incremental Mode
Command Line Options
Recommended Options
Basic Troubleshooting
o Measurements
8. Other Considerations
9. Resources

1. Introduction
The Java Platform, Standard Edition (Java SE) is used for a wide variety of applications,
from small applets on desktops to web services on large servers. In support of this diverse
range of deployments, the Java HotSpot virtual machine implementation (Java HotSpot

VM) provides multiple garbage collectors, each designed to satisfy different requirements.
This is an important part of meeting the demands of both large and small applications.
However, users, developers and administrators that need high performance are burdened with
the extra step of selecting the garbage collector that best meets their needs. A significant step
toward removing this burden was made in J2SE 5.0: the garbage collector is selected based
on the class of the machine on which the application is run.
This better choice of the garbage collector is generally an improvement, but is by no means
always the best choice for every application. Users with strict performance goals or other
requirements may need to explicitly select the garbage collector and tune certain parameters
to achieve the desired level of performance. This document provides information to help with
those tasks. First, the general features of a garbage collector and basic tuning options are
described in the context of the serial, stop-the-world collector. Then specific features of the
other collectors are presented along with factors to consider when selecting a collector.
When does the choice of a garbage collector matter? For some applications, the answer is
never. That is, the application can perform well in the presence of garbage collection with
pauses of modest frequency and duration. However, this is not the case for a large class of
applications, particularly those with large amounts of data (multiple gigabytes), many threads
and high transaction rates.
Amdahl observed that most workloads cannot be perfectly parallelized; some portion is
always sequential and does not benefit from parallelism. This is also true for the Java
platform. In particular, virtual machines from Sun Microsystems for the Java platform prior
to J2SE 1.4 do not support parallel garbage collection, so the impact of garbage collection on
a multiprocessor system grows relative to an otherwise parallel application.
The graph below models an ideal system that is perfectly scalable with the exception of
garbage collection. The red line is an application spending only 1% of the time in garbage
collection on a uniprocessor system. This translates to more than a 20% loss in throughput on
32 processor systems. At 10% of the time in garbage collection (not considered an outrageous
amount of time in garbage collection in uniprocessor applications) more than 75% of
throughput is lost when scaling up to 32 processors.

This shows that negligible speed issues when developing on small systems may become
principal bottlenecks when scaling up to large systems. However, small improvements in
reducing such a bottleneck can produce large gains in performance. For a sufficiently large
system it becomes well worthwhile to select the right garbage collector and to tune it if
necessary.
The serial collector is usually adequate for most "small" applications (those requiring heaps
of up to approximately 100MB on modern processors). The other collectors have additional
overhead and/or complexity which is the price for specialized behavior. If the application
doesn't need the specialized behavior of an alternate collector, use the serial collector. An
example of a situation where the serial collector is not expected to be the best choice is a
large application that is heavily threaded and run on a machine with a large amount of
memory and two or more processors. When applications are run on such server-class
machines, the parallel collector is selected by default (see Ergonomics below).
This document was developed using Java SE 6 on the Solaris Operating System (SPARC
(R)
Platform Edition) as the reference. However, the concepts and recommendations presented
here apply to all supported platforms, including Linux, Microsoft Windows and the Solaris

Operating System (x86 Platform Edition). In addition, the command line options mentioned
are available on all supported platforms, although the default values of some options may be
different on each platform.

2. Ergonomics
A feature referred to here as ergonomics was introduced in J2SE 5.0. The goal of ergonomics
is to provide good performance with little or no tuning of command line options by selecting
the

garbage collector,
heap size,
and runtime compiler

at JVM startup, instead of using fixed defaults. This selection assumes that the class of the
machine on which the application is run is a hint as to the characteristics of the application
(i.e., large applications run on large machines). In addition to these selections is a simplified
way of tuning garbage collection. With the parallel collector the user can specify goals for a
maximum pause time and a desired throughput for an application. This is in contrast to
specifying the size of the heap that is needed for good performance. This is intended to
particularly improve the performance of large applications that use large heaps. The more
general ergonomics is described in the document entitled Ergonomics in the 5.0 Java Virtual
Machine. It is recommended that the ergonomics as presented in this latter document be
tried before using the more detailed controls explained in this document.
Included in this document are the ergonomics features provided as part of the adaptive size
policy for the parallel collector. This includes the options to specify goals for the
performance of garbage collection and additional options to fine tune that performance.

3. Generations
One strength of the J2SE platform is that it shields the developer from the complexity of
memory allocation and garbage collection. However, once garbage collection is the principal
bottleneck, it is worth understanding some aspects of this hidden implementation. Garbage
collectors make assumptions about the way applications use objects, and these are reflected in
tunable parameters that can be adjusted for improved performance without sacrificing the
power of the abstraction.
An object is considered garbage when it can no longer be reached from any pointer in the
running program. The most straightforward garbage collection algorithms simply iterate over
every reachable object. Any objects left over are then considered garbage. The time this
approach takes is proportional to the number of live objects, which is prohibitive for large
applications maintaining lots of live data.
Beginning with the J2SE 1.2, the virtual machine incorporated a number of different garbage
collection algorithms that are combined using generational collection. While naive garbage
collection examines every live object in the heap, generational collection exploits several
empirically observed properties of most applications to minimize the work required to

reclaim unused ("garbage") objects. The most important of these observed properties is the
weak generational hypothesis, which states that most objects survive for only a short period
of time.
The blue area in the diagram below is a typical distribution for the lifetimes of objects. The X
axis is object lifetimes measured in bytes allocated. The byte count on the Y axis is the total
bytes in objects with the corresponding lifetime. The sharp peak at the left represents objects
that can be reclaimed (i.e., have "died") shortly after being allocated. Iterator objects, for
example, are often alive for the duration of a single loop.

Some objects do live longer, and so the distribution stretches out to the the right. For
instance, there are typically some objects allocated at initialization that live until the process
exits. Between these two extremes are objects that live for the duration of some intermediate
computation, seen here as the lump to the right of the initial peak. Some applications have
very different looking distributions, but a surprisingly large number possess this general
shape. Efficient collection is made possible by focusing on the fact that a majority of objects
"die young."

To optimize for this scenario, memory is managed in generations, or memory pools holding
objects of different ages. Garbage collection occurs in each generation when the generation
fills up. The vast majority of objects are allocated in a pool dedicated to young objects (the
young generation), and most objects die there. When the young generation fills up it causes a
minor collection in which only the young generation is collected; garbage in other
generations is not reclaimed. Minor collections can be optimized assuming the weak
generational hypothesis holds and most objects in the young generation are garbage and can
be reclaimed. The costs of such collections are, to the first order, proportional to the number
of live objects being collected; a young generation full of dead objects is collected very
quickly. Typically some fraction of the surviving objects from the young generation are
moved to the tenured generation during each minor collection. Eventually, the tenured
generation will fill up and must be collected, resulting in a major collection, in which the
entire heap is collected. Major collections usually last much longer than minor collections
because a significantly larger number of objects are involved.
As noted above, ergonomics selects the garbage collector dynamically in order to provide
good performance on a variety of applications. The serial garbage collector is designed for
applications with small data sets and its default parameters were chosen to be effective for
most small applications. The throughput garbage collector is meant to be used with
applications that have medium to large data sets. The heap size parameters selected by
ergonomics plus the features of the adaptive size policy are meant to provide good
performance for server applications. These choices work well in most, but not all, cases.
Which leads to the central tenet of this document:
If garbage collection becomes a bottleneck, you will most likely have to
customize the total heap size as well as the sizes of the individual generations.
Check the verbose garbage collector output and then explore the sensitivity of
your individual performance metric to the garbage collector parameters.

The default arrangement of generations (for all collectors with the exception of the parallel
collector) looks something like this.

At initialization, a maximum address space is virtually reserved but not allocated to physical
memory unless it is needed. The complete address space reserved for object memory can be
divided into the young and tenured generations.
The young generation consists of eden and two survivor spaces. Most objects are initially
allocated in eden. One survivor space is empty at any time, and serves as the destination of
any live objects in eden and the other survivor space during the next copying collection.
Objects are copied between survivor spaces in this way until they are old enough to be
tenured (copied to the tenured generation).
A third generation closely related to the tenured generation is the permanent generation
which holds data needed by the virtual machine to describe objects that do not have an
equivalence at the Java language level. For example objects describing classes and methods
are stored in the permanent generation.

Performance Considerations
There are two primary measures of garbage collection performance:

1. Throughput is the percentage of total time not spent in garbage collection, considered over
long periods of time. Throughput includes time spent in allocation (but tuning for speed of
allocation is generally not needed).
2. Pauses are the times when an application appears unresponsive because garbage collection
is occurring.

Users have different requirements of garbage collection. For example, some consider the
right metric for a web server to be throughput, since pauses during garbage collection may be
tolerable, or simply obscured by network latencies. However, in an interactive graphics
program even short pauses may negatively affect the user experience.
Some users are sensitive to other considerations. Footprint is the working set of a process,
measured in pages and cache lines. On systems with limited physical memory or many
processes, footprint may dictate scalability. Promptness is the time between when an object
becomes dead and when the memory becomes available, an important consideration for
distributed systems, including remote method invocation (RMI).
In general, a particular generation sizing chooses a trade-off between these considerations.
For example, a very large young generation may maximize throughput, but does so at the
expense of footprint, promptness and pause times. young generation pauses can be minimized
by using a small young generation at the expense of throughput. To a first approximation, the
sizing of one generation does not affect the collection frequency and pause times for another
generation.
There is no one right way to size generations. The best choice is determined by the way the
application uses memory as well as user requirements. Thus the virtual machine's choice of a
garbage collectior is not always optimal and may be overridden with command line options
described below.

Measurement
Throughput and footprint are best measured using metrics particular to the application. For
example, throughput of a web server may be tested using a client load generator, while
footprint of the server might be measured on the Solaris Operating System using the pmap
command. On the other hand, pauses due to garbage collection are easily estimated by
inspecting the diagnostic output of the virtual machine itself.
The command line option -verbose:gc causes information about the heap and garbage
collection to be printed at each collection. For example, here is output from a large server
application:
[GC 325407K->83000K(776768K), 0.2300771 secs]
[GC 325816K->83372K(776768K), 0.2454258 secs]
[Full GC 267628K->83769K(776768K), 1.8479984 secs]

Here we see two minor collections followed by one major collection. The numbers before
and after the arrow (e.g., 325407K->83000K from the first line) indicate the combined size of
live objects before and after garbage collection, respectively. After minor collections the size

includes some objects that are garbage (no longer alive) but that cannot be reclaimed. These
objects are either contained in the tenured generation, or referenced from the tenured or
permanent generations.
The next number in parentheses (e.g., (776768K) again from the first line) is the committed
size of the heap: the amount of space usable for java objects without requesting more
memory from the operating system. Note that this number does not include one of the
survivor spaces, since only one can be used at any given time, and also does not include the
permanent generation, which holds metadata used by the virtual machine.
The last item on the line (e.g., 0.2300771 secs) indicates the time taken to perform the
collection; in this case approximately a quarter of a second.
The format for the major collection in the third line is similar.
The format of the output produced by -verbose:gc is subject to change in future releases.

The option -XX:+PrintGCDetails causes additional information about the collections to be


printed. An example of the output with -XX:+PrintGCDetails using the serial garbage
collector is shown here.
[GC [DefNew: 64575K->959K(64576K), 0.0457646 secs] 196016K>133633K(261184K), 0.0459067 secs]

indicates that the minor collection recovered about 98% of the young generation, DefNew:
64575K->959K(64576K) and took 0.0457646 secs (about 45 milliseconds).
The usage of the entire heap was reduced to about 51% 196016K->133633K(261184K) and
that there was some slight additional overhead for the collection (over and above the
collection of the young generation) as indicated by the final time of 0.0459067 secs.
The option -XX:+PrintGCTimeStamps will add a time stamp at the start of each collection.
This is useful to see how frequently garbage collections occur.
111.042: [GC 111.042: [DefNew: 8128K->8128K(8128K), 0.0000505 secs]111.042:
[Tenured: 18154K->2311K(24576K), 0.1290354 secs] 26282K->2311K(32704K),
0.1293306 secs]

The collection starts about 111 seconds into the execution of the application. The minor
collection starts at about the same time. Additionally the information is shown for a major
collection delineated by Tenured. The tenured generation usage was reduced to about 10%
18154K->2311K(24576K) and took 0.1290354 secs (approximately 130 milliseconds).
As was the case with -verbose:gc, the format of the output produced by -

XX:+PrintGCDetails is subject to change in future releases.

4. Sizing the Generations


A number of parameters affect generation size. The following diagram illustrates the
difference between committed space and virtual space in the heap. At initialization of the
virtual machine, the entire space for the heap is reserved. The size of the space reserved can
be specified with the -Xmx option. If the value of the -Xms parameter is smaller than the value
of the -Xmx parameter, not all of the space that is reserved is immediately committed to the
virtual machine. The uncommitted space is labeled "virtual" in this figure. The different parts
of the heap (permanent generation, tenured generation and young generation) can grow to the
limit of the virtual space as needed.
Some of the parameters are ratios of one part of the heap to another. For example the
parameter NewRatio denotes the relative size of the tenured generation to the young
generation. These parameters are discussed below.

Total Heap
Note that the following discussion regarding growing and shrinking of the heap and default
heap sizes does not apply to the parallel collector. (See the section on ergonomics for details
on heap resizing and default heap sizes with the parallel collector.) However, the parameters
that control the total size of the heap and the sizes of the generations do apply to the parallel
collector.
Since collections occur when generations fill up, throughput is inversely proportional to the
amount of memory available. Total available memory is the most important factor affecting
garbage collection performance.
By default, the virtual machine grows or shrinks the heap at each collection to try to keep the
proportion of free space to live objects at each collection within a specific range. This target
range is set as a percentage by the parameters -XX:MinHeapFreeRatio=<minimum> and XX:MaxHeapFreeRatio=<maximum>, and the total size is bounded below by -Xms<min> and
above by -Xmx<max>. The default parameters for the 32-bit Solaris Operating System
(SPARC Platform Edition) are shown in this table:

Parameter

Default Value

MinHeapFreeRatio

40

MaxHeapFreeRatio

70

-Xms

3670k

-Xmx

64m

Default values of heap size parameters on 64-bit systems have been scaled up by
approximately 30%. This increase is meant to compensate for the larger size of objects on a
64-bit system.
With these parameters, if the percent of free space in a generation falls below 40%, the
generation will be expanded to maintain 40% free space, up to the maximum allowed size of
the generation. Similarly, if the free space exceeds 70%, the generation will be contracted so
that only 70% of the space is free, subject to the minimum size of the generation.
Large server applications often experience two problems with these defaults. One is slow
startup, because the initial heap is small and must be resized over many major collections. A
more pressing problem is that the default maximum heap size is unreasonably small for most
server applications. The rules of thumb for server applications are:

Unless you have problems with pauses, try granting as much memory as possible to the
virtual machine. The default size (64MB) is often too small.
Setting -Xms and -Xmx to the same value increases predictability by removing the most
important sizing decision from the virtual machine. However, the virtual machine is then
unable to compensate if you make a poor choice.
In general, increase the memory as you increase the number of processors, since allocation
can be parallelized.

For reference, there is a separate page explaining some of the available command-line
options.

The Young Generation


The second most influential knob is the proportion of the heap dedicated to the young
generation. The bigger the young generation, the less often minor collections occur.
However, for a bounded heap size a larger young generation implies a smaller tenured
generation, which will increase the frequency of major collections. The optimal choice
depends on the lifetime distribution of the objects allocated by the application.
By default, the young generation size is controlled by NewRatio. For example, setting XX:NewRatio=3 means that the ratio between the young and tenured generation is 1:3. In
other words, the combined size of the eden and survivor spaces will be one fourth of the total
heap size.

The parameters NewSize and MaxNewSize bound the young generation size from below and
above. Setting these to the same value fixes the young generation, just as setting -Xms and Xmx to the same value fixes the total heap size. This is useful for tuning the young generation
at a finer granularity than the integral multiples allowed by NewRatio.
Survivor Space Sizing

If desired, the parameter SurvivorRatio can be used to tune the size of the survivor spaces,
but this is often not as important for performance. For example, -XX:SurvivorRatio=6 sets
the ratio between eden and a survivor space to 1:6. In other words, each survivor space will
be one sixth the size of eden, and thus one eighth the size of the young generation (not one
seventh, because there are two survivor spaces).
If survivor spaces are too small, copying collection overflows directly into the tenured
generation. If survivor spaces are too large, they will be uselessly empty. At each garbage
collection the virtual machine chooses a threshold number of times an object can be copied
before it is tenured. This threshold is chosen to keep the survivors half full. The commandline option -XX:+PrintTenuringDistribution can be used to show this threshold and the
ages of objects in the new generation. It is also useful for observing the lifetime distribution
of an application.
Here are the default values for the 32-bit Solaris Operating System (SPARC Platform
Edition); the default values on other platforms are different.
Default Value

Parameter

NewRatio
NewSize
MaxNewSize
SurvivorRatio

Client JVM

Server JVM

2228K

2228K

not limited not limited


32

32

The maximum size of the young generation will be calculated from the maximum size of the
total heap and NewRatio. The "not limited" default value for MaxNewSize means that the
calculated value is not limited by MaxNewSize unless a value for MaxNewSize is specified on
the command line.
The rules of thumb for server applications are:

First decide the maximum heap size you can afford to give the virtual machine. Then plot
your performance metric against young generation sizes to find the best setting.

Note that the maximum heap size should always be smaller than the amount of
memory installed on the machine, to avoid excessive page faults and thrashing.
If the total heap size is fixed, increasing the young generation size requires reducing the
tenured generation size. Keep the tenured generation large enough to hold all the live data
used by the application at any given time, plus some amount of slack space (10-20% or
more).
Subject to the above constraint on the tenured generation:
o Grant plenty of memory to the young generation.
o Increase the young generation size as you increase the number of processors, since
allocation can be parallelized.

5. Available Collectors
The discussion to this point has been about the serial collector. The Java HotSpot VM
includes three different collectors, each with different performance characteristics.
1. The serial collector uses a single thread to perform all garbage collection work, which makes
it relatively efficient since there is no communication overhead between threads. It is bestsuited to single processor machines, since it cannot take advantage of multiprocessor
hardware, although it can be useful on multiprocessors for applications with small data sets
(up to approximately 100MB). The serial collector is selected by default on certain hardware
and operating system configurations, or can be explicitly enabled with the option XX:+UseSerialGC.
2. The parallel collector (also known as the throughput collector) performs minor collections in
parallel, which can significantly reduce garbage collection overhead. It is intended for
applications with medium- to large-sized data sets that are run on multiprocessor or multithreaded hardware. The parallel collector is selected by default on certain hardware and
operating system configurations, or can be explicitly enabled with the option XX:+UseParallelGC.
o New: parallel compaction is a feature introduced in J2SE 5.0 update 6 and enhanced
in Java SE 6 that allows the parallel collector to perform major collections in parallel.
Without parallel compaction, major collections are performed using a single thread,
which can significantly limit scalability. Parallel compaction is enabled by adding the
option -XX:+UseParallelOldGC to the command line.
3. The concurrent collector performs most of its work concurrently (i.e., while the application is
still running) to keep garbage collection pauses short. It is designed for applications with
medium- to large-sized data sets for which response time is more important than overall
throughput, since the techniques used to minimize pauses can reduce application
performance. The concurrent collector is enabled with the option XX:+UseConcMarkSweepGC.

Selecting a Collector
Unless your application has rather strict pause time requirements, first run your application
and allow the VM to select a collector. If necessary, adjust the heap size to improve
performance. If the performance still does not meet your goals, then use the following
guidelines as a starting point for selecting a collector.
1. If the application has a small data set (up to approximately 100MB), then

o select the serial collector with -XX:+UseSerialGC.


2. If the application will be run on a single processor and there are no pause time
requirements, then
o let the VM select the collector, or
o select the serial collector with -XX:+UseSerialGC.
3. If (a) peak application performance is the first priority and (b) there are no pause time
requirements or pauses of one second or longer are acceptable, then
o let the VM select the collector, or
o select the parallel collector with -XX:+UseParallelGC and (optionally) enable
parallel compaction with -XX:+UseParallelOldGC.
4. If response time is more important than overall throughput and garbage collection pauses
must be kept shorter than approximately one second, then
o select the concurrent collector with -XX:+UseConcMarkSweepGC. If only one or
two processors are available, consider using incremental mode, described below.

These guidelines provide only a starting point for selecting a collector because
performance is dependent on the size of the heap, the amount of live data
maintained by the application and the number and speed of available
processors. Pause times are particularly sensitive to these factors, so the
threshold of one second mentioned above is only approximate: the parallel
collector will experience pause times longer than one second on many data size
and hardware combinations; conversely, the concurrent collector may not be
able to keep pauses shorter than one second on some combinations.

If the recommended collector does not achieve the desired performance, first attempt to
adjust the heap and generation sizes to meet the desired goals. If still unsuccessful, then try a
different collector: use the concurrent collector to reduce pause times and use the parallel
collector to increase overall throughput on multiprocessor hardware.

The Parallel Collector


The parallel collector (also referred to here as the throughput collector) is a generational
collector similar to the serial collector; the primary difference is that multiple threads are used
to speed up garbage collection. The parallel collector is enabled with the command line
option -XX:+UseParallelGC. By default, only minor collections are executed in parallel;
major collections are performed with a single thread. However, parallel compaction can be
enabled with the option -XX:+UseParallelOldGC so that both minor and major collections
are executed in parallel, to further reduce garbage collection overhead.
On a machine with N processors the parallel collector uses N garbage collector threads;
however, this number can be adjusted with a command line option (see below). On a host
with one processor, the parallel collector will likely not perform as well as the serial collector
because of the overhead required for parallel execution (e.g., synchronization). However,
when running applications with medium- to large-sized heaps, it generally outperforms the
serial collector by a modest amount on machines with two processors, and usually performs
significantly better than the serial collector when more than two processors are available.

The number of garbage collector threads can be controlled with the command line option XX:ParallelGCThreads=<N>. If explicit tuning of the heap is being done with command line
options, the size of the heap needed for good performance with the parallel collector is to first
order the same as needed with the serial collector. Enabling the parallel collector should just
make the minor collection pauses shorter. Because there are multiple garbage collector
threads participating in the minor collection there is a small possibility of fragmentation due
to promotions from the young generation to the tenured generation during the collection.
Each garbage collection thread reserves a part of the tenured generation for promotions and
the division of the available space into these "promotion buffers" can cause a fragmentation
effect. Reducing the number of garbage collector threads will reduce this fragmentation effect
as will increasing the size of the tenured generation.

Generations
As mentioned earlier, the arrangement of the generations is different in the parallel collector.
That arrangement is shown in the figure below.

Ergonomics

Starting in J2SE 5.0, the parallel collector is selected by default on server-class machines as
detailed in the document Garbage Collector Ergonomics. In addition, the parallel collector
uses a method of automatic tuning that allows desired behaviors to be specified instead of
generation sizes and other low-level tuning details. The behaviors that can be specified are:

Maximum garbage collection pause time


Throughput
Footprint (i.e., heap size)

The maximum pause time goal is specified with the command line option XX:MaxGCPauseMillis=<N>. This is interpreted as a hint that pause times of <N>
milliseconds or less are desired; by default there is no maximum pause time goal. If a pause
time goal is specified, the heap size and other garbage collection related parameters are
adjusted in an attempt to keep garbage collection pauses shorter than the specified value.
Note that these adjustments may cause the garbage collector to reduce the overall throughput
of the application and in some cases the desired pause time goal cannot be met.
The throughput goal is measured in terms of the time spent doing garbage collection vs. the
time spent outside of garbage collection (referred to as application time). The goal is
specified by the command line option -XX:GCTimeRatio=<N>, which sets the ratio of garbage
collection time to application time to 1 / (1 + <N>).
For example, -XX:GCTimeRatio=19 sets a goal of 1/20 or 5% of the total time in garbage
collection. The default value is 99, resulting in a goal of 1% of the time in garbage collection.
Maxmimum heap footprint is specified using the existing option -Xmx<N>. In addition, the
collector has an implicit goal of minimizing the size of the heap as long as the other goals are
being met.
Priority of goals

The goals are addressed in the following order


1. Maximum pause time goal
2. Throughput goal
3. Minimum footprint goal

The maximum pause time goal is met first. Only after it is met is the throughput goal
addressed. Similarly, only after the first two goals have been met is the footprint goal
considered.
Generation Size Adjustments

The statistics such as average pause time kept by the collector are updated at the end of each
collection. The tests to determine if the goals have been met are then made and any needed
adjustments to the size of a generation is made. The exception is that explicit garbage
collections (e.g., calls to System.gc()) are ignored in terms of keeping statistics and making
adjustments to the sizes of generations.

Growing and shrinking the size of a generation is done by increments that are a fixed
percentage of the size of the generation so that a generation steps up or down toward its
desired size. Growing and shrinking are done at different rates. By default a generation grows
in increments of 20% and shrinks in increments of 5%. The percentage for growing is
controlled by the command line flag -XX:YoungGenerationSizeIncrement=<Y> for the
young generation and -XX:TenuredGenerationSizeIncrement=<T> for the tenured
generation. The percentage by which a generation shrinks is adjusted by the command line
flag -XX:AdaptiveSizeDecrementScaleFactor=<D>. If the growth increment is X percent,
the decrement for shrinking is X / D percent.
If the collector decides to grow a generation at startup, there is a supplemental percentage
added to the increment. This supplement decays with the number of collections and there is
no long term affect of this supplement. The intent of the supplement is to increase startup
performance. There is no supplement to the percentage for shrinking.
If the maximum pause time goal is not being met, the size of only one generation is shrunk at
a time. If the pause times of both generations are above the goal, the size of the generation
with the larger pause time is shrunk first.
If the throughput goal is not being met, the sizes of both generations are increased. Each is
increased in proportion to its respective contribution to the total garbage collection time. For
example, if the garbage collection time of the young generation is 25% of the total collection
time and if a full increment of the young generation would be by 20%, then the young
generation would be increased by 5%.
Default Heap Size

If not otherwise set on the command line, the initial and maximum heap sizes are calculated
based on the amount of memory on the machine. The proportion of memory to use for the
heap is controlled by the command line options DefaultInitialRAMFraction and
DefaultMaxRAMFraction, as shown in the table below. (In the table, memory represents the
amount of memory on the machine.)
Formula

Default

initial heap size

memory /
DefaultInitialRAMFraction

memory / 64

maximum heap
size

MIN(memory /
DefaultMaxRAMFraction, 1GB)

MIN(memory / 4,
1GB)

Note that the default maximum heap size will not exceed 1GB, regardless of how much
memory is installed on the machine.

Excessive GC Time and OutOfMemoryError

The parallel collector will throw an OutOfMemoryError if too much time is being spent in
garbage collection: if more than 98% of the total time is spent in garbage collection and less
than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is
designed to prevent applications from running for an extended period of time while making
little or no progress because the heap is too small. If necessary, this feature can be disabled
by adding the option -XX:-UseGCOverheadLimit to the command line.

Measurements
The verbose garbage collector output from the parallel collector is essentially the same as that
from the serial collector.

7. The Concurrent Collector


The concurrent collector is designed for applications that prefer shorter garbage collection
pauses and that can afford to share processor resources with the garbage collector while the
application is running. Typically applications which have a relatively large set of long-lived
data (a large tenured generation), and run on machines with two or more processors tend to
benefit from the use of this collector. However, this collector should be considered for any
application with a low pause time requirement; for example, good results have been observed
for interactive applications with tenured generations of a modest size on a single processor,
especially if using incremental mode. The concurrent collector is enabled with the command
line option -XX:+UseConcMarkSweepGC.
Similar to the other available collectors, the concurrent collector is generational; thus both
minor and major collections occur. The concurrent collector attempts to reduce pause times
due to major collections by using separate garbage collector threads to trace the reachable
objects concurrently with the execution of the application threads. During each major
collection cycle, the concurrent collector will pause all the application threads for a brief
period at the beginning of the collection and again toward the middle of the collection. The
second pause tends to be the longer of the two pauses and multiple threads are used to do the
collection work during that pause. The remainder of the collection including the bulk of the
tracing of live objects and sweeping of unreachable objects is done with one or more garbage
collector threads that run concurrently with the application. Minor collections can interleave
with an on-going major cycle, and are done in a manner similar to the parallel collector (in
particular, the application threads are stopped during minor collections).
The basic algorithms used by the concurrent collector are described in the technical report A
Generational Mostly-concurrent Garbage Collector. Note that precise implementation details
may, however, differ slightly as the collector is enhanced from one release to another.

Overhead of Concurrency
The concurrent collector trades processor resources (which would otherwise be available to
the application) for shorter major collection pause times. The most visible overhead is the use
of one or more processors during the concurrent parts of the collection. On an N processor
system, the concurrent part of the collection will use K/N of the available processors, where
1 <= K <= ceiling{N/4}. (Note that the precise choice of and bounds on K are subject to

change.) In addition to the use of processors during concurrent phases, additional overhead is
incurred to enable concurrency. Thus while garbage collection pauses are typically much
shorter with the concurrent collector, application throughput also tends to be slightly lower
than with the other collectors.
On a machine with more than one processing core, there are processors available for
application threads during the concurrent part of the collection, so the concurrent garbage
collector thread does not "pause" the application. This usually results in shorter pauses, but
again fewer processor resources are available to the application and some slowdown should
be expected, especially if the application utilizes all of the processing cores maximally. Up to
a limit, as N increases the reduction in processor resources due to concurrent garbage
collection becomes smaller, and the benefit from concurrent collection increases. The
following section, concurrent mode failure, discusses potential limits to such scaling.
Since at least one processor is utilized for garbage collection during the concurrent phases,
the concurrent collector does not normally provide any benefit on a uniprocessor (singlecore) machine. However, there is a separate mode available that can achieve low pauses on
systems with only one or two processors; see incremental mode below for details.

Concurrent Mode Failure


The concurrent collector uses one or more garbage collector threads that run simultaneously
with the application threads with the goal of completing the collection of the tenured and
permanent generations before either becomes full. As described above, in normal operation,
the concurrent collector does most of its tracing and sweeping work with the application
threads still running, so only brief pauses are seen by the application threads. However, if the
concurrent collector is unable to finish reclaiming the unreachable objects before the tenured
generation fills up, or if an allocation cannot be satisfied with the available free space blocks
in the tenured generation, then the application is paused and the collection is completed with
all the application threads stopped. The inability to complete a collection concurrently is
referred to as concurrent mode failure and indicates the need to adjust the concurrent
collector parameters.

Excessive GC Time and OutOfMemoryError


The concurrent collector will throw an OutOfMemoryError if too much time is being spent in
garbage collection: if more than 98% of the total time is spent in garbage collection and less
than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is
designed to prevent applications from running for an extended period of time while making
little or no progress because the heap is too small. If necessary, this feature can be disabled
by adding the option -XX:-UseGCOverheadLimit to the command line.
The policy is the same as that in the parallel collector, except that time spent
performing concurrent collections is not counted toward the 98% time limit. In
other words, only collections performed while the application is stopped count
toward excessive GC time. Such collections are typically due to a concurrent
mode failure or an explicit collection request (e.g., a call to System.gc()).

Floating Garbage
The concurrent collector, like all the other collectors in HotSpot, is a tracing collector that
identifies at least all the reachable objects in the heap. In the parlance of Jones and Lins it is
an incremental update collector. Because application threads and the garbage collector thread
run concurrently during a major collection, objects that are traced by the garbage collector
thread may subsequently become unreachable by the time collection finishes. Such
unreachable objects that have not yet been reclaimed are referred to as floating garbage. The
amount of floating garbage depends on the duration of the concurrent collection cycle and on
the frequency of reference updates, also known as mutations, by the application. Furthermore,
since the young generation and the tenured generation are collected independently, each acts
a source of roots to the other. As a rough rule of thumb, try increasing the size of the tenured
generation by 20% to account for the floating garbage. Floating garbage in the heap at the
end of one concurrent collection cycle is collected during the next collection cycle.

Pauses
The concurrent collector pauses an application twice during a concurrent collection cycle.
The first pause is to mark as live the objects directly reachable from the roots (e.g., object
references from application thread stacks and registers, static objects and so on) and from
elsewhere in the heap (e.g., the young generation). This first pause is referred to as the initial
mark pause. The second pause comes at the end of the concurrent tracing phase and finds
objects that were missed by the concurrent tracing due to updates by the application threads
of references in an object after the concurrent collector had finished tracing that object. This
second pause is referred to as the remark pause.

Concurrent Phases
The concurrent tracing of the reachable object graph occurs between the initial mark pause
and the remark pause. During this concurrent tracing phase one or more concurrent garbage
collector threads may be using processor resources that would otherwise have been available
to the application and, as a result, compute-bound applications may see a commensurate fall
in application throughput during this and other concurrent phases even though the application
threads are not paused. After the remark pause, there is a concurrent sweeping phase which
collects the objects identified as unreachable. Once a collection cycle completes, the
concurrent collector will wait, consume almost no computational resources, until the start of
the next major collection cycle.

Starting a Concurrent Collection Cycle


With the serial collector a major collection occurs whenever the tenured generation becomes
full and all application threads are stopped while the collection is done. In contrast, a
concurrent collection needs to be started at a time such that the collection can finish before
the tenured generation becomes full; otherwise the application would observe longer pauses
due to concurrent mode failure. There are several ways a concurrent collection can be started.

Based on recent history, the concurrent collector maintains estimates of the time remaining
before the tenured generation will be exhausted and of the time needed for a concurrent
collection cycle. Based on these dynamic estimates, a concurrent collection cycle will be
started with the aim of completing the collection cycle before the tenured generation is
exhausted. These estimates are padded for safety, since the concurrent mode failure can be
very costly.
A concurrent collection will also start if the occupancy of the tenured generation exceeds an
initiating occupancy, a percentage of the tenured generation. The default value of this
initiating occupancy threshold is approximately 92%, but the value is subject to change from
release to release. This value can be manually adjusted using the command line option
-XX:CMSInitiatingOccupancyFraction=<N>

where <N> is an integral percentage (0-100) of the tenured generation size.

Scheduling Pauses
The pauses for the young generation collection and the tenured generation collection occur
independently. They do not overlap, but may occur in quick succession such that the pause
from one collection, immediately followed by one from the other collection, can appear to be
a single, longer pause. To avoid this, the concurrent collector attempts to schedule the remark
pause roughly midway between the previous and next young generation pauses. This
scheduling is currently not done for the initial mark pause, which is usually much shorter than
the remark pause.

Incremental Mode
The concurrent collector can be used in a mode in which the concurrent phases are done
incrementally. Recall that during a concurrent phase the garbage collector thread is using one
or more processors. The incremental mode is meant to lessen the impact of long concurrent
phases by periodically stopping the concurrent phase to yield back the processor to the
application. This mode, referred to here as i-cms, divides the work done concurrently by
the collector into small chunks of time which are scheduled between young generation
collections. This feature is useful when applications that need the low pause times provided
by the concurrent collector are run on machines with small numbers of processors (e.g., 1 or
2).
The concurrent collection cycle typically includes the following steps:

stop all application threads and identify the set of objects reachable from roots, then
resume all application threads
concurrently trace the reachable object graph, using one or more processors, while the
application threads are executing
concurrently retrace sections of the object graph that were modified since the tracing in the
previous step, using one processor
stop all application threads and retrace sections of the roots and object graph that may have
been modified since they were last examined, then resume all application threads

concurrently sweep up the unreachable objects to the free lists used for allocation, using
one processor
concurrently resize the heap and prepare the support data structures for the next collection
cycle, using one processor

Normally, the concurrent collector uses one or more processors during the entire concurrent
tracing phase, without voluntarily relinquishing them. Similarly, one processor is used for the
entire concurrent sweep phase, again without relinquishing it. This overhead can be too much
of a disruption for applications with response time constraints that might otherwise have
utilized the processing cores, particularly when run on systems with just one or two
processors. Incremental mode solves this problem by breaking up the concurrent phases into
short bursts of activity, which are scheduled to occur mid-way between minor pauses.
i-cms uses a duty cycle to control the amount of work the concurrent collector is allowed to
do before voluntarily giving up the processor. The duty cycle is the percentage of time
between young generation collections that the concurrent collector is allowed to run. i-cms
can automatically compute the duty cycle based on the behavior of the application (the
recommended method, known as automatic pacing), or the duty cycle can be set to a fixed
value on the command line.
Command Line Options

The following command-line options control i-cms (see below for recommendations for an
initial set of options):
Default Value

Option

Description

J2SE 5.0 Java SE


and
6 and
earlier
later

-XX:+CMSIncrementalMode

Enables incremental mode. Note


disabled disabled
that the concurrent collector must
also be enabled (with XX:+UseConcMarkSweepGC) for this
option to work.

-XX:+CMSIncrementalPacing

Enables automatic pacing. The


incremental mode duty cycle is
automatically adjusted based on
statistics collected while the JVM is
running.

-XX:CMSIncrementalDutyCycle=<N>

The percentage (0-100) of time

disabled enabled

50

10

Default Value

Option

Description

J2SE 5.0 Java SE


and
6 and
earlier
later

between minor collections that the


concurrent collector is allowed to
run. If CMSIncrementalPacing is
enabled, then this is just the initial
value.
The percentage (0-100) which is the
XX:CMSIncrementalDutyCycleMin=<N> lower bound on the duty cycle when

10

10

10

CMSIncrementalPacing is enabled.
The percentage (0-100) used to add
XX:CMSIncrementalSafetyFactor=<N> conservatism when computing the

duty cycle.
-XX:CMSIncrementalOffset=<N>

The percentage (0-100) by which


the incremental mode duty cycle is
shifted to the right within the period
between minor collections.

-XX:CMSExpAvgFactor=<N>

The percentage (0-100) used to


weight the current sample when
computing exponential averages for
the concurrent collection statistics.

25

25

Recommended Options

To use i-cms in Java SE 6, use the following command line options:


-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode \

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps

The first two options enable the concurrent collector and i-cms, respectively. The last two
options are not required; they simply cause diagnostic information about garbage collection
to be written to stdout, so that garbage collection behavior can be seen and later analyzed.
Note that in J2SE 5.0 and earlier releases, we recommend the following as an initial set of
command line options for i-cms:
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode \

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps \

-XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0

-XX:CMSIncrementalDutyCycle=10

These are the same as recommended for Java SE 6, with the addition of three options that
control i-cms automatic pacing. The additional options simply specify the values that became
the default in Java SE 6.
Basic Troubleshooting

The i-cms automatic pacing feature uses statistics gathered while the program is running to
compute a duty cycle so that concurrent collections complete before the heap becomes full.
However, past behavior is not a perfect predictor of future behavior and the estimates may
not always be accurate enough to prevent the heap from becoming full. If too many full
collections occur, try the following steps, one at a time:
Step
1. Increase the safety factor:

Options
-XX:CMSIncrementalSafetyFactor=<N>

2. Increase the minimum duty cycle: -XX:CMSIncrementalDutyCycleMin=<N>


3. Disable automatic pacing and use -XX:-CMSIncrementalPacing XX:CMSIncrementalDutyCycle=<N>
a fixed duty cycle:

Measurements
Below is the output from the concurrent collector with the options -verbose:gc XX:+PrintGCDetails, with a few minor details removed. Note that the output for the
concurrent collector is interspersed with the output from the minor collections; typically
many minor collections occur during a concurrent collection cycle. The CMS-initial-

mark: indicates the start of the concurrent collection cycle. The CMS-concurrentmark: indicates the end of the concurrent marking phase and CMS-concurrentsweep: marks the end of the concurrent sweeping phase. Not discussed before is the
precleaning phase indicated by CMS-concurrent-preclean:. Precleaning represents
work that can be done concurrently in preparation for the remark phase CMS-remark. The
final phase is indicated by the CMS-concurrent-reset: and is in preparation for the
next concurrent collection.
[GC [1 CMS-initial-mark: 13991K(20288K)] 14103K(22400K), 0.0023781 secs]
[GC [DefNew: 2112K->64K(2112K), 0.0837052 secs] 16103K->15476K(22400K),
0.0838519 secs]
...
[GC [DefNew: 2077K->63K(2112K), 0.0126205 secs] 17552K->15855K(22400K),
0.0127482 secs]
[CMS-concurrent-mark: 0.267/0.374 secs]
[GC [DefNew: 2111K->64K(2112K), 0.0190851 secs] 17903K->16154K(22400K),
0.0191903 secs]
[CMS-concurrent-preclean: 0.044/0.064 secs]
[GC [1 CMS-remark: 16090K(20288K)] 17242K(22400K), 0.0210460 secs]
[GC [DefNew: 2112K->63K(2112K), 0.0716116 secs] 18177K->17382K(22400K),
0.0718204 secs]
[GC [DefNew: 2111K->63K(2112K), 0.0830392 secs] 19363K->18757K(22400K),
0.0832943 secs]
...
[GC [DefNew: 2111K->0K(2112K), 0.0035190 secs] 17527K->15479K(22400K),
0.0036052 secs]
[CMS-concurrent-sweep: 0.291/0.662 secs]
[GC [DefNew: 2048K->0K(2112K), 0.0013347 secs] 17527K->15479K(27912K),
0.0014231 secs]
[CMS-concurrent-reset: 0.016/0.016 secs]
[GC [DefNew: 2048K->1K(2112K), 0.0013936 secs] 17527K->15479K(27912K),
0.0014814 secs]

The initial mark pause is typically short relative to the minor collection pause time. The
concurrent phases (concurrent mark, concurrent preclean and concurrent sweep) normally last
significantly longer than a minor collection pause, as indicated by the example output above.
Note, however, that the application is not paused during these concurrent phases. The remark
pause is often comparable in length to a minor collection. The remark pause is affected by
certain application characteristics (e.g., a high rate of object modification can increase this
pause) and the time since the last minor collection (i.e., more objects in the young generation
may increase this pause).

8. Other Considerations
Permanent Generation Size
The permanent generation does not have a noticeable impact on garbage collector
performance for most applications. However, some applications dynamically generate and
load many classes; for example, some implementations of JavaServer Pages (JSP) pages.
These applications may need a larger permanent generation to hold the additional classes. If

so, the maximum permanent generation size can be increased with the command-line option XX:MaxPermSize=<N>.

Finalization; Weak, Soft and Phantom References


Some applications interact with garbage collection by using finalization and weak, soft, or
phantom references. These features can create performance artifacts at the Java programming
language level. An example of this is relying on finalization to close file descriptors, which
makes an external resource (descriptors) dependent on garbage collection promptness.
Relying on garbage collection to manage resources other than memory is almost always a bad
idea.
The Resources section includes an article that discusses in depth some of the pitfalls of
finalization and techniques for avoiding them.

Explicit Garbage Collection


Another way applications can interact with garbage collection is by invoking full garbage
collections explicitly by calling System.gc(). This can force a major collection to be done
when it may not be necessary (i.e., when a minor collection would suffice), and so in general
should be avoided. The performance impact of explicit garbage collections can be measured
by disabling them using the flag -XX:+DisableExplicitGC, which causes the VM to ignore
calls to System.gc().
One of the most commonly encountered uses of explicit garbage collection occurs with RMI's
distributed garbage collection (DGC). Applications using RMI refer to objects in other virtual
machines. Garbage cannot be collected in these distributed applications without occasionally
collection the local heap, so RMI forces full collections periodically. The frequency of these
collections can be controlled with properties. For example,
java -Dsun.rmi.dgc.client.gcInterval=3600000 Dsun.rmi.dgc.server.gcInterval=3600000 ...

specifies explicit collection once per hour instead of the default rate of once per minute.
However, this may also cause some objects to take much longer to be reclaimed. These
properties can be set as high as Long.MAX_VALUE to make the time between explicit
collections effectively infinite, if there is no desire for an upper bound on the timeliness of
DGC activity.

Soft References
Soft references are kept alive longer in the server virtual machine than in the client. The rate
of clearing can be controlled with the command line option XX:SoftRefLRUPolicyMSPerMB=<N>, which specifies the number of milliseconds a soft
reference will be kept alive (once it is no longer strongly reachable) for each megabyte of free
space in the heap. The default value is 1000 ms per megabyte, which means that a soft

reference will survive (after the last strong reference to the object has been collected) for 1
second for each megabyte of free space in the heap. Note that this is an approximate figure
since soft references are cleared only during garbage collection, which may occur
sporadically.

Solaris 8 Alternate libthread


The Solaris 8 Operating System supports an alternate version of the threading library,
libthread, that binds threads to light-weight processes (LWPs) directly. Some applications can
benefit greatly from the use of this alternate libthread and it is a potential benefit for any
threaded application. The following commands will load the alternate libthread for java
(Bourne shell syntax is shown):
LD_PRELOAD=/usr/lib/lwp/libthread.so.1
export LD_PRELOAD
java ...

The above is necessary only on Solaris 8, since the alternate libthread is the default in the
Solaris 9 Operating System and is the only libthread available starting with Solaris 10.

9. Resources
1. HotSpot VM Frequently Asked Questions (FAQ)
2. GC output examples describes how to interpret the output from the different collectors.
3. How to Handle Java Finalization's Memory-Retention Issues covers finalization pitfalls and
ways to avoid them.
4. Richard Jones and Rafael Lins, Garbage Collection: Algorithms for Automated Dynamic
Memory Management, Wiley and Sons (1996), ISBN 0-471-94148-4

As used on the web site, the terms "Java Virtual Machine" and "JVM" mean a virtual
machine for the Java platform.

Static methods (in fact all methods) as well as static variables are stored in the
PermGen section of the heap, since they are part of the reflection data (class
related data, not instance related).
Update for clarification:
31 down
vote
accepted

Note that only the variables and their technical values (primitives or references)
are stored in PermGen space.
If your static variable is a reference to an object that object itself is stored in the
normal sections of the heap (young/old generation or survivor space). Those
objects (unless they are interal objects like classes etc.) are not stored in PermGen
space.

Example:
static int i = 1; //the value 1 is stored in the permgen section
static Object o = new SomeObject(); //the
reference(pointer/memory address) is stored in the permgen
section, the object itself is not.

A word on garbage collection:


Do not rely on finalize() as it's not guaranteed to run. It is totally up to the
JVM to decide when to run the garbage collector and what to collect, even if an
object is elligible for garbage collection.
Of course you can set a static variable to null and thus remove the reference to the
object on the heap but that doesn't mean the garbage collector will collect it (even
if there are no more references).
Additionally finalize() is run only once, so you have to make sure it doesn't
throw exceptions or otherwise prevent the object to be collected. If you halt
finalization through some exception, finalize() won't be invoked on the same
object a second time.
A final note: how code, runtime data etc. are stored depends on the JVM which is
used, i.e. HotSpot might do it differently than JRockit and this might even differ
between versions of the same JVM. The above is based on HotSpot for Java 5
and 6 (those are basically the same) since at the time of answering I'd say that
most people used those JVMs. Due to major changes in the memory model as of
Java 8, the statements above might not be true for Java 8 HotSpot - and I didn't
check the changes of Java 7 HotSpot, so I guess the above is still true for that
version, but I'm not sure here.

JVM memory area related jargons are key to understand the JVM on the whole. In this article
let us discuss about the important memory areas in JVM.

Heap Memory

Class instances and arrays are stored in heap memory. Heap memory is also called as shared
memory. As this is the place where multiple threads will share the same data.

Non-heap Memory
It comprises of Method Area and other memory required for internal processing. So here
the major player is Method Area.
Method Area

As given in the last line, method area is part of non-heap memory. It stores per-class
structures, code for methods and constructors. Per-class structure means runtime constants
and static fields.

Memory Pool
Memory pools are created by JVM memory managers during runtime. Memory pool may
belong to either heap or non-heap memory.

Runtime Constant Pool


A run time constant pool is a per-class or per-interface run time representation of the
constant_pool table in a class file. Each runtime constant pool is allocated from the Java

virtual machines method area.

Java Stacks or Frames


Java stacks are created private to a thread. Every thread will have a program counter (PC) and
a java stack. PC will use the java stack to store the intermediate values, dynamic linking,
return values for methods and dispatch exceptions. This is used in the place of registers.

Memory Generations
HotSpot VMs garbage collector uses generational garbage collection. It separates the JVMs
memory into and they are called young generation and old generation.
Young Generation

Young generation memory consists of two parts, Eden space and survivor space. Shortlived
objects will be available in Eden space. Every object starts its life from Eden space. When
GC happens, if an object is still alive and it will be moved to survivor space and other
dereferenced objects will be removed.
Old Generation Tenured and PermGen

Old generation memory has two parts, tenured generation and permanent generation
(PermGen). PermGen is a popular term. We used to error like PermGen space not sufficient.
GC moves live objects from survivor space to tenured generation. The permanent generation
contains meta data of the virtual machine, class and method objects.

Discussion:
Java specification doesnt give hard and fast rules about the design of JVM heap data area. So
it is left to the JVM implementers and they can decide on things like whether to allocate fixed
memory size or dynamic.

Key Takeaways

Local Variables are stored in Frames during runtime.


Static Variables are stored in Method Area.
Arrays are stored in heap memory.

References:

MemoryPoolMXBean provides you api to explore the memory usage, threshold notifications,
peak memory usage and memory usage monitoring.
Java Docs API for JConsole
Threads and Locks chapter of Java Language Specification talks lot about java memory

JBoss Performance Tuning part 1

JBoss Performance Tuning part 1


Tune the Heap ratio
Tune JBoss Thread Pools
All Pages

Page 1 of 3

Performance tuning is not a silver bullet. Simply put, good system performance depends on:
good design, good implementation, defined performance objectives, and performance tuning.
Since JBoss Performance tuning involves also tuning the environment on which jBoss is
run, the first tutorial will start discussing about JVM settings and OS settings on which JBoss
can produce best results. Then we'll see some specific JBoss config settings.

JBoss Performance Tuning Part 1


This tutorial is updated to the release 4.x of the application server. If you want to learn all
about JBoss 4.x-5.x-6.x Performance tuning, Optimal data Persistence, Clustering tuning,
Web application tuning and much more you should not miss the JBoss Performance tuning
book
Read here more about the book
JBoss tuning tip 1: Tune the garbage collector

One strength of the J2SE platform is that it shields the developer from the complexity of
memory allocation. However, once garbage collection is the principal bottleneck, it is worth
understanding some aspects of this hidden implementation
An object is considered garbage when it can no longer be reached from any pointer in the
running program. The most straightforward garbage collection algorithms simply iterate over

every reachable object. Any objects left over are then considered garbage. The time this
approach takes is proportional to the number of live objects,

The complete address space reserved for object memory can be divided into the young and
tenured generations.

The young generation consists of eden and two survivor spaces. Most objects are initially
allocated in eden. One survivor space is empty at any time, and serves as the destination of
any live objects in eden and the other survivor space during the next copying collection.
Objects are copied between survivor spaces in this way until they are old enough to be
tenured (copied to the tenured generation).
A third generation closely related to the tenured generation is the permanent generation
which holds data needed by the virtual machine to describe objects that do not have an
equivalence at the Java language level. For example objects describing classes and methods
are stored in the permanent generation

Use the the command line option -verbose:gc causes information about the heap and garbage
collection to be printed at each collection. For example, here is output from a large server
application:

It's demonstrated
that an application that spends 10% of its time in garbage collection can lose 75% of its
throughput when scaled out to 32 processors
(http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html)
JBoss tuning tip 2: Set -Xms and -Xmx to the same value

By default, the virtual machine grows or shrinks the heap at each collection to try to keep the
proportion of free space to live objects at each collection within a specific range.
Setting -Xms and -Xmx to the same value. This increase predictability by removing the most
important sizing decision from the virtual machine.
JBoss tuning tip 3: Use server VM

The server JVM is better suited to longer running applications. To enable it simply set the server option on the command line.
JBoss tuning tip 4: Turn off distributed gc

The RMI system provides a reference counting distributed garbage collection algorithm. This
system works by having the server keep track of which clients have requested access to
remote objects running on the server. When a reference is made, the server marks the object
as "dirty" and when a client drops the reference, it is marked as being "clean.". However this
system is quite expensive and by default runs every minute.
Set it to run every 30 minute at least
-Dsun.rmi.dgc.client.gcInterval=1800000
-Dsun.rmi.dgc.server.gcInterval=1800000
JBoss tuning tip 6: Turn on parallel gc

If you have multiple proessors you can do your garbage collection with multiple threads. By
default the parallel collector runs a collection thread per processor, that is if you have an 8
processor box then you'll garbage collect your data with 8 threads. In order to turn on the
parallel collector use the flag -XX:+UseParallelGC. You can also specify how many threads
you want to dedicate to garbage collection using the flag -XX:ParallelGCThreads=8.

JBoss tuning tip 7: Don't use Huge heaps, use a cluster

More JVMs/smaller heaps can outperform fewer JVMs/Larger Heaps. So instead of huge
heaps, use additional server nodes. Set up a JBoss cluster and balance work between nodes.
JBoss tuning tip 8: Don't choose an heap larger then 70% of your OS memory

Choose a maximum heap size not more then 70% of the memory to avoid excessive page
faults and thrashing.

Prev
Next >>

JBoss tuning tip 9: Tune the Heap ratio

This is one of most important tuning factor: the heap ratio. The heap ratio specifies how the
amount of the total heap will be partitioned between the young and the tenured space. What
happens if you have lots of long lived data (cached data, collections ) ? maybe you're in this
situation:

The problem here is that the long lived data overflows the tenured generation. When a
collection is needed the tenured generation is basically full of live data. Much of the young
generation is also filled with long lived data. The result was that a minor collection could

not be done successfully (there wasn't enough room in the tenured generation for the
anticipated promotions out of the young generation) so a major collection was done.
The major collection worked fine, but the results again was that the tenured generation was
full of long lived data and there was long lived data in the young generation. There was also
free space in the young generation for more allocations, but the next collection was again
destined to be a major collection.

This will eventually bring your application to crawl !!!!!


By decreasing the space in the young generation and putting that space into the tenured
generation (a value of NewRatio larger than the default value was chosen), there was enough
room in the tenured generation to hold all the long lived data and also space to support minor
collections. This particular application used lots of short lived objects so after the fix mostly
minor collections were done.
NewRatio is a flag that specifies the amount of the total heap that will be partitioned into the
young generation. It's the tenured-generation-size / young-generation-size. For example,
setting -XX:NewRatio=3 means that the ratio between the young and tenured generation is
1:3
If you want a more precise control over the young generation : NewSize is the initial size of
the young generation, MaxNewSize will specify the maximum size of the young generation
What is the recommeded heap ratios ? Set the tenured generation to be approximately
two times the size of the young generation. With a 2GB of RAM the recommended sizes
are 1200MB for the heap and 400MB for the young generation.
This recommendation is only a starting point, you have to tune from there and to do that you
have to gather and analyze the garbage collection statistics.
JBoss tuning tip 10: Monitor the free memory with monitors and snapshots

See this tips:


How to monitor jboss graphically ?
How to monitor JBoss with snapshots?

JBoss tuning tip 11: Tune the Operating System

Each operating system sets default tuning parameters differently. For Windows platforms, the
default settings are usually sufficient. However, the UNIX and Linux operating systems
usually need to be tuned appropriately
Solaris tuning parameters:

Check the following TCP parameters with your sysadmin:


/dev/tcp tcp_time_wait_interval
/dev/tcp tcp_conn_req_max_q
/dev/tcp tcp_conn_req_max_q0
/dev/tcp tcp_ip_abort_interval
/dev/tcp tcp_keepalive_interval
/dev/tcp tcp_rexmit_interval_initial
/dev/tcp tcp_rexmit_interval_max
/dev/tcp tcp_rexmit_interval_min
/dev/tcp tcp_smallest_anon_port
/dev/tcp tcp_xmit_hiwat
/dev/tcp tcp_recv_hiwat
/dev/ce instance
/dev/ce rx_intr_time
Tip: Use the netstat -s -P tcp command to view all available TCP parameters.
Set TCP-related tuning parameters using the ndd command
Example: ndd -set /dev/tcp tcp_conn_req_max_q 16384
Tune /etc/system filesystem
Each socket connection to JBoss consumes a file descriptor. To optimize socket performance,
you may need to configure your operating system to have the appropriate number of file
descriptors.
See solaris documentation about this parameters:
set rlim_fd_cur
set rlim_fd_max
set tcp:tcp_conn_hash_size (Solaris 8 and 9)
set ip:ipcl_conn_hash_size (Solaris 10)
set shmsys:shminfo_shmmax Note: This should only be set for machines that have at least 4
GB RAM or higher.
set autoup
set tune_t_fsflushr
Linux tuning parameters:

Since in Linux everything is a file, check the file-max parameter


cat /proc/sys/fs/file-max
set fs.file-max=102642 into /etc/sysctl.conf
Raise ulimit with /etc/limits.conf (or ulimit -n for current session)

Increase default socket send/receive buffer


sysctl -w net.core.rmem_default=262144
(default socket receive buffer)
sysctl -w net.core.wmem_default=262144
(default socket send buffer)
sysctl -w net.core.rmem_max=262144
(max socket receive buffer)
sysctl -w net.core.wmem_max=262144
(max socket send buffer size)

Optimize MTU. The TCP maximum transfer unit is 1512 on the Internet. If you are sending
larger packets it's a good idea to increase MTU size in order to reduce packet fragmentation
(especially if you have a slow network)
vi /etc/sysconfig/network-scripts/ifcfg-xxx (eth0 for instance)
add "MTU=9000" (for gigabit ethernet)
restart the interface (ifdown eth0;ifup eth0)
Use Big Memory Pages
Default page size is 4KB (usually too small!)
Check page size with:
$ cat /proc/meminfo
If you see "HugePage_Total," "HugePages_Free" and "Hugepagesize", you can apply this
optimization
Here's how to do it (2GB Heap Size Example)
$ echo 2147483647 > /proc/sys/kernel/shmmax
$ echo 1000 > /proc/sys/vm/nr_hugepages
In Sun's JVM, add this flag: XX:+UseLargePages

<< Prev
Next >>

JBoss Performance Tuning part 1 - Tune JBoss Thread


Pools

JBoss Performance Tuning part 1


Tune the Heap ratio
Tune JBoss Thread Pools
All Pages

Page 3 of 3

JBoss tuning tip 12: Lots of Requests ? check JBoss thread pool

JBoss thread pool is defined into conf/jboss-service.xml


?
1
2

<mbean code="org.jboss.util.threadpool.BasicThreadPool"
name="jboss.system:service=ThreadPool">

3<attribute name="Name">JBoss System Threads</attribute>


4<attribute name="ThreadGroupName">System Threads</attribute>
5<attribute name="KeepAliveTime">60000</attribute>
6<attribute name="MaximumPoolSize">10</attribute>
7
8

<attribute name="MaximumQueueSize">1000</attribute>
<attribute name="BlockingMode">run</attribute>
</mbean>

For most applications this defaults will just work well, however if you are running an
application with issues lots of requests to jboss (such as EJB invocations) then monitor your
thread pool. Open the Web Console and look for the MBean
jboss.system:service=ThreadPool.

Start a monitor on the QueueSize parameter. Have you got a QueueSize which reaches
MaximumPoolSize ? then probably you need to set a higher MaximumPoolSize pool size
attribute
Watchout! Speak at first with your sysadmin and ensure that the CPU capacity support the
increase in threads.
Watchout! if your threads make use of JDBC connections you'll probably need to increase
also the JDBC connection pool accordingly. Also verify that your HTTP connector is enabled
to handle that amount of requests
JBoss tuning tip 13: Check the Embedded web container

JBoss supports connectors for http, https, and ajp. The configuration file is server.xml and it's
deployed in the root of JBoss web container (In JBoss 4.2.0
it's: "JBOSS_HOME\server\default\deploy\jboss-web.deployer")
?
1<Connector port="8080" address="${jboss.bind.address}"
2maxThreads="250" maxHttpHeaderSize="8192"
3emptySessionPath="true" protocol="HTTP/1.1"
4enableLookups="false" redirectPort="8443" acceptCount="100"
5connectionTimeout="20000" disableUploadTimeout="true" />

The underlying HTTP connector of JBoss needs to be fine tuned for production settings. The
important parameters are:

maxThreads - This indicates the maximum number of threads to be allocated for handling
client HTTP requests. This figure corresponds to the concurrent users that are going to access
the application. Depending on the machine configuration, there is a physical limit beyond
which you will need to do clustering.
acceptCount - This is the number of request threads that are put in request queue when all
available threads are used. When this exceeds, client machines get a request timeout
response.
compression - If you set this attribute to force, the content will be compressed by JBoss
and will be send to browser. Browser will extract it and display the page on screen. Enabling
compression can substantially reduce bandwidth requirements of your application.
So how do you know if it's necessary to raise your maxThreads number ? again open the
web console and look for the MBean jboss.web:name=http-127.0.0.1-8080,type=ThreadPool.
The key attribute is currentThreadsBusy. If it's about 70-80% of the the maxThreads you
should consider raising the number of maxThreads.

Watch out! if you increase the maxThreads count you need to raise your JBoss Thread pool
accordingly.
JBoss tuning tip 14: Turn off JSP Compilation in production

JBoss application server regularly checks whether a JSP requires compilation to a servlet
before executing a JSP. In a production server, JSP files wont change and hence you can
configure the settings for increased performance.
Open the web.xml in deploy/jboss-web.deployer/conf folder. Look for the jsp servlet in the
file and modify the following XML fragment as given below:
?
<init-param>

<param-name>development</param-name>

2 <param-value>false</param-value>
3</init-param>
4
5

<init-param>
<param-name>checkInterval</param-name>
<param-value>300</param-value>

6
</init-param>

7
8

References:
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
http://people.redhat.com/alikins/system_tuning.html
http://community.jboss.org/wiki/JBossASTuningSliming
Ok. now keep reading........JBoss performance tuning part 2
What are the benefits of knowing how garbage collection (GC) works in Java? Satisfying the
intellectual curiosity as a software engineer would be a valid cause, but also, understanding
how GC works can help you write much better Java applications.
This is a very personal and subjective opinion of mine, but I believe that a person well versed
in GC tends to be a better Java developer. If you are interested in the GC process, that means
you have experience in developing applications of certain size. If you have thought carefully
about choosing the right GC algorithm, that means you completely understand the features of
the application you have developed. Of course, this may not be common standards for a good
developer. However, few would object when I say that understanding GC is a requirement for
being a great Java developer.
This is the first of a series of "Become a Java GC Expert" articles. I will cover the GC
introduction this time, and in the next article, I will talk about analyzing GC status and GC
tuning examples from NHN.
The purpose of this article is to introduce GC to you in an easy way. I hope this article proves
to be very helpful. Actually, my colleagues have already published a few great articles on
Java Internals which became quite popular on Twitter. You may refer to them as well.
Returning back to Garbage Collection, there is a term that you should know before learning
about GC. The term is "stop-the-world." Stop-the-world will occur no matter which GC
algorithm you choose. Stop-the-world means that the JVM is stopping the application from
running to execute a GC. When stop-the-world occurs, every thread except for the threads
needed for the GC will stop their tasks. The interrupted tasks will resume only after the GC
task has completed. GC tuning often means reducing this stop-the-world time.

Generational Garbage Collection

Java does not explicitly specify a memory and remove it in the program code. Some people
sets the relevant object to null or use System.gc() method to remove the memory explicitly.
Setting it to null is not a big deal, but calling System.gc() method will affect the system
performance drastically, and must not be carried out. (Thankfully, I have not yet seen any
developer in NHN calling this method.)
In Java, as the developer does not explicitly remove the memory in the program code, the
garbage collector finds the unnecessary (garbage) objects and removes them. This garbage
collector was created based on the following two hypotheses. (It is more correct to call them
suppositions or preconditions, rather than hypotheses.)

Most objects soon become unreachable.


References from old objects to young objects only exist in small numbers.

These hypotheses are called the weak generational hypothesis. So in order to preserve the
strengths of this hypothesis, it is physically divided into two - young generation and old
generation - in HotSpot VM.
Young generation: Most of the newly created objects are located here. Since most objects
soon become unreachable, many objects are created in the young generation, then disappear.
When objects disappear from this area, we say a "minor GC" has occurred.
Old generation: The objects that did not become unreachable and survived from the young
generation are copied here. It is generally larger than the young generation. As it is bigger in
size, the GC occurs less frequently than in the young generation. When objects disappear
from the old generation, we say a "major GC" (or a "full GC") has occurred.
Let's look at this in a chart.

Figure 1: GC Area & Data Flow.

The permanent generation from the chart above is also called the "method area," and it
stores classes or interned character strings. So, this area is definitely not for objects that
survived from the old generation to stay permanently. A GC may occur in this area. The GC
that took place here is still counted as a major GC.
Some people may wonder:
What if an object in the old generation need to reference an object in the young
generation?
To handle these cases, there is something called the a "card table" in the old generation,
which is a 512 byte chunk. Whenever an object in the old generation references an object in
the young generation, it is recorded in this table. When a GC is executed for the young
generation, only this card table is searched to determine whether or not it is subject for GC,
instead of checking the reference of all the objects in the old generation. This card table is
managed with write barrier. This write barrier is a device that allows a faster performance
for minor GC. Though a bit of overhead occurs because of this, the overall GC time is
reduced.

Figure 2: Card Table Structure.

Composition of the Young Generation


In order to understand GC, let's learn about the young generation, where the objects are
created for the first time. The young generation is divided into 3 spaces.

One Eden space


Two Survivor spaces

There are 3 spaces in total, two of which are Survivor spaces. The order of execution process
of each space is as below:
1. The majority of newly created objects are located in the Eden space.
2. After one GC in the Eden space, the surviving objects are moved to one of the Survivor
spaces.

3. After a GC in the Eden space, the objects are piled up into the Survivor space, where other
surviving objects already exist.
4. Once a Survivor space is full, surviving objects are moved to the other Survivor space. Then,
the Survivor space that is full will be changed to a state where there is no data at all.
5. The objects that survived these steps that have been repeated a number of times are moved
to the old generation.

As you can see by checking these steps, one of the Survivor spaces must remain empty. If
data exists in both Survivor spaces, or the usage is 0 for both spaces, then take that as a sign
that something is wrong with your system.
The process of data piling up into the old generation through minor GCs can be shown as in
the below chart:

Figure 3: Before & After a GC.


Note that in HotSpot VM, two techniques are used for faster memory allocations. One is
called "bump-the-pointer," and the other is called "TLABs (Thread-Local Allocation
Buffers)."
Bump-the-pointer technique tracks the last object allocated to the Eden space. That object
will be located on top of the Eden space. And if there is an object created afterwards, it
checks only if the size of the object is suitable for the Eden space. If the said object seems
right, it will be placed in the Eden space, and the new object goes on top. So, when new

objects are created, only the lastly added object needs to be checked, which allows much
faster memory allocations. However, it is a different story if we consider a multithreaded
environment. To save objects used by multiple threads in the Eden space for Thread-Safe, an
inevitable lock will occur and the performance will drop due to the lock-contention. TLABs
is the solution to this problem in HotSpot VM. This allows each thread to have a small
portion of its Eden space that corresponds to its own share. As each thread can only access to
their own TLAB, even the bump-the-pointer technique will allow memory allocations
without a lock.
This has been a quick overview of the GC in the young generation. You do not necessarily
have to remember the two techniques that I have just mentioned. You will not go to jail for
not knowing them. But please remember that after the objects are first created in the Eden
space, and the long-surviving objects are moved to the old generation through the Survivor
space.

GC for the Old Generation


The old generation basically performs a GC when the data is full. The execution procedure
varies by the GC type, so it would be easier to understand if you know different types of GC.
According to JDK 7, there are 5 GC types.
1.
2.
3.
4.
5.

Serial GC
Parallel GC
Parallel Old GC (Parallel Compacting GC)
Concurrent Mark & Sweep GC (or "CMS")
Garbage First (G1) GC

Among these, the serial GC must not be used on an operating server. This GC type was
created when there was only one CPU core on desktop computers. Using this serial GC will
drop the application performance significantly.
Now let's learn about each GC type.
Serial GC (-XX:+UseSerialGC)

The GC in the young generation uses the type we explained in the previous paragraph. The
GC in the old generation uses an algorithm called "mark-sweep-compact."
1. The first step of this algorithm is to mark the surviving objects in the old generation.
2. Then, it checks the heap from the front and leaves only the surviving ones behind (sweep).
3. In the last step, it fills up the heap from the front with the objects so that the objects are
piled up consecutively, and divides the heap into two parts: one with objects and one
without objects (compact).

The serial GC is suitable for a small memory and a small number of CPU cores.

Parallel GC (-XX:+UseParallelGC)

Figure 4: Difference between the Serial GC and Parallel GC.


From the picture, you can easily see the difference between the serial GC and parallel GC.
While the serial GC uses only one thread to process a GC, the parallel GC uses several
threads to process a GC, and therefore, faster. This GC is useful when there is enough
memory and a large number of cores. It is also called the "throughput GC."
Parallel Old GC(-XX:+UseParallelOldGC)

Parallel Old GC was supported since JDK 5 update. Compared to the parallel GC, the only
difference is the GC algorithm for the old generation. It goes through three steps: mark
summary compaction. The summary step identifies the surviving objects separately for the
areas that the GC have previously performed, and thus different from the sweep step of the
mark-sweep-compact algorithm. It goes through a little more complicated steps.

CMS GC (-XX:+UseConcMarkSweepGC)

Figure 5: Serial GC & CMS GC.


As you can see from the picture, the Concurrent Mark-Sweep GC is much more complicated
than any other GC types that I have explained so far. The early initial mark step is simple.
The surviving objects among the objects the closest to the classloader are searched. So, the
pausing time is very short. In the concurrent mark step, the objects referenced by the
surviving objects that have just been confirmed are tracked and checked. The difference of
this step is that it proceeds while other threads are processed at the same time. In the remark
step, the objects that were newly added or stopped being referenced in the concurrent mark
step are checked. Lastly, in the concurrent sweep step, the garbage collection procedure takes
place. The garbage collection is carried out while other threads are still being processed.
Since this GC type is performed in this manner, the pausing time for GC is very short. The
CMS GC is also called the low latency GC, and is used when the response time from all
applications is crucial.
While this GC type has the advantage of short stop-the-world time, it also has the following
disadvantages.

It uses more memory and CPU than other GC types.


The compaction step is not provided by default.

You need to carefully review before using this type. Also, if the compaction task needs to be
carried out because of the many memory fragments, the stop-the-world time can be longer

than any other GC types. You need to check how often and how long the compaction task is
carried out.
G1 GC

Finally, let's learn about the garbage first (G1) GC.

Figure 6: Layout of G1 GC.


If you want to understand G1 GC, forget everything you know about the young generation
and the old generation. As you can see in the picture, one object is allocated to each grid, and
then a GC is executed. Then, once one area is full, the objects are allocated to another area,
and then a GC is executed. The steps where the data moves from the three spaces of the
young generation to the old generation cannot be found in this GC type. This type was
created to replace the CMS GC, which has causes a lot of issues and complaints in the long
term.
The biggest advantage of the G1 GC is its performance. It is faster than any other GC types
that we have discussed so far. But in JDK 6, this is called an early access and can be used
only for a test. It is officially included in JDK 7. In my personal opinion, we need to go
through a long test period (at least 1 year) before NHN can use JDK7 in actual services, so
you probably should wait a while. Also, I heard a few times that a JVM crash occurred after
applying the G1 in JDK 6. Please wait until it is more stable.
I will talk about the GC tuning in the next issue, but I would like to ask you one thing in
advance. If the size and the type of all objects created in the application are identical, all the
GC options for WAS used in our company can be the same. But the size and the lifespan of
the objects created by WAS vary depending on the service, and the type of equipment varies
as well. In other words, just because a certain service uses the GC option "A," it does not
mean that the same option will bring the best results for a different service. It is necessary to
find the best values for the WAS threads, WAS instances for each equipment and each GC
option by constant tuning and monitoring. This did not come from my personal experience,
but from the discussion of the engineers making Oracle JVM for JavaOne 2010.
In this issue, we have only glanced at the GC for Java. Please look forward to our next issue,
where I will talk about how to monitor the Java GC status and tune GC.
I would like to note that I referred to a new book released in December 2011 called "Java
Performance" (Amazon, it can also be viewed from safari online, if the company provides an
account), as well as Memory Management in the Java HotSpotTM Virtual Machine, a white

paper provided by the Oracle website. (The book is different from "Java Performance
Tuning.")
By Sangmin Lee, Senior Engineer at Performance Engineering Lab, NHN Corporation.

You might also like