Professional Documents
Culture Documents
programs
Find out when memory leaks are a concern and how to prevent them
Memory leaks in Java programs? Absolutely. Contrary to popular belief,
memory management is still a consideration in Java programming. In this
article, you'll learn what causes Java memory leaks and when these leaks
should be of concern. You'll also get a quick hands-on lesson for tackling
leaks in your own projects.
PDF (214 KB) | 11 Comments
Share:
Facebook
Twitter
Linked In
Google+
no longer being referenced are then eligible to be garbage collected. The memory resources
used by these objects can be returned to the Java virtual machine (JVM) when the objects
are deleted.
So it is true that Java code does not require the programmer to be responsible for memory
management cleanup, and that it automatically garbage collects unused objects. However,
the key point to remember is that an object is only counted as being unused when it is no
longer referenced. Figure 1 illustrates this concept.
Figure 1. Unused but still referenced
The figure illustrates two classes that have different lifetimes during the execution of a Java
application. Class A is instantiated first and exists for a long time or for the entire life of the
program. At some point, class B is created, and class A adds a reference to this newly
created class. Now let's suppose class B is some user interface widget that is displayed and
eventually dismissed by the user. Even though class B is no longer needed, if the reference
that class A has to class B is not cleared, class B will continue to exist and to take up memory
space even after the next garbage collection cycle is executed.
Back to top
collector -- even if a program explicitly calls System.gc(). Typically, the garbage collector
won't be automatically run until a program needs more memory than is currently available. At
this point, the JVM will first attempt to make more memory available by invoking the garbage
collector. If this attempt still doesn't free enough resources, then the JVM will obtain more
memory from the operating system until it finally reaches the maximum allowed.
Take, for example, a small Java application that displays some simple user interface
elements for configuration modifications and that has a memory leak. Chances are that the
garbage collector will not even be invoked before the application closes, because the JVM
will probably have plenty of memory to create all of the objects needed by the program with
leftover memory to spare. So, in this case, even though some dead objects are taking up
memory while the program is being executed, it really doesn't matter for all practical
purposes.
If the Java code being developed is meant to run on a server 24 hours a day, then memory
leaks become much more significant than in the case of our configuration utility. Even the
smallest leak in some code that is meant to be continuously run will eventually result in the
JVM exhausting all of the memory available.
And in the opposite case where a program is relatively short lived, memory limits can be
reached by any Java code that allocates a large number of temporary objects (or a handful
of objects that eat up large amounts of memory) that are not de-referenced when no longer
needed.
One last consideration is that the memory leak isn't a concern at all. Java memory leaks
should not be considered as dangerous as leaks that occur in other languages such as C++
where memory is lost and never returned to the operating system. In the case of Java
applications, we have unneeded objects clinging to memory resources that have been given
to the JVM by the operating system. So in theory, once the Java application and its JVM
have been closed, all allocated memory will be returned to the operating system.
Back to top
memory as it runs. The program never seems to return any memory back to the system until
a very large amount of physical memory has been allocated to the application. Could these
situations be signs of a memory leak?
To understand what is going on, we need to familiarize ourselves with how the JVM uses
system memory for its heap. When running java.exe, you can use certain options to control
the startup and maximum size of the garbage-collected heap (-ms and -mx, respectively).
The Sun JDK 1.1.8 uses a default 1 MB startup setting and a 16 MB maximum setting. The
IBM JDK 1.1.8 uses a default maximum setting of one-half the total physical memory size of
the machine. These memory settings have a direct impact on what the JVM does when it
runs out of memory. The JVM may continue growing the heap rather than wait for a garbage
collection cycle to complete.
So for the purposes of finding and eventually eliminating a memory leak, we are going to
need better tools than task monitoring utility programs. Memory debugging programs
(see Resources) can come in handy when you're trying to detect memory leaks. These
programs typically give you information about the number of objects in the heap, the number
of instances of each object, and the memory being using by the objects. In addition, they
may also provide useful views showing each object's references and referrers so that you
can track down the source of a memory leak.
Next, I will show how I detected and removed a memory leak using the JProbe debugger
from Sitraka Software to give you some idea of how these tools can be deployed and the
process required to successfully remove a leak.
Back to top
to be a tedious, iterative process that involved first determining the cause of a given memory
leak and then making code changes and verifying the results.
JProbe has several options to control what information is actually recorded during a
debugging session. After some experimentation, I decided that the most efficient way to get
the information I needed was to turn off the performance data collection and concentrate on
the captured heap data. JProbe provides a view called the Runtime Heap Summary that
shows the amount of heap memory in use over time as the Java application is running. It
also provides a toolbar button to force the JVM to perform garbage collection when desired.
This capability turned out to be very useful when trying to see if a given instance of a class
would be garbage collected when it was no longer needed by the Java application. Figure 2
shows the amount of heap storage that is in use over time.
Figure 2. Runtime Heap Summary
In the Heap Usage Chart, the blue portion indicates the amount of heap space that has been
allocated. After I started the Java program and it reached a stable point, I forced the garbage
collector to run, which is indicated by the sudden drop in the blue curve before the green line
(this line indicates a checkpoint was inserted). Next, I added, then deleted four forms and
again invoked the garbage collector. The fact that the level blue area after the checkpoint is
higher than the level blue area before the checkpoint tells us that a memory leak is likely, as
the program has returned to its initial state of only having a single visible form. I confirmed
the leak by looking at the Instance Summary, which indicates that the FormFrameclass
(which is the main UI class for the forms) has increased in count by four after the checkpoint.
Back to top
For this specific example, it turned out that the primary culprit was a font manager class that
contained a static hashtable. After tracing back through the list of referrers, I found that the
root node was a static hashtable that stored the fonts in use for each form. The various
forms could be zoomed in or out independently, so the hashtable contained a vector with all
of the fonts for a given form. When the zoom view of the form was changed, the vector of
fonts was fetched and the appropriate zoom factor was applied to the font sizes.
The problem with this font manager class was that while the code put the font vector into the
hashtable when the form was created, no provision was ever made to remove the vector
when the form was deleted. Therefore, this static hashtable, which essentially existed for the
life of the application itself, was never removing the keys that referenced each form.
Consequently, the form and all of its associated classes were left dangling in memory.
Back to top
Applying a fix
The simple solution to this problem was to add a method to the font manager class that
allowed the hashtable's remove() method to be called with the appropriate key when the
form was deleted by the user. The removeKeyFromHashtables() method is shown below:
public void removeKeyFromHashtables(GraphCanvas graph) {
if (graph != null) {
viewFontTable.remove(graph);
}
}
Next, I added a call to this method to the FormFrame class. FormFrame uses Swing's internal
frames to actually implement the form UI, so the call to the font manager was added to the
method that is executed when an internal frame has completely closed, as shown here:
/**
* Invoked when a FormFrame is disposed. Clean out references to prevent
* memory leaks.
*/
public void internalFrameClosed(InternalFrameEvent e) {
FontManager.get().removeKeyFromHashtables(canvas);
canvas = null;
setDesktopIcon(null);
}
After I made these code changes, I used the debugger to verify that the object count
associated with the deleted form decreased when the same test case was executed.
Back to top
Another common problem occurs when you register a class as an event listener without
bothering to unregister when the class is no longer needed. Also, many times member
variables of a class that point to other classes simply need to be set to null at the appropriate
time.
Back to top
Conclusion
Finding the cause of a memory leak can be a tedious process, not to mention one that will
require special debugging tools. However, once you become familiar with the tools and the
patterns to look for in tracing object references, you will be able to track down memory leaks.
In addition, you'll gain some valuable skills that may not only save a programming project,
but also provide insight as to what coding practices to avoid to prevent memory leaks in
future projects.
all references are null. Cyclic dependencies are not counted as reference so if Object A has reference of
object B and object B has reference of Object A and they don't have any other live reference then both Objects
A and B will be eligible for Garbage collection.
Generally an object becomes eligible for garbage collection in Java on following cases:
1) All references of that object explicitly set to null e.g. object = null
2) Object is created inside a block and reference goes out scope once control exit that block.
3) Parent object set to null, if an object holds reference of another object and when you set container object's
reference null, child or contained object automatically becomes eligible for garbage collection.
4) If an object has only live references via WeakHashMap it will be eligible for garbage collection. To learn
more about HashMap see here How HashMap works in Java.
full GC and I found that Garbage collection tuning largely depends on application profile, what kind of object
application has and what are there average lifetime etc. for example if an application has too many short lived
object then making Eden space wide enough or larger will reduces number of minor collections. you can also
control size of both young and Tenured generation using JVM parameters for example setting XX:NewRatio=3 means that the ratio among the young and tenured generation is 1:3 , you got to be careful
on sizing these generation. As making young generation larger will reduce size of tenured generation
which will force Major collection to occur more frequently which pauses application thread during that
duration results in degraded or reduced throughput. The parameters NewSize and MaxNewSize are used to
specify the young generation size from below and above. Setting these equal to one another fixes the young
generation. In my opinion before doing garbage collection tuning detailed understanding of garbage collection
in java is must and I would recommend reading Garbage collection document provided by Sun Microsystems
for detail knowledge of garbage collection in Java. Also to get a full list of JVM parameters for a particular Java
Virtual machine please refer official documents on garbage collection in Java. I found this link quite helpful
though http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html
Note: For Java SE 8, see Java Platform, Standard Edition HotSpot Virtual Machine Garbage
Collection Tuning Guide.
Table of Contents
1. Introduction
2. Ergonomics
3. Generations
o Performance Considerations
o Measurement
4. Sizing the Generations
o Total Heap
o The Young Generation
Survivor Space Sizing
5. Available Collectors
o Selecting a Collector
6. The Parallel Collector
o Generations
o Ergonomics
Priority of goals
Generation Size Adjustments
Default Heap Size
o Excessive GC Time and OutOfMemoryError
o Measurements
7. The Concurrent Collector
o Overhead of Concurrency
o Concurrent Mode Failure
o Excessive GC Time and OutOfMemoryError
o Floating Garbage
o Pauses
o Concurrent Phases
o Starting a Concurrent Collection Cycle
o Scheduling Pauses
o Incremental Mode
Command Line Options
Recommended Options
Basic Troubleshooting
o Measurements
8. Other Considerations
9. Resources
1. Introduction
The Java Platform, Standard Edition (Java SE) is used for a wide variety of applications,
from small applets on desktops to web services on large servers. In support of this diverse
range of deployments, the Java HotSpot virtual machine implementation (Java HotSpot
VM) provides multiple garbage collectors, each designed to satisfy different requirements.
This is an important part of meeting the demands of both large and small applications.
However, users, developers and administrators that need high performance are burdened with
the extra step of selecting the garbage collector that best meets their needs. A significant step
toward removing this burden was made in J2SE 5.0: the garbage collector is selected based
on the class of the machine on which the application is run.
This better choice of the garbage collector is generally an improvement, but is by no means
always the best choice for every application. Users with strict performance goals or other
requirements may need to explicitly select the garbage collector and tune certain parameters
to achieve the desired level of performance. This document provides information to help with
those tasks. First, the general features of a garbage collector and basic tuning options are
described in the context of the serial, stop-the-world collector. Then specific features of the
other collectors are presented along with factors to consider when selecting a collector.
When does the choice of a garbage collector matter? For some applications, the answer is
never. That is, the application can perform well in the presence of garbage collection with
pauses of modest frequency and duration. However, this is not the case for a large class of
applications, particularly those with large amounts of data (multiple gigabytes), many threads
and high transaction rates.
Amdahl observed that most workloads cannot be perfectly parallelized; some portion is
always sequential and does not benefit from parallelism. This is also true for the Java
platform. In particular, virtual machines from Sun Microsystems for the Java platform prior
to J2SE 1.4 do not support parallel garbage collection, so the impact of garbage collection on
a multiprocessor system grows relative to an otherwise parallel application.
The graph below models an ideal system that is perfectly scalable with the exception of
garbage collection. The red line is an application spending only 1% of the time in garbage
collection on a uniprocessor system. This translates to more than a 20% loss in throughput on
32 processor systems. At 10% of the time in garbage collection (not considered an outrageous
amount of time in garbage collection in uniprocessor applications) more than 75% of
throughput is lost when scaling up to 32 processors.
This shows that negligible speed issues when developing on small systems may become
principal bottlenecks when scaling up to large systems. However, small improvements in
reducing such a bottleneck can produce large gains in performance. For a sufficiently large
system it becomes well worthwhile to select the right garbage collector and to tune it if
necessary.
The serial collector is usually adequate for most "small" applications (those requiring heaps
of up to approximately 100MB on modern processors). The other collectors have additional
overhead and/or complexity which is the price for specialized behavior. If the application
doesn't need the specialized behavior of an alternate collector, use the serial collector. An
example of a situation where the serial collector is not expected to be the best choice is a
large application that is heavily threaded and run on a machine with a large amount of
memory and two or more processors. When applications are run on such server-class
machines, the parallel collector is selected by default (see Ergonomics below).
This document was developed using Java SE 6 on the Solaris Operating System (SPARC
(R)
Platform Edition) as the reference. However, the concepts and recommendations presented
here apply to all supported platforms, including Linux, Microsoft Windows and the Solaris
Operating System (x86 Platform Edition). In addition, the command line options mentioned
are available on all supported platforms, although the default values of some options may be
different on each platform.
2. Ergonomics
A feature referred to here as ergonomics was introduced in J2SE 5.0. The goal of ergonomics
is to provide good performance with little or no tuning of command line options by selecting
the
garbage collector,
heap size,
and runtime compiler
at JVM startup, instead of using fixed defaults. This selection assumes that the class of the
machine on which the application is run is a hint as to the characteristics of the application
(i.e., large applications run on large machines). In addition to these selections is a simplified
way of tuning garbage collection. With the parallel collector the user can specify goals for a
maximum pause time and a desired throughput for an application. This is in contrast to
specifying the size of the heap that is needed for good performance. This is intended to
particularly improve the performance of large applications that use large heaps. The more
general ergonomics is described in the document entitled Ergonomics in the 5.0 Java Virtual
Machine. It is recommended that the ergonomics as presented in this latter document be
tried before using the more detailed controls explained in this document.
Included in this document are the ergonomics features provided as part of the adaptive size
policy for the parallel collector. This includes the options to specify goals for the
performance of garbage collection and additional options to fine tune that performance.
3. Generations
One strength of the J2SE platform is that it shields the developer from the complexity of
memory allocation and garbage collection. However, once garbage collection is the principal
bottleneck, it is worth understanding some aspects of this hidden implementation. Garbage
collectors make assumptions about the way applications use objects, and these are reflected in
tunable parameters that can be adjusted for improved performance without sacrificing the
power of the abstraction.
An object is considered garbage when it can no longer be reached from any pointer in the
running program. The most straightforward garbage collection algorithms simply iterate over
every reachable object. Any objects left over are then considered garbage. The time this
approach takes is proportional to the number of live objects, which is prohibitive for large
applications maintaining lots of live data.
Beginning with the J2SE 1.2, the virtual machine incorporated a number of different garbage
collection algorithms that are combined using generational collection. While naive garbage
collection examines every live object in the heap, generational collection exploits several
empirically observed properties of most applications to minimize the work required to
reclaim unused ("garbage") objects. The most important of these observed properties is the
weak generational hypothesis, which states that most objects survive for only a short period
of time.
The blue area in the diagram below is a typical distribution for the lifetimes of objects. The X
axis is object lifetimes measured in bytes allocated. The byte count on the Y axis is the total
bytes in objects with the corresponding lifetime. The sharp peak at the left represents objects
that can be reclaimed (i.e., have "died") shortly after being allocated. Iterator objects, for
example, are often alive for the duration of a single loop.
Some objects do live longer, and so the distribution stretches out to the the right. For
instance, there are typically some objects allocated at initialization that live until the process
exits. Between these two extremes are objects that live for the duration of some intermediate
computation, seen here as the lump to the right of the initial peak. Some applications have
very different looking distributions, but a surprisingly large number possess this general
shape. Efficient collection is made possible by focusing on the fact that a majority of objects
"die young."
To optimize for this scenario, memory is managed in generations, or memory pools holding
objects of different ages. Garbage collection occurs in each generation when the generation
fills up. The vast majority of objects are allocated in a pool dedicated to young objects (the
young generation), and most objects die there. When the young generation fills up it causes a
minor collection in which only the young generation is collected; garbage in other
generations is not reclaimed. Minor collections can be optimized assuming the weak
generational hypothesis holds and most objects in the young generation are garbage and can
be reclaimed. The costs of such collections are, to the first order, proportional to the number
of live objects being collected; a young generation full of dead objects is collected very
quickly. Typically some fraction of the surviving objects from the young generation are
moved to the tenured generation during each minor collection. Eventually, the tenured
generation will fill up and must be collected, resulting in a major collection, in which the
entire heap is collected. Major collections usually last much longer than minor collections
because a significantly larger number of objects are involved.
As noted above, ergonomics selects the garbage collector dynamically in order to provide
good performance on a variety of applications. The serial garbage collector is designed for
applications with small data sets and its default parameters were chosen to be effective for
most small applications. The throughput garbage collector is meant to be used with
applications that have medium to large data sets. The heap size parameters selected by
ergonomics plus the features of the adaptive size policy are meant to provide good
performance for server applications. These choices work well in most, but not all, cases.
Which leads to the central tenet of this document:
If garbage collection becomes a bottleneck, you will most likely have to
customize the total heap size as well as the sizes of the individual generations.
Check the verbose garbage collector output and then explore the sensitivity of
your individual performance metric to the garbage collector parameters.
The default arrangement of generations (for all collectors with the exception of the parallel
collector) looks something like this.
At initialization, a maximum address space is virtually reserved but not allocated to physical
memory unless it is needed. The complete address space reserved for object memory can be
divided into the young and tenured generations.
The young generation consists of eden and two survivor spaces. Most objects are initially
allocated in eden. One survivor space is empty at any time, and serves as the destination of
any live objects in eden and the other survivor space during the next copying collection.
Objects are copied between survivor spaces in this way until they are old enough to be
tenured (copied to the tenured generation).
A third generation closely related to the tenured generation is the permanent generation
which holds data needed by the virtual machine to describe objects that do not have an
equivalence at the Java language level. For example objects describing classes and methods
are stored in the permanent generation.
Performance Considerations
There are two primary measures of garbage collection performance:
1. Throughput is the percentage of total time not spent in garbage collection, considered over
long periods of time. Throughput includes time spent in allocation (but tuning for speed of
allocation is generally not needed).
2. Pauses are the times when an application appears unresponsive because garbage collection
is occurring.
Users have different requirements of garbage collection. For example, some consider the
right metric for a web server to be throughput, since pauses during garbage collection may be
tolerable, or simply obscured by network latencies. However, in an interactive graphics
program even short pauses may negatively affect the user experience.
Some users are sensitive to other considerations. Footprint is the working set of a process,
measured in pages and cache lines. On systems with limited physical memory or many
processes, footprint may dictate scalability. Promptness is the time between when an object
becomes dead and when the memory becomes available, an important consideration for
distributed systems, including remote method invocation (RMI).
In general, a particular generation sizing chooses a trade-off between these considerations.
For example, a very large young generation may maximize throughput, but does so at the
expense of footprint, promptness and pause times. young generation pauses can be minimized
by using a small young generation at the expense of throughput. To a first approximation, the
sizing of one generation does not affect the collection frequency and pause times for another
generation.
There is no one right way to size generations. The best choice is determined by the way the
application uses memory as well as user requirements. Thus the virtual machine's choice of a
garbage collectior is not always optimal and may be overridden with command line options
described below.
Measurement
Throughput and footprint are best measured using metrics particular to the application. For
example, throughput of a web server may be tested using a client load generator, while
footprint of the server might be measured on the Solaris Operating System using the pmap
command. On the other hand, pauses due to garbage collection are easily estimated by
inspecting the diagnostic output of the virtual machine itself.
The command line option -verbose:gc causes information about the heap and garbage
collection to be printed at each collection. For example, here is output from a large server
application:
[GC 325407K->83000K(776768K), 0.2300771 secs]
[GC 325816K->83372K(776768K), 0.2454258 secs]
[Full GC 267628K->83769K(776768K), 1.8479984 secs]
Here we see two minor collections followed by one major collection. The numbers before
and after the arrow (e.g., 325407K->83000K from the first line) indicate the combined size of
live objects before and after garbage collection, respectively. After minor collections the size
includes some objects that are garbage (no longer alive) but that cannot be reclaimed. These
objects are either contained in the tenured generation, or referenced from the tenured or
permanent generations.
The next number in parentheses (e.g., (776768K) again from the first line) is the committed
size of the heap: the amount of space usable for java objects without requesting more
memory from the operating system. Note that this number does not include one of the
survivor spaces, since only one can be used at any given time, and also does not include the
permanent generation, which holds metadata used by the virtual machine.
The last item on the line (e.g., 0.2300771 secs) indicates the time taken to perform the
collection; in this case approximately a quarter of a second.
The format for the major collection in the third line is similar.
The format of the output produced by -verbose:gc is subject to change in future releases.
indicates that the minor collection recovered about 98% of the young generation, DefNew:
64575K->959K(64576K) and took 0.0457646 secs (about 45 milliseconds).
The usage of the entire heap was reduced to about 51% 196016K->133633K(261184K) and
that there was some slight additional overhead for the collection (over and above the
collection of the young generation) as indicated by the final time of 0.0459067 secs.
The option -XX:+PrintGCTimeStamps will add a time stamp at the start of each collection.
This is useful to see how frequently garbage collections occur.
111.042: [GC 111.042: [DefNew: 8128K->8128K(8128K), 0.0000505 secs]111.042:
[Tenured: 18154K->2311K(24576K), 0.1290354 secs] 26282K->2311K(32704K),
0.1293306 secs]
The collection starts about 111 seconds into the execution of the application. The minor
collection starts at about the same time. Additionally the information is shown for a major
collection delineated by Tenured. The tenured generation usage was reduced to about 10%
18154K->2311K(24576K) and took 0.1290354 secs (approximately 130 milliseconds).
As was the case with -verbose:gc, the format of the output produced by -
Total Heap
Note that the following discussion regarding growing and shrinking of the heap and default
heap sizes does not apply to the parallel collector. (See the section on ergonomics for details
on heap resizing and default heap sizes with the parallel collector.) However, the parameters
that control the total size of the heap and the sizes of the generations do apply to the parallel
collector.
Since collections occur when generations fill up, throughput is inversely proportional to the
amount of memory available. Total available memory is the most important factor affecting
garbage collection performance.
By default, the virtual machine grows or shrinks the heap at each collection to try to keep the
proportion of free space to live objects at each collection within a specific range. This target
range is set as a percentage by the parameters -XX:MinHeapFreeRatio=<minimum> and XX:MaxHeapFreeRatio=<maximum>, and the total size is bounded below by -Xms<min> and
above by -Xmx<max>. The default parameters for the 32-bit Solaris Operating System
(SPARC Platform Edition) are shown in this table:
Parameter
Default Value
MinHeapFreeRatio
40
MaxHeapFreeRatio
70
-Xms
3670k
-Xmx
64m
Default values of heap size parameters on 64-bit systems have been scaled up by
approximately 30%. This increase is meant to compensate for the larger size of objects on a
64-bit system.
With these parameters, if the percent of free space in a generation falls below 40%, the
generation will be expanded to maintain 40% free space, up to the maximum allowed size of
the generation. Similarly, if the free space exceeds 70%, the generation will be contracted so
that only 70% of the space is free, subject to the minimum size of the generation.
Large server applications often experience two problems with these defaults. One is slow
startup, because the initial heap is small and must be resized over many major collections. A
more pressing problem is that the default maximum heap size is unreasonably small for most
server applications. The rules of thumb for server applications are:
Unless you have problems with pauses, try granting as much memory as possible to the
virtual machine. The default size (64MB) is often too small.
Setting -Xms and -Xmx to the same value increases predictability by removing the most
important sizing decision from the virtual machine. However, the virtual machine is then
unable to compensate if you make a poor choice.
In general, increase the memory as you increase the number of processors, since allocation
can be parallelized.
For reference, there is a separate page explaining some of the available command-line
options.
The parameters NewSize and MaxNewSize bound the young generation size from below and
above. Setting these to the same value fixes the young generation, just as setting -Xms and Xmx to the same value fixes the total heap size. This is useful for tuning the young generation
at a finer granularity than the integral multiples allowed by NewRatio.
Survivor Space Sizing
If desired, the parameter SurvivorRatio can be used to tune the size of the survivor spaces,
but this is often not as important for performance. For example, -XX:SurvivorRatio=6 sets
the ratio between eden and a survivor space to 1:6. In other words, each survivor space will
be one sixth the size of eden, and thus one eighth the size of the young generation (not one
seventh, because there are two survivor spaces).
If survivor spaces are too small, copying collection overflows directly into the tenured
generation. If survivor spaces are too large, they will be uselessly empty. At each garbage
collection the virtual machine chooses a threshold number of times an object can be copied
before it is tenured. This threshold is chosen to keep the survivors half full. The commandline option -XX:+PrintTenuringDistribution can be used to show this threshold and the
ages of objects in the new generation. It is also useful for observing the lifetime distribution
of an application.
Here are the default values for the 32-bit Solaris Operating System (SPARC Platform
Edition); the default values on other platforms are different.
Default Value
Parameter
NewRatio
NewSize
MaxNewSize
SurvivorRatio
Client JVM
Server JVM
2228K
2228K
32
The maximum size of the young generation will be calculated from the maximum size of the
total heap and NewRatio. The "not limited" default value for MaxNewSize means that the
calculated value is not limited by MaxNewSize unless a value for MaxNewSize is specified on
the command line.
The rules of thumb for server applications are:
First decide the maximum heap size you can afford to give the virtual machine. Then plot
your performance metric against young generation sizes to find the best setting.
Note that the maximum heap size should always be smaller than the amount of
memory installed on the machine, to avoid excessive page faults and thrashing.
If the total heap size is fixed, increasing the young generation size requires reducing the
tenured generation size. Keep the tenured generation large enough to hold all the live data
used by the application at any given time, plus some amount of slack space (10-20% or
more).
Subject to the above constraint on the tenured generation:
o Grant plenty of memory to the young generation.
o Increase the young generation size as you increase the number of processors, since
allocation can be parallelized.
5. Available Collectors
The discussion to this point has been about the serial collector. The Java HotSpot VM
includes three different collectors, each with different performance characteristics.
1. The serial collector uses a single thread to perform all garbage collection work, which makes
it relatively efficient since there is no communication overhead between threads. It is bestsuited to single processor machines, since it cannot take advantage of multiprocessor
hardware, although it can be useful on multiprocessors for applications with small data sets
(up to approximately 100MB). The serial collector is selected by default on certain hardware
and operating system configurations, or can be explicitly enabled with the option XX:+UseSerialGC.
2. The parallel collector (also known as the throughput collector) performs minor collections in
parallel, which can significantly reduce garbage collection overhead. It is intended for
applications with medium- to large-sized data sets that are run on multiprocessor or multithreaded hardware. The parallel collector is selected by default on certain hardware and
operating system configurations, or can be explicitly enabled with the option XX:+UseParallelGC.
o New: parallel compaction is a feature introduced in J2SE 5.0 update 6 and enhanced
in Java SE 6 that allows the parallel collector to perform major collections in parallel.
Without parallel compaction, major collections are performed using a single thread,
which can significantly limit scalability. Parallel compaction is enabled by adding the
option -XX:+UseParallelOldGC to the command line.
3. The concurrent collector performs most of its work concurrently (i.e., while the application is
still running) to keep garbage collection pauses short. It is designed for applications with
medium- to large-sized data sets for which response time is more important than overall
throughput, since the techniques used to minimize pauses can reduce application
performance. The concurrent collector is enabled with the option XX:+UseConcMarkSweepGC.
Selecting a Collector
Unless your application has rather strict pause time requirements, first run your application
and allow the VM to select a collector. If necessary, adjust the heap size to improve
performance. If the performance still does not meet your goals, then use the following
guidelines as a starting point for selecting a collector.
1. If the application has a small data set (up to approximately 100MB), then
These guidelines provide only a starting point for selecting a collector because
performance is dependent on the size of the heap, the amount of live data
maintained by the application and the number and speed of available
processors. Pause times are particularly sensitive to these factors, so the
threshold of one second mentioned above is only approximate: the parallel
collector will experience pause times longer than one second on many data size
and hardware combinations; conversely, the concurrent collector may not be
able to keep pauses shorter than one second on some combinations.
If the recommended collector does not achieve the desired performance, first attempt to
adjust the heap and generation sizes to meet the desired goals. If still unsuccessful, then try a
different collector: use the concurrent collector to reduce pause times and use the parallel
collector to increase overall throughput on multiprocessor hardware.
The number of garbage collector threads can be controlled with the command line option XX:ParallelGCThreads=<N>. If explicit tuning of the heap is being done with command line
options, the size of the heap needed for good performance with the parallel collector is to first
order the same as needed with the serial collector. Enabling the parallel collector should just
make the minor collection pauses shorter. Because there are multiple garbage collector
threads participating in the minor collection there is a small possibility of fragmentation due
to promotions from the young generation to the tenured generation during the collection.
Each garbage collection thread reserves a part of the tenured generation for promotions and
the division of the available space into these "promotion buffers" can cause a fragmentation
effect. Reducing the number of garbage collector threads will reduce this fragmentation effect
as will increasing the size of the tenured generation.
Generations
As mentioned earlier, the arrangement of the generations is different in the parallel collector.
That arrangement is shown in the figure below.
Ergonomics
Starting in J2SE 5.0, the parallel collector is selected by default on server-class machines as
detailed in the document Garbage Collector Ergonomics. In addition, the parallel collector
uses a method of automatic tuning that allows desired behaviors to be specified instead of
generation sizes and other low-level tuning details. The behaviors that can be specified are:
The maximum pause time goal is specified with the command line option XX:MaxGCPauseMillis=<N>. This is interpreted as a hint that pause times of <N>
milliseconds or less are desired; by default there is no maximum pause time goal. If a pause
time goal is specified, the heap size and other garbage collection related parameters are
adjusted in an attempt to keep garbage collection pauses shorter than the specified value.
Note that these adjustments may cause the garbage collector to reduce the overall throughput
of the application and in some cases the desired pause time goal cannot be met.
The throughput goal is measured in terms of the time spent doing garbage collection vs. the
time spent outside of garbage collection (referred to as application time). The goal is
specified by the command line option -XX:GCTimeRatio=<N>, which sets the ratio of garbage
collection time to application time to 1 / (1 + <N>).
For example, -XX:GCTimeRatio=19 sets a goal of 1/20 or 5% of the total time in garbage
collection. The default value is 99, resulting in a goal of 1% of the time in garbage collection.
Maxmimum heap footprint is specified using the existing option -Xmx<N>. In addition, the
collector has an implicit goal of minimizing the size of the heap as long as the other goals are
being met.
Priority of goals
The maximum pause time goal is met first. Only after it is met is the throughput goal
addressed. Similarly, only after the first two goals have been met is the footprint goal
considered.
Generation Size Adjustments
The statistics such as average pause time kept by the collector are updated at the end of each
collection. The tests to determine if the goals have been met are then made and any needed
adjustments to the size of a generation is made. The exception is that explicit garbage
collections (e.g., calls to System.gc()) are ignored in terms of keeping statistics and making
adjustments to the sizes of generations.
Growing and shrinking the size of a generation is done by increments that are a fixed
percentage of the size of the generation so that a generation steps up or down toward its
desired size. Growing and shrinking are done at different rates. By default a generation grows
in increments of 20% and shrinks in increments of 5%. The percentage for growing is
controlled by the command line flag -XX:YoungGenerationSizeIncrement=<Y> for the
young generation and -XX:TenuredGenerationSizeIncrement=<T> for the tenured
generation. The percentage by which a generation shrinks is adjusted by the command line
flag -XX:AdaptiveSizeDecrementScaleFactor=<D>. If the growth increment is X percent,
the decrement for shrinking is X / D percent.
If the collector decides to grow a generation at startup, there is a supplemental percentage
added to the increment. This supplement decays with the number of collections and there is
no long term affect of this supplement. The intent of the supplement is to increase startup
performance. There is no supplement to the percentage for shrinking.
If the maximum pause time goal is not being met, the size of only one generation is shrunk at
a time. If the pause times of both generations are above the goal, the size of the generation
with the larger pause time is shrunk first.
If the throughput goal is not being met, the sizes of both generations are increased. Each is
increased in proportion to its respective contribution to the total garbage collection time. For
example, if the garbage collection time of the young generation is 25% of the total collection
time and if a full increment of the young generation would be by 20%, then the young
generation would be increased by 5%.
Default Heap Size
If not otherwise set on the command line, the initial and maximum heap sizes are calculated
based on the amount of memory on the machine. The proportion of memory to use for the
heap is controlled by the command line options DefaultInitialRAMFraction and
DefaultMaxRAMFraction, as shown in the table below. (In the table, memory represents the
amount of memory on the machine.)
Formula
Default
memory /
DefaultInitialRAMFraction
memory / 64
maximum heap
size
MIN(memory /
DefaultMaxRAMFraction, 1GB)
MIN(memory / 4,
1GB)
Note that the default maximum heap size will not exceed 1GB, regardless of how much
memory is installed on the machine.
The parallel collector will throw an OutOfMemoryError if too much time is being spent in
garbage collection: if more than 98% of the total time is spent in garbage collection and less
than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is
designed to prevent applications from running for an extended period of time while making
little or no progress because the heap is too small. If necessary, this feature can be disabled
by adding the option -XX:-UseGCOverheadLimit to the command line.
Measurements
The verbose garbage collector output from the parallel collector is essentially the same as that
from the serial collector.
Overhead of Concurrency
The concurrent collector trades processor resources (which would otherwise be available to
the application) for shorter major collection pause times. The most visible overhead is the use
of one or more processors during the concurrent parts of the collection. On an N processor
system, the concurrent part of the collection will use K/N of the available processors, where
1 <= K <= ceiling{N/4}. (Note that the precise choice of and bounds on K are subject to
change.) In addition to the use of processors during concurrent phases, additional overhead is
incurred to enable concurrency. Thus while garbage collection pauses are typically much
shorter with the concurrent collector, application throughput also tends to be slightly lower
than with the other collectors.
On a machine with more than one processing core, there are processors available for
application threads during the concurrent part of the collection, so the concurrent garbage
collector thread does not "pause" the application. This usually results in shorter pauses, but
again fewer processor resources are available to the application and some slowdown should
be expected, especially if the application utilizes all of the processing cores maximally. Up to
a limit, as N increases the reduction in processor resources due to concurrent garbage
collection becomes smaller, and the benefit from concurrent collection increases. The
following section, concurrent mode failure, discusses potential limits to such scaling.
Since at least one processor is utilized for garbage collection during the concurrent phases,
the concurrent collector does not normally provide any benefit on a uniprocessor (singlecore) machine. However, there is a separate mode available that can achieve low pauses on
systems with only one or two processors; see incremental mode below for details.
Floating Garbage
The concurrent collector, like all the other collectors in HotSpot, is a tracing collector that
identifies at least all the reachable objects in the heap. In the parlance of Jones and Lins it is
an incremental update collector. Because application threads and the garbage collector thread
run concurrently during a major collection, objects that are traced by the garbage collector
thread may subsequently become unreachable by the time collection finishes. Such
unreachable objects that have not yet been reclaimed are referred to as floating garbage. The
amount of floating garbage depends on the duration of the concurrent collection cycle and on
the frequency of reference updates, also known as mutations, by the application. Furthermore,
since the young generation and the tenured generation are collected independently, each acts
a source of roots to the other. As a rough rule of thumb, try increasing the size of the tenured
generation by 20% to account for the floating garbage. Floating garbage in the heap at the
end of one concurrent collection cycle is collected during the next collection cycle.
Pauses
The concurrent collector pauses an application twice during a concurrent collection cycle.
The first pause is to mark as live the objects directly reachable from the roots (e.g., object
references from application thread stacks and registers, static objects and so on) and from
elsewhere in the heap (e.g., the young generation). This first pause is referred to as the initial
mark pause. The second pause comes at the end of the concurrent tracing phase and finds
objects that were missed by the concurrent tracing due to updates by the application threads
of references in an object after the concurrent collector had finished tracing that object. This
second pause is referred to as the remark pause.
Concurrent Phases
The concurrent tracing of the reachable object graph occurs between the initial mark pause
and the remark pause. During this concurrent tracing phase one or more concurrent garbage
collector threads may be using processor resources that would otherwise have been available
to the application and, as a result, compute-bound applications may see a commensurate fall
in application throughput during this and other concurrent phases even though the application
threads are not paused. After the remark pause, there is a concurrent sweeping phase which
collects the objects identified as unreachable. Once a collection cycle completes, the
concurrent collector will wait, consume almost no computational resources, until the start of
the next major collection cycle.
Based on recent history, the concurrent collector maintains estimates of the time remaining
before the tenured generation will be exhausted and of the time needed for a concurrent
collection cycle. Based on these dynamic estimates, a concurrent collection cycle will be
started with the aim of completing the collection cycle before the tenured generation is
exhausted. These estimates are padded for safety, since the concurrent mode failure can be
very costly.
A concurrent collection will also start if the occupancy of the tenured generation exceeds an
initiating occupancy, a percentage of the tenured generation. The default value of this
initiating occupancy threshold is approximately 92%, but the value is subject to change from
release to release. This value can be manually adjusted using the command line option
-XX:CMSInitiatingOccupancyFraction=<N>
Scheduling Pauses
The pauses for the young generation collection and the tenured generation collection occur
independently. They do not overlap, but may occur in quick succession such that the pause
from one collection, immediately followed by one from the other collection, can appear to be
a single, longer pause. To avoid this, the concurrent collector attempts to schedule the remark
pause roughly midway between the previous and next young generation pauses. This
scheduling is currently not done for the initial mark pause, which is usually much shorter than
the remark pause.
Incremental Mode
The concurrent collector can be used in a mode in which the concurrent phases are done
incrementally. Recall that during a concurrent phase the garbage collector thread is using one
or more processors. The incremental mode is meant to lessen the impact of long concurrent
phases by periodically stopping the concurrent phase to yield back the processor to the
application. This mode, referred to here as i-cms, divides the work done concurrently by
the collector into small chunks of time which are scheduled between young generation
collections. This feature is useful when applications that need the low pause times provided
by the concurrent collector are run on machines with small numbers of processors (e.g., 1 or
2).
The concurrent collection cycle typically includes the following steps:
stop all application threads and identify the set of objects reachable from roots, then
resume all application threads
concurrently trace the reachable object graph, using one or more processors, while the
application threads are executing
concurrently retrace sections of the object graph that were modified since the tracing in the
previous step, using one processor
stop all application threads and retrace sections of the roots and object graph that may have
been modified since they were last examined, then resume all application threads
concurrently sweep up the unreachable objects to the free lists used for allocation, using
one processor
concurrently resize the heap and prepare the support data structures for the next collection
cycle, using one processor
Normally, the concurrent collector uses one or more processors during the entire concurrent
tracing phase, without voluntarily relinquishing them. Similarly, one processor is used for the
entire concurrent sweep phase, again without relinquishing it. This overhead can be too much
of a disruption for applications with response time constraints that might otherwise have
utilized the processing cores, particularly when run on systems with just one or two
processors. Incremental mode solves this problem by breaking up the concurrent phases into
short bursts of activity, which are scheduled to occur mid-way between minor pauses.
i-cms uses a duty cycle to control the amount of work the concurrent collector is allowed to
do before voluntarily giving up the processor. The duty cycle is the percentage of time
between young generation collections that the concurrent collector is allowed to run. i-cms
can automatically compute the duty cycle based on the behavior of the application (the
recommended method, known as automatic pacing), or the duty cycle can be set to a fixed
value on the command line.
Command Line Options
The following command-line options control i-cms (see below for recommendations for an
initial set of options):
Default Value
Option
Description
-XX:+CMSIncrementalMode
-XX:+CMSIncrementalPacing
-XX:CMSIncrementalDutyCycle=<N>
disabled enabled
50
10
Default Value
Option
Description
10
10
10
CMSIncrementalPacing is enabled.
The percentage (0-100) used to add
XX:CMSIncrementalSafetyFactor=<N> conservatism when computing the
duty cycle.
-XX:CMSIncrementalOffset=<N>
-XX:CMSExpAvgFactor=<N>
25
25
Recommended Options
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
The first two options enable the concurrent collector and i-cms, respectively. The last two
options are not required; they simply cause diagnostic information about garbage collection
to be written to stdout, so that garbage collection behavior can be seen and later analyzed.
Note that in J2SE 5.0 and earlier releases, we recommend the following as an initial set of
command line options for i-cms:
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode \
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps \
-XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0
-XX:CMSIncrementalDutyCycle=10
These are the same as recommended for Java SE 6, with the addition of three options that
control i-cms automatic pacing. The additional options simply specify the values that became
the default in Java SE 6.
Basic Troubleshooting
The i-cms automatic pacing feature uses statistics gathered while the program is running to
compute a duty cycle so that concurrent collections complete before the heap becomes full.
However, past behavior is not a perfect predictor of future behavior and the estimates may
not always be accurate enough to prevent the heap from becoming full. If too many full
collections occur, try the following steps, one at a time:
Step
1. Increase the safety factor:
Options
-XX:CMSIncrementalSafetyFactor=<N>
Measurements
Below is the output from the concurrent collector with the options -verbose:gc XX:+PrintGCDetails, with a few minor details removed. Note that the output for the
concurrent collector is interspersed with the output from the minor collections; typically
many minor collections occur during a concurrent collection cycle. The CMS-initial-
mark: indicates the start of the concurrent collection cycle. The CMS-concurrentmark: indicates the end of the concurrent marking phase and CMS-concurrentsweep: marks the end of the concurrent sweeping phase. Not discussed before is the
precleaning phase indicated by CMS-concurrent-preclean:. Precleaning represents
work that can be done concurrently in preparation for the remark phase CMS-remark. The
final phase is indicated by the CMS-concurrent-reset: and is in preparation for the
next concurrent collection.
[GC [1 CMS-initial-mark: 13991K(20288K)] 14103K(22400K), 0.0023781 secs]
[GC [DefNew: 2112K->64K(2112K), 0.0837052 secs] 16103K->15476K(22400K),
0.0838519 secs]
...
[GC [DefNew: 2077K->63K(2112K), 0.0126205 secs] 17552K->15855K(22400K),
0.0127482 secs]
[CMS-concurrent-mark: 0.267/0.374 secs]
[GC [DefNew: 2111K->64K(2112K), 0.0190851 secs] 17903K->16154K(22400K),
0.0191903 secs]
[CMS-concurrent-preclean: 0.044/0.064 secs]
[GC [1 CMS-remark: 16090K(20288K)] 17242K(22400K), 0.0210460 secs]
[GC [DefNew: 2112K->63K(2112K), 0.0716116 secs] 18177K->17382K(22400K),
0.0718204 secs]
[GC [DefNew: 2111K->63K(2112K), 0.0830392 secs] 19363K->18757K(22400K),
0.0832943 secs]
...
[GC [DefNew: 2111K->0K(2112K), 0.0035190 secs] 17527K->15479K(22400K),
0.0036052 secs]
[CMS-concurrent-sweep: 0.291/0.662 secs]
[GC [DefNew: 2048K->0K(2112K), 0.0013347 secs] 17527K->15479K(27912K),
0.0014231 secs]
[CMS-concurrent-reset: 0.016/0.016 secs]
[GC [DefNew: 2048K->1K(2112K), 0.0013936 secs] 17527K->15479K(27912K),
0.0014814 secs]
The initial mark pause is typically short relative to the minor collection pause time. The
concurrent phases (concurrent mark, concurrent preclean and concurrent sweep) normally last
significantly longer than a minor collection pause, as indicated by the example output above.
Note, however, that the application is not paused during these concurrent phases. The remark
pause is often comparable in length to a minor collection. The remark pause is affected by
certain application characteristics (e.g., a high rate of object modification can increase this
pause) and the time since the last minor collection (i.e., more objects in the young generation
may increase this pause).
8. Other Considerations
Permanent Generation Size
The permanent generation does not have a noticeable impact on garbage collector
performance for most applications. However, some applications dynamically generate and
load many classes; for example, some implementations of JavaServer Pages (JSP) pages.
These applications may need a larger permanent generation to hold the additional classes. If
so, the maximum permanent generation size can be increased with the command-line option XX:MaxPermSize=<N>.
specifies explicit collection once per hour instead of the default rate of once per minute.
However, this may also cause some objects to take much longer to be reclaimed. These
properties can be set as high as Long.MAX_VALUE to make the time between explicit
collections effectively infinite, if there is no desire for an upper bound on the timeliness of
DGC activity.
Soft References
Soft references are kept alive longer in the server virtual machine than in the client. The rate
of clearing can be controlled with the command line option XX:SoftRefLRUPolicyMSPerMB=<N>, which specifies the number of milliseconds a soft
reference will be kept alive (once it is no longer strongly reachable) for each megabyte of free
space in the heap. The default value is 1000 ms per megabyte, which means that a soft
reference will survive (after the last strong reference to the object has been collected) for 1
second for each megabyte of free space in the heap. Note that this is an approximate figure
since soft references are cleared only during garbage collection, which may occur
sporadically.
The above is necessary only on Solaris 8, since the alternate libthread is the default in the
Solaris 9 Operating System and is the only libthread available starting with Solaris 10.
9. Resources
1. HotSpot VM Frequently Asked Questions (FAQ)
2. GC output examples describes how to interpret the output from the different collectors.
3. How to Handle Java Finalization's Memory-Retention Issues covers finalization pitfalls and
ways to avoid them.
4. Richard Jones and Rafael Lins, Garbage Collection: Algorithms for Automated Dynamic
Memory Management, Wiley and Sons (1996), ISBN 0-471-94148-4
As used on the web site, the terms "Java Virtual Machine" and "JVM" mean a virtual
machine for the Java platform.
Static methods (in fact all methods) as well as static variables are stored in the
PermGen section of the heap, since they are part of the reflection data (class
related data, not instance related).
Update for clarification:
31 down
vote
accepted
Note that only the variables and their technical values (primitives or references)
are stored in PermGen space.
If your static variable is a reference to an object that object itself is stored in the
normal sections of the heap (young/old generation or survivor space). Those
objects (unless they are interal objects like classes etc.) are not stored in PermGen
space.
Example:
static int i = 1; //the value 1 is stored in the permgen section
static Object o = new SomeObject(); //the
reference(pointer/memory address) is stored in the permgen
section, the object itself is not.
JVM memory area related jargons are key to understand the JVM on the whole. In this article
let us discuss about the important memory areas in JVM.
Heap Memory
Class instances and arrays are stored in heap memory. Heap memory is also called as shared
memory. As this is the place where multiple threads will share the same data.
Non-heap Memory
It comprises of Method Area and other memory required for internal processing. So here
the major player is Method Area.
Method Area
As given in the last line, method area is part of non-heap memory. It stores per-class
structures, code for methods and constructors. Per-class structure means runtime constants
and static fields.
Memory Pool
Memory pools are created by JVM memory managers during runtime. Memory pool may
belong to either heap or non-heap memory.
Memory Generations
HotSpot VMs garbage collector uses generational garbage collection. It separates the JVMs
memory into and they are called young generation and old generation.
Young Generation
Young generation memory consists of two parts, Eden space and survivor space. Shortlived
objects will be available in Eden space. Every object starts its life from Eden space. When
GC happens, if an object is still alive and it will be moved to survivor space and other
dereferenced objects will be removed.
Old Generation Tenured and PermGen
Old generation memory has two parts, tenured generation and permanent generation
(PermGen). PermGen is a popular term. We used to error like PermGen space not sufficient.
GC moves live objects from survivor space to tenured generation. The permanent generation
contains meta data of the virtual machine, class and method objects.
Discussion:
Java specification doesnt give hard and fast rules about the design of JVM heap data area. So
it is left to the JVM implementers and they can decide on things like whether to allocate fixed
memory size or dynamic.
Key Takeaways
References:
MemoryPoolMXBean provides you api to explore the memory usage, threshold notifications,
peak memory usage and memory usage monitoring.
Java Docs API for JConsole
Threads and Locks chapter of Java Language Specification talks lot about java memory
Page 1 of 3
Performance tuning is not a silver bullet. Simply put, good system performance depends on:
good design, good implementation, defined performance objectives, and performance tuning.
Since JBoss Performance tuning involves also tuning the environment on which jBoss is
run, the first tutorial will start discussing about JVM settings and OS settings on which JBoss
can produce best results. Then we'll see some specific JBoss config settings.
One strength of the J2SE platform is that it shields the developer from the complexity of
memory allocation. However, once garbage collection is the principal bottleneck, it is worth
understanding some aspects of this hidden implementation
An object is considered garbage when it can no longer be reached from any pointer in the
running program. The most straightforward garbage collection algorithms simply iterate over
every reachable object. Any objects left over are then considered garbage. The time this
approach takes is proportional to the number of live objects,
The complete address space reserved for object memory can be divided into the young and
tenured generations.
The young generation consists of eden and two survivor spaces. Most objects are initially
allocated in eden. One survivor space is empty at any time, and serves as the destination of
any live objects in eden and the other survivor space during the next copying collection.
Objects are copied between survivor spaces in this way until they are old enough to be
tenured (copied to the tenured generation).
A third generation closely related to the tenured generation is the permanent generation
which holds data needed by the virtual machine to describe objects that do not have an
equivalence at the Java language level. For example objects describing classes and methods
are stored in the permanent generation
Use the the command line option -verbose:gc causes information about the heap and garbage
collection to be printed at each collection. For example, here is output from a large server
application:
It's demonstrated
that an application that spends 10% of its time in garbage collection can lose 75% of its
throughput when scaled out to 32 processors
(http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html)
JBoss tuning tip 2: Set -Xms and -Xmx to the same value
By default, the virtual machine grows or shrinks the heap at each collection to try to keep the
proportion of free space to live objects at each collection within a specific range.
Setting -Xms and -Xmx to the same value. This increase predictability by removing the most
important sizing decision from the virtual machine.
JBoss tuning tip 3: Use server VM
The server JVM is better suited to longer running applications. To enable it simply set the server option on the command line.
JBoss tuning tip 4: Turn off distributed gc
The RMI system provides a reference counting distributed garbage collection algorithm. This
system works by having the server keep track of which clients have requested access to
remote objects running on the server. When a reference is made, the server marks the object
as "dirty" and when a client drops the reference, it is marked as being "clean.". However this
system is quite expensive and by default runs every minute.
Set it to run every 30 minute at least
-Dsun.rmi.dgc.client.gcInterval=1800000
-Dsun.rmi.dgc.server.gcInterval=1800000
JBoss tuning tip 6: Turn on parallel gc
If you have multiple proessors you can do your garbage collection with multiple threads. By
default the parallel collector runs a collection thread per processor, that is if you have an 8
processor box then you'll garbage collect your data with 8 threads. In order to turn on the
parallel collector use the flag -XX:+UseParallelGC. You can also specify how many threads
you want to dedicate to garbage collection using the flag -XX:ParallelGCThreads=8.
More JVMs/smaller heaps can outperform fewer JVMs/Larger Heaps. So instead of huge
heaps, use additional server nodes. Set up a JBoss cluster and balance work between nodes.
JBoss tuning tip 8: Don't choose an heap larger then 70% of your OS memory
Choose a maximum heap size not more then 70% of the memory to avoid excessive page
faults and thrashing.
Prev
Next >>
This is one of most important tuning factor: the heap ratio. The heap ratio specifies how the
amount of the total heap will be partitioned between the young and the tenured space. What
happens if you have lots of long lived data (cached data, collections ) ? maybe you're in this
situation:
The problem here is that the long lived data overflows the tenured generation. When a
collection is needed the tenured generation is basically full of live data. Much of the young
generation is also filled with long lived data. The result was that a minor collection could
not be done successfully (there wasn't enough room in the tenured generation for the
anticipated promotions out of the young generation) so a major collection was done.
The major collection worked fine, but the results again was that the tenured generation was
full of long lived data and there was long lived data in the young generation. There was also
free space in the young generation for more allocations, but the next collection was again
destined to be a major collection.
Each operating system sets default tuning parameters differently. For Windows platforms, the
default settings are usually sufficient. However, the UNIX and Linux operating systems
usually need to be tuned appropriately
Solaris tuning parameters:
Optimize MTU. The TCP maximum transfer unit is 1512 on the Internet. If you are sending
larger packets it's a good idea to increase MTU size in order to reduce packet fragmentation
(especially if you have a slow network)
vi /etc/sysconfig/network-scripts/ifcfg-xxx (eth0 for instance)
add "MTU=9000" (for gigabit ethernet)
restart the interface (ifdown eth0;ifup eth0)
Use Big Memory Pages
Default page size is 4KB (usually too small!)
Check page size with:
$ cat /proc/meminfo
If you see "HugePage_Total," "HugePages_Free" and "Hugepagesize", you can apply this
optimization
Here's how to do it (2GB Heap Size Example)
$ echo 2147483647 > /proc/sys/kernel/shmmax
$ echo 1000 > /proc/sys/vm/nr_hugepages
In Sun's JVM, add this flag: XX:+UseLargePages
<< Prev
Next >>
Page 3 of 3
JBoss tuning tip 12: Lots of Requests ? check JBoss thread pool
<mbean code="org.jboss.util.threadpool.BasicThreadPool"
name="jboss.system:service=ThreadPool">
<attribute name="MaximumQueueSize">1000</attribute>
<attribute name="BlockingMode">run</attribute>
</mbean>
For most applications this defaults will just work well, however if you are running an
application with issues lots of requests to jboss (such as EJB invocations) then monitor your
thread pool. Open the Web Console and look for the MBean
jboss.system:service=ThreadPool.
Start a monitor on the QueueSize parameter. Have you got a QueueSize which reaches
MaximumPoolSize ? then probably you need to set a higher MaximumPoolSize pool size
attribute
Watchout! Speak at first with your sysadmin and ensure that the CPU capacity support the
increase in threads.
Watchout! if your threads make use of JDBC connections you'll probably need to increase
also the JDBC connection pool accordingly. Also verify that your HTTP connector is enabled
to handle that amount of requests
JBoss tuning tip 13: Check the Embedded web container
JBoss supports connectors for http, https, and ajp. The configuration file is server.xml and it's
deployed in the root of JBoss web container (In JBoss 4.2.0
it's: "JBOSS_HOME\server\default\deploy\jboss-web.deployer")
?
1<Connector port="8080" address="${jboss.bind.address}"
2maxThreads="250" maxHttpHeaderSize="8192"
3emptySessionPath="true" protocol="HTTP/1.1"
4enableLookups="false" redirectPort="8443" acceptCount="100"
5connectionTimeout="20000" disableUploadTimeout="true" />
The underlying HTTP connector of JBoss needs to be fine tuned for production settings. The
important parameters are:
maxThreads - This indicates the maximum number of threads to be allocated for handling
client HTTP requests. This figure corresponds to the concurrent users that are going to access
the application. Depending on the machine configuration, there is a physical limit beyond
which you will need to do clustering.
acceptCount - This is the number of request threads that are put in request queue when all
available threads are used. When this exceeds, client machines get a request timeout
response.
compression - If you set this attribute to force, the content will be compressed by JBoss
and will be send to browser. Browser will extract it and display the page on screen. Enabling
compression can substantially reduce bandwidth requirements of your application.
So how do you know if it's necessary to raise your maxThreads number ? again open the
web console and look for the MBean jboss.web:name=http-127.0.0.1-8080,type=ThreadPool.
The key attribute is currentThreadsBusy. If it's about 70-80% of the the maxThreads you
should consider raising the number of maxThreads.
Watch out! if you increase the maxThreads count you need to raise your JBoss Thread pool
accordingly.
JBoss tuning tip 14: Turn off JSP Compilation in production
JBoss application server regularly checks whether a JSP requires compilation to a servlet
before executing a JSP. In a production server, JSP files wont change and hence you can
configure the settings for increased performance.
Open the web.xml in deploy/jboss-web.deployer/conf folder. Look for the jsp servlet in the
file and modify the following XML fragment as given below:
?
<init-param>
<param-name>development</param-name>
2 <param-value>false</param-value>
3</init-param>
4
5
<init-param>
<param-name>checkInterval</param-name>
<param-value>300</param-value>
6
</init-param>
7
8
References:
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
http://people.redhat.com/alikins/system_tuning.html
http://community.jboss.org/wiki/JBossASTuningSliming
Ok. now keep reading........JBoss performance tuning part 2
What are the benefits of knowing how garbage collection (GC) works in Java? Satisfying the
intellectual curiosity as a software engineer would be a valid cause, but also, understanding
how GC works can help you write much better Java applications.
This is a very personal and subjective opinion of mine, but I believe that a person well versed
in GC tends to be a better Java developer. If you are interested in the GC process, that means
you have experience in developing applications of certain size. If you have thought carefully
about choosing the right GC algorithm, that means you completely understand the features of
the application you have developed. Of course, this may not be common standards for a good
developer. However, few would object when I say that understanding GC is a requirement for
being a great Java developer.
This is the first of a series of "Become a Java GC Expert" articles. I will cover the GC
introduction this time, and in the next article, I will talk about analyzing GC status and GC
tuning examples from NHN.
The purpose of this article is to introduce GC to you in an easy way. I hope this article proves
to be very helpful. Actually, my colleagues have already published a few great articles on
Java Internals which became quite popular on Twitter. You may refer to them as well.
Returning back to Garbage Collection, there is a term that you should know before learning
about GC. The term is "stop-the-world." Stop-the-world will occur no matter which GC
algorithm you choose. Stop-the-world means that the JVM is stopping the application from
running to execute a GC. When stop-the-world occurs, every thread except for the threads
needed for the GC will stop their tasks. The interrupted tasks will resume only after the GC
task has completed. GC tuning often means reducing this stop-the-world time.
Java does not explicitly specify a memory and remove it in the program code. Some people
sets the relevant object to null or use System.gc() method to remove the memory explicitly.
Setting it to null is not a big deal, but calling System.gc() method will affect the system
performance drastically, and must not be carried out. (Thankfully, I have not yet seen any
developer in NHN calling this method.)
In Java, as the developer does not explicitly remove the memory in the program code, the
garbage collector finds the unnecessary (garbage) objects and removes them. This garbage
collector was created based on the following two hypotheses. (It is more correct to call them
suppositions or preconditions, rather than hypotheses.)
These hypotheses are called the weak generational hypothesis. So in order to preserve the
strengths of this hypothesis, it is physically divided into two - young generation and old
generation - in HotSpot VM.
Young generation: Most of the newly created objects are located here. Since most objects
soon become unreachable, many objects are created in the young generation, then disappear.
When objects disappear from this area, we say a "minor GC" has occurred.
Old generation: The objects that did not become unreachable and survived from the young
generation are copied here. It is generally larger than the young generation. As it is bigger in
size, the GC occurs less frequently than in the young generation. When objects disappear
from the old generation, we say a "major GC" (or a "full GC") has occurred.
Let's look at this in a chart.
The permanent generation from the chart above is also called the "method area," and it
stores classes or interned character strings. So, this area is definitely not for objects that
survived from the old generation to stay permanently. A GC may occur in this area. The GC
that took place here is still counted as a major GC.
Some people may wonder:
What if an object in the old generation need to reference an object in the young
generation?
To handle these cases, there is something called the a "card table" in the old generation,
which is a 512 byte chunk. Whenever an object in the old generation references an object in
the young generation, it is recorded in this table. When a GC is executed for the young
generation, only this card table is searched to determine whether or not it is subject for GC,
instead of checking the reference of all the objects in the old generation. This card table is
managed with write barrier. This write barrier is a device that allows a faster performance
for minor GC. Though a bit of overhead occurs because of this, the overall GC time is
reduced.
There are 3 spaces in total, two of which are Survivor spaces. The order of execution process
of each space is as below:
1. The majority of newly created objects are located in the Eden space.
2. After one GC in the Eden space, the surviving objects are moved to one of the Survivor
spaces.
3. After a GC in the Eden space, the objects are piled up into the Survivor space, where other
surviving objects already exist.
4. Once a Survivor space is full, surviving objects are moved to the other Survivor space. Then,
the Survivor space that is full will be changed to a state where there is no data at all.
5. The objects that survived these steps that have been repeated a number of times are moved
to the old generation.
As you can see by checking these steps, one of the Survivor spaces must remain empty. If
data exists in both Survivor spaces, or the usage is 0 for both spaces, then take that as a sign
that something is wrong with your system.
The process of data piling up into the old generation through minor GCs can be shown as in
the below chart:
objects are created, only the lastly added object needs to be checked, which allows much
faster memory allocations. However, it is a different story if we consider a multithreaded
environment. To save objects used by multiple threads in the Eden space for Thread-Safe, an
inevitable lock will occur and the performance will drop due to the lock-contention. TLABs
is the solution to this problem in HotSpot VM. This allows each thread to have a small
portion of its Eden space that corresponds to its own share. As each thread can only access to
their own TLAB, even the bump-the-pointer technique will allow memory allocations
without a lock.
This has been a quick overview of the GC in the young generation. You do not necessarily
have to remember the two techniques that I have just mentioned. You will not go to jail for
not knowing them. But please remember that after the objects are first created in the Eden
space, and the long-surviving objects are moved to the old generation through the Survivor
space.
Serial GC
Parallel GC
Parallel Old GC (Parallel Compacting GC)
Concurrent Mark & Sweep GC (or "CMS")
Garbage First (G1) GC
Among these, the serial GC must not be used on an operating server. This GC type was
created when there was only one CPU core on desktop computers. Using this serial GC will
drop the application performance significantly.
Now let's learn about each GC type.
Serial GC (-XX:+UseSerialGC)
The GC in the young generation uses the type we explained in the previous paragraph. The
GC in the old generation uses an algorithm called "mark-sweep-compact."
1. The first step of this algorithm is to mark the surviving objects in the old generation.
2. Then, it checks the heap from the front and leaves only the surviving ones behind (sweep).
3. In the last step, it fills up the heap from the front with the objects so that the objects are
piled up consecutively, and divides the heap into two parts: one with objects and one
without objects (compact).
The serial GC is suitable for a small memory and a small number of CPU cores.
Parallel GC (-XX:+UseParallelGC)
Parallel Old GC was supported since JDK 5 update. Compared to the parallel GC, the only
difference is the GC algorithm for the old generation. It goes through three steps: mark
summary compaction. The summary step identifies the surviving objects separately for the
areas that the GC have previously performed, and thus different from the sweep step of the
mark-sweep-compact algorithm. It goes through a little more complicated steps.
CMS GC (-XX:+UseConcMarkSweepGC)
You need to carefully review before using this type. Also, if the compaction task needs to be
carried out because of the many memory fragments, the stop-the-world time can be longer
than any other GC types. You need to check how often and how long the compaction task is
carried out.
G1 GC
paper provided by the Oracle website. (The book is different from "Java Performance
Tuning.")
By Sangmin Lee, Senior Engineer at Performance Engineering Lab, NHN Corporation.