You are on page 1of 7

Lecture Notes: 9

Dangling pointers
A dangling pointer is a pointer to storage that is no longer allocated. Dangling pointers are nasty bugs because they seldom crash the program until long after they have been created, which makes them hard to find. Programs that create dangling pointers often appear to work on small inputs, but are likely to fail on large or complex inputs. Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type, or to a distinguished null pointer value in languages which support this. Dangling pointers arise when an object is deleted or deallocated, without modifying the value of the pointer, so that the pointer still points to the memory location of the deallocated memory. As the system may reallocate the previously freed memory to another process, if the original program then dereferences the (now) dangling pointer, unpredictable behavior may result, as the memory may now contain completely different data. This is especially the case if the program writes data to memory pointed by a dangling pointer, as silent corruption of unrelated data may result, leading to subtle bugs that can be extremely difficult to find, or cause segmentation faults or general protection faults (Windows). If the overwritten data is bookkeeping data used by the system's memory allocator, the corruption can cause system instabilities. Wild pointers arise when a pointer is used prior to initialization to some known state, which is possible in some programming languages. They show the same erratic behaviour as dangling pointers, though they are less likely to stay undetected. Cause of dangling pointers In many languages (particularly the C programming language, which assumes the programmer will take care of all design issues, and hence do not include many of the checks that are present in higher-level languages), deleting an object from memory explicitly or by destroying the stack frame on return does not alter any associated pointers. The pointer still points to the location in memory where the object or data was, even though the object or data has since been deleted and the memory may now be used for other purposes, creating a dangling pointer. A straightforward example is shown below: { char *cp = NULL; /* ... */ { char c; cp = &c; } /* The memory location, which c was occupying, is released here */ /* cp here is now a dangling pointer */ } In the above, one solution to avoid the dangling pointer is to make cp a null pointer after the inner block is exited, or to otherwise guarantee that cp won't be used again without further initialization in the code which follows.

Notes Compiled for III Sem B.Tech CSE C & D batch

Page 1

Lecture Notes: 9

Another frequent source of creating dangling pointers is a jumbled combination of malloc() and free() library calls. In such a case, a pointer becomes dangling when the block of memory it points to is freed. As with the previous example, one way to avoid this is to make sure to set the pointer back to null after freeing the memory, as demonstrated below: #include <stdlib.h> { char *cp = malloc ( A_CONST ); /* ... */ free ( cp ); /* cp now becomes a dangling pointer */ cp = NULL; /* cp is no longer dangling */ /* ... */ } Lastly, a common programming misstep to create a dangling pointer is returning the address of a local variable. Since local variables are deallocated when the function returns, any pointers that point to local variables will become dangling pointers once the stack frame is deallocated. char * func ( void ) { char ca[] = "Pointers and Arrays - II"; /* ... */ return ca; } If it is required to return the address of ca, it should be declared with the static storage specifier if you want ca to retain its value even between multiple calls to func, or it should be dynamically allocated (i.e. via malloc) to have new memory assigned (which would have to be later free'd to avoid a memory leak). Cause of wild pointers Wild pointers are created by omitting necessary initialization prior to first use. Thus, strictly speaking, every pointer in programming languages which do not enforce initialization begins as a wild pointer. This most often occurs due to jumping over the initialization, not by omitting it. Most compilers are able to warn about this. Avoiding dangling pointer errors One approach is to use the Boehm garbage collector, a conservative garbage collector that replaces standard memory allocation functions in C and C++ with a garbage collector. This approach completely eliminates dangling pointer errors altogether by disabling frees, and reclaiming objects by garbage collection. In languages like Java, dangling pointers cannot occur because there is no mechanism to explicitly deallocate memory. Rather, the garbage collector may deallocate memory, but only when the object is no longer reachable from any references.
Notes Compiled for III Sem B.Tech CSE C & D batch

Page 2

Lecture Notes: 9

Dangling pointer detection To expose dangling pointer errors, one common programming technique is to set pointers to the null pointer or to an invalid address once the storage they point to has been released. When the null pointer is dereferenced (in most languages) the program will immediately terminate there is no potential for data corruption or unpredictable behavior. This makes the underlying programming mistake easier to find and resolve. This technique does not help when there are multiple copies of the pointer.

Garbage collection
The basic idea and terms The core GC strategy is: At some frequency, distinguish live from dead objects (tracing) and reclaim dead ones. Live objects are reachable from roots, which are globals, locals, registers of active methods etc... Analogy: Imagine walking to the refrigerator and getting out a bowl of grapes. You pick up the bunch by the stem and look in the bottom of the bowl; there are a bunch of black and blue moldy grapes--that's the "garbage". Anything not reachable from the stem has gone "bad." Imagine a heap of dynamic memory for a running program. Certain variables point into the heap to your data structures. When those variables no longer point at a data structure, nothing can reach the data structure so it's garbage: Before Garbage collection

After Garbage collection, you will see some thing like this

Notes Compiled for III Sem B.Tech CSE C & D batch

Page 3

Lecture Notes: 9

Garbage collection (GC) is a form of automatic memory management. The garbage collector or collector attempts to reclaim garbage, or memory used by objects that will never be accessed or mutated again by the application. Garbage collection was invented by John McCarthy around 1959 to solve the problems of manual memory management in his Lisp programming language. Garbage collection is often portrayed as the opposite of manual memory management, which requires the programmer to specify which objects to deallocate and return to the memory system. Note that there is an ambiguity of terms, as theory often uses the terms manual garbage-collection and automatic garbage-collection rather than manual memory management and garbage-collection, and does not restrict garbage-collection to memory management, rather considering that any logical or physical resource may be garbagecollected. Description The basic principle of how a garbage collector works is: 1. Determine what data objects in a program will not be accessed in the future 2. Reclaim the resources used by those objects By making manual memory deallocation unnecessary, garbage collection frees the programmer from having to worry about releasing objects that are no longer needed, which can otherwise consume a significant amount of design effort. It also aids programmers in their efforts to make programs more stable, because it prevents several classes of runtime errors. For example, it prevents dangling pointer errors, where a reference to a deallocated object is used. (The pointer still points to the location in memory where the object or data was, even though the object or data has since been deleted and the memory may now be used for other purposes, creating a dangling pointer.) Categories of Garbage Collectors Tracing garbage collectors Tracing garbage collectors are the most common type of garbage collector. They first determine which objects are reachable (or potentially reachable), and then discard all remaining objects. Reachability of an object Informally, a reachable object can be defined as an object for which there exists some name in the program environment that leads to it, either directly or through references from other reachable objects. More precisely, objects can be reachable in only two ways: 1. A distinguished set of objects are assumed to be reachable -- these are known as the roots. Typically, these include all the objects referenced from anywhere in the call stack (that is, all local variables and parameters in the functions currently being invoked), and any global variables. 2. Anything referenced from a reachable object is itself reachable; that is, reachability is transitive.

Notes Compiled for III Sem B.Tech CSE C & D batch

Page 4

Lecture Notes: 9

The reachability definition of "garbage" is not optimal, insofar as the last time a program uses an object could be long before that object falls out of the environment scope. A distinction is sometimes drawn between syntactic garbage, those objects the program cannot possibly reach, and semantic garbage, those objects the program will in fact never again use. The problem of precisely identifying semantic garbage can easily be shown to be partially decidable: a program that allocates an object X, runs an arbitrary input program P, and uses X if and only if P finishes would require a semantic garbage collector to solve the halting problem. Although conservative heuristic methods for semantic garbage detection remain an active research area, essentially all practical garbage collectors focus on syntactic garbage as described here. Basic algorithm This set of Garbage collectors are called tracing collectors because they trace through the working set of memory, these garbage collectors perform collection in cycles. A cycle is started when the collector decides (or is notified) that it needs to reclaim storage, which in particular happens when the system is low on memory. The original method involves a naive mark-and-sweep in which the entire memory set is touched several times. This method has been superseded by the so-called 'tri-colour' marking method. In the naive method, each object in memory has a flag (typically a single bit) reserved for garbage collection use only. This flag is always cleared (counter-intuitively), except during the collection cycle. The first stage of collection sweeps the entire 'root set', marking each accessible object as being 'in-use'. All objects transitively accessible from the root set are marked, as well. Finally, each object in memory is again examined; those with the in-use flag still cleared are not reachable by any program or data, and their memory is freed. (For objects which are marked in-use, the in-use flag is cleared again, preparing for the next cycle.) This method has several disadvantages, the most notable being that the entire system must be suspended during collection; no mutation of the working set can be allowed. This will cause programs to 'freeze' periodically (and generally unpredictably), making real-time and timecritical applications impossible. In addition, the entire working memory must be examined, much of it twice, potentially causing problems in paged memory systems. Because of these pitfalls, all modern tracing garbage collectors implement some variant of the tri-colour marking abstraction, but simple collectors (such as the mark-and-sweep collector) often do not make this abstraction explicit. Tri-colour marking works as follows: 1. Create initial white, grey, and black sets; these sets will be used to maintain progress during the cycle. Initially the white set or condemned set is the set of objects that are candidates for having their memory recycled. The black set is the set of objects that cheaply can be proven to have no references to objects in the white set; in many implementations the black set starts off empty. The grey set is all the remaining objects that may or may not have references to objects in the white set (and elsewhere). These sets partition memory; every object in the system, including the root set, is in precisely one set. 2. Pick an object from the grey set. Blacken this object (move it to the black set), by greying all the white objects it references directly. 3. Repeat the previous step until the grey set is empty.

Notes Compiled for III Sem B.Tech CSE C & D batch

Page 5

Lecture Notes: 9

4. When there are no more objects in the grey set, then all the objects remaining in the white set are provably not reachable and the storage occupied by them can be reclaimed. The tri-colour marking algorithm preserves an important invariant: No black object points directly to a white object. This ensures that the white objects can be safely destroyed once the grey set is empty. (Some variations on the algorithm do not preserve the tricolour invariant but they use a modified form for which all the important properties hold.) The tri-colour method has an important advantage: it can be performed 'on-the-fly', without halting the system for significant time periods. This is accomplished by marking objects as they are allocated and during mutation, maintaining the various sets. By monitoring the size of the sets, the system can perform garbage collection periodically, rather than as-needed. Also, the need to touch the entire working set each cycle is avoided. Reference counting Reference counting is a form of automatic memory management where each object has a count of the number of references to it. An object's reference count is incremented when a reference to it is created, and decremented when a reference is destroyed. The object's memory is reclaimed when the count reaches zero. There are two major disadvantages to reference counting:

If two or more objects refer to each other, they can create a cycle whereby neither will be collected as their mutual references never let their reference counts become zero. Some GC systems (like the one in CPython) using reference counting use specific cycle-detecting algorithms to deal with this issue. In naive implementations, each assignment of a reference and each reference falling out of scope often require modifications of one or more reference counters. However, optimizations to this are described in the literature. When used in a multithreaded environment, these modifications (increment and decrement) may need to be interlocked. This is usually a very expensive operation.

Availability Generally speaking, higher-level programming languages are more likely to have garbage collection as a standard feature. In languages that do not have it built in, garbage collection can often be added as a library, as with the Boehm garbage collector for C and C++. Functional programming languages, like ML, Haskell, and APL, have garbage collection built in. Lisp, which introduced functional programming, is especially notable for using garbage collection before this technique was commonly used. Other dynamic languages, such as Perl, Ruby, Python and PHP, also tend to use GC. Objectoriented programming languages like Smalltalk, Java and ECMAScript usually provide integrated garbage collection, a notable exception being C++.

Notes Compiled for III Sem B.Tech CSE C & D batch

Page 6

Lecture Notes: 9

Historically, languages intended for beginners, such as BASIC and the Lisp-derived Logo, have often used garbage collection for variable-length data types, such as strings and lists, so as to not burden novice programmers with the hassles of memory management. On early microcomputers, with their limited memory and slow processors, garbage collection could often cause apparently random, inexplicable pauses in the midst of program operation.

Memory Compaction
It may happen that the fragmentation of the memory space which still occurs will not leave sufficiently large free segments for the operation of a particular program. This would indicate a need for memory compaction. Memory compaction involves moving all of the allocated segments to one "end" of the string space, all adjacent, so that all of the free segments are combined into a single large free block extending to the other end of memory. Since this involves moving data, it will invalidate pointers to the data. For this reason, most compaction schemes require a "master pointer" to each item in the allocated region, so that only one pointer need be adjusted when any item is moved. Most memory compaction is performed "on demand," i.e., when the program runs out of space. This is likely to be a disadvantage in a real-time application, where an unexpected "pause" for a few seconds, while memory is reshuffled, will disrupt the process. An "incremental" memory compactor would be worthy of study.

Notes Compiled for III Sem B.Tech CSE C & D batch

Page 7

You might also like