Breaking News

What is Garbage Collection?

What is Garbage Collection?

Garbage collection refers to the strategy adapted by Microsoft .NET to free unused objects or objects that go out of the scope automatically. This article covers the concepts of Garbage Collection and the strategies adopted by Microsoft .NET for handling managed memory efficiently. Garbage Collection is a process of releasing the memory used by the objects, which are no longer referenced. This is done in different ways and different manners in various platforms and languages. The Common Language Runtime (CLR) requires that the objects should be created in the managed heap, but we do not have to bother with cleaning up the memory once the object goes out of the scope or is no longer needed. This is unlike the strategies adopted in programming languages like C and C++ where we were required to cleanup the heap memory explicitly using a free function of C and delete operator of C++. Under normal circumstances Garbage collector is a low priority thread but when memory becomes limited, Garbage collector becomes high priority and release memory automatically. But we cannot actually tell when Garbage Collector does the job.

Note: Generally, the garbage collector runs when the .NET runtime determines that a garbage collection is required. You can force the garbage collector to run at a certain point in your code by calling System.GC.Collect(). The System.GC class is a .NET class that represents the garbage collector, and the Collect() method initiates agarbage collection . The GC class is intended for rare situations in which you know that it's a good time to call the garbage collector; for example, if you have just dereferenced a large number of objects in your code. However, the logic of the garbage collector does not guarantee that all unreferenced objects will be removed from the heap in a singlegarbage collection pass.
Problems with Manual Memory Management
One of the major causes of program failure today, particularly in applications that run for long time is due to manual memory management. It leads two main problems

First one is, when a programmer allocates a block of memory in a data storage area of operating system (i.e. in Random Accesses Memory) intending to free it latter, but some time he mistakenly forget to release memory that is no longer required, this condition is known as "Memory leak".

If this application runs long enough, these leaks accumulate and the application runs out of memory that is not big deal in a programs like 'notepad', that user runs for a few minutes and then shutdown. But a fatal application like web server that are supposed to run continuously for days or week will lead to accumulation of memory leaks and application failure.

Second one is when a programmer manually deletes an object but then mistakenly or other objects try to access this memory location later. This will lead to hanging of application, and more over some times transistor that had make up the deleted object memory would still contain 'Plausible' values, and program continues to run with corrupted data.

These two above mentioned bugs are worse than most other application bugs because what the consequences will be and when those consequences will occur are typically unpredictable. That is making our application perform in unpredictable ways at unpredictable times. For other bugs, when we see our application misbehaving, we can just fix it.
Automatic Memory Management
Microsoft.NET made solution for above mentioned bugs. That is Microsoft made automatic memory management as a part of .Net common language runtime (CLR), which allows it to be used in any .Net language. That is in .Net application if CLR detects that, an Application is no longer using the memory and that it no longer needed, CLR release that memory (i.e. Application does not have to explicitly free Memory that was allocated). This mechanism runs automatically in background and is known asGarbage Collection

It solves the problem of manual memory management with out having to write any single line of code. You can't forget to delete an object because the system cleans it for you when it is not required and you can't access deleted object through an invalid reference because the object won't be deleted as long as you hold a reference of it

This Garbage Collection is like an automatic seat belt. The passengers couldn't forget to buckle it. More over we know that automatic belt require more space and mechanism than manual one. Similarly .Net application requires more system resources than ordinary application.

Main aim of .Net is that faster development with fewer bugs. That is we want programmer only think about program logic. Not any other things likeMemory Management or else.
How does Garbage Collection works
An Overview

Memory is not infinite. The garbage collector must perform a collection in order to free some memory. The garbage collector's optimizing engine determines the best time to perform a collection, (the exact criteria is guarded by Microsoft) based upon the allocations being made. When the garbage collector performs a collection, it checks for objects in the managed heap that are no longer being used by the application and performs the necessary operations to reclaim their memory.
However for automatic memory management, the garbage collector has to know the location of the roots i.e. it should know when an object is no longer in use by the application. This knowledge is made available to the GC in .NET by the inclusion of a concept know as metadata. Every data type used in .NET software includes metadata that describes it. With the help of metadata, the CLR knows the layout of each of the objects in memory, which helps the Garbage Collector in the compaction phase ofGarbage collection. Without this knowledge the Garbage Collector wouldn't know where one object instance ends and the next begins.

Garbage Collection Algorithm
Application Roots


Every application has a set of roots. Roots identify storage locations, which refer to objects on the managed heap or to objects that are set to null.
For example:

* All the global and static object pointers in an application.
* Any local variable/parameter object pointers on a thread's stack.
* Any CPU registers containing pointers to objects in the managed heap.
* Pointers to the objects from Freachable queue.
* The list of active roots is maintained by the just-in-time (JIT) compiler and common language runtime, and is made accessible to the garbage collector's algorithm.

Implementation

Garbage collection in .NET is done using tracing collection and specifically the CLR implements the Mark/Compact collector.
This method consists of two phases as described below.

Phase I: Mark
Find memory that can be reclaimed.

When the garbage collector starts running, it makes the assumption that all objects in the heap are garbage. In other words, it assumes that none of the application's roots refer to any objects in the heap.

The following steps are included in Phase I:
1. The GC identifies live object references or application roots.
2. It starts walking the roots and building a graph of all objects reachable from the roots.
3. If the GC attempts to add an object already present in the graph, then it stops walking down that path. This serves two purposes. First, it helps performance significantly since it doesn't walk through a set of objects more than once. Second, it prevents infinite loops should you have any circular linked lists of objects. Thus cycles are handles properly.

Once all the roots have been checked, the garbage collector's graph contains the set of all objects that are somehow reachable from the application's roots; any objects that are not in the graph are not accessible by the application, and are therefore considered garbage.

Phase II: Compact
Move all the live objects to the bottom of the heap, leaving free space at the top.

Phase II includes the following steps:

1. The garbage collector now walks through the heap linearly, looking for contiguous blocks of garbage objects (now considered free space).
2. The garbage collector then shifts the non-garbage objects down in memory, removing all of the gaps in the heap.
3. Moving the objects in memory invalidates all pointers to the objects. So the garbage collector modifies the application's roots so that the pointers point to the objects' new locations.
4. In addition, if any object contains a pointer to another object, the garbage collector is responsible for correcting these pointers as well.

After all the garbage has been identified, all the non-garbage has been compacted, and all the non-garbage pointers have been fixed-up, a pointer is positioned just after the last non-garbage object to indicate the position where the next object can be added.

Finalization

.NET Framework's garbage collection implicitly keeps track of the lifetime of the objects that an application creates, but fails when it comes to the unmanaged resources (i.e. a file, a window or a network connection) that objects encapsulate.

The unmanaged resources must be explicitly released once the application has finished using them. .NET Framework provides the Object.Finalize method: a method that the garbage collector must run on the object to clean up its unmanaged resources, prior to reclaiming the memory used up by the object. Since Finalize method does nothing, by default, this method must be overridden if explicit cleanup is required.

It would not be surprising if you will consider Finalize just another name for destructors in C++. Though, both have been assigned the responsibility of freeing the resources used by the objects, they have very different semantics. In C++, destructors are executed immediately when the object goes out of scope whereas a finalize method is called once when Garbage collection gets around to cleaning up an object.

The potential existence of finalizers complicates the job of garbage collection in .NET by adding some extra steps before freeing an object.

Whenever a new object, having a Finalize method, is allocated on the heap a pointer to the object is placed in an internal data structure called Finalization queue. When an object is not reachable, the garbage collector considers the object garbage. The garbage collector scans the finalization queue looking for pointers to these objects. When a pointer is found, the pointer is removed from the finalization queue and appended to another internal data structure called Freachable queue, making the object no longer a part of the garbage. At this point, the garbage collector has finished identifying garbage. The garbage collector compacts the reclaimable memory and the special runtime thread empties the freachable queue, executing each object's Finalize method.

The next time the garbage collector is invoked, it sees that the finalized objects are truly garbage and the memory for those objects is then, simply freed.

Conclusion: Thus when an object requires finalization, it dies, then lives (resurrects) and finally dies again. It is recommended to avoid using Finalize method, unless required. Finalize methods increase memory pressure by not letting the memory and the resources used by that object to be released, until two garbage collections. Since you do not have control on the order in which the finalize methods are executed, it may lead to unpredictable results.
Understanding Object Generations

When the CLR is attempting to locate unreachable objects, is does not literally examine each and every object placed on the managed heap. Obviously, doing so would involve considerable time, especially in larger (i.e., real-world) applications.

To help optimize the process, each object on the heap is assigned to a specific "generation." The idea behind generations is simple: The longer an object has existed on the heap, the more likely it is to stay there. For example, the object implementing Main() will be in memory until the program terminates. Conversely, objects that have been recently placed on the heap are likely to be unreachable rather quickly (such as an object created within a method scope). Given these assumptions, each object on the heap belongs to one of the following generations:

* Generation 0: Identifies a newly allocated object that has never been marked for collection
* Generation 1: Identifies an object that has survived a garbage collection (i.e., it was marked for collection, but was not removed due to the fact that the sufficient heap space was acquired)
* Generation 2: Identifies an object that has survived more than one sweep of the garbage Collector

The garbage collector will investigate all generation 0 objects first. If marking and sweeping these objects results in the required amount of free memory, any surviving objects are promoted to generation 1. To illustrate how an object's generation affects the collection process, ponder image shown above, which diagrams how a set of surviving generation 0 objects (A, B, and E) are promoted once the required memory has been reclaimed.

If all generation 0 objects have been evaluated, but additional memory is still required, generation 1 objects are then investigated for their "reachability" and collected accordingly. Surviving generation 1 objects are then promoted to generation 2. If the garbage collector still requires additional memory, generation 2 objects are then evaluated for their reachability. At this point, if a generation 2 object survives a garbage collection, it remains a generation 2 object given the predefined upper limit of object generations.

The bottom line is that by assigning a generational value to objects on the heap, newer objects (such as local variables) will be removed quickly, while older objects (such as a program's application object) are not "bothered" as often.
Strong and Weak References

The garbage collector can reclaim only objects that have no references. An object that is reachable cannot be garbage collected by the garbage collector. Such a reference is known as a strong reference. An object can also be referred to as a weak reference; another term for a weak reference is the target. An object is eligible for garbage collection if it does not contain any strong references, irrespective of the number of weak references it contains.

The managed heap contains two internal data structures whose sole purpose is to manage weak references: the short weak reference table and the long weak reference table.

Weak references are of two types:

* A short weak reference doesn't track resurrection.
i.e. the object which has a short weak reference to itself is collected immediately without running its finalization method.

* A long weak reference tracks resurrection.
i.e. the garbage collector collects object pointed to by the long weak reference table only after determining that the object's storage is reclaimable. If the object has a Finalize method, the Finalize method has been called and the object was not resurrected.

These two tables simply contain pointers to objects allocated within the managed heap. Initially, both tables are empty. When you create a WeakReference object, an object is not allocated from the managed heap. Instead, an empty slot in one of the weak reference tables is located; short weak references use the short weak reference table and long weak references use the long weak reference table.


Consider an example of what happens when the garbage collector runs. The diagrams (Figure 1 & 2) show the state of all the internal data structures before and after the GC runs.

Now, here's what happens when a garbage collection (GC) runs:

1. The garbage collector builds a graph of all the reachable objects. In the above example, the graph will include objects B, C, E, G.
2. The garbage collector scans the short weak reference table. If a pointer in the table refers to an object that is not part of the graph, then the pointer identifies an unreachable object and the slot in the short weak reference table is set to null. In the above example, slot of object D is set to null since it is not a part of the graph.
3. The garbage collector scans the finalization queue. If a pointer in the queue refers to an object that is not part of the graph, then the pointer identifies an unreachable object and the pointer is moved from the finalization queue to the freachable queue. At this point, the object is added to the graph since the object is now considered reachable. In the above example, though objects A, D, F are not included in the graph they are treated as reachable objects because they are part of the finalization queue. Finalization queue thus gets emptied.
4. The garbage collector scans the long weak reference table. If a pointer in the table refers to an object that is not part of the graph (which now contains the objects pointed to by entries in the freachable queue), then the pointer identifies an unreachable object and the slot is set to null. Since both the objects C and F are a part of the graph (of the previous step), none of them are set to null in the long reference table.
5. The garbage collector compacts the memory, squeezing out the holes left by the unreachable objects. In the above example, object H is the only object that gets removed from the heap and it's memory is reclaimed.

The System.GC class

The System.GC class represents the garbage collector and contains many of methods and properties that are described in this section.

GC.Collect Method
This method is used to force a garbage collection of all the generations. It can also force a garbage collection of a particular generation passed to it as a parameter. The signatures of the overloaded Collect methods are:
public static void Collect();
public static void Collect(Integer int);

GC.GetTotalMemory Method
This method returns the total number of bytes that is allocated in the managed memory. This method accepts a boolean parameter. If the parameter is true, it indicates that it should wait for the garbage collector to finish.

GC.KeepAlive Method
This method extends the life time of an object passed to it as a parameter. The signature of this method is as follows:
public static void KeepAlive(object objToKeepAlive);

GC.ReRegisterForFinalize Method
This method re-registers an object for finalization, i.e., makes an object eligible for finalization. The method signature is as follows:
public static void ReRegisterForFinalize(objectobjToRegister);

GC.SupressFinalize Method
This method suppresses the finalization on an object. The prototype of this method is:
public static void SupressFinalize(object obj);

GC.GetGeneration Method
This method returns the current generation of an object or the same of the target of the weak reference. The signature of this overloaded method is:
System.GC.GetGeneration(object obj);
System.GC.GetGeneration(WeakReferenceweakReference);

GC.MaxGeneration Property
This property returns the maximum number of generations available.

GC.WaitForPendingFinalizers Method
This method blocks the current thread till the execution of all the pending finalizers is over. The signature of this method is:
public static void WaitForPendingFinalizers();