blog posts

What Is Garbage Collection In Python And How Does It Work?

What Is Garbage Collection In Python And How Does It Work?

Python Is One Of The Most Popular Programming Languages ​​Used To Build Various Projects. In 2021, The Language Was Ranked Third On The TIOBE Website List Due To Public Acceptance. 

Ease of use and support by a large community of Python developers make it a good choice for data science, building web-based applications, and more.

One of the critical concepts around the Python programming language that are less discussed is “Garbage Collection.”

A Python programmer must thoroughly understand the principles of memory management and why garbage collection is used in this language.

Accordingly, this article will learn how the garbage collection mechanism works in Python. We will continue introducing points you should pay attention to when writing Python programs and using the mentioned feature.

What is garbage collection in Python, and why do we need it?

If Python is your first programming language, you may not know much about the concept of garbage collection, so let’s start with the basics.

Memory Management

programming language uses objects to perform various operations. Objects can be simple variables such as strings, integers, logical, and more complex data structures such as lists, hashes, or classes.

The values ​​of objects your application uses are stored in memory for quick access. In most programming languages, a variable defined in a program is a pointer to the address of an object in memory. When a variable is used in a program, the application process reads the value that the variable points to in remembering and operates on it.

In early programming languages, programmers were responsible for managing the memory used by the application. So before creating a list or an object, the memory for the variable should first specify. After finishing the work with the variable, it would delete from the memory to free the allocated memory. If this is not done, the application would quickly consume the system’s free memory, and the system’s performance would suffer a severe drop due to the lack of free memory. For example, when pointers are defined dynamically in the C programming language, you must clear them after the work is done so that the memory used by the arrows is freed and returned to the system.

What are the consequences of not freeing memory?

If you don’t release the system’s main memory after using it, the “Memory Leak” problem will arise. Over time, this problem causes the application to use too much memory and creates security problems if the application is used for a long time. In this case, the application is still in memory and needs memory to perform some activities, but no reserved memory is available to the application.

Another problem is freeing the allocated memory incorrectly. More precisely, if the memory used by the application is freed without special precautions, it will still cause problems for the application. This issue creates two big problems; The application may stop running, or data used by the application may be corrupted. A variable that points to freed memory is called a “dangling pointer.” Such problems caused companies to think of new solutions for automatic memory management in modern programming languages. This approach eventually led to the invention of automated memory management and garbage collection technology.

Automatic memory management and garbage collection

With automatic memory management, programmers no longer need to manage memory, and a component called “Runtime” contains this process. There are several methods for automated memory management. The most popular way is reference counting. In the reference counting method, the runtime component keeps track of all references to an object. When an object has no connections, it is labeled “unusable” by the runtime component.

Automatic memory management provides several significant advantages to programmers. Programmers can work solely on the application’s business logic without focusing on low-level memory details. Automated memory management is an efficient solution to prevent memory leaks and dangling pointers.

However, automatic memory management comes at a cost. Your program must use additional memory and computation to track all the references. In addition, most programming languages ​​must use a stop-the-world process for garbage collection to take advantage of the automatic memory management technique. In this process, some execution operations are temporarily stopped so that the garbage collector component can accurately identify and collect unused objects.

Thanks to Moore’s Law, advances in CPU core optimization, and the installation of multi-gigabyte main memory modules in personal computers, the advantages of automatic memory management outweigh its disadvantages. Most modern programming languages ​​such as Java, Python, and Golang use automatic memory management mechanisms.

Of course, some languages ​​still use manual memory management mechanisms, including C++, Objective-C, and Rust. Now that we have an overview of memory management and garbage collection let’s explore the garbage collection mechanism in Python.

How Python uses the garbage collection mechanism

Suppose we have installed an implementation of Python called CPython on a system. CPython is one of the most widely used Python implementations by programmers worldwide. Of course, other implementations of Python, such as PyPy, Jython (based on Java), or IronPython (based on C# ), have their uses. To see what Python is installed on your operating system, run the following command in the Linux terminal:

>>>python -c ‘import platform; print(platform.python_implementation())’

Or, you can have these lines for both Linux and Windows terminals.

>>> import platform

>>> print(platform.python_imlplementation())

CPython

Python manages memory and garbage collection based on reference counting and generational garbage collection.

Reference counting in CPython

The primary garbage collection mechanism in C Python is reference counting. Whenever you create an object in C Python, the created thing has both a Python-type attribute (such as a list, dictionary, or function) and a reference count.

In primary mode, whenever a reference is made to an object, the reference counter is incremented by one unit. When a reference to the object is terminated, the counter is decremented by one unit. If the reference count to an object is 0, the memory allocated to the thing is freed. Note that your application cannot disable Python’s reference counting pattern. Some developers claim that the reference counting feature performs poorly and has flaws. For example, the referral recognition cycle sometimes does not work correctly. However, reference counting has acceptable performance because it can immediately remove an object when there are no references to it, freeing up the associated memory.

See the number of references in Python.

Assign an object to a variable. You can use the Python standard library sys module to check the number of references to a particular object. The thing you should pay attention to in this context is that the process of increasing the number of references to an object has special conditions as follows:

  •  I am adding an object to a data structure, such as adding to a list or adding as a property on a class instance.
  •  You are passing an object as an argument to a function.

To better understand this, we’ll use a Python REPL and the sys module to take a closer look. First, in the operating system terminal, type Python to bring up the Python REPL window.

Include the sys module in the REPL, create a variable, and check its reference count:

>>> import sys

>>> a = ‘my-string’

>>> sys.get recount(a)

2

In the code snippet above, there are two references to the variable a. The first reference is created when the variable is made, and the second is when we pass the variable as a variable to the sys. get recount() function. If you add the variable to a data structure such as a list or dictionary, you will still see an increase in the number of references.

The following code snippet illustrates this:

>>> import sys

>>> a = ‘my-string’

>>> b = [a] # Make a list with a as an element.

>>> c = { ‘key’: a } # Create a dictionary with a as one of the values.

>>> sys.get recount(a)

4

As you can see, the number of references associated with the variable increases when the variable is added to a list or dictionary.

Now that we have seen how the functional mechanism of the reference counter works, it is time to go to the generational garbage collector technique, the second Python tool for memory management.

generational garbage collector

In addition to the reference counting strategy for memory management, Python uses a generational garbage collector mechanism. The easiest way to understand why we need the above feature is to give an example. In the previous section, we saw that adding an object to an array or an object increases its reference count, but what happens if you add an object to itself? Pay attention to the following code snippet:

>>> class MyClass(object):

…     pass

>>> a = MyClass()

>>> a.obj = a

>>> del a

In the example above, we defined a new class. Next, we create an instance of the course and assign it a model that is a property of the object itself. However, Python has not released this instance from memory, so its reference count is not zero because the thing refers to itself. Finally, we removed the sample. Deleting the piece makes it no longer possible to use it in the Python program.

We call this type of problem “reference cycle,” Unfortunately, the above problem cannot solve through the reference counting mechanism. It is precisely when the generational garbage collector feature comes into play—a utility feature available through the GC module in the Python standard library.

Functional terms

There are two essential terms to be aware of when discussing the Generational garbage collector feature. The first term is a generation and the second term is the threshold.

The garbage collector keeps all objects in memory. The life cycle of a new thing starts with the first generation of garbage collectors. Python‘s garbage collection mechanism has three generations; whenever an object survives the garbage collection process in its current age, it is passed to the next generation. If Python runs a garbage collection process on one generation and an object is active, it is redirected to the second generation.

The garbage collection module has a “threshold” number of objects for each generation. The garbage collector executes the collection process if the number of things exceeds that threshold. Any object that survives this process is passed on to the next generation.

The good thing about the above technique is that programmers can manually run the garbage collection process or disable the garbage collection process altogether. To clarify the discussion, let’s use the GC module to check the performance of the mentioned feature.

 Using the GC module

In the Linux terminal, type python to go to the Python REPL. Add the GC module to your session. Call the get_threshold method to check how garbage is collected:

>>> import GC

>>> GC.get_threshold()

(700, 10, 10)

By default, Python sets a threshold of 700 for the newest generation and 10 for the two older generations.

With the get_count method, you can check the number of objects in each generation, as in the following code snippet:

>>> import GC

>>> GC.get_count()

(596, 2, 1)

As you can see, Python creates several objects by default before you run the program.

Using the GC. Collect method, and you can run the manual garbage collection process as in the following code snippet:

>>> GC.get_count()

(595, 2, 1)

>>> GC.collect()

577

>>> GC.get_count()

(18, 0, 0)

Executing a garbage collection process causes the application to release a significant portion of unused objects.

Of course, it is possible that the method.

Use set_threshold in the GC module to change the threshold to start garbage collection:

>>> import GC

>>> GC.get_threshold()

(700, 10, 10)

>>> gc.set_threshold(1000, 15, 15)

>>> gc.get_threshold()

(1000, 15, 15)

In the code snippet above, we have increased the value of each of the default thresholds. Expanding the point reduces the garbage collector’s workload, which improves performance. Of course, the problem with the above technique is that the program will accept more unreferenced objects. Now that we know how reference counting and the garbage collection module work, it’s time to learn how to use them when writing Python programs.

Why is the way Python’s garbage collector works essential for better coding?

Now that we understand how memory is managed and how to collect unused objects let’s move on to how we should use this information as a Python application developer.

General rule: don’t try to change the working pattern of the garbage collector.

Generally, you shouldn’t consider changing how Python’s garbage collection works. One of the critical benefits of Python is increasing developer productivity, as it tries to help developers focus on the application’s business logic by abstracting away the technical details.

Manual memory management is more beneficial for specific projects. Suppose you encounter performance limitations that you think may be related to Python’s garbage collection mechanisms. In that case, it’s best to focus on improving your coding patterns instead of manually changing the garbage collection process. In most cases, if you rewrite the code and use other objects, you will achieve the desired result. Also, manual garbage collection to free up memory may produce unexpected results.

Disable Garbage Collector

In some projects, you have to turn off the garbage collection process from automatic mode and manage it manually. One thing to note in this context is that the reference counting feature is Python’s primary garbage collection mechanism that cannot be disabled. The only instrument you can change is the generational garbage collector.

It is in the GC module. An exciting application example in this field was Instagram, which disabled the garbage collector feature. Instagram uses Django, the popular Python web framework, to develop its web applications so that it can run multiple instances of its web application on a single compute instance. These instances are executed using a master-child mechanism, where the child uses memory shared with the master.

The Instagram development team noticed that the shared memory after creating a child its performance decreases drastically.

Evaluations showed that the problem is Garbage Collector. The Instagram team disabled the garbage collection module by setting the thresholds to zero for all generations, which made their web apps 10% more efficient.

While the above example is interesting, before following this path in your application projects, ensure that the application’s performance problem is related to the garbage collection feature. Instagram is a web-scale application that serves millions of users. For this reason, they can edit some behavioral patterns and use non-standard mechanisms to achieve greater productivity. In most cases, Python’s standard functionality meets business needs.

last word

To correctly manage garbage collection in Python, you should conduct thorough research. To this end, use tools like Stackify’s Retrace to evaluate your app’s performance and pinpoint issues. Once you fully understand the problem, could you take the necessary steps to fix it?