Python is a fantastic programming language. It is also known for being pretty slow, due mostly to its enormous flexibility and dynamic features. For many applications and domains it is not a problem due to their requirements and various optimization techniques. It is less known that Python object graphs (nested dictionaries of lists and tuples and primitive types) take a significant amount of memory. This can be a much more severe limiting factor due to its effects on caching, virtual memory, multi-tenancy with other programs and in general exhausting faster the available memory, which is a scarce and expensive resource.
It turns out that it is not trivial to figure out how much memory is actually consumed. In this article I’ll walk you through the intricacies of Python object’s memory management and show how to measure the consumed memory accurately.
In this article I focus solely on CPython—the primary implementation of the Python programming language. The experiments and conclusions here don’t apply to other Python implementations like IronPython, Jython and PyPy.
Also, I ran the numbers on 64-bit Python 2.7. In Python 3 the numbers are sometimes a little different (especially for strings which are always Unicode), but the concepts are the same.
Hands-On Exploration of Python Memory Usage
First, let’s explore a little bit and get a concrete sense of the actual memory usage of Python objects.
The sys.getsizeof() Built-in Function
The standard library’s sys module provides the getsizeof() function. That function accepts an object (and optional default), calls the object’s sizeof() method and returns the result, so you can make your objects inspectable as well.
Measuring the Memory of Python Objects
Let’s start with some numeric types:
import sys sys.getsizeof(5) 24
Interesting. An integer takes 24 bytes.
sys.getsizeof(5.3) 24
Hmm… a float takes 24 bytes as well.
from decimal import Decimal sys.getsizeof(Decimal(5.3)) 80
Wow. 80 bytes! This really makes you think about whether you want to represent a large number of real numbers as floats or Decimals.
Let’s move on to strings and collections:
sys.getsizeof('') 37 sys.getsizeof('1') 38 sys.getsizeof('1234') 41 sys.getsizeof(u'') 50 sys.getsizeof(u'1') 52 sys.getsizeof(u'1234') 58
OK. An empty string takes 37 bytes, and each additional character adds another byte. That says a lot about the tradeoffs of keeping multiple short strings where you’ll pay the 37 bytes overhead for each one vs. a single long string where you pay the overhead only once.
Unicode strings behave similarly, except the overhead is 50 bytes and each additional character adds 2 bytes. That’s something to consider if you use libraries that return Unicode strings, but your text can be represented as simple strings.
By the way, in Python 3, strings are always Unicode and the overhead is 49 bytes (they saved a byte somewhere). The bytes object has an overhead of only 33 bytes. If you have a program that processes a lot of short strings in memory and you care about performance, consider Python 3.
sys.getsizeof([]) 72 sys.getsizeof([1]) 88 sys.getsizeof([1, 2, 3, 4]) 104 sys.getsizeof(['a long longlong string'])
What’s going on? An empty list takes 72 bytes, but each additional int adds just 8 bytes, where the size of an int is 24 bytes. A list that contains a long string takes just 80 bytes.
The answer is simple. The list doesn’t contain the int objects themselves. It just contains an 8-byte (on 64-bit versions of CPython) pointer to the actual int object. What that means is that the getsizeof() function doesn’t return the actual memory of the list and all the objects it contains, but only the memory of the list and the pointers to its objects. In the next section I’ll introduce the deep_getsizeof() function that addresses this issue.
sys.getsizeof(()) 56 sys.getsizeof((1,)) 64 sys.getsizeof((1, 2, 3, 4)) 88 sys.getsizeof(('a long longlong string',)) 64
The story is similar for tuples. The overhead of an empty tuple is 56 bytes vs. the 72 of a list. Again, this 16 bytes difference per sequence is low-hanging fruit if you have a data structure with a lot of small, immutable sequences.
sys.getsizeof(set()) 232 sys.getsizeof(set([1)) 232 sys.getsizeof(set([1, 2, 3, 4)) 232 sys.getsizeof({}) 280 sys.getsizeof(dict(a=1)) 280 sys.getsizeof(dict(a=1, b=2, c=3)) 280
Sets and dictionaries ostensibly don’t grow at all when you add items, but note the enormous overhead.
The bottom line is that Python objects have a huge fixed overhead. If your data structure is composed of a large number of collection objects like strings, lists and dictionaries that contain a small number of items each, you pay a heavy toll.
The deep_getsizeof() Function
Now that I’ve scared you half to death and also demonstrated that sys.getsizeof() can only tell you how much memory a primitive object takes, let’s take a look at a more adequate solution. The deep_getsizeof() function drills down recursively and calculates the actual memory usage of a Python object graph.
from collections import Mapping, Container from sys import getsizeof def deep_getsizeof(o, ids): """Find the memory footprint of a Python object This is a recursive function that drills down a Python object graph like a dictionary holding nested dictionaries with lists of lists and tuples and sets. The sys.getsizeof function does a shallow size of only. It counts each object inside a container as pointer only regardless of how big it really is. :param o: the object :param ids: :return: """ d = deep_getsizeof if id(o) in ids: return 0 r = getsizeof(o) ids.add(id(o)) if isinstance(o, str) or isinstance(0, unicode): return r if isinstance(o, Mapping): return r + sum(d(k, ids) + d(v, ids) for k, v in o.iteritems()) if isinstance(o, Container): return r + sum(d(x, ids) for x in o) return r
There are several interesting aspects to this function. It takes into account objects that are referenced multiple times and counts them only once by keeping track of object ids. The other interesting feature of the implementation is that it takes full advantage of the collections module’s abstract base classes. That allows the function very concisely to handle any collection that implements either the Mapping or Container base classes instead of dealing directly with myriad collection types like: string, Unicode, bytes, list, tuple, dict, frozendict, OrderedDict, set, frozenset, etc.
Let’s see it in action:
x = '1234567' deep_getsizeof(x, set()) 44
A string of length 7 takes 44 bytes (37 overhead + 7 bytes for each character).
deep_getsizeof([], set()) 72
An empty list takes 72 bytes (just overhead).
python deep_getsizeof([x], set()) 124
A list that contains the string x takes 124 bytes (72 + 8 + 44).
deep_getsizeof([x, x, x, x, x], set()) 156
A list that contains the string x 5 times takes 156 bytes (72 + 5 * 8 + 44).
The last example shows that deep_getsizeof() counts references to the same object (the x string) just once, but each reference’s pointer is counted.
Treats or Tricks
It turns out that CPython has several tricks up its sleeve, so the numbers you get from deep_getsizeof() don’t fully represent the memory usage of a Python program.
Reference Counting
Python manages memory using reference counting semantics. Once an object is not referenced anymore, its memory is deallocated. But as long as there is a reference, the object will not be deallocated. Things like cyclical references can bite you pretty hard.
Small Objects
CPython manages small objects (less than 256 bytes) in special pools on 8-byte boundaries. There are pools for 1-8 bytes, 9-16 bytes, and all the way to 249-256 bytes. When an object of size 10 is allocated, it is allocated from the 16-byte pool for objects 9-16 bytes in size. So, even though it contains only 10 bytes of data, it will cost 16 bytes of memory. If you allocate 1,000,000 objects of size 10, you actually use 16,000,000 bytes and not 10,000,000 bytes as you may assume. This 60% overhead is obviously not trivial.
Integers
CPython keeps a global list of all the integers in the range [-5, 256]. This optimization strategy makes sense because small integers pop up all over the place, and given that each integer takes 24 bytes, it saves a lot of memory for a typical program.
It also means that CPython pre-allocates 266 * 24 = 6384 bytes for all these integers, even if you don’t use most of them. You can verify it by using the id() function that gives the pointer to the actual object. If you call id(x) multiple for any x in the range [-5, 256], you will get the same result every time (for the same integer). But if you try it for integers outside this range, each one will be different (a new object is created on the fly every time).
Here are a few examples within the range:
id(-3) 140251817361752 id(-3) 140251817361752 id(-3) 140251817361752 id(201) 140251817366736 id(201) 140251817366736 id(201) 140251817366736
Here are some examples outside the range:
id(301) 140251846945800 id(301) 140251846945776 id(-6) 140251846946960 id(-6) 140251846946936
Python Memory vs. System Memory
CPython is kind of possessive. In many cases, when memory objects in your program are not referenced anymore, they are not returned to the system (e.g. the small objects). This is good for your program if you allocate and deallocate many objects (that belong to the same 8-byte pool) because Python doesn’t have to bother the system, which is relatively expensive. But it’s not so great if your program normally uses X bytes and under some temporary condition it uses 100 times as much (e.g. parsing and processing a big configuration file only when it starts).
Now, that 100X memory may be trapped uselessly in your program, never to be used again and denying the system from allocating it to other programs. The irony is that if you use the processing module to run multiple instances of your program, you’ll severely limit the number of instances you can run on a given machine.
Memory Profiler
To gauge and measure the actual memory usage of your program, you can use the memory_profiler module. I played with it a little bit and I’m not sure I trust the results. Using it is very simple. You decorate a function (could be the main(0 function) with @profiler decorator, and when the program exits, the memory profiler prints to standard output a handy report that shows the total and changes in memory for every line. Here is a sample program I ran under the profiler:
from memory_profiler import profile @profile def main(): a = [] b = [] c = [] for i in range(100000): a.append(5) for i in range(100000): b.append(300) for i in range(100000): c.append('123456789012345678901234567890') del a del b del c print 'Done!' if __name__ == '__main__': main()
Here is the output:
Line # Mem usage Increment Line Contents ================================================ 3 22.9 MiB 0.0 MiB @profile 4 def main(): 5 22.9 MiB 0.0 MiB a = [] 6 22.9 MiB 0.0 MiB b = [] 7 22.9 MiB 0.0 MiB c = [] 8 27.1 MiB 4.2 MiB for i in range(100000): 9 27.1 MiB 0.0 MiB a.append(5) 10 27.5 MiB 0.4 MiB for i in range(100000): 11 27.5 MiB 0.0 MiB b.append(300) 12 28.3 MiB 0.8 MiB for i in range(100000): 13 28.3 MiB 0.0 MiB c.append('123456789012345678901234567890') 14 27.7 MiB -0.6 MiB del a 15 27.9 MiB 0.2 MiB del b 16 27.3 MiB -0.6 MiB del c 17 18 27.3 MiB 0.0 MiB print 'Done!'
As you can see, there is 22.9 MB of memory overhead. The reason the memory doesn’t increase when adding integers both inside and outside the [-5, 256] range and also when adding the string is that a single object is used in all cases. It’s not clear why the first loop of range(100000) on line 8 adds 4.2MB while the second on line 10 adds just 0.4MB and the third loop on line 12 adds 0.8MB. Finally, when deleting the a, b and c lists, -0.6MB is released for a and c, but for b 0.2MB is added. I can’t make much sense out of these results.
Conclusion
CPython uses a lot of memory for its objects. It uses various tricks and optimizations for memory management. By keeping track of your object’s memory usage and being aware of the memory management model, you can significantly reduce the memory footprint of your program.
by Gigi Sayfan via Envato Tuts+ Code
No comments:
Post a Comment