Python's Data Containers Comparison

Posted on Thu 24 August 2023 in python

With Python 3.7 data classes have been introduced. They are simple classes that can hold some data as attributes, support type hints in a clean way, and provide a default constructor for developers. Good news! they can be immutable (frozen). I know the dataclasses lib provides much more features but what I described below, covers most of my use cases and is good enough. Now I want to compare it with the older options and methods that we can use to hold data and access them by attribute like named tuples or even better NamedTuple from typing and of course a simple Python class with an __init__.

Note: I use CPython 3.10.11

What I need

This is what I want in terms of interface:

>>> yaser = Person(name="Yaser", age=26)
>>> print(yaser.age)
26

Also I need it to be compatible with mypy.

Class Implementations

I'll go for these:

  • Simple Python classes with __init__
  • Simple Python classes in combination with slots
  • Named tuples (from typing lib)
  • Data classes (from dataclasses lib)
  • Frozen data classes (We like immutability after all!)
from typing import NamedTuple
from dataclasses import dataclass


class NamedTuplePerson(NamedTuple):
    name: str
    age: int


@dataclass
class DataClassPerson:
    name: str
    age: int


@dataclass(frozen=True)
class FrozenDataClassPerson:
    name: str
    age: int


class RawClassPerson:
    def __init__(self, name: str, age: int) -> None:
        self.name = name
        self.age = age


class SlotClassPerson:
    __slots__ = ("name", "age")

    def __init__(self, name: str, age: int) -> None:
        self.name = name
        self.age = age

Now all of these classes can be used for my use case, let's see how they're doing!

Construction

Let's see how long it takes to instantiate these classes, this is the function that I'll run by timeit:

name, age = "a", 1

def test_construction(cls, size = 1000):
    collection = [None] * size
    for i in range(size):
        collection[i] = cls(name=name, age=age)
    return collection

Yes, I'm not just measuring the construction time, but I want to be sure about keeping multiple instances in the memory and their consequences. And here's the result:

NamedTuplePerson      => 5.113
DataClassPerson       => 5.347
FrozenDataClassPerson => 7.49
RawClassPerson        => 5.319
SlotClassPerson       => 4.514

Memory usage

Now I'm gonna create a list of 100,000 instances and measure its size by pympler.asizeof:

def test_size(cls):
    # used test_construction from the previous test
    return int(asizeof(test_construction(cls, size=100000)) / 1000)

The result:

NamedTuplePerson      => 6400 KB
DataClassPerson       => 16000 KB
FrozenDataClassPerson => 16000 KB
RawClassPerson        => 16000 KB
SlotClassPerson       => 5600 KB

Conclusion

Well, classes that use slots are very efficient in instantiation time and RAM usage, but we know they are full of gotchas, and personally I don't like to write my attributes as strings in __slots__.
The second best option is using name tuple, which is close to slots and good enough!
The other ones are the same in memory usage and slower but instantiation of frozen data classes is considerably slower!

Good luck.