Optimizations and benchmark¶
apischema is (a lot) faster than its known alternatives, thanks to advanced optimizations.
Note
Chart is truncated to a relative performance of 20x slower. Benchmark results are detailed in the results table.
Precomputed (de)serialization methods¶
apischema precomputes (de)serialization methods depending on the (de)serialized type (and other parameters); type annotations processing is done in the precomputation. Methods are then cached using functools.lru_cache
, so deserialize
and serialize
don't recompute them every time.
Note
The cache is automatically reset when global settings are modified, because it impacts the generated methods.
However, if lru_cache
is fast, using the methods directly is faster, so apischema provides apischema.deserialization_method
and apischema.serialization_method
. These functions share the same parameters than deserialize
/serialize
, except the data/object parameter to (de)serialize. Using the computed methods directly can increase performances by 10%.
from dataclasses import dataclass
from apischema import deserialization_method, serialization_method
@dataclass
class Foo:
bar: int
deserialize_foo = deserialization_method(Foo)
serialize_foo = serialization_method(Foo)
assert deserialize_foo({"bar": 0}) == Foo(0)
assert serialize_foo(Foo(0)) == {"bar": 0}
Warning
Methods computed before settings modification will not be updated and use the old settings. Be careful to set your settings first.
Avoid unnecessary copies¶
As an example, when a list of integers is deserialized, json.load
already return a list of integers. The loaded data can thus be "reused", and the deserialization just become a validation step. The same principle applies to serialization.
It's controlled by the settings apischema.settings.deserialization.no_copy
/apischema.settings.serialization.no_copy
, or no_copy
parameter of deserialize
/serialize
methods. Default behavior is to avoid these unnecessary copies, i.e. no_copy=False
.
from timeit import timeit
from apischema import deserialize
ints = list(range(100))
assert deserialize(list[int], ints, no_copy=True) is ints # default
assert deserialize(list[int], ints, no_copy=False) is not ints
print(timeit("deserialize(list[int], ints, no_copy=True)", globals=globals()))
# 8.596703557006549
print(timeit("deserialize(list[int], ints, no_copy=False)", globals=globals()))
# 9.365363762015477
Serialization passthrough¶
JSON serialization libraries expect primitive data types (dict
/list
/str
/etc.). A non-negligible part of objects to be serialized are primitive.
When type checking is disabled (this is default), objects annotated with primitive types doesn't need to be transformed or checked; apischema can simply "pass through" them, and it will result into an identity serialization method, just returning its argument.
Container types like list
or dict
are passed through only when the contained types are passed through too (and when no_copy=True
)
Note
Enum
subclasses which also inherit str
/int
are also passed through
Passthrough options¶
Some JSON serialization libraries natively support types like UUID
or datetime
, sometimes with a faster implementation than the apischema one — orjson, written in Rust, is a good example.
To take advantage of that, apischema provides apischema.PassThroughOptions
class to specify which type should be passed through, whether they are supported natively by JSON libraries (or handled in a default
fallback).
apischema.serialization_default
can be used as default
fallback in combination to PassThroughOptions
. It has to be instantiated with the same kwargs parameters (aliaser
, etc.) than serialization_method
.
from collections.abc import Collection
from uuid import UUID, uuid4
from apischema import PassThroughOptions, serialization_method
uuids_method = serialization_method(
Collection[UUID], pass_through=PassThroughOptions(collections=True, types={UUID})
)
uuids = [uuid4() for _ in range(5)]
assert uuids_method(uuids) is uuids
Important
Passthrough optimization is a lot diminished with check_type=False
.
PassThroughOptions
has the following parameters:
any
— pass through Any
¶
collections
— pass through collections¶
Standard collections list
, tuple
and dict
are natively handled by JSON libraries, but set
, for example, isn't. Moreover, standard abstract collections like Collection
or Mapping
, which are used a lot, are
not guaranteed to have their runtime type supported (having a set
annotated with
Collection
for instance).
But, most of the time, collections runtime types are list
/dict
, so others can be handled in default
fallback.
Note
Set-like type will not be passed through.
dataclasses
- pass through dataclasses¶
Some JSON libraries, like orjson, support dataclasses natively. However, because apischema has a lot of specific features (aliasing, flatten fields, conditional skipping, fields ordering, etc.), only dataclasses with none of these features, and only passed through fields, will be passed through too.
enums
— pass through enum.Enum
subclasses¶
tuple
— pass through tuple
¶
Even if tuple
is often supported by JSON serializers, if this options is not enabled, tuples will be serialized as lists. It also allows easier test writing for example.
Note
collections=True
implies tuple=True
;
types
— pass through arbitrary types¶
Either a collection of types, or a predicate to determine if type has to be passed through.
Binary compilation using Cython¶
apischema use Cython in order to compile critical parts of the code, i.e. the (de)serialization methods.
However, apischema remains a pure Python library — it can work without binary modules. Cython source files (.pyx
) are in fact generated from Python modules. It allows notably keeping the code simple, by adding switch-case optimization to replace dynamic dispatch, avoiding big chains of elif
in Python code.
Note
Compilation is disabled when using PyPy, because it's even faster with the bare Python code.
That's another interest of generating .pyx
files: keeping Python source for PyPy.
Override dataclass constructors¶
Warning
This feature is still experimental and disabled by default. Test carefully its impact on your code before enable it in production.
Dataclass constructors calls is the slowest part of the deserialization, about 50% of its runtime! They are indeed pure Python functions and cannot be compiled.
In case of "normal" dataclass (no __slots__
, __post_init__
, or __init__
/__new__
/__setattr__
overriding), apischema can override the constructor with a compilable code.
This feature can be toggled on/off globally using apischema.settings.deserialization.override_dataclass_constructors
Discriminator¶
OpenAPI discriminator allows making union deserialization time more homogeneous.
from dataclasses import dataclass
from timeit import timeit
from typing import Annotated, Union
from apischema import deserialization_method, discriminator
@dataclass
class Cat:
love_dog: bool = False
@dataclass
class Dog:
love_cat: bool = False
Pet = Union[Cat, Dog]
DiscriminatedPet = Annotated[Pet, discriminator("type", {"dog": Dog})]
deserialize_union = deserialization_method(Union[Cat, Dog])
deserialize_discriminated = deserialization_method(
Annotated[Union[Cat, Dog], discriminator("type")]
)
##### Without discrimininator
print(timeit('deserialize_union({"love_dog": False})', globals=globals()))
# Cat: 0.760085788
print(timeit('deserialize_union({"love_cat": False})', globals=globals()))
# Dog: 3.078876515 ≈ x4
##### With discriminator
print(timeit('deserialize_discriminated({"type": "Cat"})', globals=globals()))
# Cat: 1.244204702
print(timeit('deserialize_discriminated({"type": "Dog"})', globals=globals()))
# Dog: 1.234058598 ≈ same
Note
As you can notice in the example, discriminator brings its own additional cost, but it's completely worth it.
Benchmark¶
Benchmark code is located benchmark directory or apischema repository. Performances are measured on two datasets: a simple, a more complex one.
Benchmark is run by Github Actions workflow on ubuntu-latest
with Python 3.10.
Results are given relatively to the fastest library, i.e. apischema; simple and complex results are detailed in the table, displayed result is the mean of both.
Relative execution time (lower is better)¶
library | version | deserialization | serialization |
---|---|---|---|
apischema | 0.18.1 | / | / |
pyserde | 0.9.6 | x2.3 (3.2/1.3) | x2.7 (3.5/1.9) |
cattrs | 22.2.0 | x2.4 (3.1/1.6) | x2.7 (3.9/1.4) |
mashumaro | 3.2 | x2.6 (3.2/1.9) | x1.2 (1.3/1.2) |
pydantic | 1.10.2 | x10.5 (11.2/9.9) | x28.6 (36.6/20.6) |
marshmallow | 3.19.0 | x20.0 (21.5/18.5) | x16.2 (20.2/12.3) |
typedload | 2.20 | x9.9 (12.9/7.0) | x65.7 (63.1/68.3) |
Note
Benchmark use binary optimization, but even running as a pure Python library, apischema still performs better than almost all of the competitors.