Skip to content

Performance and benchmark

apischema is faster than its known alternatives, thanks to advanced optimizations.

Precomputed (de)serialization methods

apischema precomputes (de)serialization methods depending on the (de)serialized type (and other parameters); type annotations processing is done in the precomputation. Methods are then cached using functools.lru_cache, so deserialize and serialize don't recompute them every time.

Note

The cache is automatically reset when global settings are modified, because it impacts the generated methods.

However, if lru_cache is fast, using the methods directly is faster, so apischema provides apischema.deserialization_method and apischema.serialization_method. These functions share the same parameters than deserialize/serialize, except the data/object parameter to (de)serialize. Using the computed methods directly can increase performances by 10%.

from dataclasses import dataclass

from apischema import deserialization_method, serialization_method


@dataclass
class Foo:
    bar: int


deserialize_foo = deserialization_method(Foo)
serialize_foo = serialization_method(Foo)

assert deserialize_foo({"bar": 0}) == Foo(0)
assert serialize_foo(Foo(0)) == {"bar": 0}

Warning

Methods computed before settings modification will not be updated and use the old settings. Be careful to set your settings first.

Serialization passthrough

Warning

This feature has been released on a provisional basis. It has also been partially rolled back (it was initially covering dataclasses) to simplify the code for the next version; in fact the feature will then maybe not be as much useful, as apischema performance will normally be improved significantly enough to offer its own JSON dump implementation.

JSON serialization libraries expect primitive data types (dict/list/str/etc.). A non-negligible part of objects to be serialized are primitive.

When type checking is disabled (this is default), objects annotated with primitive types doesn't need to be transformed or checked; apischema can simply "pass through" them, and it will result into an identity serialization method.

Container types like list or dict are passed through only when the contained types are passed through too.

from apischema import identity, serialization_method

assert serialization_method(list[int]) == identity

Note

Enum subclasses which also inherit str/int are also passed through

Passthrough options

Some JSON serialization libraries natively support types like UUID or datetime, sometimes with a faster implementation than the apischema one — orjson, written in Rust, is a good example.

To take advantage of that, apischema provides apischema.PassThroughOptions class to specify which type should be passed through, whether they are supported natively by JSON libraries (or handled in a default fallback).

apischema.serialization_default can be used as default fallback in combination to PassThroughOptions. It has to be instantiated with the same kwargs parameters (aliaser, etc.) than serialization_method.

from collections.abc import Collection
from uuid import UUID

from apischema import PassThroughOptions, serialization_method
from apischema.conversions import identity

uuids_method = serialization_method(
    Collection[UUID], pass_through=PassThroughOptions(collections=True, types={UUID})
)
assert uuids_method == identity

Important

Passthrough optimization always requires check_type to be False, in parameters or settings.

PassThroughOptions has the following parameters:

any — pass through Any

collections — pass through collections

Standard collections list, tuple and dict are natively handled by JSON libraries, but set, for example, isn't. Moreover, standard abstract collections like Collection or Mapping, which are used a lot, are not guaranteed to have their runtime type supported (having a set annotated with Collection for instance).

But, most of the time, collections runtime types are list/dict, so others can be handled in default fallback.

Note

Set-like type will not be passed through.

enums — pass through enums

tuple — pass through tuple

Even if tuple is often supported by JSON serializers, if this options is not enabled, tuples will be serialized as lists. It also allows easier test writing for example.

Note

collections=True implies tuple=True;

types — pass through arbitrary types

Either a collection of types, or a predicate to determine if type has to be passed through.

Passing through is not always faster

apischema is quite optimized and can perform better than using default fallback, as shown in the following example:

from json import dumps
from timeit import timeit
from uuid import UUID, uuid4

from apischema import PassThroughOptions, serialization_default, serialization_method

uuids = [uuid4() for i in range(10)]
serialize_uuids = serialization_method(list[UUID])
serialize_uuids2 = serialization_method(
    list[UUID], pass_through=PassThroughOptions(types={UUID})
)
default = serialization_default()
assert (
    dumps(serialize_uuids(uuids))
    == dumps(serialize_uuids2(uuids), default=default)
    == dumps(uuids, default=str)  # equivalent to previous one, but faster
)
print(timeit("dumps(serialize_uuids(uuids))", globals=globals()))
# 18.171754636 -> without passthrough
print(timeit("dumps(uuids, default=str)", globals=globals()))
# 21.188269333 -> with passthrough and faster default
print(timeit("dumps(serialize_uuids2(uuids), default=default)", globals=globals()))
# 24.494076885 -> with passthrough and default
That's why passthrough optimization should be used wisely.

Benchmark

Note

Benchmark presented is just Pydantic benchmark where apischema has been "inserted".

Below are the results of crude benchmark comparing apischema to pydantic and other validation libraries.

Package Version Relative Performance Mean deserialization time
apischema 0.16.0 40.2μs
pydantic 1.8.2 1.9x slower 77.8μs
attrs + cattrs 21.2.0 2.0x slower 79.7μs
valideer 0.4.2 2.6x slower 105.0μs
marshmallow 3.14.0 5.2x slower 210.5μs
voluptuous 0.12.2 5.9x slower 238.4μs
trafaret 2.1.0 6.3x slower 253.0μs
schematics 2.1.1 19.3x slower 775.4μs
django-rest-framework 3.12.4 23.4x slower 939.7μs
cerberus 1.3.4 42.8x slower 1717.6μs
Package Version Relative Performance Mean serialization time
apischema 0.15.3 18.0μs
pydantic 1.8.2 2.9x slower 51.3μs

Benchmarks were run with Python 3.9 (CPython) and the package versions listed above installed via pypi on macOs 11.2

Note

A few precisions have to be written about these results:

  • pydantic benchmark is biased by the implementation of datetime parsing for cattrs (see this post about it); in fact, if cattrs use a decently fast implementation, like the standard datetime.fromisoformat, cattrs becomes 3 times faster than pydantic, even faster than apischema. That being said, apischema is still claimed to be the fastest validation library of this benchmark because cattrs is not considered as a true validation library, essentially because of its fail-fast behavior. It's nevertheless a good (and fast) library, and its great performance has push apischema into optimizing its own performance a lot.
  • pydantic benchmark mixes valid with invalid data (around 50/50), which doesn't correspond to real case. It means that error handling is very (too much?) important in this benchmark, and libraries like cattrs which raise and end simply at the first error encountered have a big advantage. Using only valid data, apischema becomes even faster than cattrs.

FAQ

Why not ask directly for integration to pydantic benchmark?

Done, but rejected because "apischema doesn't have enough usage". Let's change that!