Performance and benchmark¶
apischema is faster than its known alternatives, thanks to advanced optimizations.
Precomputed (de)serialization methods¶
apischema precomputes (de)serialization methods depending on the (de)serialized type (and other parameters); type annotations processing is done in the precomputation. Methods are then cached using functools.lru_cache
, so deserialize
and serialize
don't recompute them every time.
Note
The cache is automatically reset when global settings are modified, because it impacts the generated methods.
However, if lru_cache
is fast, using the methods directly is faster, so apischema provides apischema.deserialization_method
and apischema.serialization_method
. These functions share the same parameters than deserialize
/serialize
, except the data/object parameter to (de)serialize. Using the computed methods directly can increase performances by 10%.
from dataclasses import dataclass
from apischema import deserialization_method, serialization_method
@dataclass
class Foo:
bar: int
deserialize_foo = deserialization_method(Foo)
serialize_foo = serialization_method(Foo)
assert deserialize_foo({"bar": 0}) == Foo(0)
assert serialize_foo(Foo(0)) == {"bar": 0}
Warning
Methods computed before settings modification will not be updated and use the old settings. Be careful to set your settings first.
Serialization passthrough¶
Warning
This feature has been released on a provisional basis. It has also been partially rolled back (it was initially covering dataclasses) to simplify the code for the next version; in fact the feature will then maybe not be as much useful, as apischema performance will normally be improved significantly enough to offer its own JSON dump implementation.
JSON serialization libraries expect primitive data types (dict
/list
/str
/etc.). A non-negligible part of objects to be serialized are primitive.
When type checking is disabled (this is default), objects annotated with primitive types doesn't need to be transformed or checked; apischema can simply "pass through" them, and it will result into an identity serialization method.
Container types like list
or dict
are passed through only when the contained types are passed through too.
from apischema import identity, serialization_method
assert serialization_method(list[int]) == identity
Note
Enum
subclasses which also inherit str
/int
are also passed through
Passthrough options¶
Some JSON serialization libraries natively support types like UUID
or datetime
, sometimes with a faster implementation than the apischema one — orjson, written in Rust, is a good example.
To take advantage of that, apischema provides apischema.PassThroughOptions
class to specify which type should be passed through, whether they are supported natively by JSON libraries (or handled in a default
fallback).
apischema.serialization_default
can be used as default
fallback in combination to PassThroughOptions
. It has to be instantiated with the same kwargs parameters (aliaser
, etc.) than serialization_method
.
from collections.abc import Collection
from uuid import UUID
from apischema import PassThroughOptions, serialization_method
from apischema.conversions import identity
uuids_method = serialization_method(
Collection[UUID], pass_through=PassThroughOptions(collections=True, types={UUID})
)
assert uuids_method == identity
Important
Passthrough optimization always requires check_type
to be False
, in parameters or settings.
PassThroughOptions
has the following parameters:
any
— pass through Any
¶
collections
— pass through collections¶
Standard collections list
, tuple
and dict
are natively handled by JSON libraries, but set
, for example, isn't. Moreover, standard abstract collections like Collection
or Mapping
, which are used a lot, are
not guaranteed to have their runtime type supported (having a set
annotated with
Collection
for instance).
But, most of the time, collections runtime types are list
/dict
, so others can be handled in default
fallback.
Note
Set-like type will not be passed through.
enums
— pass through enums¶
tuple
— pass through tuple
¶
Even if tuple
is often supported by JSON serializers, if this options is not enabled, tuples will be serialized as lists. It also allows easier test writing for example.
Note
collections=True
implies tuple=True
;
types
— pass through arbitrary types¶
Either a collection of types, or a predicate to determine if type has to be passed through.
Passing through is not always faster¶
apischema is quite optimized and can perform better than using default
fallback, as shown in the following example:
from json import dumps
from timeit import timeit
from uuid import UUID, uuid4
from apischema import PassThroughOptions, serialization_default, serialization_method
uuids = [uuid4() for i in range(10)]
serialize_uuids = serialization_method(list[UUID])
serialize_uuids2 = serialization_method(
list[UUID], pass_through=PassThroughOptions(types={UUID})
)
default = serialization_default()
assert (
dumps(serialize_uuids(uuids))
== dumps(serialize_uuids2(uuids), default=default)
== dumps(uuids, default=str) # equivalent to previous one, but faster
)
print(timeit("dumps(serialize_uuids(uuids))", globals=globals()))
# 18.171754636 -> without passthrough
print(timeit("dumps(uuids, default=str)", globals=globals()))
# 21.188269333 -> with passthrough and faster default
print(timeit("dumps(serialize_uuids2(uuids), default=default)", globals=globals()))
# 24.494076885 -> with passthrough and default
Benchmark¶
Note
Benchmark presented is just Pydantic benchmark where apischema has been "inserted".
Below are the results of crude benchmark comparing apischema to pydantic and other validation libraries.
Package | Version | Relative Performance | Mean deserialization time |
---|---|---|---|
apischema | 0.16.0 |
40.2μs | |
pydantic | 1.8.2 |
1.9x slower | 77.8μs |
attrs + cattrs | 21.2.0 |
2.0x slower | 79.7μs |
valideer | 0.4.2 |
2.6x slower | 105.0μs |
marshmallow | 3.14.0 |
5.2x slower | 210.5μs |
voluptuous | 0.12.2 |
5.9x slower | 238.4μs |
trafaret | 2.1.0 |
6.3x slower | 253.0μs |
schematics | 2.1.1 |
19.3x slower | 775.4μs |
django-rest-framework | 3.12.4 |
23.4x slower | 939.7μs |
cerberus | 1.3.4 |
42.8x slower | 1717.6μs |
Package | Version | Relative Performance | Mean serialization time |
---|---|---|---|
apischema | 0.15.3 |
18.0μs | |
pydantic | 1.8.2 |
2.9x slower | 51.3μs |
Benchmarks were run with Python 3.9 (CPython) and the package versions listed above installed via pypi on macOs 11.2
Note
A few precisions have to be written about these results:
- pydantic benchmark is biased by the implementation of
datetime
parsing for cattrs (see this post about it); in fact, if cattrs use a decently fast implementation, like the standarddatetime.fromisoformat
, cattrs becomes 3 times faster than pydantic, even faster than apischema. That being said, apischema is still claimed to be the fastest validation library of this benchmark because cattrs is not considered as a true validation library, essentially because of its fail-fast behavior. It's nevertheless a good (and fast) library, and its great performance has push apischema into optimizing its own performance a lot. - pydantic benchmark mixes valid with invalid data (around 50/50), which doesn't correspond to real case. It means that error handling is very (too much?) important in this benchmark, and libraries like cattrs which raise and end simply at the first error encountered have a big advantage. Using only valid data, apischema becomes even faster than cattrs.
FAQ¶
Why not ask directly for integration to pydantic benchmark?¶
Done, but rejected because "apischema doesn't have enough usage". Let's change that!