(De)serialization¶
apischema aims to help with deserialization/serialization of API data, mostly JSON.
Let's start again with the overview example
from collections.abc import Collection
from dataclasses import dataclass, field
from uuid import UUID, uuid4
import pytest
from graphql import print_schema
from apischema import ValidationError, deserialize, serialize
from apischema.graphql import graphql_schema
from apischema.json_schema import deserialization_schema
# Define a schema with standard dataclasses
@dataclass
class Resource:
id: UUID
name: str
tags: set[str] = field(default_factory=set)
# Get some data
uuid = uuid4()
data = {"id": str(uuid), "name": "wyfo", "tags": ["some_tag"]}
# Deserialize data
resource = deserialize(Resource, data)
assert resource == Resource(uuid, "wyfo", {"some_tag"})
# Serialize objects
assert serialize(Resource, resource) == data
# Validate during deserialization
with pytest.raises(ValidationError) as err: # pytest checks exception is raised
deserialize(Resource, {"id": "42", "name": "wyfo"})
assert err.value.errors == [
{"loc": ["id"], "err": "badly formed hexadecimal UUID string"}
]
# Generate JSON Schema
assert deserialization_schema(Resource) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {
"id": {"type": "string", "format": "uuid"},
"name": {"type": "string"},
"tags": {
"type": "array",
"items": {"type": "string"},
"uniqueItems": True,
"default": [],
},
},
"required": ["id", "name"],
"additionalProperties": False,
}
# Define GraphQL operations
def resources(tags: Collection[str] | None = None) -> Collection[Resource] | None:
...
# Generate GraphQL schema
schema = graphql_schema(query=[resources], id_types={UUID})
schema_str = """\
type Query {
resources(tags: [String!]): [Resource!]
}
type Resource {
id: ID!
name: String!
tags: [String!]!
}"""
assert print_schema(schema) == schema_str
Deserialization¶
apischema.deserialize
deserializes Python types from JSON-like data: dict
/list
/str
/int
/float
/bool
/None
— in short, what you get when you execute json.loads
. Types can be dataclasses as well as list[int]
, NewType
s, or whatever you want (see conversions to extend deserialization support to every type you want).
from collections.abc import Collection, Mapping
from dataclasses import dataclass
from typing import NewType
from apischema import deserialize
@dataclass
class Foo:
bar: str
MyInt = NewType("MyInt", int)
assert deserialize(Foo, {"bar": "bar"}) == Foo("bar")
assert deserialize(MyInt, 0) == MyInt(0) == 0
assert deserialize(Mapping[str, Collection[Foo]], {"key": [{"bar": "42"}]}) == {
"key": [Foo("42")]
}
Deserialization performs a validation of data, based on typing annotations and other information (see schema and validation).
Deserialization passthrough¶
In some case, e.g. MessagePack loading with raw bytes inside, some data will have other type than
JSON primitive ones. These types can be allowed using pass_through
parameter; it must be collection of classes, or a predicate. Behavior can also be set globally using apischema.settings.deserialization.pass_through
.
Only non JSON primitive classes can be allowed, because apischema relies on a type check with isinstance
to skip deserialization. That exclude NewType
but also TypeDict
.
from datetime import datetime, timedelta
from apischema import deserialize
start, end = datetime.now(), datetime.now() + timedelta(1)
assert deserialize(
tuple[datetime, datetime], [start, end], pass_through={datetime}
) == (start, end)
# Passing through types can also be deserialized normally from JSON types
assert deserialize(
tuple[datetime, datetime],
[start.isoformat(), end.isoformat()],
pass_through={datetime},
) == (start, end)
Note
Equivalent serialization feature is presented in optimizations documentation.
Strictness¶
Coercion¶
apischema is strict by default. You ask for an integer, you have to receive an integer.
However, in some cases, data has to be be coerced, for example when parsing a configuration file. That can be done using coerce
parameter; when set to True
, all primitive types will be coerced to the expected type of the data model like the following:
import pytest
from apischema import ValidationError, deserialize
with pytest.raises(ValidationError):
deserialize(bool, "ok")
assert deserialize(bool, "ok", coerce=True)
bool
can be coerced from str
with the following case-insensitive mapping:
False | True |
---|---|
0 | 1 |
f | t |
n | y |
no | yes |
false | true |
off | on |
ko | ok |
The coerce
parameter can also receive a coercion function which will then be used instead of default one.
from typing import TypeVar, cast
import pytest
from apischema import ValidationError, deserialize
T = TypeVar("T")
def coerce(cls: type[T], data) -> T:
"""Only coerce int to bool"""
if cls is bool and isinstance(data, int):
return cast(T, bool(data))
else:
return data
with pytest.raises(ValidationError):
deserialize(bool, 0)
with pytest.raises(ValidationError):
assert deserialize(bool, "ok", coerce=coerce)
assert deserialize(bool, 1, coerce=coerce)
Note
If coercer result is not an instance of class passed in argument, a ValidationError will be raised with an appropriate error message
Warning
Coercer first argument is a primitive json type str
/bool
/int
/float
/list
/dict
/type(None)
; it can be type(None)
, so returning cls(data)
will fail in this case.
Additional properties¶
apischema is strict too about the number of fields received for an object. In JSON schema terms, apischema put "additionalProperties": false
by default (this can be configured by class with properties field).
This behavior can be controlled by additional_properties
parameter. When set to True
, it prevents the rejection of unexpected properties.
from dataclasses import dataclass
import pytest
from apischema import ValidationError, deserialize
@dataclass
class Foo:
bar: str
data = {"bar": "bar", "other": 42}
with pytest.raises(ValidationError):
deserialize(Foo, data)
assert deserialize(Foo, data, additional_properties=True) == Foo("bar")
Fall back on default¶
Validation errors can happen when deserializing an ill-formed field. However, if this field has a default value/factory, deserialization can fall back on this default; this is enabled by fall_back_on_default
parameter. This behavior can also be configured for each field using metadata.
from dataclasses import dataclass, field
import pytest
from apischema import ValidationError, deserialize
from apischema.metadata import fall_back_on_default
@dataclass
class Foo:
bar: str = "bar"
baz: str = field(default="baz", metadata=fall_back_on_default)
with pytest.raises(ValidationError):
deserialize(Foo, {"bar": 0})
assert deserialize(Foo, {"bar": 0}, fall_back_on_default=True) == Foo()
assert deserialize(Foo, {"baz": 0}) == Foo()
Strictness configuration¶
apischema global configuration is managed through apischema.settings
object.
It has, among other, three global variables settings.additional_properties
, settings.deserialization.coerce
and settings.deserialization.fall_back_on_default
whose values are used as default parameter values for the deserialize
; by default, additional_properties=False
, coerce=False
and fall_back_on_default=False
.
Note
additional_properties
settings is in settings.deserialization
because it's also used in serialization.
Global coercion function can be set with settings.coercer
following this example:
import json
from apischema import ValidationError, settings
prev_coercer = settings.coercer
def coercer(cls, data):
"""In case of coercion failures, try to deserialize json data"""
try:
return prev_coercer(cls, data)
except ValidationError as err:
if not isinstance(data, str):
raise
try:
return json.loads(data)
except json.JSONDecodeError:
raise err
settings.coercer = coercer
Fields set¶
Sometimes, it can be useful to know which field has been set by the deserialization, for example in the case of PATCH requests, to know which field has been updated. Moreover, it is also used in serialization to limit the fields serialized (see next section)
Because apischema use vanilla dataclasses, this feature is not enabled by default and must be set explicitly on a per-class basis. apischema provides a simple API to get/set this metadata.
from dataclasses import dataclass
from apischema import deserialize
from apischema.fields import (
fields_set,
is_set,
set_fields,
unset_fields,
with_fields_set,
)
# This decorator enable the feature
@with_fields_set
@dataclass
class Foo:
bar: int
baz: str | None = None
# Retrieve fields set
foo1 = Foo(0, None)
assert fields_set(foo1) == {"bar", "baz"}
foo2 = Foo(0)
assert fields_set(foo2) == {"bar"}
# Test fields individually (with autocompletion and refactoring)
assert is_set(foo1).baz
assert not is_set(foo2).baz
# Mark fields as set/unset
set_fields(foo2, "baz")
assert fields_set(foo2) == {"bar", "baz"}
unset_fields(foo2, "baz")
assert fields_set(foo2) == {"bar"}
set_fields(foo2, "baz", overwrite=True)
assert fields_set(foo2) == {"baz"}
# Fields modification are taken in account
foo2.bar = 0
assert fields_set(foo2) == {"bar", "baz"}
# Because deserialization use normal constructor, it works with the feature
foo3 = deserialize(Foo, {"bar": 0})
assert fields_set(foo3) == {"bar"}
Warning
The with_fields_set
decorator MUST be put above dataclass
one. This is because both of them modify __init__
method, but only the first is built to take the second in account.
Warning
dataclasses.replace
works by setting all the fields of the replaced object. Because of this issue, apischema provides a little wrapper apischema.dataclasses.replace
.
Serialization¶
apischema.serialize
is used to serialize Python objects to JSON-like data. Contrary to apischema.deserialize
, Python type can be omitted; in this case, the object will be serialized with an typing.Any
type, i.e. the class of the serialized object will be used.
from dataclasses import dataclass
from typing import Any
from apischema import serialize
@dataclass
class Foo:
bar: str
assert serialize(Foo, Foo("baz")) == {"bar": "baz"}
assert serialize(tuple[int, int], (0, 1)) == [0, 1]
assert (
serialize(Any, {"key": ("value", 42)})
== serialize({"key": ("value", 42)})
== {"key": ["value", 42]}
)
assert serialize(Foo("baz")) == {"bar": "baz"}
Note
Omitting type with serialize
can have unwanted side effects, as it makes loose any type annotations of the serialized object. In fact, generic specialization as well as PEP 593 annotations cannot be retrieved from an object instance; conversions can also be impacted
That's why it's advisable to pass the type when it is available.
Type checking¶
Serialization can be configured using check_type
(default to False
) and fall_back_on_any
(default to False
) parameters. If check_type
is True
, the serialized object type will be checked to match the serialized type.
If it doesn't, fall_back_on_any
allows bypassing the serialized type to use typing.Any
instead, i.e. to use the serialized object class.
The default values of these parameters can be modified through apischema.settings.serialization.check_type
and apischema.settings.serialization.fall_back_on_any
.
Note
apischema relies on typing annotations, and assumes that the code is well statically type-checked. That's why it doesn't add the overhead of type checking by default (it's more than 10% performance impact).
Serialized methods/properties¶
apischema can execute methods/properties during serialization and add the computed values with the other fields values; just put apischema.serialized
decorator on top of methods/properties you want to be serialized.
The function name is used unless an alias is given in decorator argument.
from dataclasses import dataclass
from apischema import serialize, serialized
from apischema.json_schema import serialization_schema
@dataclass
class Foo:
@serialized
@property
def bar(self) -> int:
return 0
# Serialized method can have default argument
@serialized
def baz(self, some_arg_with_default: int = 1) -> int:
return some_arg_with_default
@serialized("aliased")
@property
def with_alias(self) -> int:
return 2
# Serialized method can also be defined outside class,
# but first parameter must be annotated
@serialized
def function(foo: Foo) -> int:
return 3
assert serialize(Foo, Foo()) == {"bar": 0, "baz": 1, "aliased": 2, "function": 3}
assert serialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {
"aliased": {"type": "integer"},
"bar": {"type": "integer"},
"baz": {"type": "integer"},
"function": {"type": "integer"},
},
"required": ["bar", "baz", "aliased", "function"],
"additionalProperties": False,
}
Note
Serialized methods must not have parameters without default, as apischema needs to execute them without arguments
Note
Overriding of a serialized method in a subclass will also override the serialization of the subclass.
Error handling¶
Errors occurring in serialized methods can be caught in a dedicated error handler registered with error_handler
parameter. This function takes in parameters the exception, the object and the alias of the serialized method; it can return a new value or raise the current or another exception — it can for example be used to log errors without throwing the complete serialization.
The resulting serialization type will be a Union
of the normal type and the error handling type; if the error handler always raises, use typing.NoReturn
annotation.
error_handler=None
correspond to a default handler which only return None
— exception is thus discarded and serialization type becomes Optional
.
The error handler is only executed by apischema serialization process, it's not added to the function, so this one can be executed normally and raise an exception in the rest of your code.
from dataclasses import dataclass
from logging import getLogger
from typing import Any
from apischema import serialize, serialized
from apischema.json_schema import serialization_schema
logger = getLogger(__name__)
def log_error(error: Exception, obj: Any, alias: str) -> None:
logger.error(
"Serialization error in %s.%s", type(obj).__name__, alias, exc_info=error
)
return None
@dataclass
class Foo:
@serialized(error_handler=log_error)
def bar(self) -> int:
raise RuntimeError("Some error")
assert serialize(Foo, Foo()) == {"bar": None} # Logs "Serialization error in Foo.bar"
assert serialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {"bar": {"type": ["integer", "null"]}},
"required": ["bar"],
"additionalProperties": False,
}
Non-required serialized methods¶
Serialized methods (or their error handler) can return apischema.Undefined
, in which case the property will not be included into the serialization; accordingly, the property loses the required qualification in the JSON schema.
from dataclasses import dataclass
from apischema import Undefined, UndefinedType, serialize, serialized
from apischema.json_schema import serialization_schema
@dataclass
class Foo:
@serialized
def bar(self) -> int | UndefinedType:
return Undefined
assert serialize(Foo, Foo()) == {}
assert serialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {"bar": {"type": "integer"}},
"additionalProperties": False,
}
Generic serialized methods¶
Serialized methods of generic classes get the right type when their owning class is specialized.
from dataclasses import dataclass
from typing import Generic, TypeVar
from apischema import serialized
from apischema.json_schema import serialization_schema
T = TypeVar("T")
U = TypeVar("U")
@dataclass
class Foo(Generic[T]):
@serialized
def bar(self) -> T:
...
@serialized
def baz(foo: Foo[U]) -> U:
...
@dataclass
class FooInt(Foo[int]):
...
assert (
serialization_schema(Foo[int])
== serialization_schema(FooInt)
== {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {"bar": {"type": "integer"}, "baz": {"type": "integer"}},
"required": ["bar", "baz"],
"additionalProperties": False,
}
)
Exclude unset fields¶
When a class has a lot of optional fields, it can be convenient to not include all of them, to avoid a bunch of useless fields in your serialized data.
Using the previous feature of fields set tracking, serialize
can exclude unset fields using its exclude_unset
parameter or settings.serialization.exclude_unset
(default is True
).
from dataclasses import dataclass
from apischema import serialize
from apischema.fields import with_fields_set
# Decorator needed to benefit from the feature
@with_fields_set
@dataclass
class Foo:
bar: int
baz: str | None = None
assert serialize(Foo, Foo(0)) == {"bar": 0}
assert serialize(Foo, Foo(0), exclude_unset=False) == {"bar": 0, "baz": None}
Note
As written in comment in the example, with_fields_set
is necessary to benefit from the feature. If the dataclass don't use it, the feature will have no effect.
Sometimes, some fields must be serialized, even with their default value; this behavior can be enforced using field metadata. With it, a field will be marked as set even if its default value is used at initialization.
from dataclasses import dataclass, field
from apischema import serialize
from apischema.fields import with_fields_set
from apischema.metadata import default_as_set
# Decorator needed to benefit from the feature
@with_fields_set
@dataclass
class Foo:
bar: int | None = field(default=None, metadata=default_as_set)
assert serialize(Foo, Foo()) == {"bar": None}
assert serialize(Foo, Foo(0)) == {"bar": 0}
Note
This metadata has effect only in combination with with_fields_set
decorator.
Exclude fields with default value or None
¶
Fields metadata apischema.skip
already allows skipping fields serialization depending on a condition, for example if the field is None
or equal to its default value. However, it must be added on each concerned fields, and that can be tedious when you want to set that behavior globally.
That's why apischema provides the two following settings:
settings.serialization.exclude_defaults
: whether fields which are equal to their default values should be excluded from serialization; defaultFalse
settings.serialization.exclude_none
: whether fields which are equal toNone
should be excluded from serialization; defaultFalse
These settings can also be set directly using serialize
parameters, like in the following example:
from dataclasses import dataclass
from apischema import serialize
@dataclass
class Foo:
bar: int = 0
baz: str | None = None
assert serialize(Foo, Foo(), exclude_defaults=True) == {}
assert serialize(Foo, Foo(), exclude_none=True) == {"bar": 0}
Field ordering¶
Usually, JSON object properties are unordered, but sometimes, order does matter. By default, fields, are ordered according to their declaration; serialized methods are appended after the fields.
However, it's possible to change the ordering using apischema.order
.
Class-level ordering¶
order
can be used to decorate a class with the field ordered as expected:
import json
from dataclasses import dataclass
from apischema import order, serialize
@order(["baz", "bar", "biz"])
@dataclass
class Foo:
bar: int
baz: int
biz: str
assert json.dumps(serialize(Foo, Foo(0, 0, ""))) == '{"baz": 0, "bar": 0, "biz": ""}'
Field-level ordering¶
Each field has an order "value" (0 by default), and ordering is done by sorting fields using this value; if several fields have the same order value, they are sorted by their declaration order. For instance, assigning -1
to a field will put it before every other fields, and 999
will surely put it at the end.
This order value is set using order
, this time as a field metadata (or passed to order
argument of serialized methods/properties). It has the following overloaded signature:
order(value: int, /)
: set the order value of the fieldorder(*, after)
: ignore the order value and put the field after the given field/method/propertyorder(*, before)
: ignore the order value and put the field before the given field/method/property
Note
after
and before
can be raw strings, but also dataclass fields, methods or properties.
Also, order
can again be used as class decorator to override ordering metadata, by passing this time a mapping of field with their overridden order.
import json
from dataclasses import dataclass, field
from datetime import date
from apischema import order, serialize, serialized
@order({"trigram": order(-1)})
@dataclass
class User:
firstname: str
lastname: str
address: str = field(metadata=order(after="birthdate"))
birthdate: date = field()
@serialized
@property
def trigram(self) -> str:
return (self.firstname[0] + self.lastname[0] + self.lastname[-1]).lower()
@serialized(order=order(before=birthdate))
@property
def age(self) -> int:
age = date.today().year - self.birthdate.year
if age > 0 and (date.today().month, date.today().day) < (
self.birthdate.month,
self.birthdate.day,
):
age -= 1
return age
user = User("Harry", "Potter", "London", date(1980, 7, 31))
dump = f"""{{
"trigram": "hpr",
"firstname": "Harry",
"lastname": "Potter",
"age": {user.age},
"birthdate": "1980-07-31",
"address": "London"
}}"""
assert json.dumps(serialize(User, user), indent=4) == dump
TypedDict additional properties¶
TypedDict
can contain additional keys, which are not serialized by default. Setting additional_properties
parameter to True
(or apischema.settings.additional_properties
) will toggle on their serialization (without aliasing).
FAQ¶
Why isn't coercion the default behavior?¶
Because ill-formed data can be symptomatic of deeper issues, it has been decided that highlighting them would be better than hiding them. By the way, this is easily globally configurable.
Why isn't with_fields_set
enabled by default?¶
It's true that this feature has the little cost of adding a decorator everywhere. However, keeping dataclass decorator allows IDEs/linters/type checkers/etc. to handle the class as such, so there is no need to develop a plugin for them. Standard compliance can be worth the additional decorator. (And little overhead can be avoided when not useful)
Why isn't serialization type checking enabled by default?¶
Type checking has a runtime cost, which means poorer performance. Moreover, as explained in performances section, it prevents "passthrough" optimization. At last, code is supposed to be statically verified, and thus types already checked. (If some silly things are done and leads to have unsupported types passed to the JSON library, an error will be raised anyway).
Runtime type checking is more a development feature, which could for example be with apischema.settings.serialization.check_type = __debug__
.
Why not use json library default
fallback parameter for serialization?¶
Some apischema features like conversions can simply not be implemented with default
fallback. By the way, apischema can perform surprisingly better than using default
.
However, default
can be used in combination with passthrough optimization when needed to improve performance.