Skip to content

(De)serialization

apischema aims to help with deserialization/serialization of API data, mostly JSON.

Let's start again with the overview example

from collections.abc import Collection
from dataclasses import dataclass, field
from uuid import UUID, uuid4

import pytest
from graphql import print_schema

from apischema import ValidationError, deserialize, serialize
from apischema.graphql import graphql_schema
from apischema.json_schema import deserialization_schema


# Define a schema with standard dataclasses
@dataclass
class Resource:
    id: UUID
    name: str
    tags: set[str] = field(default_factory=set)


# Get some data
uuid = uuid4()
data = {"id": str(uuid), "name": "wyfo", "tags": ["some_tag"]}
# Deserialize data
resource = deserialize(Resource, data)
assert resource == Resource(uuid, "wyfo", {"some_tag"})
# Serialize objects
assert serialize(Resource, resource) == data
# Validate during deserialization
with pytest.raises(ValidationError) as err:  # pytest checks exception is raised
    deserialize(Resource, {"id": "42", "name": "wyfo"})
assert err.value.errors == [
    {"loc": ["id"], "err": "badly formed hexadecimal UUID string"}
]
# Generate JSON Schema
assert deserialization_schema(Resource) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "type": "object",
    "properties": {
        "id": {"type": "string", "format": "uuid"},
        "name": {"type": "string"},
        "tags": {
            "type": "array",
            "items": {"type": "string"},
            "uniqueItems": True,
            "default": [],
        },
    },
    "required": ["id", "name"],
    "additionalProperties": False,
}


# Define GraphQL operations
def resources(tags: Collection[str] | None = None) -> Collection[Resource] | None:
    ...


# Generate GraphQL schema
schema = graphql_schema(query=[resources], id_types={UUID})
schema_str = """\
type Query {
  resources(tags: [String!]): [Resource!]
}

type Resource {
  id: ID!
  name: String!
  tags: [String!]!
}"""
assert print_schema(schema) == schema_str

Deserialization

apischema.deserialize deserializes Python types from JSON-like data: dict/list/str/int/float/bool/None — in short, what you get when you execute json.loads. Types can be dataclasses as well as list[int], NewTypes, or whatever you want (see conversions to extend deserialization support to every type you want).

from collections.abc import Collection, Mapping
from dataclasses import dataclass
from typing import NewType

from apischema import deserialize


@dataclass
class Foo:
    bar: str


MyInt = NewType("MyInt", int)

assert deserialize(Foo, {"bar": "bar"}) == Foo("bar")
assert deserialize(MyInt, 0) == MyInt(0) == 0
assert deserialize(Mapping[str, Collection[Foo]], {"key": [{"bar": "42"}]}) == {
    "key": [Foo("42")]
}

Deserialization performs a validation of data, based on typing annotations and other information (see schema and validation).

Deserialization passthrough

In some case, e.g. MessagePack loading with raw bytes inside, some data will have other type than JSON primitive ones. These types can be allowed using pass_through parameter; it must be collection of classes, or a predicate. Behavior can also be set globally using apischema.settings.deserialization.pass_through.

Only non JSON primitive classes can be allowed, because apischema relies on a type check with isinstance to skip deserialization. That exclude NewType but also TypeDict.

from datetime import datetime, timedelta

from apischema import deserialize

start, end = datetime.now(), datetime.now() + timedelta(1)
assert deserialize(
    tuple[datetime, datetime], [start, end], pass_through={datetime}
) == (start, end)
# Passing through types can also be deserialized normally from JSON types
assert deserialize(
    tuple[datetime, datetime],
    [start.isoformat(), end.isoformat()],
    pass_through={datetime},
) == (start, end)

Note

Equivalent serialization feature is presented in optimizations documentation.

Strictness

Coercion

apischema is strict by default. You ask for an integer, you have to receive an integer.

However, in some cases, data has to be be coerced, for example when parsing a configuration file. That can be done using coerce parameter; when set to True, all primitive types will be coerced to the expected type of the data model like the following:

import pytest

from apischema import ValidationError, deserialize

with pytest.raises(ValidationError):
    deserialize(bool, "ok")
assert deserialize(bool, "ok", coerce=True)

bool can be coerced from str with the following case-insensitive mapping:

False True
0 1
f t
n y
no yes
false true
off on
ko ok

The coerce parameter can also receive a coercion function which will then be used instead of default one.

from typing import TypeVar, cast

import pytest

from apischema import ValidationError, deserialize

T = TypeVar("T")


def coerce(cls: type[T], data) -> T:
    """Only coerce int to bool"""
    if cls is bool and isinstance(data, int):
        return cast(T, bool(data))
    else:
        return data


with pytest.raises(ValidationError):
    deserialize(bool, 0)
with pytest.raises(ValidationError):
    assert deserialize(bool, "ok", coerce=coerce)
assert deserialize(bool, 1, coerce=coerce)

Note

If coercer result is not an instance of class passed in argument, a ValidationError will be raised with an appropriate error message

Warning

Coercer first argument is a primitive json type str/bool/int/float/list/dict/type(None); it can be type(None), so returning cls(data) will fail in this case.

Additional properties

apischema is strict too about the number of fields received for an object. In JSON schema terms, apischema put "additionalProperties": false by default (this can be configured by class with properties field).

This behavior can be controlled by additional_properties parameter. When set to True, it prevents the rejection of unexpected properties.

from dataclasses import dataclass

import pytest

from apischema import ValidationError, deserialize


@dataclass
class Foo:
    bar: str


data = {"bar": "bar", "other": 42}
with pytest.raises(ValidationError):
    deserialize(Foo, data)
assert deserialize(Foo, data, additional_properties=True) == Foo("bar")

Fall back on default

Validation errors can happen when deserializing an ill-formed field. However, if this field has a default value/factory, deserialization can fall back on this default; this is enabled by fall_back_on_default parameter. This behavior can also be configured for each field using metadata.

from dataclasses import dataclass, field

import pytest

from apischema import ValidationError, deserialize
from apischema.metadata import fall_back_on_default


@dataclass
class Foo:
    bar: str = "bar"
    baz: str = field(default="baz", metadata=fall_back_on_default)


with pytest.raises(ValidationError):
    deserialize(Foo, {"bar": 0})
assert deserialize(Foo, {"bar": 0}, fall_back_on_default=True) == Foo()
assert deserialize(Foo, {"baz": 0}) == Foo()

Strictness configuration

apischema global configuration is managed through apischema.settings object. It has, among other, three global variables settings.additional_properties, settings.deserialization.coerce and settings.deserialization.fall_back_on_default whose values are used as default parameter values for the deserialize; by default, additional_properties=False, coerce=False and fall_back_on_default=False.

Note

additional_properties settings is in settings.deserialization because it's also used in serialization.

Global coercion function can be set with settings.coercer following this example:

import json
from apischema import ValidationError, settings

prev_coercer = settings.coercer

def coercer(cls, data):
    """In case of coercion failures, try to deserialize json data"""
    try:
        return prev_coercer(cls, data)
    except ValidationError as err:
        if not isinstance(data, str):
            raise
        try:
            return json.loads(data)
        except json.JSONDecodeError:
            raise err

settings.coercer = coercer

Fields set

Sometimes, it can be useful to know which field has been set by the deserialization, for example in the case of PATCH requests, to know which field has been updated. Moreover, it is also used in serialization to limit the fields serialized (see next section)

Because apischema use vanilla dataclasses, this feature is not enabled by default and must be set explicitly on a per-class basis. apischema provides a simple API to get/set this metadata.

from dataclasses import dataclass

from apischema import deserialize
from apischema.fields import (
    fields_set,
    is_set,
    set_fields,
    unset_fields,
    with_fields_set,
)


# This decorator enable the feature
@with_fields_set
@dataclass
class Foo:
    bar: int
    baz: str | None = None


# Retrieve fields set
foo1 = Foo(0, None)
assert fields_set(foo1) == {"bar", "baz"}
foo2 = Foo(0)
assert fields_set(foo2) == {"bar"}
# Test fields individually (with autocompletion and refactoring)
assert is_set(foo1).baz
assert not is_set(foo2).baz
# Mark fields as set/unset
set_fields(foo2, "baz")
assert fields_set(foo2) == {"bar", "baz"}
unset_fields(foo2, "baz")
assert fields_set(foo2) == {"bar"}
set_fields(foo2, "baz", overwrite=True)
assert fields_set(foo2) == {"baz"}
# Fields modification are taken in account
foo2.bar = 0
assert fields_set(foo2) == {"bar", "baz"}
# Because deserialization use normal constructor, it works with the feature
foo3 = deserialize(Foo, {"bar": 0})
assert fields_set(foo3) == {"bar"}

Warning

The with_fields_set decorator MUST be put above dataclass one. This is because both of them modify __init__ method, but only the first is built to take the second in account.

Warning

dataclasses.replace works by setting all the fields of the replaced object. Because of this issue, apischema provides a little wrapper apischema.dataclasses.replace.

Serialization

apischema.serialize is used to serialize Python objects to JSON-like data. Contrary to apischema.deserialize, Python type can be omitted; in this case, the object will be serialized with an typing.Any type, i.e. the class of the serialized object will be used.

from dataclasses import dataclass
from typing import Any

from apischema import serialize


@dataclass
class Foo:
    bar: str


assert serialize(Foo, Foo("baz")) == {"bar": "baz"}
assert serialize(tuple[int, int], (0, 1)) == [0, 1]
assert (
    serialize(Any, {"key": ("value", 42)})
    == serialize({"key": ("value", 42)})
    == {"key": ["value", 42]}
)
assert serialize(Foo("baz")) == {"bar": "baz"}

Note

Omitting type with serialize can have unwanted side effects, as it makes loose any type annotations of the serialized object. In fact, generic specialization as well as PEP 593 annotations cannot be retrieved from an object instance; conversions can also be impacted

That's why it's advisable to pass the type when it is available.

Type checking

Serialization can be configured using check_type (default to False) and fall_back_on_any (default to False) parameters. If check_type is True, the serialized object type will be checked to match the serialized type. If it doesn't, fall_back_on_any allows bypassing the serialized type to use typing.Any instead, i.e. to use the serialized object class.

The default values of these parameters can be modified through apischema.settings.serialization.check_type and apischema.settings.serialization.fall_back_on_any.

Note

apischema relies on typing annotations, and assumes that the code is well statically type-checked. That's why it doesn't add the overhead of type checking by default (it's more than 10% performance impact).

Serialized methods/properties

apischema can execute methods/properties during serialization and add the computed values with the other fields values; just put apischema.serialized decorator on top of methods/properties you want to be serialized.

The function name is used unless an alias is given in decorator argument.

from dataclasses import dataclass

from apischema import serialize, serialized
from apischema.json_schema import serialization_schema


@dataclass
class Foo:
    @serialized
    @property
    def bar(self) -> int:
        return 0

    # Serialized method can have default argument
    @serialized
    def baz(self, some_arg_with_default: int = 1) -> int:
        return some_arg_with_default

    @serialized("aliased")
    @property
    def with_alias(self) -> int:
        return 2


# Serialized method can also be defined outside class,
# but first parameter must be annotated
@serialized
def function(foo: Foo) -> int:
    return 3


assert serialize(Foo, Foo()) == {"bar": 0, "baz": 1, "aliased": 2, "function": 3}
assert serialization_schema(Foo) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "type": "object",
    "properties": {
        "aliased": {"type": "integer"},
        "bar": {"type": "integer"},
        "baz": {"type": "integer"},
        "function": {"type": "integer"},
    },
    "required": ["bar", "baz", "aliased", "function"],
    "additionalProperties": False,
}

Note

Serialized methods must not have parameters without default, as apischema needs to execute them without arguments

Note

Overriding of a serialized method in a subclass will also override the serialization of the subclass.

Error handling

Errors occurring in serialized methods can be caught in a dedicated error handler registered with error_handler parameter. This function takes in parameters the exception, the object and the alias of the serialized method; it can return a new value or raise the current or another exception — it can for example be used to log errors without throwing the complete serialization.

The resulting serialization type will be a Union of the normal type and the error handling type; if the error handler always raises, use typing.NoReturn annotation.

error_handler=None correspond to a default handler which only return None — exception is thus discarded and serialization type becomes Optional.

The error handler is only executed by apischema serialization process, it's not added to the function, so this one can be executed normally and raise an exception in the rest of your code.

from dataclasses import dataclass
from logging import getLogger
from typing import Any

from apischema import serialize, serialized
from apischema.json_schema import serialization_schema

logger = getLogger(__name__)


def log_error(error: Exception, obj: Any, alias: str) -> None:
    logger.error(
        "Serialization error in %s.%s", type(obj).__name__, alias, exc_info=error
    )
    return None


@dataclass
class Foo:
    @serialized(error_handler=log_error)
    def bar(self) -> int:
        raise RuntimeError("Some error")


assert serialize(Foo, Foo()) == {"bar": None}  # Logs "Serialization error in Foo.bar"
assert serialization_schema(Foo) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "type": "object",
    "properties": {"bar": {"type": ["integer", "null"]}},
    "required": ["bar"],
    "additionalProperties": False,
}

Non-required serialized methods

Serialized methods (or their error handler) can return apischema.Undefined, in which case the property will not be included into the serialization; accordingly, the property loses the required qualification in the JSON schema.

from dataclasses import dataclass

from apischema import Undefined, UndefinedType, serialize, serialized
from apischema.json_schema import serialization_schema


@dataclass
class Foo:
    @serialized
    def bar(self) -> int | UndefinedType:
        return Undefined


assert serialize(Foo, Foo()) == {}
assert serialization_schema(Foo) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "type": "object",
    "properties": {"bar": {"type": "integer"}},
    "additionalProperties": False,
}

Generic serialized methods

Serialized methods of generic classes get the right type when their owning class is specialized.

from dataclasses import dataclass
from typing import Generic, TypeVar

from apischema import serialized
from apischema.json_schema import serialization_schema

T = TypeVar("T")
U = TypeVar("U")


@dataclass
class Foo(Generic[T]):
    @serialized
    def bar(self) -> T:
        ...


@serialized
def baz(foo: Foo[U]) -> U:
    ...


@dataclass
class FooInt(Foo[int]):
    ...


assert (
    serialization_schema(Foo[int])
    == serialization_schema(FooInt)
    == {
        "$schema": "http://json-schema.org/draft/2020-12/schema#",
        "type": "object",
        "properties": {"bar": {"type": "integer"}, "baz": {"type": "integer"}},
        "required": ["bar", "baz"],
        "additionalProperties": False,
    }
)

Exclude unset fields

When a class has a lot of optional fields, it can be convenient to not include all of them, to avoid a bunch of useless fields in your serialized data. Using the previous feature of fields set tracking, serialize can exclude unset fields using its exclude_unset parameter or settings.serialization.exclude_unset (default is True).

from dataclasses import dataclass

from apischema import serialize
from apischema.fields import with_fields_set


# Decorator needed to benefit from the feature
@with_fields_set
@dataclass
class Foo:
    bar: int
    baz: str | None = None


assert serialize(Foo, Foo(0)) == {"bar": 0}
assert serialize(Foo, Foo(0), exclude_unset=False) == {"bar": 0, "baz": None}

Note

As written in comment in the example, with_fields_set is necessary to benefit from the feature. If the dataclass don't use it, the feature will have no effect.

Sometimes, some fields must be serialized, even with their default value; this behavior can be enforced using field metadata. With it, a field will be marked as set even if its default value is used at initialization.

from dataclasses import dataclass, field

from apischema import serialize
from apischema.fields import with_fields_set
from apischema.metadata import default_as_set


# Decorator needed to benefit from the feature
@with_fields_set
@dataclass
class Foo:
    bar: int | None = field(default=None, metadata=default_as_set)


assert serialize(Foo, Foo()) == {"bar": None}
assert serialize(Foo, Foo(0)) == {"bar": 0}

Note

This metadata has effect only in combination with with_fields_set decorator.

Exclude fields with default value or None

Fields metadata apischema.skip already allows skipping fields serialization depending on a condition, for example if the field is None or equal to its default value. However, it must be added on each concerned fields, and that can be tedious when you want to set that behavior globally.

That's why apischema provides the two following settings:

  • settings.serialization.exclude_defaults: whether fields which are equal to their default values should be excluded from serialization; default False
  • settings.serialization.exclude_none: whether fields which are equal to None should be excluded from serialization; default False

These settings can also be set directly using serialize parameters, like in the following example:

from dataclasses import dataclass

from apischema import serialize


@dataclass
class Foo:
    bar: int = 0
    baz: str | None = None


assert serialize(Foo, Foo(), exclude_defaults=True) == {}
assert serialize(Foo, Foo(), exclude_none=True) == {"bar": 0}

Field ordering

Usually, JSON object properties are unordered, but sometimes, order does matter. By default, fields, are ordered according to their declaration; serialized methods are appended after the fields.

However, it's possible to change the ordering using apischema.order.

Class-level ordering

order can be used to decorate a class with the field ordered as expected:

import json
from dataclasses import dataclass

from apischema import order, serialize


@order(["baz", "bar", "biz"])
@dataclass
class Foo:
    bar: int
    baz: int
    biz: str


assert json.dumps(serialize(Foo, Foo(0, 0, ""))) == '{"baz": 0, "bar": 0, "biz": ""}'

Field-level ordering

Each field has an order "value" (0 by default), and ordering is done by sorting fields using this value; if several fields have the same order value, they are sorted by their declaration order. For instance, assigning -1 to a field will put it before every other fields, and 999 will surely put it at the end.

This order value is set using order, this time as a field metadata (or passed to order argument of serialized methods/properties). It has the following overloaded signature:

  • order(value: int, /): set the order value of the field
  • order(*, after): ignore the order value and put the field after the given field/method/property
  • order(*, before): ignore the order value and put the field before the given field/method/property

Note

after and before can be raw strings, but also dataclass fields, methods or properties.

Also, order can again be used as class decorator to override ordering metadata, by passing this time a mapping of field with their overridden order.

import json
from dataclasses import dataclass, field
from datetime import date

from apischema import order, serialize, serialized


@order({"trigram": order(-1)})
@dataclass
class User:
    firstname: str
    lastname: str
    address: str = field(metadata=order(after="birthdate"))
    birthdate: date = field()

    @serialized
    @property
    def trigram(self) -> str:
        return (self.firstname[0] + self.lastname[0] + self.lastname[-1]).lower()

    @serialized(order=order(before=birthdate))
    @property
    def age(self) -> int:
        age = date.today().year - self.birthdate.year
        if age > 0 and (date.today().month, date.today().day) < (
            self.birthdate.month,
            self.birthdate.day,
        ):
            age -= 1
        return age


user = User("Harry", "Potter", "London", date(1980, 7, 31))
dump = f"""{{
    "trigram": "hpr",
    "firstname": "Harry",
    "lastname": "Potter",
    "age": {user.age},
    "birthdate": "1980-07-31",
    "address": "London"
}}"""
assert json.dumps(serialize(User, user), indent=4) == dump

TypedDict additional properties

TypedDict can contain additional keys, which are not serialized by default. Setting additional_properties parameter to True (or apischema.settings.additional_properties) will toggle on their serialization (without aliasing).

FAQ

Why isn't coercion the default behavior?

Because ill-formed data can be symptomatic of deeper issues, it has been decided that highlighting them would be better than hiding them. By the way, this is easily globally configurable.

Why isn't with_fields_set enabled by default?

It's true that this feature has the little cost of adding a decorator everywhere. However, keeping dataclass decorator allows IDEs/linters/type checkers/etc. to handle the class as such, so there is no need to develop a plugin for them. Standard compliance can be worth the additional decorator. (And little overhead can be avoided when not useful)

Why isn't serialization type checking enabled by default?

Type checking has a runtime cost, which means poorer performance. Moreover, as explained in performances section, it prevents "passthrough" optimization. At last, code is supposed to be statically verified, and thus types already checked. (If some silly things are done and leads to have unsupported types passed to the JSON library, an error will be raised anyway).

Runtime type checking is more a development feature, which could for example be with apischema.settings.serialization.check_type = __debug__.

Why not use json library default fallback parameter for serialization?

Some apischema features like conversions can simply not be implemented with default fallback. By the way, apischema can perform surprisingly better than using default.

However, default can be used in combination with passthrough optimization when needed to improve performance.