(De)serialization¶

apischema aims to help API with deserialization/serialization of data, mostly JSON.

Let start again with the overview example

from collections.abc import Collection
from dataclasses import dataclass, field
from typing import Optional
from uuid import UUID, uuid4

from graphql import print_schema
from pytest import raises

from apischema import ValidationError, deserialize, serialize
from apischema.graphql import graphql_schema
from apischema.json_schema import deserialization_schema


# Define a schema with standard dataclasses
@dataclass
class Resource:
    id: UUID
    name: str
    tags: set[str] = field(default_factory=set)


# Get some data
uuid = uuid4()
data = {"id": str(uuid), "name": "wyfo", "tags": ["some_tag"]}
# Deserialize data
resource = deserialize(Resource, data)
assert resource == Resource(uuid, "wyfo", {"some_tag"})
# Serialize objects
assert serialize(Resource, resource) == data
# Validate during deserialization
with raises(ValidationError) as err:  # pytest checks exception is raised
    deserialize(Resource, {"id": "42", "name": "wyfo"})
assert serialize(err.value) == [  # ValidationError is serializable
    {"loc": ["id"], "err": ["badly formed hexadecimal UUID string"]}
]
# Generate JSON Schema
assert deserialization_schema(Resource) == {
    "$schema": "http://json-schema.org/draft/2019-09/schema#",
    "type": "object",
    "properties": {
        "id": {"type": "string", "format": "uuid"},
        "name": {"type": "string"},
        "tags": {
            "type": "array",
            "items": {"type": "string"},
            "uniqueItems": True,
            "default": [],
        },
    },
    "required": ["id", "name"],
    "additionalProperties": False,
}


# Define GraphQL operations
def resources(tags: Optional[Collection[str]] = None) -> Optional[Collection[Resource]]:
    ...


# Generate GraphQL schema
schema = graphql_schema(query=[resources], id_types={UUID})
schema_str = """\
type Query {
  resources(tags: [String!]): [Resource!]
}

type Resource {
  id: ID!
  name: String!
  tags: [String!]!
}
"""
assert print_schema(schema) == schema_str

Deserialization¶

apischema.deserialize deserializes Python types from JSON-like data: dict/list/str/int/float/bool/None — in short, what you get when you execute json.loads. Types can be dataclasses as well as list[int], NewTypes, or whatever you want (see conversions to extend deserialization support to every type you want).

from collections.abc import Collection, Mapping from dataclasses import dataclass from typing import NewType

from apischema import deserialize

@dataclass class Foo: bar: str

MyInt = NewType("MyInt", int)

assert deserialize(Foo, {"bar": "bar"}) == Foo("bar") assert deserialize(MyInt, 0) == MyInt(0) == 0 assert deserialize(Mapping[str, Collection[Foo]], {"key": [{"bar": "42"}]}) == { "key": (Foo("42"),) }

Deserialization performs a validation of data, based on typing annotations and other information (see schema and validation).

Strictness¶

Coercion¶

apischema is strict by default. You ask for an integer, you have to receive an integer.

However, in some cases, data has to be be coerced, for example when parsing aconfiguration file. That can be done using coerce parameter; when set to True, all primitive types will be coerce to the expected type of the data model like the following:

from pytest import raises

from apischema import ValidationError, deserialize

with raises(ValidationError):
    deserialize(bool, "ok")
assert deserialize(bool, "ok", coerce=True)

bool can be coerced from str with the following case-insensitive mapping:

False	True
0	1
f	t
n	y
no	yes
false	true
off	on
ko	ok

Note

bool coercion from str is just a global dict[str, bool] named apischema.data.coercion.STR_TO_BOOL and it can be customized according to your need (but keys have to be lower cased).

There is also a global set[str] named apischema.data.coercion.STR_NONE_VALUES for None coercion.

coerce parameter can also receive a coercion function which will then be used instead of default one.

from typing import TypeVar, cast

from pytest import raises

from apischema import ValidationError, deserialize

T = TypeVar("T")


def coerce(cls: type[T], data) -> T:
    """Only coerce int to bool"""
    if cls is bool and isinstance(data, int):
        return cast(T, bool(data))
    else:
        return data


with raises(ValidationError):
    deserialize(bool, 0)
with raises(ValidationError):
    assert deserialize(bool, "ok", coerce=coerce)
assert deserialize(bool, 1, coerce=coerce)

Note

If coercer result is not an instance of class passed in argument, a ValidationError will be raised with an appropriate error message

Warning

Coercer first argument is a primitive json type str/bool/int/float/list/dict/type(None); it can be type(None), so returning cls(data) will fail in this case.

Additional properties¶

apischema is strict too about number of fields received for an object. In JSON schema terms, apischema put "additionalProperties": false by default (this can be configured by class with properties field).

This behavior can be controlled by additional_properties parameter. When set to True, it prevents the reject of unexpected properties.

from dataclasses import dataclass

from pytest import raises

from apischema import ValidationError, deserialize


@dataclass
class Foo:
    bar: str


data = {"bar": "bar", "other": 42}
with raises(ValidationError):
    deserialize(Foo, data)
assert deserialize(Foo, data, additional_properties=True) == Foo("bar")

Fall back on default¶

Validation error can happen when deserializing an ill-formed field. However, if this field has a default value/factory, deserialization can fall back on this default; this is enabled by fall_back_on_default parameter. This behavior can also be configured for each field using metadata.

from dataclasses import dataclass, field

from pytest import raises

from apischema import ValidationError, deserialize
from apischema.metadata import fall_back_on_default


@dataclass
class Foo:
    bar: str = "bar"
    baz: str = field(default="baz", metadata=fall_back_on_default)


with raises(ValidationError):
    deserialize(Foo, {"bar": 0})
assert deserialize(Foo, {"bar": 0}, fall_back_on_default=True) == Foo()
assert deserialize(Foo, {"baz": 0}) == Foo()

Strictness configuration¶

apischema global configuration is managed through apischema.settings object. It has, among other, three global variables settings.deserializaton.additional_properties, settings.deserialization.coerce and settings.deserialization.fall_back_on_default whose values are used as default parameter values for the deserialize; by default, additional_properties=False, coerce=False and fall_back_on_default=False.

Global coercion function can be set with settings.coercer following this example:

import json
from apischema import ValidationError, settings

prev_coercer = settings.coercer

def coercer(cls, data):
    """In case of coercion failures, try to deserialize json data"""
    try:
        return prev_coercer(cls, data)
    except ValidationError as err:
        if not isinstance(data, str):
            raise
        try:
            return json.loads(data)
        except json.JSONDecodeError:
            raise err

settings.coercer = coercer

Fields set¶

Sometimes, it can be useful to know which field has been set by the deserialization, for example in the case of a PATCH requests, to know which field has been updated. Moreover, it is also used in serialization to limit the fields serialized (see next section)

Because apischema use vanilla dataclasses, this feature is not enabled by default and must be set explicitly on a per-class basis. apischema provides a simple API to get/set this metadata.

from dataclasses import dataclass
from typing import Optional

from apischema import deserialize
from apischema.fields import (
    fields_set,
    is_set,
    set_fields,
    unset_fields,
    with_fields_set,
)


# This decorator enable the feature
@with_fields_set
@dataclass
class Foo:
    bar: int
    baz: Optional[str] = None


# Retrieve fields set
foo1 = Foo(0, None)
assert fields_set(foo1) == {"bar", "baz"}
foo2 = Foo(0)
assert fields_set(foo2) == {"bar"}
# Test fields individually (with autocompletion and refactoring)
assert is_set(foo1).baz
assert not is_set(foo2).baz
# Mark fields as set/unset
set_fields(foo2, "baz")
assert fields_set(foo2) == {"bar", "baz"}
unset_fields(foo2, "baz")
assert fields_set(foo2) == {"bar"}
set_fields(foo2, "baz", overwrite=True)
assert fields_set(foo2) == {"baz"}
# Fields modification are taken in account
foo2.bar = 0
assert fields_set(foo2) == {"bar", "baz"}
# Because deserialization use normal constructor, it works with the feature
foo3 = deserialize(Foo, {"bar": 0})
assert fields_set(foo3) == {"bar"}

Warning

with_fields_set decorator MUST be put above dataclass one. This is because both of them modify __init__ method, but only the first is built to take the second in account.

Warning

dataclasses.replace works by setting all the fields of the replaced object. Because of this issue, apischema provides a little wrapper apischema.dataclasses.replace.

Serialization¶

apischema.serialize is used to serialize Python objects to JSON-like data. Contrary to apischema.deserialize, Python type can be omitted; in this case, the object will be serialized with an typing.Any type, i.e. the class of the serialized object will be used.

from dataclasses import dataclass
from typing import Any

from apischema import serialize


@dataclass
class Foo:
    bar: str


assert serialize(Foo, Foo("baz")) == {"bar": "baz"}
assert serialize(tuple[int, int], (0, 1)) == [0, 1]
assert (
    serialize(Any, {"key": ("value", 42)})
    == serialize({"key": ("value", 42)})
    == {"key": ["value", 42]}
)
assert serialize(Foo("baz")) == {"bar": "baz"}

Note

Omitting type with serialize can have unwanted side effects, as it makes loose any type annotations of the serialized object. In fact, generic specialization as well as PEP 593 annotations cannot be retrieved from an object instance; conversions can also be impacted

That's why it's advisable to pass the type when it is available.

Type checking¶

Serialization can be configured using check_type (default to False) and fall_back_on_any (default to False) parameters. If check_type is True, serialized object type will be checked to match the serialized type. If it doesn't, fall_back_on_any allows bypassing serialized type to use typing.Any instead, i.e. to use the serialized object class.

The default values of these parameters can be modified through apischema.settings.serialization.check_type and apischema.settings.serialization.fall_back_on_any.

Note

apischema relies on typing annotations, and assumes that the code is well statically type-checked. That's why it doesn't add the overhead of type checking by default (it's more than 10% performance impact).

Serialized methods/properties¶

apischema can execute methods/properties during serialization and add the computed values with the other fields values; just put apischema.serialized decorator on top of methods/properties you want to be serialized.

The function name is used unless an alias is given in decorator argument.

from dataclasses import dataclass

from apischema import serialize, serialized
from apischema.json_schema import serialization_schema


@dataclass
class Foo:
    @serialized
    @property
    def bar(self) -> int:
        return 0

    # Serialized method can have default argument
    @serialized
    def baz(self, some_arg_with_default: int = 1) -> int:
        return some_arg_with_default

    @serialized("aliased")
    @property
    def with_alias(self) -> int:
        return 2


# Serialized method can also be defined outside class,
# but first parameter must be annotated
@serialized
def function(foo: Foo) -> int:
    return 3


assert serialize(Foo, Foo()) == {"bar": 0, "baz": 1, "aliased": 2, "function": 3}
assert serialization_schema(Foo) == {
    "$schema": "http://json-schema.org/draft/2019-09/schema#",
    "type": "object",
    "properties": {
        "aliased": {"type": "integer"},
        "bar": {"type": "integer"},
        "baz": {"type": "integer"},
        "function": {"type": "integer"},
    },
    "required": ["aliased", "bar", "baz", "function"],
    "additionalProperties": False,
}

Note

Serialized methods must not have parameters without default, as apischema need to execute them without arguments

Note

Overriding of a serialized method in a subclass will also override the serialization of the subclass.

Error handling¶

Errors occurring in serialized methods can be caught in a dedicated error handler registered with error_handler parameter. This function takes in parameters the exception, the object and the alias of the serialized method; it can return a new value or raise the current or another exception — it can for example be used to log errors without throwing the complete serialization.

The resulting serialization type will be a Union of the normal type and the error handling type ; if the error handler always raises, use typing.NoReturn annotation.

error_handler=None correspond to a default handler which only return None — exception is thus discarded and serialization type becomes Optional.

The error handler is only executed by apischema serialization process, it's not added to the function, so this one can be executed normally and raise an exception in the rest of your code.

from dataclasses import dataclass
from logging import getLogger
from typing import Any

from apischema import serialize, serialized
from apischema.json_schema import serialization_schema

logger = getLogger(__name__)


def log_error(error: Exception, obj: Any, alias: str) -> None:
    logger.error(
        "Serialization error in %s.%s", type(obj).__name__, alias, exc_info=error
    )
    return None


@dataclass
class Foo:
    @serialized(error_handler=log_error)
    def bar(self) -> int:
        raise RuntimeError("Some error")


assert serialize(Foo, Foo()) == {"bar": None}  # Logs "Serialization error in Foo.bar"
assert serialization_schema(Foo) == {
    "$schema": "http://json-schema.org/draft/2019-09/schema#",
    "type": "object",
    "properties": {"bar": {"type": ["integer", "null"]}},
    "required": ["bar"],
    "additionalProperties": False,
}

Non-required serialized methods¶

Serialized methods (or their error handler) can return apischema.Undefined, in which case the property will not be included into the serialization; accordingly, the property loose the required qualification in the JSON schema.

from dataclasses import dataclass
from typing import Union

from apischema import Undefined, UndefinedType, serialize, serialized
from apischema.json_schema import serialization_schema


@dataclass
class Foo:
    @serialized
    def bar(self) -> Union[int, UndefinedType]:
        return Undefined


assert serialize(Foo, Foo()) == {}
assert serialization_schema(Foo) == {
    "$schema": "http://json-schema.org/draft/2019-09/schema#",
    "type": "object",
    "properties": {"bar": {"type": "integer"}},
    "additionalProperties": False,
}

Generic serialized methods¶

Serialized methods of generic classes get the right type when their owning class is specialized.

Warning

serialized cannot decorate methods of Generic classes in Python 3.6, it has to be used outside of class.

Exclude unset fields¶

When a class has a lot of optional fields, it can be convenient to not include all of them, to avoid a bunch of useless fields in your serialized data. Using the previous feature of fields set tracking, serialize can exclude unset fields using its exclude_unset parameter or settings.serialization.exclude_unset (default is True).

from dataclasses import dataclass
from typing import Optional

from apischema import serialize
from apischema.fields import with_fields_set


# Decorator needed to benefit from the feature
@with_fields_set
@dataclass
class Foo:
    bar: int
    baz: Optional[str] = None


assert serialize(Foo, Foo(0)) == {"bar": 0}
assert serialize(Foo, Foo(0), exclude_unset=False) == {"bar": 0, "baz": None}

Note

As written in comment in the example, with_fields_set is necessary to benefit from the feature. If the dataclass don't use it, the feature will have no effect.

Sometimes, some fields must be serialized, even with their default value; this behavior can be enforced using field metadata. With it, field will be marked as set even if its default value is used at initialization.

from dataclasses import dataclass, field
from typing import Optional

from apischema import serialize
from apischema.fields import with_fields_set
from apischema.metadata import default_as_set


# Decorator needed to benefit from the feature
@with_fields_set
@dataclass
class Foo:
    bar: Optional[int] = field(default=None, metadata=default_as_set)


assert serialize(Foo, Foo()) == {"bar": None}
assert serialize(Foo, Foo(0)) == {"bar": 0}

Note

This metadata has effect only in combination with with_fields_set decorator.

Performances¶

apischema is among the fastest (if not the fastest) Python library in its domain. These performances are achieved by pre-computing (de)serialization methods depending on the (de)serialized type (and other parameters): all the type annotations processing is done in this pre-computation. The methods are then cached using functools.lru_cache, so deserialize and serialize don't recompute them every time.

However, if lru_cache is fast, using the methods directly is faster, so apischema provides apischema.deserialization_method and apischema.serialization_method. These functions share the same parameters than deserialize/serialize, except the data/object parameter to (de)serialize. Using the computed methods directly can increase performances by 10%.

from dataclasses import dataclass

from apischema import deserialization_method, serialization_method


@dataclass
class Foo:
    bar: int


deserialize_foo = deserialization_method(Foo)
serialize_foo = serialization_method(Foo)

assert deserialize_foo({"bar": 0}) == Foo(0)
assert serialize_foo(Foo(0)) == {"bar": 0}

Also, apischema cache size can be modified using apischema.cache.set_size, and it can be reset using apischema.cache.reset (it happens automatically when apischema.settings is modified), but you should not need it.

FAQ¶

Why coercion is not default behavior?¶

Because ill-formed data can be symptomatic of deeper issues, it has been decided that highlighting them would be better than hiding them. By the way, this is easily globally configurable.

Why `with_fields_set` feature is not enable by default?¶

It's true that this feature has the little cost of adding a decorator everywhere. However, keeping dataclass decorator allows IDEs/linters/type checkers/etc. to handle the class as such, so there is no need to develop a plugin for them. Standard compliance can be worth the additional decorator. (And little overhead can be avoided when not useful)

(De)serialization¶