(De)serialization¶
apischema aims to help API with deserialization/serialization of data, mostly JSON.
Let start again with the overview example
from collections.abc import Collection
from dataclasses import dataclass, field
from typing import Optional
from uuid import UUID, uuid4
from graphql import print_schema
from pytest import raises
from apischema import ValidationError, deserialize, serialize
from apischema.graphql import graphql_schema
from apischema.json_schema import deserialization_schema
# Define a schema with standard dataclasses
@dataclass
class Resource:
id: UUID
name: str
tags: set[str] = field(default_factory=set)
# Get some data
uuid = uuid4()
data = {"id": str(uuid), "name": "wyfo", "tags": ["some_tag"]}
# Deserialize data
resource = deserialize(Resource, data)
assert resource == Resource(uuid, "wyfo", {"some_tag"})
# Serialize objects
assert serialize(Resource, resource) == data
# Validate during deserialization
with raises(ValidationError) as err: # pytest checks exception is raised
deserialize(Resource, {"id": "42", "name": "wyfo"})
assert serialize(err.value) == [ # ValidationError is serializable
{"loc": ["id"], "err": ["badly formed hexadecimal UUID string"]}
]
# Generate JSON Schema
assert deserialization_schema(Resource) == {
"$schema": "http://json-schema.org/draft/2019-09/schema#",
"type": "object",
"properties": {
"id": {"type": "string", "format": "uuid"},
"name": {"type": "string"},
"tags": {
"type": "array",
"items": {"type": "string"},
"uniqueItems": True,
"default": [],
},
},
"required": ["id", "name"],
"additionalProperties": False,
}
# Define GraphQL operations
def resources(tags: Optional[Collection[str]] = None) -> Optional[Collection[Resource]]:
...
# Generate GraphQL schema
schema = graphql_schema(query=[resources], id_types={UUID})
schema_str = """\
type Query {
resources(tags: [String!]): [Resource!]
}
type Resource {
id: ID!
name: String!
tags: [String!]!
}
"""
assert print_schema(schema) == schema_str
Deserialization¶
apischema.deserialize
deserializes Python types from JSON-like data: dict
/list
/str
/int
/float
/bool
/None
— in short, what you get when you execute json.loads
. Types can be dataclasses as well as list[int]
, NewType
s, or whatever you want (see conversions to extend deserialization support to every type you want).
from collections.abc import Collection, Mapping from dataclasses import dataclass from typing import NewType
from apischema import deserialize
@dataclass class Foo: bar: str
MyInt = NewType("MyInt", int)
assert deserialize(Foo, {"bar": "bar"}) == Foo("bar") assert deserialize(MyInt, 0) == MyInt(0) == 0 assert deserialize(Mapping[str, Collection[Foo]], {"key": [{"bar": "42"}]}) == { "key": (Foo("42"),) }
Deserialization performs a validation of data, based on typing annotations and other information (see schema and validation).
Strictness¶
Coercion¶
apischema is strict by default. You ask for an integer, you have to receive an integer.
However, in some cases, data has to be be coerced, for example when parsing aconfiguration file. That can be done using coerce
parameter; when set to True
, all primitive types will be coerce to the expected type of the data model like the following:
from pytest import raises
from apischema import ValidationError, deserialize
with raises(ValidationError):
deserialize(bool, "ok")
assert deserialize(bool, "ok", coerce=True)
bool
can be coerced from str
with the following case-insensitive mapping:
False | True |
---|---|
0 | 1 |
f | t |
n | y |
no | yes |
false | true |
off | on |
ko | ok |
Note
bool
coercion from str
is just a global dict[str, bool]
named apischema.data.coercion.STR_TO_BOOL
and it can be customized according to your need (but keys have to be lower cased).
There is also a global set[str]
named apischema.data.coercion.STR_NONE_VALUES
for None
coercion.
coerce
parameter can also receive a coercion function which will then be used instead of default one.
from typing import TypeVar, cast
from pytest import raises
from apischema import ValidationError, deserialize
T = TypeVar("T")
def coerce(cls: type[T], data) -> T:
"""Only coerce int to bool"""
if cls is bool and isinstance(data, int):
return cast(T, bool(data))
else:
return data
with raises(ValidationError):
deserialize(bool, 0)
with raises(ValidationError):
assert deserialize(bool, "ok", coerce=coerce)
assert deserialize(bool, 1, coerce=coerce)
Note
If coercer result is not an instance of class passed in argument, a ValidationError will be raised with an appropriate error message
Warning
Coercer first argument is a primitive json type str
/bool
/int
/float
/list
/dict
/type(None)
; it can be type(None)
, so returning cls(data)
will fail in this case.
Additional properties¶
apischema is strict too about number of fields received for an object. In JSON schema terms, apischema put "additionalProperties": false
by default (this can be configured by class with properties field).
This behavior can be controlled by additional_properties
parameter. When set to True
, it prevents the reject of unexpected properties.
from dataclasses import dataclass
from pytest import raises
from apischema import ValidationError, deserialize
@dataclass
class Foo:
bar: str
data = {"bar": "bar", "other": 42}
with raises(ValidationError):
deserialize(Foo, data)
assert deserialize(Foo, data, additional_properties=True) == Foo("bar")
Fall back on default¶
Validation error can happen when deserializing an ill-formed field. However, if this field has a default value/factory, deserialization can fall back on this default; this is enabled by fall_back_on_default
parameter. This behavior can also be configured for each field using metadata.
from dataclasses import dataclass, field
from pytest import raises
from apischema import ValidationError, deserialize
from apischema.metadata import fall_back_on_default
@dataclass
class Foo:
bar: str = "bar"
baz: str = field(default="baz", metadata=fall_back_on_default)
with raises(ValidationError):
deserialize(Foo, {"bar": 0})
assert deserialize(Foo, {"bar": 0}, fall_back_on_default=True) == Foo()
assert deserialize(Foo, {"baz": 0}) == Foo()
Strictness configuration¶
apischema global configuration is managed through apischema.settings
object.
It has, among other, three global variables settings.deserializaton.additional_properties
, settings.deserialization.coerce
and settings.deserialization.fall_back_on_default
whose values are used as default parameter values for the deserialize
; by default, additional_properties=False
, coerce=False
and fall_back_on_default=False
.
Global coercion function can be set with settings.coercer
following this example:
import json
from apischema import ValidationError, settings
prev_coercer = settings.coercer
def coercer(cls, data):
"""In case of coercion failures, try to deserialize json data"""
try:
return prev_coercer(cls, data)
except ValidationError as err:
if not isinstance(data, str):
raise
try:
return json.loads(data)
except json.JSONDecodeError:
raise err
settings.coercer = coercer
Fields set¶
Sometimes, it can be useful to know which field has been set by the deserialization, for example in the case of a PATCH requests, to know which field has been updated. Moreover, it is also used in serialization to limit the fields serialized (see next section)
Because apischema use vanilla dataclasses, this feature is not enabled by default and must be set explicitly on a per-class basis. apischema provides a simple API to get/set this metadata.
from dataclasses import dataclass
from typing import Optional
from apischema import deserialize
from apischema.fields import (
fields_set,
is_set,
set_fields,
unset_fields,
with_fields_set,
)
# This decorator enable the feature
@with_fields_set
@dataclass
class Foo:
bar: int
baz: Optional[str] = None
# Retrieve fields set
foo1 = Foo(0, None)
assert fields_set(foo1) == {"bar", "baz"}
foo2 = Foo(0)
assert fields_set(foo2) == {"bar"}
# Test fields individually (with autocompletion and refactoring)
assert is_set(foo1).baz
assert not is_set(foo2).baz
# Mark fields as set/unset
set_fields(foo2, "baz")
assert fields_set(foo2) == {"bar", "baz"}
unset_fields(foo2, "baz")
assert fields_set(foo2) == {"bar"}
set_fields(foo2, "baz", overwrite=True)
assert fields_set(foo2) == {"baz"}
# Fields modification are taken in account
foo2.bar = 0
assert fields_set(foo2) == {"bar", "baz"}
# Because deserialization use normal constructor, it works with the feature
foo3 = deserialize(Foo, {"bar": 0})
assert fields_set(foo3) == {"bar"}
Warning
with_fields_set
decorator MUST be put above dataclass
one. This is because both of them modify __init__
method, but only the first is built to take the second in account.
Warning
dataclasses.replace
works by setting all the fields of the replaced object. Because of this issue, apischema provides a little wrapper apischema.dataclasses.replace
.
Serialization¶
apischema.serialize
is used to serialize Python objects to JSON-like data. Contrary to apischema.deserialize
, Python type can be omitted; in this case, the object will be serialized with an typing.Any
type, i.e. the class of the serialized object will be used.
from dataclasses import dataclass
from typing import Any
from apischema import serialize
@dataclass
class Foo:
bar: str
assert serialize(Foo, Foo("baz")) == {"bar": "baz"}
assert serialize(tuple[int, int], (0, 1)) == [0, 1]
assert (
serialize(Any, {"key": ("value", 42)})
== serialize({"key": ("value", 42)})
== {"key": ["value", 42]}
)
assert serialize(Foo("baz")) == {"bar": "baz"}
Note
Omitting type with serialize
can have unwanted side effects, as it makes loose any type annotations of the serialized object. In fact, generic specialization as well as PEP 593 annotations cannot be retrieved from an object instance; conversions can also be impacted
That's why it's advisable to pass the type when it is available.
Type checking¶
Serialization can be configured using check_type
(default to False
) and fall_back_on_any
(default to False
) parameters. If check_type
is True
, serialized object type will be checked to match the serialized type.
If it doesn't, fall_back_on_any
allows bypassing serialized type to use typing.Any
instead, i.e. to use the serialized object class.
The default values of these parameters can be modified through apischema.settings.serialization.check_type
and apischema.settings.serialization.fall_back_on_any
.
Note
apischema relies on typing annotations, and assumes that the code is well statically type-checked. That's why it doesn't add the overhead of type checking by default (it's more than 10% performance impact).
Serialized methods/properties¶
apischema can execute methods/properties during serialization and add the computed values with the other fields values; just put apischema.serialized
decorator on top of methods/properties you want to be serialized.
The function name is used unless an alias is given in decorator argument.
from dataclasses import dataclass
from apischema import serialize, serialized
from apischema.json_schema import serialization_schema
@dataclass
class Foo:
@serialized
@property
def bar(self) -> int:
return 0
# Serialized method can have default argument
@serialized
def baz(self, some_arg_with_default: int = 1) -> int:
return some_arg_with_default
@serialized("aliased")
@property
def with_alias(self) -> int:
return 2
# Serialized method can also be defined outside class,
# but first parameter must be annotated
@serialized
def function(foo: Foo) -> int:
return 3
assert serialize(Foo, Foo()) == {"bar": 0, "baz": 1, "aliased": 2, "function": 3}
assert serialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2019-09/schema#",
"type": "object",
"properties": {
"aliased": {"type": "integer"},
"bar": {"type": "integer"},
"baz": {"type": "integer"},
"function": {"type": "integer"},
},
"required": ["aliased", "bar", "baz", "function"],
"additionalProperties": False,
}
Note
Serialized methods must not have parameters without default, as apischema need to execute them without arguments
Note
Overriding of a serialized method in a subclass will also override the serialization of the subclass.
Error handling¶
Errors occurring in serialized methods can be caught in a dedicated error handler registered with error_handler
parameter. This function takes in parameters the exception, the object and the alias of the serialized method; it can return a new value or raise the current or another exception — it can for example be used to log errors without throwing the complete serialization.
The resulting serialization type will be a Union
of the normal type and the error handling type ; if the error handler always raises, use typing.NoReturn
annotation.
error_handler=None
correspond to a default handler which only return None
— exception is thus discarded and serialization type becomes Optional
.
The error handler is only executed by apischema serialization process, it's not added to the function, so this one can be executed normally and raise an exception in the rest of your code.
from dataclasses import dataclass
from logging import getLogger
from typing import Any
from apischema import serialize, serialized
from apischema.json_schema import serialization_schema
logger = getLogger(__name__)
def log_error(error: Exception, obj: Any, alias: str) -> None:
logger.error(
"Serialization error in %s.%s", type(obj).__name__, alias, exc_info=error
)
return None
@dataclass
class Foo:
@serialized(error_handler=log_error)
def bar(self) -> int:
raise RuntimeError("Some error")
assert serialize(Foo, Foo()) == {"bar": None} # Logs "Serialization error in Foo.bar"
assert serialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2019-09/schema#",
"type": "object",
"properties": {"bar": {"type": ["integer", "null"]}},
"required": ["bar"],
"additionalProperties": False,
}
Non-required serialized methods¶
Serialized methods (or their error handler) can return apischema.Undefined
, in which case the property will not be included into the serialization; accordingly, the property loose the required qualification in the JSON schema.
from dataclasses import dataclass
from typing import Union
from apischema import Undefined, UndefinedType, serialize, serialized
from apischema.json_schema import serialization_schema
@dataclass
class Foo:
@serialized
def bar(self) -> Union[int, UndefinedType]:
return Undefined
assert serialize(Foo, Foo()) == {}
assert serialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2019-09/schema#",
"type": "object",
"properties": {"bar": {"type": "integer"}},
"additionalProperties": False,
}
Generic serialized methods¶
Serialized methods of generic classes get the right type when their owning class is specialized.
Warning
serialized
cannot decorate methods of Generic
classes in Python 3.6, it has to be used outside of class.
Exclude unset fields¶
When a class has a lot of optional fields, it can be convenient to not include all of them, to avoid a bunch of useless fields in your serialized data.
Using the previous feature of fields set tracking, serialize
can exclude unset fields using its exclude_unset
parameter or settings.serialization.exclude_unset
(default is True
).
from dataclasses import dataclass
from typing import Optional
from apischema import serialize
from apischema.fields import with_fields_set
# Decorator needed to benefit from the feature
@with_fields_set
@dataclass
class Foo:
bar: int
baz: Optional[str] = None
assert serialize(Foo, Foo(0)) == {"bar": 0}
assert serialize(Foo, Foo(0), exclude_unset=False) == {"bar": 0, "baz": None}
Note
As written in comment in the example, with_fields_set
is necessary to benefit from the feature. If the dataclass don't use it, the feature will have no effect.
Sometimes, some fields must be serialized, even with their default value; this behavior can be enforced using field metadata. With it, field will be marked as set even if its default value is used at initialization.
from dataclasses import dataclass, field
from typing import Optional
from apischema import serialize
from apischema.fields import with_fields_set
from apischema.metadata import default_as_set
# Decorator needed to benefit from the feature
@with_fields_set
@dataclass
class Foo:
bar: Optional[int] = field(default=None, metadata=default_as_set)
assert serialize(Foo, Foo()) == {"bar": None}
assert serialize(Foo, Foo(0)) == {"bar": 0}
Note
This metadata has effect only in combination with with_fields_set
decorator.
Performances¶
apischema is among the fastest (if not the fastest) Python library in its domain. These performances are achieved by pre-computing (de)serialization methods depending on the (de)serialized type (and other parameters): all the type annotations processing is done in this pre-computation. The methods are then cached using functools.lru_cache
, so deserialize
and serialize
don't recompute them every time.
However, if lru_cache
is fast, using the methods directly is faster, so apischema provides apischema.deserialization_method
and apischema.serialization_method
. These functions share the same parameters than deserialize
/serialize
, except the data/object parameter to (de)serialize. Using the computed methods directly can increase performances by 10%.
from dataclasses import dataclass
from apischema import deserialization_method, serialization_method
@dataclass
class Foo:
bar: int
deserialize_foo = deserialization_method(Foo)
serialize_foo = serialization_method(Foo)
assert deserialize_foo({"bar": 0}) == Foo(0)
assert serialize_foo(Foo(0)) == {"bar": 0}
Also, apischema cache size can be modified using apischema.cache.set_size
, and it can be reset using apischema.cache.reset
(it happens automatically when apischema.settings
is modified), but you should not need it.
FAQ¶
Why coercion is not default behavior?¶
Because ill-formed data can be symptomatic of deeper issues, it has been decided that highlighting them would be better than hiding them. By the way, this is easily globally configurable.
Why with_fields_set
feature is not enable by default?¶
It's true that this feature has the little cost of adding a decorator everywhere. However, keeping dataclass decorator allows IDEs/linters/type checkers/etc. to handle the class as such, so there is no need to develop a plugin for them. Standard compliance can be worth the additional decorator. (And little overhead can be avoided when not useful)