Conversions – (de)serialization customization¶
apischema covers the majority of standard data types, but of course that's not enough, which is why it enables you to add support for all your classes and the libraries you use.
Actually, apischema itself uses this conversion feature to provide a basic support for standard library data types like UUID/datetime/etc. (see std_types.py)
ORM support can easily be achieved with this feature (see SQLAlchemy example).
In fact, you can even add support for competitor libraries like Pydantic (see Pydantic compatibility example)
Principle - apischema conversions¶
An apischema conversion is composed of a source type, let's call it Source
, a target type Target
and a converter function with signature (Source) -> Target
.
When a class (actually, a non-builtin class, so not int
/list
/etc.) is deserialized, apischema will check if there is a conversion where this type is the target. If found, the source type of conversion will be deserialized, then the converter will be applied to get an object of the expected type. Serialization works the same way but inverted: look for a conversion with type as source, apply then converter, and get the target type.
Conversions are also handled in schema generation: for a deserialization schema, source schema is merged to target schema, while target schema is merged to source schema for a serialization schema.
Register a conversion¶
Conversion is registered using apischema.deserializer
/apischema.serializer
for deserialization/serialization respectively.
When used as function decorator, the Source
/Target
types are directly extracted from the conversion function signature.
serializer
can be called on methods/properties, in which case Source
type is inferred to be the owning type.
from dataclasses import dataclass
from apischema import deserialize, schema, serialize
from apischema.conversions import deserializer, serializer
from apischema.json_schema import deserialization_schema, serialization_schema
@schema(pattern=r"^#[0-9a-fA-F]{6}$")
@dataclass
class RGB:
red: int
green: int
blue: int
@serializer
@property
def hexa(self) -> str:
return f"#{self.red:02x}{self.green:02x}{self.blue:02x}"
# serializer can also be called with methods/properties outside of the class
# For example, `serializer(RGB.hexa)` would have the same effect as the decorator above
@deserializer
def from_hexa(hexa: str) -> RGB:
return RGB(int(hexa[1:3], 16), int(hexa[3:5], 16), int(hexa[5:7], 16))
assert deserialize(RGB, "#000000") == RGB(0, 0, 0)
assert serialize(RGB, RGB(0, 0, 42)) == "#00002a"
assert (
deserialization_schema(RGB)
== serialization_schema(RGB)
== {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "string",
"pattern": "^#[0-9a-fA-F]{6}$",
}
)
Warning
(De)serializer methods cannot be used with typing.NamedTuple
; in fact, apischema uses the __set_name__
magic method but it is not called on NamedTuple
subclass fields.
Multiple deserializers¶
Sometimes, you want to have several possibilities to deserialize a type. If it's possible to register a deserializer with a Union
param, it's not very practical. That's why apischema make it possible to register several deserializers for the same type. They will be handled with a Union
source type (ordered by deserializers registration), with the right serializer selected according to the matching alternative.
from dataclasses import dataclass
from apischema import deserialize, deserializer
from apischema.json_schema import deserialization_schema
@dataclass
class Expression:
value: int
@deserializer
def evaluate_expression(expr: str) -> Expression:
return Expression(int(eval(expr)))
# Could be shorten into deserializer(Expression), because class is callable too
@deserializer
def expression_from_value(value: int) -> Expression:
return Expression(value)
assert deserialization_schema(Expression) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": ["string", "integer"],
}
assert deserialize(Expression, 0) == deserialize(Expression, "1 - 1") == Expression(0)
On the other hand, serializer registration overwrites the previous registration if any.
apischema.conversions.reset_deserializers
/apischema.conversions.reset_serializers
can be used to reset (de)serializers (even those of the standard types embedded in apischema)
Inheritance¶
All serializers are naturally inherited. In fact, with a conversion function (Source) -> Target
, you can always pass a subtype of Source
and get a Target
in return.
Moreover, when serializer is a method/property, overriding this method/property in a subclass will override the inherited serializer.
from apischema import serialize, serializer
class Foo:
pass
@serializer
def serialize_foo(foo: Foo) -> int:
return 0
class Foo2(Foo):
pass
# Deserializer is inherited
assert serialize(Foo, Foo()) == serialize(Foo2, Foo2()) == 0
class Bar:
@serializer
def serialize(self) -> int:
return 0
class Bar2(Bar):
def serialize(self) -> int:
return 1
# Deserializer is inherited and overridden
assert serialize(Bar, Bar()) == 0 != serialize(Bar2, Bar2()) == 1
Note
Inheritance can also be toggled off in specific cases, like in the Class as union of its subclasses example
On the other hand, deserializers cannot be inherited, because the same Source
passed to a conversion function (Source) -> Target
will always give the same Target
(not ensured to be the desired subtype).
Note
Pseudo-inheritance could be achieved by registering a conversion (using for example a classmethod
) for each subclass in __init_subclass__
method (or a metaclass), or by using __subclasses__
; see example
Generic conversions¶
Generic
conversions are supported out of the box.
from typing import Generic, TypeVar
import pytest
from apischema import ValidationError, deserialize, serialize
from apischema.conversions import deserializer, serializer
from apischema.json_schema import deserialization_schema, serialization_schema
T = TypeVar("T")
class Wrapper(Generic[T]):
def __init__(self, wrapped: T):
self.wrapped = wrapped
@serializer
def unwrap(self) -> T:
return self.wrapped
# Wrapper constructor can be used as a function too (so deserializer could work as decorator)
deserializer(Wrapper)
assert deserialize(Wrapper[list[int]], [0, 1]).wrapped == [0, 1]
with pytest.raises(ValidationError):
deserialize(Wrapper[int], "wrapped")
assert serialize(Wrapper[str], Wrapper("wrapped")) == "wrapped"
assert (
deserialization_schema(Wrapper[int])
== {"$schema": "http://json-schema.org/draft/2020-12/schema#", "type": "integer"}
== serialization_schema(Wrapper[int])
)
However, you're not allowed to register a conversion of a specialized generic type, like Foo[int]
.
Conversion object¶
In the previous example, conversions were registered using only converter functions. However, it can also be done by passing a apischema.conversions.Conversion
instance. It allows specifying additional metadata to conversion (see next sections for examples) and precise converter source/target when annotations are not available.
from base64 import b64decode
from apischema import deserialize, deserializer
from apischema.conversions import Conversion
deserializer(Conversion(b64decode, source=str, target=bytes))
# Roughly equivalent to:
# def decode_bytes(source: str) -> bytes:
# return b64decode(source)
# but saving a function call
assert deserialize(bytes, "Zm9v") == b"foo"
Dynamic conversions — select conversions at runtime¶
Whether or not a conversion is registered for a given type, conversions can also be provided at runtime, using the conversion
parameter of deserialize
/serialize
/deserialization_schema
/serialization_schema
.
import os
import time
from dataclasses import dataclass
from datetime import datetime
from typing import Annotated
from apischema import deserialize, serialize
from apischema.metadata import conversion
# Set UTC timezone for example
os.environ["TZ"] = "UTC"
time.tzset()
def datetime_from_timestamp(timestamp: int) -> datetime:
return datetime.fromtimestamp(timestamp)
date = datetime(2017, 9, 2)
assert deserialize(datetime, 1504310400, conversion=datetime_from_timestamp) == date
@dataclass
class Foo:
bar: int
baz: int
def sum(self) -> int:
return self.bar + self.baz
@property
def diff(self) -> int:
return int(self.bar - self.baz)
assert serialize(Foo, Foo(0, 1)) == {"bar": 0, "baz": 1}
assert serialize(Foo, Foo(0, 1), conversion=Foo.sum) == 1
assert serialize(Foo, Foo(0, 1), conversion=Foo.diff) == -1
# conversions can be specified using Annotated
assert serialize(Annotated[Foo, conversion(serialization=Foo.sum)], Foo(0, 1)) == 1
Note
For definitions_schema
, conversions can be added with types by using a tuple instead, for example definitions_schema(serializations=[(list[Foo], foo_to_bar)])
.
The conversion
parameter can also take a tuple of conversions, when you have a Union
, a tuple
or when you want to have several deserializations for the same type.
Dynamic conversions are local¶
Dynamic conversions are discarded after having been applied (or after class without conversion having been encountered). For example, you can't apply directly a dynamic conversion to a dataclass field when calling serialize
on an instance of this dataclass. Reasons for this design are detailed in the FAQ.
import os
import time
from dataclasses import dataclass
from datetime import datetime
from apischema import serialize
# Set UTC timezone for example
os.environ["TZ"] = "UTC"
time.tzset()
def to_timestamp(d: datetime) -> int:
return int(d.timestamp())
@dataclass
class Foo:
bar: datetime
# timestamp conversion is not applied on Foo field because it's discarded
# when encountering Foo
assert serialize(Foo, Foo(datetime(2019, 10, 13)), conversion=to_timestamp) == {
"bar": "2019-10-13T00:00:00"
}
# timestamp conversion is applied on every member of list
assert serialize(list[datetime], [datetime(1970, 1, 1)], conversion=to_timestamp) == [0]
Note
Dynamic conversion is not discarded when the encountered type is a container (list
, dict
, Collection
, etc. or Union
) or a registered conversion from/to a container; the dynamic conversion can then apply to the container elements
Dynamic conversions interact with type_name
¶
Dynamic conversions are applied before looking for a ref registered with type_name
from dataclasses import dataclass
from apischema import type_name
from apischema.json_schema import serialization_schema
@dataclass
class Foo:
pass
@dataclass
class Bar:
pass
def foo_to_bar(_: Foo) -> Bar:
return Bar()
type_name("Bars")(list[Bar])
assert serialization_schema(list[Foo], conversion=foo_to_bar, all_refs=True) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"$ref": "#/$defs/Bars",
"$defs": {
# Bars is present because `list[Foo]` is dynamically converted to `list[Bar]`
"Bars": {"type": "array", "items": {"$ref": "#/$defs/Bar"}},
"Bar": {"type": "object", "additionalProperties": False},
},
}
Bypass registered conversion¶
Using apischema.identity
as a dynamic conversion allows you to bypass a registered conversion, i.e. to (de)serialize the given type as it would be without conversion registered.
from dataclasses import dataclass
from apischema import identity, serialize, serializer
from apischema.conversions import Conversion
@dataclass
class RGB:
red: int
green: int
blue: int
@serializer
@property
def hexa(self) -> str:
return f"#{self.red:02x}{self.green:02x}{self.blue:02x}"
assert serialize(RGB, RGB(0, 0, 0)) == "#000000"
# dynamic conversion used to bypass the registered one
assert serialize(RGB, RGB(0, 0, 0), conversion=identity) == {
"red": 0,
"green": 0,
"blue": 0,
}
# Expended bypass form
assert serialize(
RGB, RGB(0, 0, 0), conversion=Conversion(identity, source=RGB, target=RGB)
) == {"red": 0, "green": 0, "blue": 0}
Note
For a more precise selection of bypassed conversion, for tuple
or Union
member for example, it's possible to pass the concerned class as the source and the target of conversion with identity
converter, as shown in the example.
Liskov substitution principle¶
LSP is taken into account when applying dynamic conversion: the serializer source can be a subclass of the actual class and the deserializer target can be a superclass of the actual class.
from dataclasses import dataclass
from apischema import deserialize, serialize
@dataclass
class Foo:
field: int
@dataclass
class Bar(Foo):
other: str
def foo_to_int(foo: Foo) -> int:
return foo.field
def bar_from_int(i: int) -> Bar:
return Bar(i, str(i))
assert serialize(Bar, Bar(0, ""), conversion=foo_to_int) == 0
assert deserialize(Foo, 0, conversion=bar_from_int) == Bar(0, "0")
Generic dynamic conversions¶
Generic
dynamic conversions are supported out of the box. Also, contrary to registered conversions, partially specialized generics are allowed.
from collections.abc import Mapping, Sequence
from operator import itemgetter
from typing import TypeVar
from apischema import serialize
from apischema.json_schema import serialization_schema
T = TypeVar("T")
Priority = int
def sort_by_priority(values_with_priority: Mapping[T, Priority]) -> Sequence[T]:
return [k for k, _ in sorted(values_with_priority.items(), key=itemgetter(1))]
assert serialize(
dict[str, Priority], {"a": 1, "b": 0}, conversion=sort_by_priority
) == ["b", "a"]
assert serialization_schema(dict[str, Priority], conversion=sort_by_priority) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "array",
"items": {"type": "string"},
}
Field conversions¶
It is possible to register a conversion for a particular dataclass field using conversion
metadata.
import os
import time
from dataclasses import dataclass, field
from datetime import datetime
from apischema import deserialize, serialize
from apischema.conversions import Conversion
from apischema.metadata import conversion
# Set UTC timezone for example
os.environ["TZ"] = "UTC"
time.tzset()
from_timestamp = Conversion(datetime.fromtimestamp, source=int, target=datetime)
def to_timestamp(d: datetime) -> int:
return int(d.timestamp())
@dataclass
class Foo:
some_date: datetime = field(metadata=conversion(from_timestamp, to_timestamp))
other_date: datetime
assert deserialize(Foo, {"some_date": 0, "other_date": "2019-10-13"}) == Foo(
datetime(1970, 1, 1), datetime(2019, 10, 13)
)
assert serialize(Foo, Foo(datetime(1970, 1, 1), datetime(2019, 10, 13))) == {
"some_date": 0,
"other_date": "2019-10-13T00:00:00",
}
Note
It's possible to pass a conversion only for deserialization or only for serialization
Serialized method conversions¶
Serialized methods can also have dedicated conversions for their return
import os
import time
from dataclasses import dataclass
from datetime import datetime
from apischema import serialize, serialized
# Set UTC timezone for example
os.environ["TZ"] = "UTC"
time.tzset()
def to_timestamp(d: datetime) -> int:
return int(d.timestamp())
@dataclass
class Foo:
@serialized(conversion=to_timestamp)
def some_date(self) -> datetime:
return datetime(1970, 1, 1)
assert serialize(Foo, Foo()) == {"some_date": 0}
Default conversions¶
As with almost every default behavior in apischema, default conversions can be configured using apischema.settings.deserialization.default_conversion
/apischema.settings.serialization.default_conversion
. The initial value of these settings are the function which retrieved conversions registered with deserializer
/serializer
.
You can for example support attrs classes with this feature:
from typing import Sequence
import attrs
from apischema import deserialize, serialize, settings
from apischema.json_schema import deserialization_schema
from apischema.objects import ObjectField
prev_default_object_fields = settings.default_object_fields
def attrs_fields(cls: type) -> Sequence[ObjectField] | None:
if hasattr(cls, "__attrs_attrs__"):
return [
ObjectField(
a.name, a.type, required=a.default == attrs.NOTHING, default=a.default
)
for a in getattr(cls, "__attrs_attrs__")
]
else:
return prev_default_object_fields(cls)
settings.default_object_fields = attrs_fields
@attrs.define
class Foo:
bar: int
assert deserialize(Foo, {"bar": 0}) == Foo(0)
assert serialize(Foo, Foo(0)) == {"bar": 0}
assert deserialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {"bar": {"type": "integer"}},
"required": ["bar"],
"additionalProperties": False,
}
apischema functions (deserialize
/serialize
/deserialization_schema
/serialization_schema
/definitions_schema
) also have a default_conversion
parameter to dynamically modify default conversions. See FAQ for the difference between conversion
and default_conversion
parameters.
Sub-conversions¶
Sub-conversions are dynamic conversions applied on the result of a conversion.
from dataclasses import dataclass
from typing import Generic, TypeVar
from apischema.conversions import Conversion
from apischema.json_schema import serialization_schema
T = TypeVar("T")
class Query(Generic[T]):
...
def query_to_list(q: Query[T]) -> list[T]:
...
def query_to_scalar(q: Query[T]) -> T | None:
...
@dataclass
class FooModel:
bar: int
class Foo:
def serialize(self) -> FooModel:
...
assert serialization_schema(
Query[Foo], conversion=Conversion(query_to_list, sub_conversion=Foo.serialize)
) == {
# We get an array of Foo
"type": "array",
"items": {
"type": "object",
"properties": {"bar": {"type": "integer"}},
"required": ["bar"],
"additionalProperties": False,
},
"$schema": "http://json-schema.org/draft/2020-12/schema#",
}
Sub-conversions can also be used to bypass registered conversions or to define recursive conversions.
Lazy/recursive conversions¶
Conversions can be defined lazily, i.e. using a function returning Conversion
(single, or a tuple of it); this function must be wrapped into a apischema.conversions.LazyConversion
instance.
It allows creating recursive conversions or using a conversion object which can be modified after its definition (for example a conversion for a base class modified by __init_subclass__
)
It is used by apischema itself for the generated JSON schema. It is indeed a recursive data, and the different versions are handled by a conversion with a lazy recursive sub-conversion.
from dataclasses import dataclass
from apischema import serialize
from apischema.conversions import Conversion, LazyConversion
@dataclass
class Foo:
elements: list["int | Foo"]
def foo_elements(foo: Foo) -> list[int | Foo]:
return foo.elements
# Recursive conversion pattern
tmp = None
conversion = Conversion(foo_elements, sub_conversion=LazyConversion(lambda: tmp))
tmp = conversion
assert serialize(Foo, Foo([0, Foo([1])]), conversion=conversion) == [0, [1]]
# Without the recursive sub-conversion, it would have been:
assert serialize(Foo, Foo([0, Foo([1])]), conversion=foo_elements) == [
0,
{"elements": [1]},
]
Lazy registered conversions¶
Lazy conversions can also be registered, but the deserialization target/serialization source has to be passed too.
from dataclasses import dataclass
from apischema import deserialize, deserializer, serialize, serializer
from apischema.conversions import Conversion
@dataclass
class Foo:
bar: int
deserializer(
lazy=lambda: Conversion(lambda bar: Foo(bar), source=int, target=Foo), target=Foo
)
serializer(
lazy=lambda: Conversion(lambda foo: foo.bar, source=Foo, target=int), source=Foo
)
assert deserialize(Foo, 0) == Foo(0)
assert serialize(Foo, Foo(0)) == 0
Conversion helpers¶
String conversions¶
A common pattern of conversion concerns classes that have a string constructor and a __str__
method, for example standard types uuid.UUID
, pathlib.Path
, or ipaddress.IPv4Address
. Using apischema.conversions.as_str
will register a string-deserializer from the constructor and a string-serializer from the __str__
method. ValueError
raised by the constructor is caught and converted to ValidationError
.
import bson
import pytest
from apischema import Unsupported, deserialize, serialize
from apischema.conversions import as_str
with pytest.raises(Unsupported):
deserialize(bson.ObjectId, "0123456789ab0123456789ab")
with pytest.raises(Unsupported):
serialize(bson.ObjectId, bson.ObjectId("0123456789ab0123456789ab"))
as_str(bson.ObjectId)
assert deserialize(bson.ObjectId, "0123456789ab0123456789ab") == bson.ObjectId(
"0123456789ab0123456789ab"
)
assert (
serialize(bson.ObjectId, bson.ObjectId("0123456789ab0123456789ab"))
== "0123456789ab0123456789ab"
)
Note
Previously mentioned standard types are handled by apischema using as_str
.
ValueErrorCatching¶
Converters can be wrapped with apischema.conversions.catch_value_error
in order to catch ValueError
and reraise it as a ValidationError
. It's notably used but as_str
and other standard types.
Note
This wrapper is in fact inlined in deserialization, so it has better performance than writing the try-catch in the code.
Use Enum
names¶
Enum
subclasses are (de)serialized using values. However, you may want to use enumeration names instead, that's why apischema provides apischema.conversion.as_names
to decorate Enum
subclasses.
from enum import Enum
from apischema import deserialize, serialize
from apischema.conversions import as_names
from apischema.json_schema import deserialization_schema, serialization_schema
@as_names
class MyEnum(Enum):
FOO = object()
BAR = object()
assert deserialize(MyEnum, "FOO") == MyEnum.FOO
assert serialize(MyEnum, MyEnum.FOO) == "FOO"
assert (
deserialization_schema(MyEnum)
== serialization_schema(MyEnum)
== {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "string",
"enum": ["FOO", "BAR"],
}
)
Class as union of its subclasses¶
Object deserialization — transform function into a dataclass deserializer¶
apischema.objects.object_deserialization
can convert a function into a new function taking a unique parameter, a dataclass whose fields are mapped from the original function parameters.
It can be used for example to build a deserialization conversion from an alternative constructor.
from apischema import deserialize, deserializer, type_name
from apischema.json_schema import deserialization_schema
from apischema.objects import object_deserialization
def create_range(start: int, stop: int, step: int = 1) -> range:
return range(start, stop, step)
range_conv = object_deserialization(create_range, type_name("Range"))
# Conversion can be registered
deserializer(range_conv)
assert deserialize(range, {"start": 0, "stop": 10}) == range(0, 10)
assert deserialization_schema(range) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {
"start": {"type": "integer"},
"stop": {"type": "integer"},
"step": {"type": "integer", "default": 1},
},
"required": ["start", "stop"],
"additionalProperties": False,
}
Note
Parameters metadata can be specified using typing.Annotated
, or be passed with parameters_metadata
parameter, which is a mapping of parameter names as key and mapped metadata as value.
Object serialization — select only a subset of fields¶
apischema.objects.object_serialization
can be used to serialize only a subset of an object fields and methods.
from dataclasses import dataclass
from typing import Any
from apischema import alias, serialize, type_name
from apischema.json_schema import JsonSchemaVersion, definitions_schema
from apischema.objects import get_field, object_serialization
@dataclass
class Data:
id: int
content: str
@property
def size(self) -> int:
return len(self.content)
def get_details(self) -> Any:
...
# Serialization fields can be a str/field or a function/method/property
size_only = object_serialization(
Data, [get_field(Data).id, Data.size], type_name("DataSize")
)
# ["id", Data.size] would also work
def complete_data():
return [
..., # shortcut to include all the fields
Data.size,
(Data.get_details, alias("details")), # add/override metadata using tuple
]
# Serialization fields computation can be deferred in a function
# The serialization name will then be defaulted to the function name
complete = object_serialization(Data, complete_data)
data = Data(0, "data")
assert serialize(Data, data, conversion=size_only) == {"id": 0, "size": 4}
assert serialize(Data, data, conversion=complete) == {
"id": 0,
"content": "data",
"size": 4,
"details": None, # because get_details return None in this example
}
assert definitions_schema(
serialization=[(Data, size_only), (Data, complete)],
version=JsonSchemaVersion.OPEN_API_3_0,
) == {
"DataSize": {
"type": "object",
"properties": {"id": {"type": "integer"}, "size": {"type": "integer"}},
"required": ["id", "size"],
"additionalProperties": False,
},
"CompleteData": {
"type": "object",
"properties": {
"id": {"type": "integer"},
"content": {"type": "string"},
"size": {"type": "integer"},
"details": {},
},
"required": ["id", "content", "size", "details"],
"additionalProperties": False,
},
}
FAQ¶
What's the difference between conversion
and default_conversion
parameters?¶
Dynamic conversions (conversion
parameter) exists to ensure consistency and reuse of subschemas referenced (with a $ref
) in the JSON/OpenAPI schema.
In fact, different global conversions (default_conversion
parameter) could lead to having a field with different schemas depending on global conversions, so a class would not be able to be referenced consistently. Because dynamic conversions are local, they cannot mess with an object field schema.
Schema generation uses the same default conversions for all definitions (which can have associated dynamic conversion).
default_conversion
parameter allows having different (de)serialization contexts, for example to map date to string between frontend and backend, and to timestamp between backend services.