Data model¶
apischema handles every class/type you need.
By the way, it's done in an additive way, meaning that it doesn't affect your types.
PEP 585¶
With Python 3.9 and PEP 585, typing is substantially shaken up; all container types of typing
module are now deprecated.
apischema fully support 3.9 and PEP 585, as shown in the different examples. However, typing
containers can still be used, especially/necessarily when using an older version.
Dataclasses¶
Because the library aims to bring the minimum boilerplate, it's built on the top of standard library. Dataclasses are thus the core structure of the data model.
Dataclasses bring the possibility of field customization, with more than just a default value. In addition to the common parameters of dataclasses.field
, customization is done with the metadata
parameter; metadata can also be passed using PEP 593 typing.Annotated
.
With some teasing of features presented later:
from dataclasses import dataclass, field
from typing import Annotated
from apischema import alias, schema
from apischema.metadata import required
@dataclass
class Foo:
bar: int = field(
default=0,
metadata=alias("foo_bar") | schema(title="foo! bar!", min=0, max=42) | required,
)
baz: Annotated[
int, alias("foo_baz"), schema(title="foo! baz!", min=0, max=32), required
] = 0
# pipe `|` operator can also be used in Annotated
Note
Field's metadata are just an ordinary dict
; apischema provides some functions to enrich these metadata with its own keys (alias("foo_bar")
is roughly equivalent to `{"_apischema_alias": "foo_bar"}) and use them when the time comes, but metadata are not reserved to apischema and other keys can be added.
Because PEP 584 is painfully missing before Python 3.9, apischema metadata use their own subclass of dict
just to add |
operator for convenience in all Python versions.
Dataclasses __post_init__
and field(init=False)
are fully supported. Implications of this feature usage are documented in the relative sections.
Warning
Before 3.8, InitVar
is doing type erasure, which is why it's not possible for apischema to retrieve type information of init variables. To fix this behavior, a field metadata init_var
can be used to put back the type of the field (init_var
also accepts stringified type annotations).
Dataclass-like types (attrs/SQLAlchemy/etc.) can also be supported with a few lines of code, see next section
Standard library types¶
apischema natively handles most of the types provided by the standard library. They are sorted in the following categories:
Primitive¶
str
, int
, float
, bool
, None
, subclasses of them
They correspond to JSON primitive types.
Collection¶
collection.abc.Collection
(typing.Collection
)collection.abc.Sequence
(typing.Sequence
)tuple
(typing.Tuple
)collection.abc.MutableSequence
(typing.MutableSequence
)list
(typing.List
)collection.abc.Set
(typing.AbstractSet
)collection.abc.MutableSet
(typing.MutableSet
)frozenset
(typing.FrozenSet
)set
(typing.Set
)
They correspond to JSON array and are serialized to list
.
Mapping¶
collection.abc.Mapping
(typing.Mapping
)collection.abc.MutableMapping
(typing.MutableMapping
)dict
(typing.Dict
)
They correpond to JSON object and are serialized to dict
.
Enumeration¶
enum.Enum
subclasses, typing.Literal
Warning
Enum
subclasses are (de)serialized using values, not names. apischema also provides a conversion to use names instead.
Typing facilities¶
typing.Optional
/typing.Union
(Optional[T]
is strictly equivalent toUnion[T, None]
)
: Deserialization select the first matching alternative; unsupported alternatives are ignored
tuple
(typing.Tuple
)
: Can be used as collection as well as true tuple, like tuple[str, int]
typing.NewType
: Serialized according to its base type
typing.NamedTuple
: Handled as an object type, roughly like a dataclass; fields metadata can be passed using Annotated
typing.TypedDict
: Hanlded as an object type, but with a dictionary shape; fields metadata can be passed using Annotated
typing.Any
: Untouched by deserialization, serialized according to the object runtime class
Other standard library types¶
bytes
: with str
(de)serialization using base64 encoding
datetime.datetime
datetime.date
datetime.time
: Supported only in 3.7+ with fromisoformat
/isoformat
Decimal
: With float
(de)serialization
ipaddress.IPv4Address
ipaddress.IPv4Interface
ipaddress.IPv4Network
ipaddress.IPv6Address
ipaddress.IPv6Interface
ipaddress.IPv6Network
pathlib.Path
re.Pattern
(typing.Pattern
)uuid.UUID
: With str
(de)serialization
Generic¶
typing.Generic
can be used out of the box like in the following example:
from dataclasses import dataclass
from typing import Generic, TypeVar
import pytest
from apischema import ValidationError, deserialize
T = TypeVar("T")
@dataclass
class Box(Generic[T]):
content: T
assert deserialize(Box[str], {"content": "void"}) == Box("void")
with pytest.raises(ValidationError):
deserialize(Box[str], {"content": 42})
Warning
Generic types don't have default type name (used in JSON/GraphQL schema) — should Group[Foo]
be named GroupFoo
/FooGroup
/something else? — so they require by-class or default type_name
assignment.
Recursive types, string annotations and PEP 563¶
Recursive classes can be typed as they usually do, with or without PEP 563. Here with string annotations:
from dataclasses import dataclass
from typing import Optional
from apischema import deserialize
@dataclass
class Node:
value: int
child: Optional["Node"] = None
assert deserialize(Node, {"value": 0, "child": {"value": 1}}) == Node(0, Node(1))
from __future__ import annotations
from dataclasses import dataclass
from apischema import deserialize
@dataclass
class Node:
value: int
child: Node | None = None
assert deserialize(Node, {"value": 0, "child": {"value": 1}}) == Node(0, Node(1))
Warning
To resolve annotations, apischema uses typing.get_type_hints
; this doesn't work really well when used on objects defined outside of global scope.
Warning (minor)
Currently, PEP 585 can have surprising behavior when used outside the box, see bpo-41370
null
vs. undefined
¶
Contrary to Javascript, Python doesn't have an undefined
equivalent (if we consider None
to be the equivalent of null
). But it can be useful to distinguish (especially when thinking about HTTP PATCH
method) between a null
field and an undefined
/absent field.
That's why apischema provides an Undefined
constant (a single instance of UndefinedType
class) which can be used as a default value everywhere where this distinction is needed. In fact, default values are used when field are absent, thus a default Undefined
will mark the field as absent.
Dataclass/NamedTuple
fields are ignored by serialization when Undefined
.
from dataclasses import dataclass
from apischema import Undefined, UndefinedType, deserialize, serialize
from apischema.json_schema import deserialization_schema
@dataclass
class Foo:
bar: int | UndefinedType = Undefined
baz: int | UndefinedType | None = Undefined
assert deserialize(Foo, {"bar": 0, "baz": None}) == Foo(0, None)
assert deserialize(Foo, {}) == Foo(Undefined, Undefined)
assert serialize(Foo, Foo(Undefined, 42)) == {"baz": 42}
# Foo.bar and Foo.baz are not required
assert deserialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {"bar": {"type": "integer"}, "baz": {"type": ["integer", "null"]}},
"additionalProperties": False,
}
Note
UndefinedType
must only be used inside an Union
, as it has no sense as a standalone type. By the way, no suitable name was found to shorten Union[T, UndefinedType]
but propositions are welcomed.
Note
Undefined
is a falsy constant, i.e. bool(Undefined) is False
.
Use None
as if it was Undefined
¶
Using None
can be more convenient than Undefined
as a placeholder for missing value, but Optional
types are translated to nullable fields.
That's why apischema provides none_as_undefined
metadata, allowing None
to be handled as if it was Undefined
: type will not be nullable and field not serialized if its value is None
.
from dataclasses import dataclass, field
import pytest
from apischema import ValidationError, deserialize, serialize
from apischema.json_schema import deserialization_schema, serialization_schema
from apischema.metadata import none_as_undefined
@dataclass
class Foo:
bar: str | None = field(default=None, metadata=none_as_undefined)
assert (
deserialization_schema(Foo)
== serialization_schema(Foo)
== {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {"bar": {"type": "string"}},
"additionalProperties": False,
}
)
with pytest.raises(ValidationError):
deserialize(Foo, {"bar": None})
assert serialize(Foo, Foo(None)) == {}
Annotated - PEP 593¶
PEP 593 is fully supported; annotations stranger to apischema are simply ignored.
Custom types¶
apischema can support almost all of your custom types in a few lines of code, using the conversion feature. However, it also provides a simple and direct way to support dataclass-like types, as presented below.
Otherwise, when apischema encounters a type that it doesn't support, apischema.Unsupported
exception will be raised.
Note
In the rare case when a union member should be ignored by apischema, it's possible to use mark it as unsupported using Union[Foo, Annotated[Bar, Unsupported]]
.
Dataclass-like types, aka object types¶
Internally, apischema handle standard object types — dataclasses, named tuple and typed dictionary — the same way by mapping them to a set of apischema.objects.ObjectField
, which has the following definition:
@dataclass(frozen=True)
class ObjectField:
name: str # field's name
type: Any # field's type
required: bool = True # if the field is required
metadata: Mapping[str, Any] = field(default_factory=dict) # field's metadata
default: InitVar[Any] = ... # field's default value
default_factory: Optional[Callable[[], Any]] = None # field's default factory
kind: FieldKind = FieldKind.NORMAL # NORMAL/READ_ONLY/WRITE_ONLY
Thus, support of dataclass-like types (attrs, SQLAlchemy traditional mappers, etc.) can be achieved by mapping the concerned class to its own list of ObjectField
s; this is done using apischema.objects.set_object_fields
.
from apischema import deserialize, serialize
from apischema.json_schema import deserialization_schema
from apischema.objects import ObjectField, set_object_fields
class Foo:
def __init__(self, bar):
self.bar = bar
set_object_fields(Foo, [ObjectField("bar", int)])
# Fields can also be passed in a factory
set_object_fields(Foo, lambda: [ObjectField("bar", int)])
foo = deserialize(Foo, {"bar": 0})
assert isinstance(foo, Foo) and foo.bar == 0
assert serialize(Foo, Foo(0)) == {"bar": 0}
assert deserialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {"bar": {"type": "integer"}},
"required": ["bar"],
"additionalProperties": False,
}
Another way to set object fields is to directly modify apischema default behavior, using apischema.settings.default_object_fields
.
Note
set_object_fields
/settings.default_object_fields
can be used to override existing fields. Current fields can be retrieved using apischema.objects.object_fields
.
from collections.abc import Sequence
from typing import Optional
from apischema import settings
from apischema.objects import ObjectField
previous_default_object_fields = settings.default_object_field
def default_object_fields(cls) -> Optional[Sequence[ObjectField]]:
return [...] if ... else previous_default_object_fields(cls)
settings.default_object_fields = default_object_fields
Note
Almost every default behavior of apischema can be customized using apischema.settings
.
Examples of SQLAlchemy support and attrs support illustrate both methods (which could also be combined).
Skip field¶
Dataclass fields can be excluded from apischema processing by using apischema.metadata.skip
in the field metadata. It can be parametrized with deserialization
/serialization
boolean parameters to skip a field only for the given operations.
from dataclasses import dataclass, field
from typing import Any
from apischema.json_schema import deserialization_schema, serialization_schema
from apischema.metadata import skip
@dataclass
class Foo:
bar: Any
deserialization_only: Any = field(metadata=skip(serialization=True))
serialization_only: Any = field(default=None, metadata=skip(deserialization=True))
baz: Any = field(default=None, metadata=skip)
assert deserialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {"bar": {}, "deserialization_only": {}},
"required": ["bar", "deserialization_only"],
"additionalProperties": False,
}
assert serialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {"bar": {}, "serialization_only": {}},
"required": ["bar", "serialization_only"],
"additionalProperties": False,
}
Note
Fields skipped in deserialization should have a default value if deserialized, because deserialization of the class could raise otherwise.
Skip field serialization depending on condition¶
Field can also be skipped when serializing, depending on the condition given by serialization_if
, or when the field value is equal to its default value with serialization_default=True
.
from dataclasses import dataclass, field
from typing import Any
from apischema import serialize
from apischema.metadata import skip
@dataclass
class Foo:
bar: Any = field(metadata=skip(serialization_if=lambda x: not x))
baz: Any = field(default_factory=list, metadata=skip(serialization_default=True))
assert serialize(Foo(False, [])) == {}
Composition over inheritance - composed dataclasses flattening¶
Dataclass fields which are themselves dataclass can be "flattened" into the owning one by using flatten
metadata. Then, when the class is (de)serialized, "flattened" fields will be (de)serialized at the same level as the owning class.
from dataclasses import dataclass, field
from apischema import Undefined, UndefinedType, alias, deserialize, serialize
from apischema.fields import with_fields_set
from apischema.json_schema import deserialization_schema
from apischema.metadata import flatten
@dataclass
class JsonSchema:
title: str | UndefinedType = Undefined
description: str | UndefinedType = Undefined
format: str | UndefinedType = Undefined
...
@with_fields_set
@dataclass
class RootJsonSchema:
schema: str | UndefinedType = field(default=Undefined, metadata=alias("$schema"))
defs: list[JsonSchema] = field(default_factory=list, metadata=alias("$defs"))
# This field schema is flattened inside the owning one
json_schema: JsonSchema = field(default_factory=JsonSchema, metadata=flatten)
data = {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"title": "flattened example",
}
root_schema = RootJsonSchema(
schema="http://json-schema.org/draft/2020-12/schema#",
json_schema=JsonSchema(title="flattened example"),
)
assert deserialize(RootJsonSchema, data) == root_schema
assert serialize(RootJsonSchema, root_schema) == data
assert deserialization_schema(RootJsonSchema) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"$defs": {
"JsonSchema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"description": {"type": "string"},
"format": {"type": "string"},
},
"additionalProperties": False,
}
},
# It results in allOf + unevaluatedProperties=False
"allOf": [
# RootJsonSchema (without JsonSchema)
{
"type": "object",
"properties": {
"$schema": {"type": "string"},
"$defs": {
"type": "array",
"items": {"$ref": "#/$defs/JsonSchema"},
"default": [],
},
},
"additionalProperties": False,
},
# JonsSchema
{"$ref": "#/$defs/JsonSchema"},
],
"unevaluatedProperties": False,
}
Note
Generated JSON schema use unevaluatedProperties
keyword.
This feature is very convenient for building model by composing smaller components. If some kind of reuse could also be achieved with inheritance, it can be less practical when it comes to use it in code, because there is no easy way to build an inherited class when you have an instance of the super class; you have to copy all the fields by hand. On the other hand, using composition (of flattened fields), it's easy to instantiate the class when the smaller component is just a field of it.
FAQ¶
Why isn't Iterable
handled with other collection types?¶
Iterable could be handled (actually, it was at the beginning), however, this doesn't really make sense from a data point of view. Iterables are computation objects, they can be infinite, etc. They don't correspond to a serialized data; Collection
is way more appropriate in this context.
What happens if I override dataclass __init__
?¶
apischema always assumes that dataclass __init__
can be called with all its fields as kwargs parameters. If that's no longer the case after a modification of __init__
(what means if an exception is thrown when the constructor is called because of bad parameters), apischema treats then the class as not supported.