JSON schema¶
JSON schema generation¶
JSON schema can be generated from data model. However, because of all possible customizations, the schema can differ between deserilialization and serialization. In common cases, deserialization_schema
and serialization_schema
will give the same result.
from dataclasses import dataclass
from apischema.json_schema import deserialization_schema, serialization_schema
@dataclass
class Foo:
bar: str
assert deserialization_schema(Foo) == serialization_schema(Foo)
assert deserialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"additionalProperties": False,
"properties": {"bar": {"type": "string"}},
"required": ["bar"],
"type": "object",
}
Field alias¶
Sometimes dataclass field names can clash with a language keyword, sometimes the property name is not convenient. Hopefully, field can define an alias
which will be used in schema and deserialization/serialization.
from dataclasses import dataclass, field
from apischema import alias, deserialize, serialize
from apischema.json_schema import deserialization_schema
@dataclass
class Foo:
class_: str = field(metadata=alias("class"))
assert deserialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"additionalProperties": False,
"properties": {"class": {"type": "string"}},
"required": ["class"],
"type": "object",
}
assert deserialize(Foo, {"class": "bar"}) == Foo("bar")
assert serialize(Foo, Foo("bar")) == {"class": "bar"}
Alias all fields¶
Field aliasing can also be done at class level by specifying an aliasing function. This aliaser is applied to field alias if defined or field name, or not applied if override=False
is specified.
from dataclasses import dataclass, field
from typing import Any
from apischema import alias
from apischema.json_schema import deserialization_schema
@alias(lambda s: f"foo_{s}")
@dataclass
class Foo:
field1: Any
field2: Any = field(metadata=alias(override=False))
field3: Any = field(metadata=alias("field03"))
field4: Any = field(metadata=alias("field04", override=False))
assert deserialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"additionalProperties": False,
"properties": {"foo_field1": {}, "field2": {}, "foo_field03": {}, "field04": {}},
"required": ["foo_field1", "field2", "foo_field03", "field04"],
"type": "object",
}
Class-level aliasing can be used to define a camelCase API.
Dynamic aliasing and default aliaser¶
apischema operations deserialize
/serialize
/deserialization_schema
/serialization_schema
provide an aliaser
parameter which will be applied on every fields being processed in this operation.
Similar to strictness configuration
, this parameter has a default value controlled by apischema.settings.aliaser
.
It can be used for example to make all an application use camelCase. Actually, there is a shortcut for that:
Otherwise, it's used the same way than settings.coercer
.
Note
Dynamic aliaser ignores override=False
Schema annotations¶
Type annotations are not enough to express a complete schema, but apischema has a function for that; schema
can be used both as type decorator or field metadata.
from dataclasses import dataclass, field
from typing import NewType
from apischema import schema
from apischema.json_schema import deserialization_schema
Tag = NewType("Tag", str)
schema(min_len=3, pattern=r"^\w*$", examples=["available", "EMEA"])(Tag)
@dataclass
class Resource:
id: int
tags: list[Tag] = field(
default_factory=list,
metadata=schema(
description="regroup multiple resources", max_items=3, unique=True
),
)
assert deserialization_schema(Resource) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"additionalProperties": False,
"properties": {
"id": {"type": "integer"},
"tags": {
"description": "regroup multiple resources",
"items": {
"examples": ["available", "EMEA"],
"minLength": 3,
"pattern": "^\\w*$",
"type": "string",
},
"maxItems": 3,
"type": "array",
"uniqueItems": True,
"default": [],
},
},
"required": ["id"],
"type": "object",
}
Note
Schema are particularly useful with NewType
. For example, if you use prefixed ids, you can use a NewType
with a pattern
schema to validate them, and benefit of more precise type checking.
The following keys are available (they are sometimes shorten compared to JSON schema original for code concision and snake_case):
Key | JSON schema keyword | type restriction |
---|---|---|
title | / | / |
description | / | / |
default | / | / |
examples | / | / |
min | minimum | int |
max | maximum | int |
exc_min | exclusiveMinimum | int |
exc_max | exclusiveMaximum | int |
mult_of | multipleOf | int |
format | / | str |
media_type | contentMediaType | str |
encoding | contentEncoding | str |
min_len | minLength | str |
max_len | maxLength | str |
pattern | / | str |
min_items | minItems | list |
max_items | maxItems | list |
unique | / | list |
min_props | minProperties | dict |
max_props | maxProperties | dict |
Note
In case of field schema, field default value will be serialized (if possible) to add default
keyword to the schema.
Constraints validation¶
JSON schema constrains the data deserialized; these constraints are naturally used for validation.
from dataclasses import dataclass, field
from typing import NewType
from pytest import raises
from apischema import ValidationError, deserialize, schema
Tag = NewType("Tag", str)
schema(min_len=3, pattern=r"^\w*$", examples=["available", "EMEA"])(Tag)
@dataclass
class Resource:
id: int
tags: list[Tag] = field(
default_factory=list,
metadata=schema(
description="regroup multiple resources", max_items=3, unique=True
),
)
with raises(ValidationError) as err: # pytest check exception is raised
deserialize(
Resource, {"id": 42, "tags": ["tag", "duplicate", "duplicate", "bad&", "_"]}
)
assert err.value.errors == [
{"loc": ["tags"], "msg": "item count greater than 3 (maxItems)"},
{"loc": ["tags"], "msg": "duplicate items (uniqueItems)"},
{"loc": ["tags", 3], "msg": "not matching '^\\w*$' (pattern)"},
{"loc": ["tags", 4], "msg": "string length lower than 3 (minLength)"},
]
Extra schema¶
schema
has two other arguments: extra
and override
, which give a finer control of the JSON schema generated: extra
and override
. It can be used for example to build "strict" unions (using oneOf
instead of anyOf
)
from dataclasses import dataclass
from typing import Annotated, Any
from apischema import schema
from apischema.json_schema import deserialization_schema
# schema extra can be callable to modify the schema in place
def to_one_of(schema: dict[str, Any]):
if "anyOf" in schema:
schema["oneOf"] = schema.pop("anyOf")
OneOf = schema(extra=to_one_of)
# or extra can be a dictionary which will update the schema
@schema(
extra={"$ref": "http://some-domain.org/path/to/schema.json#/$defs/Foo"},
override=True, # override apischema generated schema, using only extra
)
@dataclass
class Foo:
bar: int
# Use Annotated with OneOf to make a "strict" Union
assert deserialization_schema(Annotated[Foo | int, OneOf]) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"oneOf": [ # oneOf instead of anyOf
{"$ref": "http://some-domain.org/path/to/schema.json#/$defs/Foo"},
{"type": "integer"},
],
}
Base schema
¶
apischema.settings.base_schema
can be used to define "base schema" for the different kind of objects: types, object fields or (serialized) methods.
from dataclasses import dataclass, field
from typing import Any, Callable, get_origin
import docstring_parser
from apischema import schema, serialized, settings
from apischema.json_schema import serialization_schema
from apischema.schemas import Schema
from apischema.type_names import get_type_name
@dataclass
class Foo:
"""Foo class
:var bar: bar attribute"""
bar: str = field(metadata=schema(max_len=10))
@serialized
@property
def baz(self) -> int:
"""baz method"""
...
def type_base_schema(tp: Any) -> Schema | None:
if not hasattr(tp, "__doc__"):
return None
return schema(
title=get_type_name(tp).json_schema,
description=docstring_parser.parse(tp.__doc__).short_description,
)
def field_base_schema(tp: Any, name: str, alias: str) -> Schema | None:
title = alias.replace("_", " ").capitalize()
tp = get_origin(tp) or tp # tp can be generic
for meta in docstring_parser.parse(tp.__doc__).meta:
if meta.args == ["var", name]:
return schema(title=title, description=meta.description)
return schema(title=title)
def method_base_schema(tp: Any, method: Callable, alias: str) -> Schema | None:
return schema(
title=alias.replace("_", " ").capitalize(),
description=docstring_parser.parse(method.__doc__).short_description,
)
settings.base_schema.type = type_base_schema
settings.base_schema.field = field_base_schema
settings.base_schema.method = method_base_schema
assert serialization_schema(Foo) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"additionalProperties": False,
"title": "Foo",
"description": "Foo class",
"properties": {
"bar": {
"description": "bar attribute",
"title": "Bar",
"type": "string",
"maxLength": 10,
},
"baz": {"description": "baz method", "title": "Baz", "type": "integer"},
},
"required": ["bar", "baz"],
"type": "object",
}
Base schema will be merged with schema
defined at type/field/method level.
Required field with default value¶
By default, a dataclass/namedtuple field will be tagged required
if it doesn't have a default value.
However, you may want to have a default value for a field in order to be more convenient in your code, but still make the field required. One could think about some schema model where version is fixed but is required, for example JSON-RPC with "jsonrpc": "2.0"
. That's done with field metadata required
.
from dataclasses import dataclass, field
from pytest import raises
from apischema import ValidationError, deserialize
from apischema.metadata import required
@dataclass
class Foo:
bar: int | None = field(default=None, metadata=required)
with raises(ValidationError) as err:
deserialize(Foo, {})
assert err.value.errors == [{"loc": ["bar"], "msg": "missing property"}]
Additional properties / pattern properties¶
With Mapping
¶
Schema of a Mapping
/dict
type is naturally translated to "additionalProperties": <schema of the value type>
.
However when the schema of the key has a pattern
, it will give a "patternProperties": {<key pattern>: <schema of the value type>}
With dataclass¶
additionalProperties
/patternProperties
can be added to dataclasses by using fields annotated with properties
metadata. Properties not mapped on regular fields will be deserialized into this fields; they must have a Mapping
type, or be deserializable from a Mapping
, because they are instantiated with a mapping.
from collections.abc import Mapping
from dataclasses import dataclass, field
from typing import Annotated
from apischema import deserialize, properties, schema
from apischema.json_schema import deserialization_schema
@dataclass
class Config:
active: bool = True
server_options: Mapping[str, bool] = field(
default_factory=dict, metadata=properties(pattern=r"^server_")
)
client_options: Mapping[
Annotated[str, schema(pattern=r"^client_")], bool # noqa: F722
] = field(default_factory=dict, metadata=properties(...))
options: Mapping[str, bool] = field(default_factory=dict, metadata=properties)
assert deserialize(
Config,
{"use_lightsaber": True, "server_auto_restart": False, "client_timeout": False},
) == Config(
True,
{"server_auto_restart": False},
{"client_timeout": False},
{"use_lightsaber": True},
)
assert deserialization_schema(Config) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"type": "object",
"properties": {"active": {"type": "boolean", "default": True}},
"additionalProperties": {"type": "boolean"},
"patternProperties": {
"^server_": {"type": "boolean"},
"^client_": {"type": "boolean"},
},
}
Note
Of course, a dataclass can only have a single properties
field without pattern, because it makes no sens to have several additionalProperties
.
Property dependencies¶
apischema supports property dependencies for dataclass through a class member. Dependencies are also used in validation.
from dataclasses import dataclass, field
from pytest import raises
from apischema import ValidationError, dependent_required, deserialize
from apischema.json_schema import deserialization_schema
from apischema.skip import NotNull
@dataclass
class Billing:
name: str
# Fields used in dependencies MUST be declared with `field`
credit_card: NotNull[int] = field(default=None)
billing_address: NotNull[str] = field(default=None)
dependencies = dependent_required({credit_card: [billing_address]})
# it can also be done outside the class with
# dependent_required({"credit_card": ["billing_address"]}, owner=Billing)
assert deserialization_schema(Billing) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"additionalProperties": False,
"dependentRequired": {"credit_card": ["billing_address"]},
"properties": {
"name": {"type": "string"},
"credit_card": {"type": "integer"},
"billing_address": {"type": "string"},
},
"required": ["name"],
"type": "object",
}
with raises(ValidationError) as err:
deserialize(Billing, {"name": "Anonymous", "credit_card": 1234_5678_9012_3456})
assert err.value.errors == [
{
"loc": ["billing_address"],
"msg": "missing property (required by ['credit_card'])",
}
]
Because bidirectional dependencies are a common idiom, apischema provides a shortcut notation; it's indeed possible to write dependent_required([credit_card, billing_adress])
.
JSON schema reference¶
For complex schema with type reuse, it's convenient to extract definitions of schema components in order to reuse them — it's even mandatory for recursive types; JSON schema use JSON pointers "$ref" to refer to the definitions. apischema handles this feature natively.
from dataclasses import dataclass
from typing import Optional
from apischema.json_schema import deserialization_schema
@dataclass
class Node:
value: int
child: Optional["Node"] = None
assert deserialization_schema(Node) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"$ref": "#/$defs/Node",
"$defs": {
"Node": {
"type": "object",
"properties": {
"value": {"type": "integer"},
"child": {
"anyOf": [{"$ref": "#/$defs/Node"}, {"type": "null"}],
"default": None,
},
},
"required": ["value"],
"additionalProperties": False,
}
},
}
Use reference only for reused types¶
apischema can control the reference use through the boolean all_ref
parameter of deserialization_schema
/serialization_schema
:
all_refs=True
-> all types with a reference will be put in the definitions and referenced with$ref
;all_refs=False
-> only types which are reused in the schema are put in definitions
all_refs
default value depends on the JSON schema version: it's False
for JSON schema drafts but True
for OpenAPI.
from dataclasses import dataclass
from apischema.json_schema import deserialization_schema
@dataclass
class Bar:
baz: str
@dataclass
class Foo:
bar1: Bar
bar2: Bar
assert deserialization_schema(Foo, all_refs=False) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"$defs": {
"Bar": {
"additionalProperties": False,
"properties": {"baz": {"type": "string"}},
"required": ["baz"],
"type": "object",
}
},
"additionalProperties": False,
"properties": {"bar1": {"$ref": "#/$defs/Bar"}, "bar2": {"$ref": "#/$defs/Bar"}},
"required": ["bar1", "bar2"],
"type": "object",
}
assert deserialization_schema(Foo, all_refs=True) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"$defs": {
"Bar": {
"additionalProperties": False,
"properties": {"baz": {"type": "string"}},
"required": ["baz"],
"type": "object",
},
"Foo": {
"additionalProperties": False,
"properties": {
"bar1": {"$ref": "#/$defs/Bar"},
"bar2": {"$ref": "#/$defs/Bar"},
},
"required": ["bar1", "bar2"],
"type": "object",
},
},
"$ref": "#/$defs/Foo",
}
Set reference name¶
In the previous examples, types were referenced using their name. This is indeed the default behavior for every classes/NewType
s (except primitive int
/str
/bool
/float
).
It's possible to override the default reference name using apischema.type_name
; passing None
instead of a string will remove the reference, making the type unable to be referenced as a separate definition in the schema.
from dataclasses import dataclass
from typing import Annotated
from apischema import type_name
from apischema.json_schema import deserialization_schema
# Type name can be added as a decorator
@type_name("Resource")
@dataclass
class BaseResource:
id: int
# or using typing.Annotated
tags: Annotated[set[str], type_name("ResourceTags")]
assert deserialization_schema(BaseResource, all_refs=True) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"$defs": {
"Resource": {
"type": "object",
"properties": {
"id": {"type": "integer"},
"tags": {"$ref": "#/$defs/ResourceTags"},
},
"required": ["id", "tags"],
"additionalProperties": False,
},
"ResourceTags": {
"type": "array",
"items": {"type": "string"},
"uniqueItems": True,
},
},
"$ref": "#/$defs/Resource",
}
Note
Builtin collections are interchangeable when a type_name is registered. For example, if a name is registered for list[Foo]
, this name will also be used for Sequence[Foo]
or Collection[Foo]
.
Generic aliases can have a type name, but they need to be specialized; Foo[T, int]
cannot have a type name but Foo[str, int]
can. However, generic classes can get a dynamic type name depending on their generic argument, passing a name factory to type_name
:
from dataclasses import dataclass, field
from typing import Generic, TypeVar
from apischema import type_name
from apischema.json_schema import deserialization_schema
from apischema.metadata import flatten
T = TypeVar("T")
# Type name factory takes the type and its arguments as (positional) parameters
@type_name(lambda tp, arg: f"{arg.__name__}Resource")
@dataclass
class Resource(Generic[T]):
id: int
content: T = field(metadata=flatten)
...
@dataclass
class Foo:
bar: str
assert deserialization_schema(Resource[Foo], all_refs=True) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"$ref": "#/$defs/FooResource",
"$defs": {
"FooResource": {
"type": "object",
"allOf": [
{
"type": "object",
"properties": {"id": {"type": "integer"}},
"required": ["id"],
"additionalProperties": False,
},
{"$ref": "#/$defs/Foo"},
],
"unevaluatedProperties": False,
},
"Foo": {
"type": "object",
"properties": {"bar": {"type": "string"}},
"required": ["bar"],
"additionalProperties": False,
},
},
}
The default behavior can also be customized using apischema.settings.default_type_name
:
Reference factory¶
In JSON schema, $ref
looks like #/$defs/Foo
, not just Foo
. In fact, schema generation use the ref given by type_name
/default_type_name
and pass it to a ref_factory
function (a parameter of schema generation functions) which will convert it to its final form. JSON schema version comes with its default ref_factory
, for draft 2020-12, it prefixes the ref with #/$defs/
, while it prefixes with #/components/schema
in case of OpenAPI.
from dataclasses import dataclass
from apischema.json_schema import deserialization_schema
@dataclass
class Foo:
bar: int
def ref_factory(ref: str) -> str:
return f"http://some-domain.org/path/to/{ref}.json#"
assert deserialization_schema(Foo, all_refs=True, ref_factory=ref_factory) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"$ref": "http://some-domain.org/path/to/Foo.json#",
}
Note
When ref_factory
is passed in arguments, definitions are not added to the generated schema. That's because ref_factory
would surely change definitions location, so there would be no interest to add them with a wrong location. These definitions can of course be generated separately with definitions_schema
.
Definitions schema¶
Definitions schemas can also be extracted using apischema.json_schema.definitions_schema
. It takes two lists deserialization
/serialization
of types (or tuple of type + dynamic conversion) and returns a dictionary of all referenced schemas.
Note
This is especially useful when it comes to OpenAPI schema to generate the components section.
from dataclasses import dataclass
from apischema.json_schema import definitions_schema
@dataclass
class Bar:
baz: int = 0
@dataclass
class Foo:
bar: Bar
assert definitions_schema(deserialization=[list[Foo]], all_refs=True) == {
"Foo": {
"type": "object",
"properties": {"bar": {"$ref": "#/$defs/Bar"}},
"required": ["bar"],
"additionalProperties": False,
},
"Bar": {
"type": "object",
"properties": {"baz": {"type": "integer", "default": 0}},
"additionalProperties": False,
},
}
JSON schema / OpenAPI version¶
JSON schema has several versions — OpenAPI is treated as a JSON schema version. If apischema natively use the last one: draft 2020-12, it is possible to specify a schema version which will be used for the generation.
from dataclasses import dataclass
from typing import Literal
from apischema.json_schema import (
JsonSchemaVersion,
definitions_schema,
deserialization_schema,
)
@dataclass
class Bar:
baz: int | None
constant: Literal[0] = 0
@dataclass
class Foo:
bar: Bar
assert deserialization_schema(Foo, all_refs=True) == {
"$schema": "http://json-schema.org/draft/2020-12/schema#",
"$ref": "#/$defs/Foo",
"$defs": {
"Foo": {
"type": "object",
"properties": {"bar": {"$ref": "#/$defs/Bar"}},
"required": ["bar"],
"additionalProperties": False,
},
"Bar": {
"type": "object",
"properties": {
"baz": {"type": ["integer", "null"]},
"constant": {"type": "integer", "const": 0, "default": 0},
},
"required": ["baz"],
"additionalProperties": False,
},
},
}
assert deserialization_schema(
Foo, all_refs=True, version=JsonSchemaVersion.DRAFT_7
) == {
"$schema": "http://json-schema.org/draft-07/schema#",
# $ref is isolated in allOf + draft 7 prefix
"allOf": [{"$ref": "#/definitions/Foo"}],
"definitions": { # not "$defs"
"Foo": {
"type": "object",
"properties": {"bar": {"$ref": "#/definitions/Bar"}},
"required": ["bar"],
"additionalProperties": False,
},
"Bar": {
"type": "object",
"properties": {
"baz": {"type": ["integer", "null"]},
"constant": {"type": "integer", "const": 0, "default": 0},
},
"required": ["baz"],
"additionalProperties": False,
},
},
}
assert deserialization_schema(Foo, version=JsonSchemaVersion.OPEN_API_3_1) == {
# No definitions for OpenAPI, use definitions_schema for it
"$ref": "#/components/schemas/Foo" # OpenAPI prefix
}
assert definitions_schema(
deserialization=[Foo], version=JsonSchemaVersion.OPEN_API_3_1
) == {
"Foo": {
"type": "object",
"properties": {"bar": {"$ref": "#/components/schemas/Bar"}},
"required": ["bar"],
"additionalProperties": False,
},
"Bar": {
"type": "object",
"properties": {
"baz": {"type": ["integer", "null"]},
"constant": {"type": "integer", "const": 0, "default": 0},
},
"required": ["baz"],
"additionalProperties": False,
},
}
assert definitions_schema(
deserialization=[Foo], version=JsonSchemaVersion.OPEN_API_3_0
) == {
"Foo": {
"type": "object",
"properties": {"bar": {"$ref": "#/components/schemas/Bar"}},
"required": ["bar"],
"additionalProperties": False,
},
"Bar": {
"type": "object",
# "nullable" instead of "type": "null"
"properties": {
"baz": {"type": "integer", "nullable": True},
"constant": {"type": "integer", "enum": [0], "default": 0},
},
"required": ["baz"],
"additionalProperties": False,
},
}
readOnly
/ writeOnly
¶
Dataclasses InitVar
and field(init=False)
fields will be flagged respectively with "writeOnly": true
and "readOnly": true
in the generated schema.
In definitions schema, if a type appears both in deserialization and serialization, properties are merged and the resulting schema contains then readOnly
and writeOnly
properties. By the way, the required
is not merged because it can't (it would mess up validation if some not-init field was required), so deserialization required
is kept because it's more important as it can be used in validation (OpenAPI 3.0 semantic which allows the merge has been dropped in 3.1, so it has not been judged useful to be supported)