Skip to content

JSON schema

JSON schema generation

JSON schema can be generated from data model. However, because of all possible customizations, the schema can differ between deserilialization and serialization. In common cases, deserialization_schema and serialization_schema will give the same result.

from dataclasses import dataclass

from apischema.json_schema import deserialization_schema, serialization_schema


@dataclass
class Foo:
    bar: str


assert deserialization_schema(Foo) == serialization_schema(Foo)
assert deserialization_schema(Foo) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "additionalProperties": False,
    "properties": {"bar": {"type": "string"}},
    "required": ["bar"],
    "type": "object",
}

Field alias

Sometimes dataclass field names can clash with a language keyword, sometimes the property name is not convenient. Hopefully, field can define an alias which will be used in schema and deserialization/serialization.

from dataclasses import dataclass, field

from apischema import alias, deserialize, serialize
from apischema.json_schema import deserialization_schema


@dataclass
class Foo:
    class_: str = field(metadata=alias("class"))


assert deserialization_schema(Foo) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "additionalProperties": False,
    "properties": {"class": {"type": "string"}},
    "required": ["class"],
    "type": "object",
}
assert deserialize(Foo, {"class": "bar"}) == Foo("bar")
assert serialize(Foo, Foo("bar")) == {"class": "bar"}

Alias all fields

Field aliasing can also be done at class level by specifying an aliasing function. This aliaser is applied to field alias if defined or field name, or not applied if override=False is specified.

from dataclasses import dataclass, field
from typing import Any

from apischema import alias
from apischema.json_schema import deserialization_schema


@alias(lambda s: f"foo_{s}")
@dataclass
class Foo:
    field1: Any
    field2: Any = field(metadata=alias(override=False))
    field3: Any = field(metadata=alias("field03"))
    field4: Any = field(metadata=alias("field04", override=False))


assert deserialization_schema(Foo) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "additionalProperties": False,
    "properties": {"foo_field1": {}, "field2": {}, "foo_field03": {}, "field04": {}},
    "required": ["foo_field1", "field2", "foo_field03", "field04"],
    "type": "object",
}

Class-level aliasing can be used to define a camelCase API.

Dynamic aliasing and default aliaser

apischema operations deserialize/serialize/deserialization_schema/serialization_schema provide an aliaser parameter which will be applied on every fields being processed in this operation.

Similar to strictness configuration, this parameter has a default value controlled by apischema.settings.aliaser.

It can be used for example to make all an application use camelCase. Actually, there is a shortcut for that:

Otherwise, it's used the same way than settings.coercer.

from apischema import settings

settings.camel_case = True

Note

Dynamic aliaser ignores override=False

Schema annotations

Type annotations are not enough to express a complete schema, but apischema has a function for that; schema can be used both as type decorator or field metadata.

from dataclasses import dataclass, field
from typing import NewType

from apischema import schema
from apischema.json_schema import deserialization_schema

Tag = NewType("Tag", str)
schema(min_len=3, pattern=r"^\w*$", examples=["available", "EMEA"])(Tag)


@dataclass
class Resource:
    id: int
    tags: list[Tag] = field(
        default_factory=list,
        metadata=schema(
            description="regroup multiple resources", max_items=3, unique=True
        ),
    )


assert deserialization_schema(Resource) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "additionalProperties": False,
    "properties": {
        "id": {"type": "integer"},
        "tags": {
            "description": "regroup multiple resources",
            "items": {
                "examples": ["available", "EMEA"],
                "minLength": 3,
                "pattern": "^\\w*$",
                "type": "string",
            },
            "maxItems": 3,
            "type": "array",
            "uniqueItems": True,
            "default": [],
        },
    },
    "required": ["id"],
    "type": "object",
}

Note

Schema are particularly useful with NewType. For example, if you use prefixed ids, you can use a NewType with a pattern schema to validate them, and benefit of more precise type checking.

The following keys are available (they are sometimes shorten compared to JSON schema original for code concision and snake_case):

Key JSON schema keyword type restriction
title / /
description / /
default / /
examples / /
min minimum int
max maximum int
exc_min exclusiveMinimum int
exc_max exclusiveMaximum int
mult_of multipleOf int
format / str
media_type contentMediaType str
encoding contentEncoding str
min_len minLength str
max_len maxLength str
pattern / str
min_items minItems list
max_items maxItems list
unique / list
min_props minProperties dict
max_props maxProperties dict

Note

In case of field schema, field default value will be serialized (if possible) to add default keyword to the schema.

Constraints validation

JSON schema constrains the data deserialized; these constraints are naturally used for validation.

from dataclasses import dataclass, field
from typing import NewType

from pytest import raises

from apischema import ValidationError, deserialize, schema

Tag = NewType("Tag", str)
schema(min_len=3, pattern=r"^\w*$", examples=["available", "EMEA"])(Tag)


@dataclass
class Resource:
    id: int
    tags: list[Tag] = field(
        default_factory=list,
        metadata=schema(
            description="regroup multiple resources", max_items=3, unique=True
        ),
    )


with raises(ValidationError) as err:  # pytest check exception is raised
    deserialize(
        Resource, {"id": 42, "tags": ["tag", "duplicate", "duplicate", "bad&", "_"]}
    )
assert err.value.errors == [
    {"loc": ["tags"], "msg": "item count greater than 3 (maxItems)"},
    {"loc": ["tags"], "msg": "duplicate items (uniqueItems)"},
    {"loc": ["tags", 3], "msg": "not matching '^\\w*$' (pattern)"},
    {"loc": ["tags", 4], "msg": "string length lower than 3 (minLength)"},
]

Extra schema

schema has two other arguments: extra and override, which give a finer control of the JSON schema generated: extra and override. It can be used for example to build "strict" unions (using oneOf instead of anyOf)

from dataclasses import dataclass
from typing import Annotated, Any

from apischema import schema
from apischema.json_schema import deserialization_schema


# schema extra can be callable to modify the schema in place
def to_one_of(schema: dict[str, Any]):
    if "anyOf" in schema:
        schema["oneOf"] = schema.pop("anyOf")


OneOf = schema(extra=to_one_of)


# or extra can be a dictionary which will update the schema
@schema(
    extra={"$ref": "http://some-domain.org/path/to/schema.json#/$defs/Foo"},
    override=True,  # override apischema generated schema, using only extra
)
@dataclass
class Foo:
    bar: int


# Use Annotated with OneOf to make a "strict" Union
assert deserialization_schema(Annotated[Foo | int, OneOf]) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "oneOf": [  # oneOf instead of anyOf
        {"$ref": "http://some-domain.org/path/to/schema.json#/$defs/Foo"},
        {"type": "integer"},
    ],
}

Base schema

apischema.settings.base_schema can be used to define "base schema" for the different kind of objects: types, object fields or (serialized) methods.

from dataclasses import dataclass, field
from typing import Any, Callable, get_origin

import docstring_parser

from apischema import schema, serialized, settings
from apischema.json_schema import serialization_schema
from apischema.schemas import Schema
from apischema.type_names import get_type_name


@dataclass
class Foo:
    """Foo class

    :var bar: bar attribute"""

    bar: str = field(metadata=schema(max_len=10))

    @serialized
    @property
    def baz(self) -> int:
        """baz method"""
        ...


def type_base_schema(tp: Any) -> Schema | None:
    if not hasattr(tp, "__doc__"):
        return None
    return schema(
        title=get_type_name(tp).json_schema,
        description=docstring_parser.parse(tp.__doc__).short_description,
    )


def field_base_schema(tp: Any, name: str, alias: str) -> Schema | None:
    title = alias.replace("_", " ").capitalize()
    tp = get_origin(tp) or tp  # tp can be generic
    for meta in docstring_parser.parse(tp.__doc__).meta:
        if meta.args == ["var", name]:
            return schema(title=title, description=meta.description)
    return schema(title=title)


def method_base_schema(tp: Any, method: Callable, alias: str) -> Schema | None:
    return schema(
        title=alias.replace("_", " ").capitalize(),
        description=docstring_parser.parse(method.__doc__).short_description,
    )


settings.base_schema.type = type_base_schema
settings.base_schema.field = field_base_schema
settings.base_schema.method = method_base_schema

assert serialization_schema(Foo) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "additionalProperties": False,
    "title": "Foo",
    "description": "Foo class",
    "properties": {
        "bar": {
            "description": "bar attribute",
            "title": "Bar",
            "type": "string",
            "maxLength": 10,
        },
        "baz": {"description": "baz method", "title": "Baz", "type": "integer"},
    },
    "required": ["bar", "baz"],
    "type": "object",
}

Base schema will be merged with schema defined at type/field/method level.

Required field with default value

By default, a dataclass/namedtuple field will be tagged required if it doesn't have a default value.

However, you may want to have a default value for a field in order to be more convenient in your code, but still make the field required. One could think about some schema model where version is fixed but is required, for example JSON-RPC with "jsonrpc": "2.0". That's done with field metadata required.

from dataclasses import dataclass, field

from pytest import raises

from apischema import ValidationError, deserialize
from apischema.metadata import required


@dataclass
class Foo:
    bar: int | None = field(default=None, metadata=required)


with raises(ValidationError) as err:
    deserialize(Foo, {})
assert err.value.errors == [{"loc": ["bar"], "msg": "missing property"}]

Additional properties / pattern properties

With Mapping

Schema of a Mapping/dict type is naturally translated to "additionalProperties": <schema of the value type>.

However when the schema of the key has a pattern, it will give a "patternProperties": {<key pattern>: <schema of the value type>}

With dataclass

additionalProperties/patternProperties can be added to dataclasses by using fields annotated with properties metadata. Properties not mapped on regular fields will be deserialized into this fields; they must have a Mapping type, or be deserializable from a Mapping, because they are instantiated with a mapping.

from collections.abc import Mapping
from dataclasses import dataclass, field
from typing import Annotated

from apischema import deserialize, properties, schema
from apischema.json_schema import deserialization_schema


@dataclass
class Config:
    active: bool = True
    server_options: Mapping[str, bool] = field(
        default_factory=dict, metadata=properties(pattern=r"^server_")
    )
    client_options: Mapping[
        Annotated[str, schema(pattern=r"^client_")], bool  # noqa: F722
    ] = field(default_factory=dict, metadata=properties(...))
    options: Mapping[str, bool] = field(default_factory=dict, metadata=properties)


assert deserialize(
    Config,
    {"use_lightsaber": True, "server_auto_restart": False, "client_timeout": False},
) == Config(
    True,
    {"server_auto_restart": False},
    {"client_timeout": False},
    {"use_lightsaber": True},
)
assert deserialization_schema(Config) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "type": "object",
    "properties": {"active": {"type": "boolean", "default": True}},
    "additionalProperties": {"type": "boolean"},
    "patternProperties": {
        "^server_": {"type": "boolean"},
        "^client_": {"type": "boolean"},
    },
}

Note

Of course, a dataclass can only have a single properties field without pattern, because it makes no sens to have several additionalProperties.

Property dependencies

apischema supports property dependencies for dataclass through a class member. Dependencies are also used in validation.

from dataclasses import dataclass, field

from pytest import raises

from apischema import ValidationError, dependent_required, deserialize
from apischema.json_schema import deserialization_schema
from apischema.skip import NotNull


@dataclass
class Billing:
    name: str
    # Fields used in dependencies MUST be declared with `field`
    credit_card: NotNull[int] = field(default=None)
    billing_address: NotNull[str] = field(default=None)

    dependencies = dependent_required({credit_card: [billing_address]})


# it can also be done outside the class with
# dependent_required({"credit_card": ["billing_address"]}, owner=Billing)


assert deserialization_schema(Billing) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "additionalProperties": False,
    "dependentRequired": {"credit_card": ["billing_address"]},
    "properties": {
        "name": {"type": "string"},
        "credit_card": {"type": "integer"},
        "billing_address": {"type": "string"},
    },
    "required": ["name"],
    "type": "object",
}

with raises(ValidationError) as err:
    deserialize(Billing, {"name": "Anonymous", "credit_card": 1234_5678_9012_3456})
assert err.value.errors == [
    {
        "loc": ["billing_address"],
        "msg": "missing property (required by ['credit_card'])",
    }
]

Because bidirectional dependencies are a common idiom, apischema provides a shortcut notation; it's indeed possible to write dependent_required([credit_card, billing_adress]).

JSON schema reference

For complex schema with type reuse, it's convenient to extract definitions of schema components in order to reuse them — it's even mandatory for recursive types; JSON schema use JSON pointers "$ref" to refer to the definitions. apischema handles this feature natively.

from dataclasses import dataclass
from typing import Optional

from apischema.json_schema import deserialization_schema


@dataclass
class Node:
    value: int
    child: Optional["Node"] = None


assert deserialization_schema(Node) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "$ref": "#/$defs/Node",
    "$defs": {
        "Node": {
            "type": "object",
            "properties": {
                "value": {"type": "integer"},
                "child": {
                    "anyOf": [{"$ref": "#/$defs/Node"}, {"type": "null"}],
                    "default": None,
                },
            },
            "required": ["value"],
            "additionalProperties": False,
        }
    },
}

Use reference only for reused types

apischema can control the reference use through the boolean all_ref parameter of deserialization_schema/serialization_schema:

  • all_refs=True -> all types with a reference will be put in the definitions and referenced with $ref;
  • all_refs=False -> only types which are reused in the schema are put in definitions

all_refs default value depends on the JSON schema version: it's False for JSON schema drafts but True for OpenAPI.

from dataclasses import dataclass

from apischema.json_schema import deserialization_schema


@dataclass
class Bar:
    baz: str


@dataclass
class Foo:
    bar1: Bar
    bar2: Bar


assert deserialization_schema(Foo, all_refs=False) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "$defs": {
        "Bar": {
            "additionalProperties": False,
            "properties": {"baz": {"type": "string"}},
            "required": ["baz"],
            "type": "object",
        }
    },
    "additionalProperties": False,
    "properties": {"bar1": {"$ref": "#/$defs/Bar"}, "bar2": {"$ref": "#/$defs/Bar"}},
    "required": ["bar1", "bar2"],
    "type": "object",
}
assert deserialization_schema(Foo, all_refs=True) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "$defs": {
        "Bar": {
            "additionalProperties": False,
            "properties": {"baz": {"type": "string"}},
            "required": ["baz"],
            "type": "object",
        },
        "Foo": {
            "additionalProperties": False,
            "properties": {
                "bar1": {"$ref": "#/$defs/Bar"},
                "bar2": {"$ref": "#/$defs/Bar"},
            },
            "required": ["bar1", "bar2"],
            "type": "object",
        },
    },
    "$ref": "#/$defs/Foo",
}

Set reference name

In the previous examples, types were referenced using their name. This is indeed the default behavior for every classes/NewTypes (except primitive int/str/bool/float).

It's possible to override the default reference name using apischema.type_name; passing None instead of a string will remove the reference, making the type unable to be referenced as a separate definition in the schema.

from dataclasses import dataclass
from typing import Annotated

from apischema import type_name
from apischema.json_schema import deserialization_schema


# Type name can be added as a decorator
@type_name("Resource")
@dataclass
class BaseResource:
    id: int
    # or using typing.Annotated
    tags: Annotated[set[str], type_name("ResourceTags")]


assert deserialization_schema(BaseResource, all_refs=True) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "$defs": {
        "Resource": {
            "type": "object",
            "properties": {
                "id": {"type": "integer"},
                "tags": {"$ref": "#/$defs/ResourceTags"},
            },
            "required": ["id", "tags"],
            "additionalProperties": False,
        },
        "ResourceTags": {
            "type": "array",
            "items": {"type": "string"},
            "uniqueItems": True,
        },
    },
    "$ref": "#/$defs/Resource",
}

Note

Builtin collections are interchangeable when a type_name is registered. For example, if a name is registered for list[Foo], this name will also be used for Sequence[Foo] or Collection[Foo].

Generic aliases can have a type name, but they need to be specialized; Foo[T, int] cannot have a type name but Foo[str, int] can. However, generic classes can get a dynamic type name depending on their generic argument, passing a name factory to type_name:

from dataclasses import dataclass, field
from typing import Generic, TypeVar

from apischema import type_name
from apischema.json_schema import deserialization_schema
from apischema.metadata import flatten

T = TypeVar("T")

# Type name factory takes the type and its arguments as (positional) parameters
@type_name(lambda tp, arg: f"{arg.__name__}Resource")
@dataclass
class Resource(Generic[T]):
    id: int
    content: T = field(metadata=flatten)
    ...


@dataclass
class Foo:
    bar: str


assert deserialization_schema(Resource[Foo], all_refs=True) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "$ref": "#/$defs/FooResource",
    "$defs": {
        "FooResource": {
            "type": "object",
            "allOf": [
                {
                    "type": "object",
                    "properties": {"id": {"type": "integer"}},
                    "required": ["id"],
                    "additionalProperties": False,
                },
                {"$ref": "#/$defs/Foo"},
            ],
            "unevaluatedProperties": False,
        },
        "Foo": {
            "type": "object",
            "properties": {"bar": {"type": "string"}},
            "required": ["bar"],
            "additionalProperties": False,
        },
    },
}

The default behavior can also be customized using apischema.settings.default_type_name:

Reference factory

In JSON schema, $ref looks like #/$defs/Foo, not just Foo. In fact, schema generation use the ref given by type_name/default_type_name and pass it to a ref_factory function (a parameter of schema generation functions) which will convert it to its final form. JSON schema version comes with its default ref_factory, for draft 2020-12, it prefixes the ref with #/$defs/, while it prefixes with #/components/schema in case of OpenAPI.

from dataclasses import dataclass

from apischema.json_schema import deserialization_schema


@dataclass
class Foo:
    bar: int


def ref_factory(ref: str) -> str:
    return f"http://some-domain.org/path/to/{ref}.json#"


assert deserialization_schema(Foo, all_refs=True, ref_factory=ref_factory) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "$ref": "http://some-domain.org/path/to/Foo.json#",
}

Note

When ref_factory is passed in arguments, definitions are not added to the generated schema. That's because ref_factory would surely change definitions location, so there would be no interest to add them with a wrong location. These definitions can of course be generated separately with definitions_schema.

Definitions schema

Definitions schemas can also be extracted using apischema.json_schema.definitions_schema. It takes two lists deserialization/serialization of types (or tuple of type + dynamic conversion) and returns a dictionary of all referenced schemas.

Note

This is especially useful when it comes to OpenAPI schema to generate the components section.

from dataclasses import dataclass

from apischema.json_schema import definitions_schema


@dataclass
class Bar:
    baz: int = 0


@dataclass
class Foo:
    bar: Bar


assert definitions_schema(deserialization=[list[Foo]], all_refs=True) == {
    "Foo": {
        "type": "object",
        "properties": {"bar": {"$ref": "#/$defs/Bar"}},
        "required": ["bar"],
        "additionalProperties": False,
    },
    "Bar": {
        "type": "object",
        "properties": {"baz": {"type": "integer", "default": 0}},
        "additionalProperties": False,
    },
}

JSON schema / OpenAPI version

JSON schema has several versions — OpenAPI is treated as a JSON schema version. If apischema natively use the last one: draft 2020-12, it is possible to specify a schema version which will be used for the generation.

from dataclasses import dataclass
from typing import Literal

from apischema.json_schema import (
    JsonSchemaVersion,
    definitions_schema,
    deserialization_schema,
)


@dataclass
class Bar:
    baz: int | None
    constant: Literal[0] = 0


@dataclass
class Foo:
    bar: Bar


assert deserialization_schema(Foo, all_refs=True) == {
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "$ref": "#/$defs/Foo",
    "$defs": {
        "Foo": {
            "type": "object",
            "properties": {"bar": {"$ref": "#/$defs/Bar"}},
            "required": ["bar"],
            "additionalProperties": False,
        },
        "Bar": {
            "type": "object",
            "properties": {
                "baz": {"type": ["integer", "null"]},
                "constant": {"type": "integer", "const": 0, "default": 0},
            },
            "required": ["baz"],
            "additionalProperties": False,
        },
    },
}
assert deserialization_schema(
    Foo, all_refs=True, version=JsonSchemaVersion.DRAFT_7
) == {
    "$schema": "http://json-schema.org/draft-07/schema#",
    # $ref is isolated in allOf + draft 7 prefix
    "allOf": [{"$ref": "#/definitions/Foo"}],
    "definitions": {  # not "$defs"
        "Foo": {
            "type": "object",
            "properties": {"bar": {"$ref": "#/definitions/Bar"}},
            "required": ["bar"],
            "additionalProperties": False,
        },
        "Bar": {
            "type": "object",
            "properties": {
                "baz": {"type": ["integer", "null"]},
                "constant": {"type": "integer", "const": 0, "default": 0},
            },
            "required": ["baz"],
            "additionalProperties": False,
        },
    },
}
assert deserialization_schema(Foo, version=JsonSchemaVersion.OPEN_API_3_1) == {
    # No definitions for OpenAPI, use definitions_schema for it
    "$ref": "#/components/schemas/Foo"  # OpenAPI prefix
}
assert definitions_schema(
    deserialization=[Foo], version=JsonSchemaVersion.OPEN_API_3_1
) == {
    "Foo": {
        "type": "object",
        "properties": {"bar": {"$ref": "#/components/schemas/Bar"}},
        "required": ["bar"],
        "additionalProperties": False,
    },
    "Bar": {
        "type": "object",
        "properties": {
            "baz": {"type": ["integer", "null"]},
            "constant": {"type": "integer", "const": 0, "default": 0},
        },
        "required": ["baz"],
        "additionalProperties": False,
    },
}
assert definitions_schema(
    deserialization=[Foo], version=JsonSchemaVersion.OPEN_API_3_0
) == {
    "Foo": {
        "type": "object",
        "properties": {"bar": {"$ref": "#/components/schemas/Bar"}},
        "required": ["bar"],
        "additionalProperties": False,
    },
    "Bar": {
        "type": "object",
        # "nullable" instead of "type": "null"
        "properties": {
            "baz": {"type": "integer", "nullable": True},
            "constant": {"type": "integer", "enum": [0], "default": 0},
        },
        "required": ["baz"],
        "additionalProperties": False,
    },
}

readOnly / writeOnly

Dataclasses InitVar and field(init=False) fields will be flagged respectively with "writeOnly": true and "readOnly": true in the generated schema.

In definitions schema, if a type appears both in deserialization and serialization, properties are merged and the resulting schema contains then readOnly and writeOnly properties. By the way, the required is not merged because it can't (it would mess up validation if some not-init field was required), so deserialization required is kept because it's more important as it can be used in validation (OpenAPI 3.0 semantic which allows the merge has been dropped in 3.1, so it has not been judged useful to be supported)