Skip to content

Validation

Validation is an important part of deserialization. By default, apischema validates types of data according to typing annotations, and schema constraints. But custom validators can also be add for a more precise validation.

Deserialization and validation error

ValidationError is raised when validation fails. This exception contains all the information about the ill-formed part of the data. It can be formatted/serialized using its errors property.

from dataclasses import dataclass, field
from typing import NewType

from pytest import raises

from apischema import ValidationError, deserialize, schema

Tag = NewType("Tag", str)
schema(min_len=3, pattern=r"^\w*$", examples=["available", "EMEA"])(Tag)


@dataclass
class Resource:
    id: int
    tags: list[Tag] = field(
        default_factory=list,
        metadata=schema(
            description="regroup multiple resources", max_items=3, unique=True
        ),
    )


with raises(ValidationError) as err:  # pytest check exception is raised
    deserialize(
        Resource, {"id": 42, "tags": ["tag", "duplicate", "duplicate", "bad&", "_"]}
    )
assert err.value.errors == [
    {"loc": ["tags"], "msg": "item count greater than 3 (maxItems)"},
    {"loc": ["tags"], "msg": "duplicate items (uniqueItems)"},
    {"loc": ["tags", 3], "msg": "not matching '^\\w*$' (pattern)"},
    {"loc": ["tags", 4], "msg": "string length lower than 3 (minLength)"},
]

As shown in the example, apischema will not stop at the first error met but tries to validate all parts of the data.

Note

ValidationError can also be serialized using apischema.serialize (this will use errors internally).

Dataclass validators

Dataclass validation can be completed by custom validators. These are simple decorated methods which will be executed during validation, after all fields having been deserialized.

from dataclasses import dataclass

from pytest import raises

from apischema import ValidationError, deserialize, validator


@dataclass
class PasswordForm:
    password: str
    confirmation: str

    @validator
    def password_match(self):
        # DO NOT use assert
        if self.password != self.confirmation:
            raise ValueError("password doesn't match its confirmation")


with raises(ValidationError) as err:
    deserialize(PasswordForm, {"password": "p455w0rd", "confirmation": "..."})
assert err.value.errors == [
    {"loc": [], "msg": "password doesn't match its confirmation"}
]

Warning

DO NOT use assert statement to validate external data, ever. In fact, this statement is made to be disabled when executed in optimized mode (see documentation), so validation would be disabled too. This warning doesn't concern only apischema; assert is only for internal assertion in debug/development environment. That's why apischema will not catch AssertionError as a validation error but reraises it, making deserialize fail.

Note

Validators are always executed in order of declaration.

Automatic dependency management

It makes no sense to execute a validator using a field that is ill-formed. Hopefully, apischema is able to compute validator dependencies — the fields used in validator; validator is executed only if the all its dependencies are ok.

from dataclasses import dataclass

from pytest import raises

from apischema import ValidationError, deserialize, validator


@dataclass
class PasswordForm:
    password: str
    confirmation: str

    @validator
    def password_match(self):
        if self.password != self.confirmation:
            raise ValueError("password doesn't match its confirmation")


with raises(ValidationError) as err:
    deserialize(PasswordForm, {"password": "p455w0rd"})
assert err.value.errors == [
    # validator is not executed because confirmation is missing
    {"loc": ["confirmation"], "msg": "missing property"}
]

Note

Despite the fact that validator uses the self argument, it can be called during validation even if all the fields of the class are not ok and the class not really instantiated. In fact, instance is kind of mocked for validation with only the needed field.

Raise more than one error with yield

Validation of a list field can require raising several exceptions, one for each bad element. With raise, this is not possible, because you can raise only once.

However, apischema provides a way of raising as many errors as needed by using yield. Moreover, with this syntax, it is possible to add a "path" (see below) to the error to precise its location in the validated data. This path will be added to the loc key of the error.

from dataclasses import dataclass
from ipaddress import IPv4Address, IPv4Network

from pytest import raises

from apischema import ValidationError, deserialize, validator
from apischema.objects import get_alias


@dataclass
class SubnetIps:
    subnet: IPv4Network
    ips: list[IPv4Address]

    @validator
    def check_ips_in_subnet(self):
        for index, ip in enumerate(self.ips):
            if ip not in self.subnet:
                # yield <error path>, <error message>
                yield (get_alias(self).ips, index), "ip not in subnet"


with raises(ValidationError) as err:
    deserialize(
        SubnetIps,
        {"subnet": "126.42.18.0/24", "ips": ["126.42.18.1", "126.42.19.0", "0.0.0.0"]},
    )
assert err.value.errors == [
    {"loc": ["ips", 1], "msg": "ip not in subnet"},
    {"loc": ["ips", 2], "msg": "ip not in subnet"},
]

Error path

In the example, the validator yields a tuple of an "error path" and the error message. Error path can be:

  • a field alias (obtained with apischema.objects.get_alias);
  • an integer, for list indices;
  • a raw string, for dict key (or field);
  • an apischema.objects.AliasedStr, a string subclass which will be aliased by deserialization aliaser;
  • an iterable, e.g. a tuple, of this 4 components.

yield can also be used with only an error message.

Note

For dataclass field error path, it's advised to use apischema.objects.get_alias instead of raw string, because it will take into account potential aliasing and it will be better handled by IDE (refactoring, cross-referencing, etc.)

Discard

If one of your validators fails because a field is corrupted, maybe you don't want subsequent validators to be executed. validator decorator provides a discard parameter to discard fields of the remaining validation. All the remaining validators having discarded fields in dependencies will not be executed.

from dataclasses import dataclass, field

from pytest import raises

from apischema import ValidationError, deserialize, validator
from apischema.objects import get_alias, get_field


@dataclass
class BoundedValues:
    # field must be assign to be used, even with empty `field()`
    bounds: tuple[int, int] = field()
    values: list[int]

    # validator("bounds") would also work, but it's not handled by IDE refactoring, etc.
    @validator(discard=bounds)
    def bounds_are_sorted(self):
        min_bound, max_bound = self.bounds
        if min_bound > max_bound:
            yield get_alias(self).bounds, "bounds are not sorted"

    @validator
    def values_dont_exceed_bounds(self):
        min_bound, max_bound = self.bounds
        for index, value in enumerate(self.values):
            if not min_bound <= value <= max_bound:
                yield (get_alias(self).values, index), "value exceeds bounds"


# Outside class, fields can still be accessed in a "static" way, to avoid use raw string
@validator(discard=get_field(BoundedValues).bounds)
def bounds_are_sorted_equivalent(bounded: BoundedValues):
    min_bound, max_bound = bounded.bounds
    if min_bound > max_bound:
        yield get_alias(bounded).bounds, "bounds are not sorted"


with raises(ValidationError) as err:
    deserialize(BoundedValues, {"bounds": [10, 0], "values": [-1, 2, 4]})
assert err.value.errors == [
    {"loc": ["bounds"], "msg": "bounds are not sorted"}
    # Without discard, there would have been an other error
    # {"loc": ["values", 1], "msg": "value exceeds bounds"}
]

You can notice in this example that apischema tries to avoid using raw strings to identify fields. In every function of the library using fields identifier (apischema.validator, apischema.dependent_required, apischema.fields.set_fields, etc.), you have always three ways to pass them: - using field object, preferred in dataclass definition; - using apischema.objects.get_field, to be used outside of class definition; it works with NamedTuple too — the object returned is the apischema internal field representation, common to dataclass, NamedTuple and TypedDict; - using raw strings, thus not handled by static tools like refactoring, but it works;

Field validators

At field level

Fields are validated according to their types and schema. But it's also possible to add validators to fields.

from dataclasses import dataclass, field

from pytest import raises

from apischema import ValidationError, deserialize
from apischema.metadata import validators


def check_no_duplicate_digits(n: int):
    if len(str(n)) != len(set(str(n))):
        raise ValueError("number has duplicate digits")


@dataclass
class Foo:
    bar: str = field(metadata=validators(check_no_duplicate_digits))


with raises(ValidationError) as err:
    deserialize(Foo, {"bar": "11"})
assert err.value.errors == [{"loc": ["bar"], "msg": "number has duplicate digits"}]

When validation fails for a field, it is discarded and cannot be used in class validators, as is the case when field schema validation fails.

Note

field_validator allows reusing the the same validator for several fields. However in this case, using a custom type (for example a NewType) with validators (see next section) could often be a better solution.

Using other fields

A common pattern can be to validate a field using other fields values. This is achieved with dataclass validators seen above. However, there is a shortcut for this use case:

from dataclasses import dataclass, field
from enum import Enum

from pytest import raises

from apischema import ValidationError, deserialize, validator
from apischema.objects import get_alias, get_field


class Parity(Enum):
    EVEN = "even"
    ODD = "odd"


@dataclass
class NumberWithParity:
    parity: Parity
    number: int = field()

    @validator(number)
    def check_parity(self):
        if (self.parity == Parity.EVEN) != (self.number % 2 == 0):
            yield "number doesn't respect parity"

    # A field validator is equivalent to a discard argument and all error paths prefixed
    # with the field alias
    @validator(discard=number)
    def check_parity_equivalent(self):
        if (self.parity == Parity.EVEN) != (self.number % 2 == 0):
            yield get_alias(self).number, "number doesn't respect parity"


@validator(get_field(NumberWithParity).number)
def check_parity_other_equivalent(number2: NumberWithParity):
    if (number2.parity == Parity.EVEN) != (number2.number % 2 == 0):
        yield "number doesn't respect parity"


with raises(ValidationError) as err:
    deserialize(NumberWithParity, {"parity": "even", "number": 1})
assert err.value.errors == [{"loc": ["number"], "msg": "number doesn't respect parity"}]

Validators inheritance

Validators are inherited just like other class fields.

from dataclasses import dataclass

from pytest import raises

from apischema import ValidationError, deserialize, validator


@dataclass
class PasswordForm:
    password: str
    confirmation: str

    @validator
    def password_match(self):
        if self.password != self.confirmation:
            raise ValueError("password doesn't match its confirmation")


@dataclass
class CompleteForm(PasswordForm):
    username: str


with raises(ValidationError) as err:
    deserialize(
        CompleteForm,
        {"username": "wyfo", "password": "p455w0rd", "confirmation": "..."},
    )
assert err.value.errors == [
    {"loc": [], "msg": "password doesn't match its confirmation"}
]

Validator with InitVar

Dataclasses InitVar are accessible in validators by using parameters the same way __post_init__ does. Only the needed fields have to be put in parameters, they are then added to validator dependencies.

from dataclasses import InitVar, dataclass, field

from pytest import raises

from apischema import ValidationError, deserialize, validator
from apischema.metadata import init_var


@dataclass
class Foo:
    bar: InitVar[int] = field(metadata=init_var(int))

    @validator(bar)
    def validate(self, bar: int):
        if bar < 0:
            yield "negative"


with raises(ValidationError) as err:
    deserialize(Foo, {"bar": -1})
assert err.value.errors == [{"loc": ["bar"], "msg": "negative"}]

Validators are not run on default values

If all validator dependencies are initialized with their default values, they are not run.

from dataclasses import dataclass, field

from apischema import deserialize, validator

validator_run = False


@dataclass
class Foo:
    bar: int = field(default=0)

    @validator(bar)
    def password_match(self):
        global validator_run
        validator_run = True
        if self.bar < 0:
            raise ValueError("negative")


deserialize(Foo, {})
assert not validator_run

Validators for every type

Validators can also be declared as regular functions, in which case annotation of the first param is used to associate it to the validated type (you can also use the owner parameter); this allows adding a validator to every type.

Last but not least, validators can be embedded directly into Annotated arguments using validators metadata.

from typing import Annotated, NewType

from pytest import raises

from apischema import ValidationError, deserialize, validator
from apischema.metadata import validators

Palindrome = NewType("Palindrome", str)


@validator  # could also use @validator(owner=Palindrome)
def check_palindrome(s: Palindrome):
    for i in range(len(s) // 2):
        if s[i] != s[-1 - i]:
            raise ValueError("Not a palindrome")


assert deserialize(Palindrome, "tacocat") == "tacocat"
with raises(ValidationError) as err:
    deserialize(Palindrome, "palindrome")
assert err.value.errors == [{"loc": [], "msg": "Not a palindrome"}]

# Using Annotated
with raises(ValidationError) as err:
    deserialize(Annotated[str, validators(check_palindrome)], "palindrom")
assert err.value.errors == [{"loc": [], "msg": "Not a palindrome"}]

FAQ

How are validator dependencies computed?

ast.NodeVisitor and the Python black magic begins...

Why only validate at deserialization and not at instantiation?

apischema uses type annotations, so every objects used can already be statically type-checked (with Mypy/Pycharm/etc.) at instantiation but also at modification.

Why use validators for dataclasses instead of doing validation in __post_init__?

Actually, validation can completely be done in __post_init__, there is no problem with that. However, validators offers one thing that cannot be achieved with __post_init__: they are run before __init__, so they can validate incomplete data. Moreover, they are only run during deserialization, so they don't add overhead to normal class instantiation.