Validation¶
Validation is an important part of deserialization. By default, apischema validates types of data according to typing annotations, and schema
constraints. But custom validators can also be add for a more precise validation.
Deserialization and validation error¶
ValidationError
is raised when validation fails. This exception contains all the information about the ill-formed part of the data. It can be formatted/serialized using its errors
property.
from dataclasses import dataclass, field
from typing import NewType
from pytest import raises
from apischema import ValidationError, deserialize, schema
Tag = NewType("Tag", str)
schema(min_len=3, pattern=r"^\w*$", examples=["available", "EMEA"])(Tag)
@dataclass
class Resource:
id: int
tags: list[Tag] = field(
default_factory=list,
metadata=schema(
description="regroup multiple resources", max_items=3, unique=True
),
)
with raises(ValidationError) as err: # pytest check exception is raised
deserialize(
Resource, {"id": 42, "tags": ["tag", "duplicate", "duplicate", "bad&", "_"]}
)
assert err.value.errors == [
{"loc": ["tags"], "msg": "item count greater than 3 (maxItems)"},
{"loc": ["tags"], "msg": "duplicate items (uniqueItems)"},
{"loc": ["tags", 3], "msg": "not matching '^\\w*$' (pattern)"},
{"loc": ["tags", 4], "msg": "string length lower than 3 (minLength)"},
]
As shown in the example, apischema will not stop at the first error met but tries to validate all parts of the data.
Note
ValidationError
can also be serialized using apischema.serialize
(this will use errors
internally).
Dataclass validators¶
Dataclass validation can be completed by custom validators. These are simple decorated methods which will be executed during validation, after all fields having been deserialized.
from dataclasses import dataclass
from pytest import raises
from apischema import ValidationError, deserialize, validator
@dataclass
class PasswordForm:
password: str
confirmation: str
@validator
def password_match(self):
# DO NOT use assert
if self.password != self.confirmation:
raise ValueError("password doesn't match its confirmation")
with raises(ValidationError) as err:
deserialize(PasswordForm, {"password": "p455w0rd", "confirmation": "..."})
assert err.value.errors == [
{"loc": [], "msg": "password doesn't match its confirmation"}
]
Warning
DO NOT use assert
statement to validate external data, ever. In fact, this statement is made to be disabled when executed in optimized mode (see documentation), so validation would be disabled too. This warning doesn't concern only apischema; assert
is only for internal assertion in debug/development environment. That's why apischema will not catch AssertionError
as a validation error but reraises it, making deserialize
fail.
Note
Validators are always executed in order of declaration.
Automatic dependency management¶
It makes no sense to execute a validator using a field that is ill-formed. Hopefully, apischema is able to compute validator dependencies — the fields used in validator; validator is executed only if the all its dependencies are ok.
from dataclasses import dataclass
from pytest import raises
from apischema import ValidationError, deserialize, validator
@dataclass
class PasswordForm:
password: str
confirmation: str
@validator
def password_match(self):
if self.password != self.confirmation:
raise ValueError("password doesn't match its confirmation")
with raises(ValidationError) as err:
deserialize(PasswordForm, {"password": "p455w0rd"})
assert err.value.errors == [
# validator is not executed because confirmation is missing
{"loc": ["confirmation"], "msg": "missing property"}
]
Note
Despite the fact that validator uses the self
argument, it can be called during validation even if all the fields of the class are not ok and the class not really instantiated. In fact, instance is kind of mocked for validation with only the needed field.
Raise more than one error with yield
¶
Validation of a list field can require raising several exceptions, one for each bad element. With raise
, this is not possible, because you can raise only once.
However, apischema provides a way of raising as many errors as needed by using yield
. Moreover, with this syntax, it is possible to add a "path" (see below) to the error to precise its location in the validated data. This path will be added to the loc
key of the error.
from dataclasses import dataclass
from ipaddress import IPv4Address, IPv4Network
from pytest import raises
from apischema import ValidationError, deserialize, validator
from apischema.objects import get_alias
@dataclass
class SubnetIps:
subnet: IPv4Network
ips: list[IPv4Address]
@validator
def check_ips_in_subnet(self):
for index, ip in enumerate(self.ips):
if ip not in self.subnet:
# yield <error path>, <error message>
yield (get_alias(self).ips, index), "ip not in subnet"
with raises(ValidationError) as err:
deserialize(
SubnetIps,
{"subnet": "126.42.18.0/24", "ips": ["126.42.18.1", "126.42.19.0", "0.0.0.0"]},
)
assert err.value.errors == [
{"loc": ["ips", 1], "msg": "ip not in subnet"},
{"loc": ["ips", 2], "msg": "ip not in subnet"},
]
Error path¶
In the example, the validator yields a tuple of an "error path" and the error message. Error path can be:
- a field alias (obtained with
apischema.objects.get_alias
); - an integer, for list indices;
- a raw string, for dict key (or field);
- an
apischema.objects.AliasedStr
, a string subclass which will be aliased by deserialization aliaser; - an iterable, e.g. a tuple, of this 4 components.
yield
can also be used with only an error message.
Note
For dataclass field error path, it's advised to use apischema.objects.get_alias
instead of raw string, because it will take into account potential aliasing and it will be better handled by IDE (refactoring, cross-referencing, etc.)
Discard¶
If one of your validators fails because a field is corrupted, maybe you don't want subsequent validators to be executed. validator
decorator provides a discard
parameter to discard fields of the remaining validation. All the remaining validators having discarded fields in dependencies will not be executed.
from dataclasses import dataclass, field
from pytest import raises
from apischema import ValidationError, deserialize, validator
from apischema.objects import get_alias, get_field
@dataclass
class BoundedValues:
# field must be assign to be used, even with empty `field()`
bounds: tuple[int, int] = field()
values: list[int]
# validator("bounds") would also work, but it's not handled by IDE refactoring, etc.
@validator(discard=bounds)
def bounds_are_sorted(self):
min_bound, max_bound = self.bounds
if min_bound > max_bound:
yield get_alias(self).bounds, "bounds are not sorted"
@validator
def values_dont_exceed_bounds(self):
min_bound, max_bound = self.bounds
for index, value in enumerate(self.values):
if not min_bound <= value <= max_bound:
yield (get_alias(self).values, index), "value exceeds bounds"
# Outside class, fields can still be accessed in a "static" way, to avoid use raw string
@validator(discard=get_field(BoundedValues).bounds)
def bounds_are_sorted_equivalent(bounded: BoundedValues):
min_bound, max_bound = bounded.bounds
if min_bound > max_bound:
yield get_alias(bounded).bounds, "bounds are not sorted"
with raises(ValidationError) as err:
deserialize(BoundedValues, {"bounds": [10, 0], "values": [-1, 2, 4]})
assert err.value.errors == [
{"loc": ["bounds"], "msg": "bounds are not sorted"}
# Without discard, there would have been an other error
# {"loc": ["values", 1], "msg": "value exceeds bounds"}
]
You can notice in this example that apischema tries to avoid using raw strings to identify fields. In every function of the library using fields identifier (apischema.validator
, apischema.dependent_required
, apischema.fields.set_fields
, etc.), you have always three ways to pass them:
- using field object, preferred in dataclass definition;
- using apischema.objects.get_field
, to be used outside of class definition; it works with NamedTuple
too — the object returned is the apischema internal field representation, common to dataclass
, NamedTuple
and TypedDict
;
- using raw strings, thus not handled by static tools like refactoring, but it works;
Field validators¶
At field level¶
Fields are validated according to their types and schema. But it's also possible to add validators to fields.
from dataclasses import dataclass, field
from pytest import raises
from apischema import ValidationError, deserialize
from apischema.metadata import validators
def check_no_duplicate_digits(n: int):
if len(str(n)) != len(set(str(n))):
raise ValueError("number has duplicate digits")
@dataclass
class Foo:
bar: str = field(metadata=validators(check_no_duplicate_digits))
with raises(ValidationError) as err:
deserialize(Foo, {"bar": "11"})
assert err.value.errors == [{"loc": ["bar"], "msg": "number has duplicate digits"}]
When validation fails for a field, it is discarded and cannot be used in class validators, as is the case when field schema validation fails.
Note
field_validator
allows reusing the the same validator for several fields. However in this case, using a custom type (for example a NewType
) with validators (see next section) could often be a better solution.
Using other fields¶
A common pattern can be to validate a field using other fields values. This is achieved with dataclass validators seen above. However, there is a shortcut for this use case:
from dataclasses import dataclass, field
from enum import Enum
from pytest import raises
from apischema import ValidationError, deserialize, validator
from apischema.objects import get_alias, get_field
class Parity(Enum):
EVEN = "even"
ODD = "odd"
@dataclass
class NumberWithParity:
parity: Parity
number: int = field()
@validator(number)
def check_parity(self):
if (self.parity == Parity.EVEN) != (self.number % 2 == 0):
yield "number doesn't respect parity"
# A field validator is equivalent to a discard argument and all error paths prefixed
# with the field alias
@validator(discard=number)
def check_parity_equivalent(self):
if (self.parity == Parity.EVEN) != (self.number % 2 == 0):
yield get_alias(self).number, "number doesn't respect parity"
@validator(get_field(NumberWithParity).number)
def check_parity_other_equivalent(number2: NumberWithParity):
if (number2.parity == Parity.EVEN) != (number2.number % 2 == 0):
yield "number doesn't respect parity"
with raises(ValidationError) as err:
deserialize(NumberWithParity, {"parity": "even", "number": 1})
assert err.value.errors == [{"loc": ["number"], "msg": "number doesn't respect parity"}]
Validators inheritance¶
Validators are inherited just like other class fields.
from dataclasses import dataclass
from pytest import raises
from apischema import ValidationError, deserialize, validator
@dataclass
class PasswordForm:
password: str
confirmation: str
@validator
def password_match(self):
if self.password != self.confirmation:
raise ValueError("password doesn't match its confirmation")
@dataclass
class CompleteForm(PasswordForm):
username: str
with raises(ValidationError) as err:
deserialize(
CompleteForm,
{"username": "wyfo", "password": "p455w0rd", "confirmation": "..."},
)
assert err.value.errors == [
{"loc": [], "msg": "password doesn't match its confirmation"}
]
Validator with InitVar
¶
Dataclasses InitVar
are accessible in validators by using parameters the same way __post_init__
does. Only the needed fields have to be put in parameters, they are then added to validator dependencies.
from dataclasses import InitVar, dataclass, field
from pytest import raises
from apischema import ValidationError, deserialize, validator
from apischema.metadata import init_var
@dataclass
class Foo:
bar: InitVar[int] = field(metadata=init_var(int))
@validator(bar)
def validate(self, bar: int):
if bar < 0:
yield "negative"
with raises(ValidationError) as err:
deserialize(Foo, {"bar": -1})
assert err.value.errors == [{"loc": ["bar"], "msg": "negative"}]
Validators are not run on default values¶
If all validator dependencies are initialized with their default values, they are not run.
from dataclasses import dataclass, field
from apischema import deserialize, validator
validator_run = False
@dataclass
class Foo:
bar: int = field(default=0)
@validator(bar)
def password_match(self):
global validator_run
validator_run = True
if self.bar < 0:
raise ValueError("negative")
deserialize(Foo, {})
assert not validator_run
Validators for every type¶
Validators can also be declared as regular functions, in which case annotation of the first param is used to associate it to the validated type (you can also use the owner
parameter); this allows adding a validator to every type.
Last but not least, validators can be embedded directly into Annotated
arguments using validators
metadata.
from typing import Annotated, NewType
from pytest import raises
from apischema import ValidationError, deserialize, validator
from apischema.metadata import validators
Palindrome = NewType("Palindrome", str)
@validator # could also use @validator(owner=Palindrome)
def check_palindrome(s: Palindrome):
for i in range(len(s) // 2):
if s[i] != s[-1 - i]:
raise ValueError("Not a palindrome")
assert deserialize(Palindrome, "tacocat") == "tacocat"
with raises(ValidationError) as err:
deserialize(Palindrome, "palindrome")
assert err.value.errors == [{"loc": [], "msg": "Not a palindrome"}]
# Using Annotated
with raises(ValidationError) as err:
deserialize(Annotated[str, validators(check_palindrome)], "palindrom")
assert err.value.errors == [{"loc": [], "msg": "Not a palindrome"}]
FAQ¶
How are validator dependencies computed?¶
ast.NodeVisitor
and the Python black magic begins...
Why only validate at deserialization and not at instantiation?¶
apischema uses type annotations, so every objects used can already be statically type-checked (with Mypy/Pycharm/etc.) at instantiation but also at modification.
Why use validators for dataclasses instead of doing validation in __post_init__
?¶
Actually, validation can completely be done in __post_init__
, there is no problem with that. However, validators offers one thing that cannot be achieved with __post_init__
: they are run before __init__
, so they can validate incomplete data. Moreover, they are only run during deserialization, so they don't add overhead to normal class instantiation.