Tech
November 22, 2022
Summary: We created our own library for reliably handling timestamps in Python that adhere to different formats and conventions from numerous marketplaces around the world
One of our big challenges when processing orders or returns for customers and marketplaces all around the world is working with dates and times. Various challenges exist when comparing, exchanging and storing timestamps, most notably differences between timezones, daylight savings times and different timestamp formats. As a consequence of integrating with marketplaces all over the world we deal with a large number of different datetime conventions. Additionally we had some tables in Postgres that had columns using the timestamp
type instead of timestamptz
. This is problematic because we then lose the timezone information of aware timestamps when they are inserted in the database, and therefore requires even more due diligence in the application layer to ensure that all timestamps are correctly converted to the same timezone before inserting.
Due to the above we were unable to consistently persist an ordered list of events. To illustrate: an order created in UTC+8
could be shown as if it was created later than an order that was actually created 6 hours later, but in a UTC+1
timezone. Or an order that was created at 10:00 UTC+2
could be shown as if it was created later than an order that was created at 11:00 UTC+2
from a different marketplace because the marketplaces differed in the way they relayed the timestamps. This was confusing for our customers because with a high enough volume of orders or returns, newer ones could get lost in the overview because they were inserted deep in the list.
We needed a way to enforce a consistent and robust way of converting between all the input timestamp conventions and our internal modeling, so that we can chronologically sort orders and returns based on their absolute creation time in UTC+0
irrespective of where in the world they were created and to what datetime standards. We needed the solution to be statically type checkable using mypy, to help prevent accidentally releasing code not adhering to being timezone-aware.
Python has a built-in library, datetime, which is widely used to handle timestamps, but it does not enforce timezone-awareness. A TypeError
is raised at runtime if a naive and aware datetime object are compared, but there is no support for static type checking with mypy for differentiating the two. Arrow and pendulum are two well known third party libraries for dealing with datetimes in python. While they offer significant advantages over the built-in datetime library these were not suitable to us for two main reasons:
UTC+0
by default when no timezone is given. We want to be more explicit about the assumptions and raise an error if a naive timezone is being parsed without explicitly stating the assumed timezone.We found one other module which tries to enforce timezone-aware datetimes, datetimeutc. As the name suggests, it enforces UTC+0
on all datetime objects, but is too limited in its scope to be useful in our case and it is not type checkable due to the absence of type hints. Additionally, it has not been updated since 2016 and is likely no longer maintained or updated.
The main goal was to enforce a robust interface between the input datetimes and our internal modeling with support for static type checking. Therefore, we created our own heliclockter
module. We used NewType
to allow for static type checking, as well as utilities to ensure the datetimes were in fact timezone-aware. For instance:
# A `datetime_tz` is just a guaranteed timezone-aware `datetime.datetime`.
datetime_tz = NewType('datetime_tz', datetime.datetime)
def naive_datetime_to_datetime_tz(
dt: datetime.datetime, timezone: datetime.tzinfo
) -> datetime_tz:
"""
Converts a non timezone-aware `datetime.datetime` object into a `datetime_tz`
object. Use this function whenever you have a naive `datetime.datetime`.
"""
aware_dt = dt.replace(tzinfo=timezone)
assert_aware_datetime(aware_dt)
return datetime_tz(aware_dt)
datetime_local
and datetime_utc
were also declared and implemented to always enforce the local and UTC+0
timezone respectively. datetime_local
was used to try to ensure that all datetimes that were inserted in the database were in the same timezone as the ones inserted previously. This was later less important as we migrated to use timestamp_tz
on all timestamp columns after we invented dbcritic and started using it on all of our databases.
Together with an import linter, which ensured that the built-in datetime module as well as arrow and pendulum could not be imported anywhere in the project, this solution worked quite well. The import linter mitigated human error by forcing all developers to add any missing functionality to the heliclockter module, and the NewType
objects allowed static type checking with mypy.
The NewType
solution worked well for our existing setup, but we wanted to start using pydantic to model external data to make use of the validation logic it offers. Therefore, we needed to rethink the solution to work with pydantic’s validators.
By subclassing datetime.datetime
, we can implement __get_validators__()
directly in the timezone-aware classes. To ensure the three classes, datetime_tz
, datetime_utc
and datetime_local
, enforce their respective timezone constraints, __init__()
has been added to do an additional assertion, such as:
class datetime_tz(_datetime.datetime):
"""
A `datetime_tz` is just a guaranteed timezone-aware `datetime.datetime`.
"""
assumed_timezone_for_timezone_naive_input: ClassVar[Optional[ZoneInfo]] = None
def __init__( # pylint: disable=unused-argument
self,
year: int,
month: int,
day: int,
hour: int = 0,
minute: int = 0,
second: int = 0,
microsecond: int = 0,
tzinfo: _datetime.tzinfo = None,
) -> None:
msg = f'{self.__class__} must have a timezone'
assert tzinfo is not None and self.tzinfo is not None, msg
tz_expected = self.assumed_timezone_for_timezone_naive_input or tzinfo
msg = f'{self.__class__} got invalid timezone {self.tzinfo!r}'
assert self.tzinfo == tz_expected, msg
self.assert_aware_datetime(self)
This construction has two primary benefits:
UTC+0
or any timezone based on the exact class instance.datetime_utc
cannot be created if the timezone is wrong.The runtime enforceability also extends to pydantic models because we declare __get_validators__()
in the datetime_tz
class as:
@classmethod
def __get_validators__(cls) -> Iterator[Callable[[Any], Optional[datetime_tz]]]:
yield cls._validate
@classmethod
def _validate(cls: Type[DateTimeTzT], v: Any) -> Optional[DateTimeTzT]:
return cls.from_datetime(parse_datetime(v)) if v else None
Where from_datetime
is an extra utility that instantiates a timezone-aware subclass of datetime.datetime
and asserts the timezone-awareness at runtime. Thus, when writing pydantic models like the following, we know that bar
is always guaranteed to be a timezone-aware datetime object in the UTC+0
timezone at runtime:
from pydantic import BaseModel
from heliclockter import datetime_utc
class Foo(BaseModel):
bar: datetime_utc
The pydantic validation logic is only added if pydantic is also available at runtime. Heliclockter can therefore be used, even if you are not using pydantic.
Additionally, manually creating objects in the correct timezone is much simpler than before. To create a new datetime_utc
instance with the current time, we used to do the following:
datetime_utc(datetime_to_datetime_tz(datetime.datetime_now(datetime.timezone.utc)))
With the subclasses, we can now simply write:
datetime_utc.now()
This still ensures that the returned instance is in fact an instance of datetime_utc
because now()
from the datetime.datetime
class has also been overridden to add some extra conversions and assertions:
@classmethod
def from_datetime(cls: Type[DateTimeTzT], dt: _datetime.datetime) -> DateTimeTzT:
# Case datetime is naive and there is no assumed timezone.
if dt.tzinfo is None and cls.assumed_timezone_for_timezone_naive_input is None:
raise DatetimeTzError(
'Cannot create aware datetime from naive if no tz is assumed'
)
# Case: datetime is naive, but the timezone is assumed.
if dt.tzinfo is None:
dt = dt.replace(tzinfo=cls.assumed_timezone_for_timezone_naive_input)
# Case: datetime is aware and the timezone is assumed, enforce that timezone.
elif (assumed_tz := cls.assumed_timezone_for_timezone_naive_input) is not None:
dt = dt.astimezone(assumed_tz)
cls.assert_aware_datetime(dt)
return cls(
year=dt.year,
month=dt.month,
day=dt.day,
hour=dt.hour,
minute=dt.minute,
second=dt.second,
microsecond=dt.microsecond,
tzinfo=dt.tzinfo, # type: ignore[arg-type]
)
@classmethod
def now(cls: Type[DateTimeTzT], tz: Optional[_datetime.tzinfo] = None) -> DateTimeTzT:
tz = cls.assumed_timezone_for_timezone_naive_input or tz
if tz is None:
raise DatetimeTzError(
'Must override assumed_timezone_for_timezone_naive_input '
'or give a timezone when calling now'
)
return cls.from_datetime(_datetime.datetime.now(tz))
Adding new classes that enforce a certain timezone is simple. All you have to do is declare the assumed_timezone_for_timezone_naive_input
class variable and give the class a suitable name. E.g. adding a class that enforces the ‘CET’ timezone would be done in the following way:
from zoneinfo import ZoneInfo
from heliclockter import datetime_tz
class datetime_cet(datetime_tz):
"""
A `datetime_cet` is a `datetime_tz` guaranteed to be in the 'CET' timezone.
"""
assumed_timezone_for_timezone_naive_input = ZoneInfo('CET')
At Channable we use this module in combination with an import linter to enforce that the built-in datetime and the third party libraries arrow and pendulum are not imported anywhere else. An example lint configuration looks like this:
[importlinter]
root_package = my_package
include_external_packages = True
[importlinter:contract:1]
name=The `datetime_tz` module should be used to handle dates and times.
type=forbidden
source_modules =
my_project
forbidden_modules =
datetime
pendulum
arrow
Working with datetimes in Python is hard, as static type checking of timezone information is lacking and input formats can vary widely depending on the origin of the timestamp. Our original implementation with NewType
was not adequate as our use case changed when pydantic was introduced to the codebase. Switching to inheriting datetime.datetime
combined with having the pydantic validator built into the class itself, as well as prohibiting importing the regular datetime, arrow or pendulum modules anywhere in the codebase, has greatly improved the consistency and reliability of all datetime behavior across the entire system.
heliclockter
is a word play of "clock" and "helicopter". The module aims to guide the user and help them make little to no mistakes when handling datetimes, just like a helicopter parent strictly supervises their children.
We are also happy to announce, that today we are releasing heliclockter as open source
Are you interested in working at Channable? Check out our vacancy page to see if we have an open position that suits you!
Apply now