The Best Time is Now: Story of a Bug

Execution time was more than 10 years in the future (1840947419) (Wednesday, May 3, 2028 6:16:59 AM)

This was an exception message I encountered, while working on a project for my team, on May 4th, 2023. The exception complains the “Execution time” at Unix epoch 1840947419(May 3, 2028 in GMT), to be more than 10 years in the future. But is it only less than 5 years in future? If you are curious about why and would like to know how this was eventually solved, here is the story.

All the code was rewritten Python-ish because I cannot copy proprietary code. The original code is in a different programming language.

Confusing Future

There is a data schema defined like this:

class Foo(Base):
    __tablename__ = 'foo'
    name = Column(String)
    address = Column(String)

    def __init__(self, name, address):
        self.name = name
        self.address = address

And I was trying to add TTL to that:

class Foo(Base):
    __tablename__ = 'foo'
    name = Column(String)
    address = Column(String)
    ttl = configure_ttl_offset(datetime.timedelta(days = 365*5)) # ignore leap year

We have underlying framework to configure TTL, so what I had to do in data schema is just to call a configure_ttl_offset method. Super simple! I published the Diff(think about “Pull Request” on Github), and Continuous Integration later told me some tests failed. Sure, my responsibility to fix them since I was making changes, before I ship the Diff to production.

Digging a bit, I found where the tests failed:

class TestStuffUsingFoo:
    def test_some_scenario():
        foo = Foo(name='John Doe', address='123 Main St')
        user = User(role='Admin')
        foo_id = foo.id
        ... # some other stuff
        reloaded_foo = Foo.load(user, foo_id)

A little bit explanation about Foo.load(user, id) function: that’s a class method every data schema has: if user pass all the permission checks of the particular Foo object with the given id, then return the Foo object, otherwise throw exception.

When I added the TTL config to Foo’s schema, our underlying TTL framework added a new check to Foo.load(user, id) function: if TTL expired, then do not allow user to load it. The framework also does not allow setting TTL for too long: if the execution time(of eventually deleting the data) is more than 10 years in the future, then throw exception. The exception message I saw was from that.

Given I was sober and can calculate dates, I thought the TTL framework was probably drunk. I created a ticket for the framework owner and switched to something else.

Understanding Past

Several hours later, I got response from my coworker who happened to get the ticket(I am super grateful and greatly appreciate their help), and the TTL framework was proved not drunk, but they had some interesting code pointers for me:

The TTL framework would check the execution time like follows:

def check_if_ttl_too_big(foo: Foo, offset_days: int):
    offset = datetime.timedelta(days=offset_days)
    current_time = datetime.datetime.now()
    create_time = foo.created_at
    target_time = create_time + offset
    delta = target_time - current_time
    if delta.days > 365 * 10:
        error_msg = f"Execution time was more than 10 years in the future {target_time.timestamp()}"
        raise DeltaTooBigException(error_msg) # or return True if gracefully
    return False

And there was some unexpected code in the test I broke, during its setup:

class TestStuffUsingFoo:
    @pytest.fixture(autouse=True)
    def setup(self):
        self.mock_now = datetime.datetime(2018, 1, 1)
        with patch('datetime.datetime') as mock_datetime:
            mock_datetime.now.return_value = self.mock_now
            yield

Aha! The commonly used method to get current timestamp, datetime.datetime.now()(just as example, in reality that was a common method in our code base), was mocked to a hard coded date! So what happened when the TTL framework ran?

def check_if_ttl_too_big(foo: Foo, offset_days: int):
    offset = datetime.timedelta(days=offset_days) # 5 years
    current_time = datetime.datetime.now() # 2018-01-01
    create_time = foo.created_at # 2023-05-04
    target_time = create_time + offset # 2028-05-03
    delta = target_time - current_time # 10 years and 4 months
    if delta.days > 365 * 10: # Yes! I caught you sneaky coward
        error_msg = f"Execution time was more than 10 years in the future {target_time.timestamp()}"
        raise DeltaTooBigException(error_msg) # An engineer got confused
    return False # Not reached

Then I checked code history and found a Diff: on a later day in 2018, another engineer needed to write some test, to use a time in the past instead of the actual datetime.datetime.now(). It was easy and convenient to hard code a date that was surely before the day they were working, which turned out to be, January 1st.

So how do I fix this broken test then? How about I just change the hard-coded date to be January 1st, 2023? I bet you are yelling “this is a horrible idea” now, I totally agree with you.

In my opinion, when mocking a function in a unit test, we would better make the mocked function behave as closely as the function itself intended. In real world, we know “now” is something that never stops (no one has invented a time freezer/time stopper yet), so making datetime.datetime.now() always return a fixed time, letting a unit test forever stuck in the past, sounds ridiculous to me. Here is my fix:

class TestFoo:
    @pytest.fixture(autouse=True)
    def setup(self):
        now = datetime.datetime.now()
        mock_now = now - datetime.timedelta(days=90)
        with patch('datetime.datetime') as mock_datetime:
            mock_datetime.now.return_value = mock_now
            yield

The Best Time is Now

This bug let me feel lucky. Imagine any of the following conditions:

  • I set the TTL to be shorter, e.g. 3 years.
  • The previous engineer needed to write the test at a later time, e.g. some time in 2019.
  • TTL Framework is more lenient, e.g. disallow TTLs configured after 12 years.

Then I wouldn’t have broken this test now. On a different day, that test will be broken, for no obvious reasons, at which time the owner of the test, could have to spend way more time, could waste a lot of time to bisect recent changes with no avail, and could just give up and let the test broken.

And if I just modify the hard-coded date to be January 1st, 2023, I am planting a timed bomb for the test owner, to explode on January 1st, 2028.

So I did my part to do my coworker in future a favor, at the best time ever, datetime.datetime.now(), not mocked in real life, still mocked in that particular test.

Leave a comment