The Best Time is Now: Story of a Bug
Execution time was more than 10 years in the future (1840947419) (Wednesday, May 3, 2028 6:16:59 AM)
This was an exception message I encountered, while working on a project for my team, on May 4th, 2023. The exception complains the “Execution time” at Unix epoch 1840947419(May 3, 2028 in GMT), to be more than 10 years in the future. But is it only less than 5 years in future? If you are curious about why and would like to know how this was eventually solved, here is the story.
All the code was rewritten Python-ish because I cannot copy proprietary code. The original code is in a different programming language.
Confusing Future
There is a data schema defined like this:
And I was trying to add TTL to that:
We have underlying framework to configure TTL, so what I had to do in data schema is just to call a configure_ttl_offset
method. Super simple! I published the Diff(think about “Pull Request” on Github), and Continuous Integration later told me some tests failed. Sure, my responsibility to fix them since I was making changes, before I ship the Diff to production.
Digging a bit, I found where the tests failed:
A little bit explanation about Foo.load(user, id)
function: that’s a class method every data schema has: if user pass all the permission checks of the particular Foo object with the given id, then return the Foo object, otherwise throw exception.
When I added the TTL config to Foo’s schema, our underlying TTL framework added a new check to Foo.load(user, id)
function: if TTL expired, then do not allow user to load it. The framework also does not allow setting TTL for too long: if the execution time(of eventually deleting the data) is more than 10 years in the future, then throw exception. The exception message I saw was from that.
Given I was sober and can calculate dates, I thought the TTL framework was probably drunk. I created a ticket for the framework owner and switched to something else.
Understanding Past
Several hours later, I got response from my coworker who happened to get the ticket(I am super grateful and greatly appreciate their help), and the TTL framework was proved not drunk, but they had some interesting code pointers for me:
The TTL framework would check the execution time like follows:
And there was some unexpected code in the test I broke, during its setup:
Aha! The commonly used method to get current timestamp, datetime.datetime.now()
(just as example, in reality that was a common method in our code base), was mocked to a hard coded date! So what happened when the TTL framework ran?
Then I checked code history and found a Diff: on a later day in 2018, another engineer needed to write some test, to use a time in the past instead of the actual datetime.datetime.now()
. It was easy and convenient to hard code a date that was surely before the day they were working, which turned out to be, January 1st.
So how do I fix this broken test then? How about I just change the hard-coded date to be January 1st, 2023? I bet you are yelling “this is a horrible idea” now, I totally agree with you.
In my opinion, when mocking a function in a unit test, we would better make the mocked function behave as closely as the function itself intended. In real world, we know “now” is something that never stops (no one has invented a time freezer/time stopper yet), so making datetime.datetime.now()
always return a fixed time, letting a unit test forever stuck in the past, sounds ridiculous to me. Here is my fix:
The Best Time is Now
This bug let me feel lucky. Imagine any of the following conditions:
- I set the TTL to be shorter, e.g. 3 years.
- The previous engineer needed to write the test at a later time, e.g. some time in 2019.
- TTL Framework is more lenient, e.g. disallow TTLs configured after 12 years.
Then I wouldn’t have broken this test now. On a different day, that test will be broken, for no obvious reasons, at which time the owner of the test, could have to spend way more time, could waste a lot of time to bisect recent changes with no avail, and could just give up and let the test broken.
And if I just modify the hard-coded date to be January 1st, 2023, I am planting a timed bomb for the test owner, to explode on January 1st, 2028.
So I did my part to do my coworker in future a favor, at the best time ever, datetime.datetime.now()
, not mocked in real life, still mocked in that particular test.
Leave a comment