Friday, July 31, 2009

SC Engine: The Trouble With Testing

This is sort of an ad-hoc post, so I'm not really considering it part of my SC Engine series.

The Mercurial repository shows SC Engine at just under two months old. I guess it's time to look back at some of the things that I've tried and maybe think about stuff that worked vs. stuff that didn't work.

The big change in this project from other things I've done is the architecture. Although I'm sure my implementation was not the best, I really don't know how much I approve of a messaging system for everything. On the one hand, it did great for separating out different parts of the system, and on very large systems I could imagine it being useful when you have multiple systems with different platforms. Also, horizontally scaling would be extremely easy, although this wasn't really a concern for me. Really, the big problem was that testing applications was not as easy as I had hoped.

The goal was simple: make testing apps as easy as passing in data and collecting the data back. Make sure that the data returned was expected. Rinse and repeat...


def test_some_app(self):
msg = IncomingMsg(1, 2)
results = self.app.on_msg(msg)
expected = OutgoingMsg(1,3)
self.assertContains(results, expected)


The problem is with the messages that had a TON of attributes and needed a few earlier messages to set it up. Suddenly, a test could look like...


def test_some_app(self):
incoming_data = {
'data_1' : 1,
'data_2' : 1,
'data_3' : 1,
'data_4' : 1,
'data_5' : 1,
'data_6' : 1,
'data_7' : 1,
}

incoming_data_2 = {
'data_1' : 1,
'data_2' : 1,
'data_3' : 1,
'data_4' : 1,
'data_5' : 1,
}

incoming_msg_1 = IncomingMsg(**incoming_data_1)
incoming_msg_2 = IncomingMsg(**incoming_data_2)

self.app.on_msg(incoming_msg_1)
self.app.on_msg(incoming_msg_2)

incoming_data_3 = {
'data_1' : 1,
'data_2' : 1,
'data_3' : 1,
'data_4' : 1,
'data_5' : 1,
}

results = self.app.on_msg(incoming_msg_1)

expected_outgoing_data = {
'data_1' : 1,
'data_2' : 1,
'data_3' : 1,
'data_4' : 1,
'data_5' : 1,
}

expected_outgoing_msg = OutgoingMsg(**expected_outgoing_data)
self.assertContains(results, expected_outgoing_msg)


Now, sometimes I only needed data_1 and data_2, and the rest I could care less what the incoming data was. However, I often needed to make sure that perhaps data_3, data_4, and data_5 on the outgoing was the same. However, there just seemed to be a theme of code being less logic and more naming attributes. Basically, every app needed to get all the attributes and specify what they were in their own way. This wasn't just a problem in tests, the actual applications themselves were, at times, reeking of a bunch of attribute dictionaries and no real logic. It made the code pretty ugly looking.

In the end, I started writing less unit tests. At first, I thought that this was ok. I didn't necessarily want to write unit tests for every class as I was writing it, since sometimes you really need to write the objects and see how they will work together before you have a clear idea over the best way to implement the objects. Later, I realized it was really because the tests were a pain. For some objects it was understandable, but apps were pretty straightforward in terms of their interface (it's a bunch of methods with a single argument: the msg).

One potential solution would be to split messages up. In all actuality, I did this in some cases. However, in terms of testing, this would mean going from large messages to having more messages, which means the test would be just as long.

Another potential solution was to build helper objects to build the messages, but in the case of tests where all the attributes were indeed important, this wouldn't really help. Plus, often the attributes might not be important for the specific test. This probably would just end up being more work anyway.

At this point, I'm starting to see my problem in the following manner: Even the most simplest of applications are trying to do three things:

1.) Take the data that was passed to the app via messages and convert it into some locally-defined data structure.
2.) Run operations on the data structures to arrive at results.
3.) Convert results into messages to send out.

In trying to create tests for my app to make sure that #2 was being done correctly, I would inevitably end up writing extra code for EACH test to ensure that #1 and #3 were being done as well.

Perhaps I should start thinking of these as separate stages all-together. I can keep much of the infrastructure, but place objects on top that splits the functionality of handling a single message into the three distinct parts.

Edit:

Alternatively, a situation where only enough data to identify important info is being passed in any message, and the rest of the data is stored in a database somewhere outside of the application. Basically, the message would only contain ids.

No comments:

Post a Comment