Wednesday, June 24, 2009

Subclassing JSONEncoder and JSONDecoder

Today I was trying to write code to add the ability to encode datetime and timedelta objects as JSON. The idea is that two python applications would talk to eachother over an http request, with json as the content. It seemed simple, but I didn't realize that the decoder works a bit differently, and so it'll probably make you rethink how you write the encoder.

The json documentation shows an example of overriding the JSONEncoder:


class ComplexEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, complex):
return [obj.real, obj.imag]
return json.JSONEncoder.default(self, obj)


Using this, I wrote my own encoder for datetimes...


class DateTimeAwareJSONEncoder(JSONEncoder):
"""
Converts a python object, where datetime and timedelta objects are converted
into strings that can be decoded using the DateTimeAwareJSONDecoder.
"""
def default(self, obj):
if isinstance(obj, datetime):
return obj.strftime('dt(%Y-%m-%dT%H:%M:%SZ)')
elif isinstance(obj, timedelta):
days = obj.days
seconds = obj.seconds
milliseconds = obj.microseconds / 1000
milliseconds += obj.seconds * 1000
milliseconds += obj.days * 24 * 60 * 60 * 1000

return 'td(%d)' % (milliseconds)
else:
return JSONEncoder.default(self, obj)



The default method is called with every object encountered, so whenever it finds a datetime or timedelta, it uses my custom code to convert it into a string that I should be able to pick out using a regular expression or some other means when decoding. Here's what I tried...


datetime_regex = re.compile('\"dt\((\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}):(\d{2})Z\)\"')
timedelta_regex = re.compile('\"td\((\d+)\)\"')

class DateTimeAwareJSONDecoder(JSONDecoder):
"""
Converts a json string, where datetime and timedelta objects were converted
into strings using the DateTimeAwareJSONEncoder, into a python object.
"""
def decode(self, obj):
dt_result = datetime_regex.match(obj)

if dt_result:
year, month, day, hour, minute, second = map(lambda x: int(x), dt_result.groups())
return datetime(year, month, day, hour, minute, second)

td_result = timedelta_regex.match(obj)
if td_result:
milliseconds = int(td_result.groups()[0])
return timedelta(milliseconds=milliseconds)

return super(DateTimeAwareJSONDecoder, self).decode(obj)


This seemed to work, and it would so long as the data I input was just the datetime object...


decoder = DateTimeAwareJSONDecoder()

# Worked
decoder.decode('"dt(2009-04-01T23:51:23Z)"')

# Didn't work
decoder.decode('["dt(2009-04-01T23:51:23Z)"]')
decoder.decode('{"a": "dt(2009-04-01T23:51:23Z)"}')


It turns out that the "decode" method on the JSONDecoder does not work like the "default" method on the JSONEncoder. As soon as I passed the call to the JSONDecoder's "decode" method, it went ahead and decoded the whole thing before passing it back.

For awhile I was still pretty sure this is how I should do it, and I just wasn't using the "decode" method right, until I read an article about the json encoder and decoder by Doug Hellmann. Here he showed how you could create a decoder a bit differently. The code looks as such (some changed to make it more readable):


class MyDecoder(json.JSONDecoder):
def __init__(self):
json.JSONDecoder.__init__(self, object_hook=self.dict_to_object)

def dict_to_object(self, d):
if '__class__' in d:
class_name = d.pop('__class__')
module_name = d.pop('__module__')
module = __import__(module_name)
class_ = getattr(module, class_name)
args = dict( (key.encode('ascii'), value) for key, value in d.items())
inst = class_(**args)
else:
inst = d
return inst


The important thing to see here is that rather than overriding a method, you're actually passing a method as a parameter to the parent constructor. This dict_to_object works a bit more like the "default" method, in that whenever you get an object this method is called to give you a chance to convert it. What's important to realize, however, is that it will only get called if the object in question is a json object. This sounded as if that would be okay with what I have, but remember that in json, the word "object" has a specific meaning. A string value is not an object, it's a string value. As such, it will get decoded from the json string to a python string, and your object hook method will not get to touch it.

The solution, then, is to drop the idea of converting the datetime and timedelta objects into strings, and instead convert them into objects. Here is the code...


class DateTimeAwareJSONEncoder(JSONEncoder):
"""
Converts a python object, where datetime and timedelta objects are converted
into objects that can be decoded using the DateTimeAwareJSONDecoder.
"""
def default(self, obj):
if isinstance(obj, datetime):
return {
'__type__' : 'datetime',
'year' : obj.year,
'month' : obj.month,
'day' : obj.day,
'hour' : obj.hour,
'minute' : obj.minute,
'second' : obj.second,
'microsecond' : obj.microsecond,
}

elif isinstance(obj, timedelta):
return {
'__type__' : 'timedelta',
'days' : obj.days,
'seconds' : obj.seconds,
'microseconds' : obj.microseconds,
}

else:
return JSONEncoder.default(self, obj)

class DateTimeAwareJSONDecoder(JSONDecoder):
"""
Converts a json string, where datetime and timedelta objects were converted
into objects using the DateTimeAwareJSONEncoder, back into a python object.
"""

def __init__(self):
JSONDecoder.__init__(self, object_hook=self.dict_to_object)

def dict_to_object(self, d):
if '__type__' not in d:
return d

type = d.pop('__type__')
if type == 'datetime':
return datetime(**d)
elif type == 'timedelta':
return timedelta(**d)
else:
# Oops... better put this back together.
d['__type__'] = type
return d


I convert the datetime and timedelta to objects, using the '__type__' attribute so that I can pick them out of a crowd more easily. Doing it this way makes a lot more sense than going with strings, regular expressions, and the such.

Monday, June 22, 2009

The "right way" to write the first test for an object

You've probably come across someone who is teaching unit testing through a book or web page saying something along the following lines: "After writing a test, you should do the bare minimum to make sure the test passes."

Imagine that you're writing a simple calculator class. So many times, someone might say that if the first test is...


def test_adds(self):
c = Calculator()
r = c.add(2,3)
self.assertEquals(r, 5)


then your calculator class should look like...


class Calculator()
def add(self, a, b):
return 5


On the one hand, yes, just having "return 5" makes you test pass. Of course, "bare minimum" depends on what criterion you're using. "return 5" is less than "return a + b" in terms of character length, right? Complexity too, "5" is simpler than the operation "a + b".

The thing is, your calculator program can't add, although you have a test that says "test_can_add". I always found it funny that I could do this again...


def test_can_still_add(self):
c = Calculator()
r = c.add(6,3)

self.assertEquals(r, 9)


Ok, how about...


class Calculator(object):
def add(self, a, b):
if a == 6:
return 9
return 5


Now, this isn't right, obviously. You're not actually building your class to do what it should do, you're just making it pass the tests. What more, it's more complex than just...


class Calculator(object):
def add(self, a, b):
return a + b


I know this seems like drivel, but how you write this first test shows what you think the point of unit testing is. If you really think the first thing should be "return 5", then your test shouldn't be called "can_add", it should be "can_add_two_and_three", and you have your work cut out ahead of you to finish writing this calculator class. Some people will do something like this...


def test_can_add(self):
c = Calculator()
r1 = c.add(2,3)
r2 = c.add(3,4)

self.assertEquals(r1, 5)
self.assertEquals(r2, 7)


I guess this is acceptable. You no longer can just "return 5". Of course, along with the "do the simplest thing that works" mantra there is the other misunderstood mantra of "one assertion per test". Some people might think that the above breaks this second rule. I don't think it does, you're making the same assertion in both cases of 2+3 and 3+4 (that the object can add correctly), you're just doing it with different data.

However, I still wouldn't do it this way. If every time you write a test, you need to write two versions of the same test, you've doubled your code. And really, I don't see the point, other than to feel good because you think you're doing things the Right Way.

The main reason you might see code with multiple exercises of the same test (such as the example above, exercising the add method twice) is for edge cases. I'd prefer to keep edge cases to their own tests. For example, I would have one test for testing division, and another testing that dividing by zero raises an exception.

Of course, nothing is absolute, (except that (and that (and that...))), so go with what you find to make things simple, and still makes your tests worthwhile.

Saturday, June 20, 2009

Changing console keyboard layout under Ubuntu

Here's my contribution to the "where's the file located" pool of googleable blog post.

If you look for where to change keyboard mappings for Ubuntu, you'll find answers related to xmodmap or something in the gnome taskbar. I'm working on a computer where I installed Ubuntu server edition, and want to change things on the console (no X here).

In the past, the distribution stored it under /usr/share/keymaps, and I could quickly switch my capslock and escape keys like a good little vim user. In Ubuntu, this directory was nowhere to be found. However, I did find that the important file was located under /etc/console/boottime.kmap.gz. Standard "be careful/make backups" disclaimers apply.

Wednesday, June 10, 2009

Connecting from virtualbox guest to host

Seems simple enough, but I'm not sure why it took me so long to figure it out. I thought it would be a complex thing.

While making a website, using paste or just some small script, I can start up a test web server, and access the website I'm working on by going to localhost on port 8000..

http://localhost:8000/

I also wanted to connect to this using my virtualbox guest images (three of them, each having a different version of Interent Explorer), but obviously on those computers, localhost is local to Windows XP, and the test server is on the Linux box. Up until now, I just worked around this issue by putting them on another computer on my home network. But obviously this didn't work if I was away from home...

Virtualbox uses natting by default, so my Linux host was sort of working like a router. Thus, the answer was to do the same thing if you wanted to connect to a router: use the gateway address.

So, in my case, the gateway address for the Windows guest network interface was 10.0.2.2, so I could connect to the local computer using...

http://10.0.2.2:8000/

Tuesday, June 9, 2009

Thinkpad powers up but blank display

Had a bit of a scare today (well, technically, yesterday) when my 6-month old Thinkpad T400 suddenly came down with a case of being mostly-dead. My dual-monitors went black, although the device was still on. To make a long story short, I tried every combination I could think of trying to get the screen to turn on (with the battery, without the battery, on the docking station, off but connected to a monitor on the vga) thinking I might have accidentally changed the "default display" option in the bios switching from intel->ati->intel.

I could tell that the computer was booting up quite fine. After sitting awhile, I was able to make a backup of my home directory by waiting long enough to think that the desktop was up, alt+space to open tilda (a nice drop-down xterm-replacement), creating a backup tarball and scp-ing it to my ftp server.

(Note: I'm about to describe opening up part of a laptop. Trying to do something like this could damage your hardware, void your warranty, or kick your puppy. I'm not what you call a "trained professional", but while in college I did service and repair computers as a part-time job, so I've had experience in this area. Just use your noodle.)

After that, I found the T400 hardware replacement manual, and found how to access the backup battery next to the touchpad under the wrist rest. I disconnected the battery and turned on the computer. It went through a few tries of boot-up/shutdown-down before finally trying to post, but the display was working. I'm not sure exactly what the problem was, but it was fixed by pulling the battery and, in effect, resetting the bios options back to the factory default.

I reconnected everything, although I need to be careful with the screws. According to the hardware replacement manual:

Loose screws can cause a reliability problem. In the ThinkPad computer, this
problem is addressed with special nylon-coated screws that have the following
characteristics:

  • They maintain tight connections.

  • They do not easily come loose, even with shock or vibration.

  • They are harder to tighten.

  • Each one should be used only once.



So, suffice to say, I'm going to be looking to order some replacement screws, and if they're not too expensive, a torque screwdriver. I'd probably be ok with just using the old ones, since I don't rattle my laptop around too much, but I doubt the screws cost much anyway.

Edit:

Turns out after playing with it some more that the issue is when I have the discrete (ATI) card activated (using the setting in the bios). I can now use the integrated card, but get no output on the screen if I try switching over to the ATI card. Looks like I'll have to send this guy in at some point.