Tuesday, July 7, 2009

Python virtual console race condition

Here's a fun drinking game you can play. Every time I introduce a bug because I forget to close() something, take a drink.

My last entry talked about how paramiko was giving me fits because I wasn't closing the connection. Today, I had another problem, this time using amqp-lib. I had two virtual consoles open, one with the process that would read from the queue in a tight loop and printing out what it was doing. Another console would run a script to send messages to the queue. I started noticing a problem where the script that would send 9 messages would complete without error, but if I switched over to the first terminal, I saw only a partial number of them would actually be acted on. Sometimes it was one, sometimes it was four or five, sometimes it was all of them. I opened up a third console to look at the queues in RabbitMQ and saw that there were no more messages left. Either it was picking the messages out from the queue and not acting on them correctly, or never being sent to the queue correctly.

Sometimes I could run the script a few times in a row, and they would all pass. Finally, I saw a way to reproduce it... before the script sending the messages ended, switch to the other terminal.

Here's what I imagine was happening..

When I switched from one console to the other on my (slow, old, reserved for use as a server) laptop, the slow ass virtual console didn't have to print out the printlines in the script I was debugging with. Because the script ran faster, it was able to finish before all the data was effectively sent from the socket to RabbitMQ. When the script finished, the garbage collector forced the connection closed, even if it was still working. By adding in the code to manually close the connection, I provided the connection those extra split seconds necessary to really close with everything complete.

No comments:

Post a Comment