Take Two Programming: sqlalchemy

Showing posts with label sqlalchemy. Show all posts

Monday, September 7, 2009

Running SQLAlchemy scripts off of pylon's paste configurations

I've come across a situation where I want to be able to use cron to launch scripts that automate some tasks (fetch info from 3rd party, scramble their data into my database format, write to my database). I want to be able to use the models I've already created in SQLAlchemy, along with the configuration of SQLAlchemy in my paste configuration files (development.ini, production.ini) in these scripts.

I don't need all of the bells and whistles of pylons, just the SQLAlchemy stuff. Luckily, there are some convenience functions that seem to have been made to deal with a situation just like this one. Here's the code:


import sys 
from paste.deploy import appconfig
from sqlalchemy import engine_from_config
from myapp.model import init_model

def setup_environment():
    if len(sys.argv) != 2:
        print 'Usage: Need to specify config file.'
        sys.exit(1)

    config_filename = sys.argv[1]
    config = appconfig('config:%s' % config_filename, relative_to='.')
    engine = engine_from_config(config)
    init_model(engine)

Of course, replace "myapp" with the name of your application.

Getting the configuration filename from the command-line arguments could of course be put somewhere else, so I'll just skip talking about that. The two main functions are appconfig and engine_from_config.

appconfig is from paste.deploy, which reads the configuration file and returns a dictionary-like object. engine_from_config is from SQLAlchemy and takes a dictionary-like object, retrieves SQLAlchemy-specific configurations, and uses the configurations to create an engine object.

Monday, August 10, 2009

Reusable SQLAlchemy Models

Recently looking at django, I took a liking to their idea of reusable apps. One thing that was common with most of these apps were some model that they would include and which you would call a function to automatically set up a relation to one of your models, thus having a "reusable model". An example would be any object that you'd like to have comments on automatically having a Comment model generated and relationships to your custom-built object automatically set up.

Out of mere curiosity, I wondered how difficult it would be to create a "reusable model" in SQLAlchemy. My end goal was to be able to do something like this...


@commentable
class Post(Base):
    __tablename__ = 'posts'
    id = sa.Column(sa.Integer, primary_key=True)
    text = sa.Column(sa.String)

A class that was decorated as commentable would automatically have a relation defined to contain multiple comments. After trying out a few ideas, I wrote a test for what I would want the end result to look like...


class TestModels(SATestCase):
    def test_make_comment(self):
        p = Post()
        p.text = 'SQLAlchemy is amazing!'

        text = 'I agree!'
        name = 'Mark'
        url = 'http://www.sqlalchemy.org/'
        
        c = Post.Comment()
        c.text = text
        c.name = name
        c.url = url 
        p.add_comment(c)
        Session.add(p)

        p = self.reload(p)

        self.assertEquals(len(p.comments), 1)
        c = p.comments[0]
        self.assertEquals(c.text, text)
        self.assertEquals(c.name, name)
        self.assertEquals(c.url, url)

(As a little note, self.reload() is a helper function that will force a complete reload of the session and return the objects you passed in after the session is opened again)

First off, a little bit about how the table structure would work. AFAIK, django stores comments in one large table and has what you might call a discriminator field to determine the object type that the row relates to. In my case, every type of commentable object (Post, NewItem, etc.) would instead get it's own comment class (PostComment, NewsItemComment) as well as their own table (post_comments, news_items_comments). No real reason to do it this way, I just thought it'd be easier.

In the end, it was actually pretty easy. Here is what the initial results look like...


class BaseComment(object):
    pass

def build_comment_model(clazz):
    class_table_name = str(class_mapper(clazz).local_table)
    metadata = clazz.metadata

    comment_class_name = clazz.__name__ + 'Comment'
    comment_class = type(comment_class_name, (BaseComment,), {}) 

    comment_table_name = class_table_name + '_comments'
    comment_table = sa.Table(comment_table_name, metadata,
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column(class_table_name + '_id',
            sa.Integer,
            sa.ForeignKey(class_table_name + '.id')),

        sa.Column('text', sa.String),
        sa.Column('name', sa.String(100)),
        sa.Column('url', sa.String(255)),
    )   

    mapper(comment_class, comment_table)

    return comment_class, comment_table

def commentable(clazz):
    comment_class, comment_table = build_comment_model(clazz)

    clazz.Comment = comment_class
    setattr(clazz, 'comments', relation(comment_class))

    def add_comment(self, comment):
        self.comments.append(comment)

    setattr(clazz, 'add_comment', add_comment)

    return clazz

First off, we have the BaseComment class. This is just an empty class, but you could easily see there being logic on here.

In the build_comment_model() function, you see where the creation of the table's metadata takes place. It'll first create a new class that will be used for the new comment model (in the case of a Post model being commentable, it's creating a PostComment class that inherits from BaseComment). Using SQLAlchemy information that we find on the mapper of the class we're decorating, we can determine things such as the name of the table so that we can create our own table (the Post model's table name is "posts", so we'll create a "posts_comments" table).

Finally, the commentable() function finishes by setting the Comment attribute on the model class we're decorating, as well as adding the relation and a class method. The Comment attribute allows us to easily get at the new class that we created, the relation allows us to easily work with the items, and the add_comment is an example of how we can also add extra methods to the model.

Currently, there are some pitfalls:

1.) It assumes that the class has been mapped before running. Because mapping of a class must happen after the class is instantiated, you couldn't use the commentable() function as a decorator for a non-declarative style mapping, since it wouldn't be able to find the mapper. The solution would be to just call the commentable function after the class is mapped. I guess it's not really a pitfall per se, although I haven't actually tried it, so I'm just guessing that it works :)

2.) Right now, it assumes that the foreign key for the Comment object would be a single column that corresponds to the column named "id" on whatever model it's decorating. I still need to work on this, since you might have a different name, different primary key type, or even a multi-column primary key. Basically when the new comment table's metadata is being laid out, it needs to look at the decorated model's primary keys to determine how it should construct it's foreign keys, rather than just jumping to the conclusions that it does now.

Wednesday, July 22, 2009

SC Engine: Part 8 - The Site

- Introduction
- Part One: System Overview
- Part Two: System Overview: Messages and Applications
- Part Three: Screen Scraping
- Part Four: YouTube Parsing
- Part Five: Linking the Video to the Game
- Part Six: Messaging Middleware
- Part Seven: The Console
- Part Eight: The Site

In my final planned post for SC Engine, I'll talk about how the data gets from moving around through all these apps onto the web site.

One application that is really no different than any other in terms of how it is made is called 'web.site_interface'. Here's what it looks like...

Web Site App

You'll notice that the app is simply named "App". I recently did a refactoring that allowed me to create apps based on the module name to make bringing together all the apps very simple. Anyway, in the application's builder method, we see a good example of the application getting a configuration (the password to the site). The 'poster' method is a function that's used to send data using an HTTP post to the web site at the specific location. The entire body of the request is json, and consists of the data and a password. This password, combined with SSL, should give me enough protection to allow updates to the site without worrying about a third party easily swooping in and running whatever commands they want.

On the other side, pylons is waiting in it's update controller...

Update Controller

In case you were wondering, this is similar to how the SC Console works, as well. Really, the controller just converts the json into a dictionary of values, and passes it on to the updater which does the heavy lifting...

Updater

In terms of updates, that's all there is to it.

One of the more interesting points is the url generation, such as the following:

http://sc.markhildreth.webfactional.com/ShinhanBankProleague0809/Round1/Week1/Day1/HiteSPARKYZ-v-HwaseungOz/Set1

Currently, the url routes are a bit hackish...


    # CUSTOM ROUTES HERE
    map.connect('/', controller='main', action='index')
    map.connect('/uploader_info', controller='main', action='uploader_info')
    map.connect('/contact', controller='main', action='contact')
    map.connect('/contact_submit', controller='main', action='contact_submit')
    map.connect('/contact_submitted', controller='main', action='contact_submitted')

    # Uploads
    map.connect('/update/{action}', controller='update')

    # Ajax retrieval of commentary
    map.connect('/commentary/{game_id}/{author}', controller='commentary', action='view')

    # Matches
    map.connect('/{league}/{stage_0}/{stage_1}/{stage_2}/{team_one}-v-{team_two}',
            controller='league', action='match')
    map.connect('/{league}/{stage_0}/{stage_1}/{team_one}-v-{team_two}',
            controller='league', action='match')
    map.connect('/{league}/{stage_0}/{team_one}-v-{team_two}',
            controller='league', action='match')
    map.connect('/{league}/{team_one}-v-{team_two}',
            controller='league', action='match')

    # Match set
    map.connect('/{league}/{stage_0}/{stage_1}/{stage_2}/{team_one}-v-{team_two}/Set{game_number}',
            controller='league', action='view')
    map.connect('/{league}/{stage_0}/{stage_1}/{team_one}-v-{team_two}/Set{game_number}',
            controller='league', action='view')
    map.connect('/{league}/{stage_0}/{team_one}-v-{team_two}/Set{game_number}',
            controller='league', action='view')
    map.connect('/{league}/{team_one}-v-{team_two}/Set{game_number}',
            controller='league', action='view')

    # Stages
    map.connect('/{league}/{stage_0}/{stage_1}/{stage_2}', controller='league', action='stage')
    map.connect('/{league}/{stage_0}/{stage_1}', controller='league', action='stage')
    map.connect('/{league}/{stage_0}', controller='league', action='stage')
    map.connect('/{league}', controller='league', action='stage')

Eventually, I'm going to need to do matches other than team_one vs. team_two (some leagues have matches that contain four players in a group, and the url would need to have something like "GroupA"), and I figure that when that time comes I'll rewrite how the rule matching works. But, it works for now, and I can see it in one page, so I'm not too concerned with it (a year ago I would've spent another day getting these routes down to five lines of code, one example of how I think I've matured as a programmer recently).

Right now, the SQLAlchemy code that actually retrieves the data is very simple. It suffers from N+1 Select issues, and at some point will need to be rewritten for performance reasons. I won't show any of it because it really is just a matter of get some data by the id and then keep fetching child data. It'll end up with six or more database calls for each page, and at some point I'll want to reduce that number to one. One thought was to use an application to denormalize the data and upload the data already denormalized, but it probably seems like a bunch of work for what could probably be solved with a few minutes of tweaking the SQLAlchemy queries.