skwpspace yan pritzker’s home on the web

skwpspace is Yan Pritzker's home on the web

Blog :: Photography :: About Me

TwitterCounter for @skwp

Get the news feed
Get updates by email
Follow me on twitter

hello, i'm yan

This blog is about startups, blogging, Ruby On Rails, virtualization and cloud computing, photography, customer service, marketing, ux and design, git, and lots more.

Top Posts

planypus

I'm the founder of Planypus, the place to share your plans!

cohesiveft

Accessible, manageable, virtualized application stacks ready to download or deploy to the cloud!

flickr

smokingThe old man and the accordionBicycle 6x6Eva 6x6Shaffies 6x6Phone Booth Velvia 6x6WynnStar 6x6

Archives

Contact

Reach me at yan at pritzker.ws

Posted
11 June 2008 @ 3am

Tagged
background, code, rails, ruby, thoughts, threads

Long running Threads in Rails and metaprogramming fun

Disclaimer: This post contains evil (but highly fun!) code. Proceed at your own peril…

I was recently designing an application that needed to execute some long running requests against an external host. If you’ve ever tried doing something like this in Rails, you’ll find your mongrels will block up waiting for the request to complete, bringing the experience for all other users to a halt.

I wanted to dispatch my long running request, return to the user, and then poll for results using AJAX. There are many ways to do background tasks in Rails, most of which require running an out of process background server with which you will communicate over some sort of queue or memcached. There’s BackgroundRb, Bj, workling, and so on, but this seemed overkill for my problem.

After reading a post on using Ruby Threads, I decided to be brave and try this approach. I implemented a simple action which would spawn a thread and proceed to return the result whether it was ready or not. This action is polled via AJAX and on the next poll the result will be correct. The pseudocode looks something like

def long_running_action
  #spawn a thread
  precache_the_results

  # This action throws DataNotAvailableException
  # if file is missing/unreadable
  results = read_cached_results

rescue DataNotAvailableException
  # This tells me that when I load the page
  # I should invoke an ajax a couple seconds
  # later to check for results again
  flash[:update_right_away] = true
ensure
  respond_to do |wants|
    # render an RJS update with the results
  end
end

def precache_the_results
  Thread.new {
    expensive_action_outputs_to(file.txt)
  }
end

Because I didn’t join the Thread to the request thread, it lives on after the request completes, which is just what I needed. Since the code inside my Thread is a call to an external provider and doesn’t write to the database, I am not concerned with ActiveRecord threading issues.

The only problem with this approach is that in development mode, Rails likes to reload your classes on every request. But if your thread runs past the request lifetime, the class that’s running it may be unloaded while it’s running, wreaking all sorts of havoc. But Ruby allows us the power to be truly evil:. What if I just prevent Threads from doing what they want to in development mode? Turns out I can!

if ENV['RAILS_ENV'] == development
  class Thread
    def initialize(&block)
      block.call
    end
  end
end

This code is defined in the class where I’m doing the magic. Do NOT just slap this into your environment.rb as you’ll horribly break the Rails startup logic. There’s probably a slightly smarter and safer way to do this by using a Factory pattern to create the threads and explicitly specifying the implementation you want. But this is my party and I’ll monkeypatch if I want to.

So..comments, suggestions, complaints? Is this going to die horribly in production? I guess we’ll have to see!


4 Comments

Posted by
RoR Tuesday June 17, 2008
17 June 2008 @ 1pm

[...] Long running Threads in Rails and metaprogramming fun. You know, I should just link to Yan’s blog and be done with it. [...]


Posted by
Ilya Grigorik
3 August 2008 @ 5pm

Yan, that’s an interesting approach. ;) One thing I still don’t fully understand: how are you polling for updates on the thread is spawned? Do you have another action synchronizing on the file you’re writing to?


Posted by
Yan
3 August 2008 @ 5pm

It’s basically ‘lazy polling’. The way it works is that there is an ajax poll on the front end that invokes the action. The first time the action is invoked, there is no data yet and a thread is spawned to populate. The next time it is polled, it reads the data from the first poll, and spawns another background thread to update. So essentially every time the frontend polls it is getting data that is approximately ‘one request old’. Does that make sense?

Also, I should note that in Rails 2.1 you can probably make this a lot cleaner by using the new cache read/write mechanism. Here I manually write to the file. Right now there is no mutex around the file writing which means technically I can have a race condition that causes the file to have ‘incorrect’ data, but in my case there is really no such thing as incorrect data… basically I’m using this technique to poll Amazon EC2 for launched instance state so at some point amazon will start returning ’started’ and that will be the final state and it will not really change. I am not sure if I am opening the file up to corruption if two threads try to write to it at the same time, so probably a mutex would be a good idea in the future.


Posted by
Ilya Grigorik
3 August 2008 @ 6pm

Ah, gotcha - asynchronous RPC. In fact, we’re using similar model for one our services at AideRSS: when you query for a feed_id, if we don’t have it in our system, we spawn a worker, report ‘progress:0′ and return. Any subsequent call to same endpoint will start reporting the progress. For synchronization between workers, we use a memcached server (worker is on a different machine). It’s been working great so far!


Leave a Comment

Git rm pending files Yahoo offers insight on social reputation patterns