As discussed here, if I’m going to be building a World of Warcraft playing robot, I need to have systems in place to do 3 things:
- Watch the screen and make note of certain events that happen
- Think about those events when they happen and figure out how to respond to them
- Do the things necessary for that response
So let’s talk about watching. There is a lot happening on the screen in WoW – it’s actually a really gorgeous game with some interesting art direction choices that allowed the game to age at least fairly gracefully and also be pretty performant on shitty hardware. Unfortunately, for our purposes, all that pretty stuff is really annoying and gets in the way.
When fighting a dragon, I, a human player, might need a lot of eyecandy to make me forget that really all I’m doing is pressing a number of keys in sequence while a bunch of other people are pressing a number of keys in sequence while a computer generates random numbers to tell us whether or not we have successfully managed to press the right keys fast and well enough to get a favorable outcome (christ this is depressing when put this way). But the eyecandy isn’t relevant for our hypothetical WoW playing robot. The robot cares not for neat sound effects, interesting visuals, or any of the concessions required for meatbags. So let’s ignore all the shiny and just go for the relevant bits of info.
Let’s make this super simple: If I am in combat and I am hit and take damage, I need to know how much damage I took and whether or not I need to worry about it. If I’m just taking a little damage slowly, I tend to ignore it. If I take a LOT of damage in a chunk, I probably need to scream for help or activate an ability that will prevent me from dying. For things in between maybe I need to just use a potion or something. To our robot, I think all I really need to know is what is my current health percentage. If it’s above 75%, don’t worry. If it’s between 25 and 75%, do something to heal myself, but don’t panic. If it’s below 25%, press my Oh, Shit! button and hope for the best.
So how can I tell where my health is? Easy: as a human, I look at my health bar and do a lot of subconscious math and realized oh, hey, my bar is smaller than I want it to be. With one of the add-ons I use, my bar is green until it goes below a certain point, yellow below that, and then red if I’m about to kick the bucket. As a human, it’s really easy for me to read this information because we are (generally) pretty good at picking up colors and stuff like that, and we can do so in a wide range of lighting situations. Computers, unfortunately, kind of suck at that still. What computers are good at is looking at very precisely formatted things that are set to follow certain standards and then figuring out if that’s meaningful.
In this case, I’m thinking of the ubiquitous (and often pretty useless) QR code. If I can have my WoW game throw up a QR code that the robot can read that contains such helpful information as “you just got your ass beat, time to panic” it doesn’t need to worry about shades of green and stuff like that. And QR codes could be used for a lot of different events – generate one that contains information about anything you particularly care about watching and your robot can see it and figure out what to do about it.
So our Watcher subsystem would need:
- A way to generate a QR code and put it on the screen when things happen in WoW
- A camera looking at the screen constantly and registering QR codes
- Some way of knowing whether a QR code was new or had previously been seen (timestamp)
- A way to pass this information on to the Thinker subsystem
Generate QR Codes
Someone already came up with an addon in WoW for generating QR codes – there’s an addon called WoWQR which generates a QR code for linking to various external things like Armory or WoWHead or whatever. So, I could make an addon that listens for game events that I care about – health, combat stuff, whatever – and then more or less uses the WoWQR method of generating a QR code. This would require me learning how to make a WoW addon and lua scripting, which should be fun. From looking at the WoW addon API I can tell that there’s all kinds of information available to addons – health, position, facing, and so on, so I should be able to get or infer all kinds of information that I can then put into my codes. Nothing here that isn’t available to a human!
My addon will need to:
- Listen for changes to values I care about (health, mana, position, ability cooldown, special ability proc)
- Format those changes into some kind of string that is meaningful (probably JSON) and include a timestamp
- Encode that string into a QR code and display it
Considerations are whether or not the addon can generate and display a QR code fast enough in combat. Writing performant lua will be key here.
Read QR Codes
Webcams are cheap and easy enough to work with. I’m thinking that I don’t want to have a gigantic PC sitting around being the brains of my robot. A Raspberry Pi with a camera should be sufficient to look at the screen, recognize and decode the information contained in a QR code. I’ve literally never even touched a Pi, so that would be entirely new to me, but really cool to learn as well. I’m familiar with a number of programming languages already, but have never used a “functional” programming language, so maybe I’ll go with Haskell or Lisp (which some people say isn’t functional, but I don’t know any better) for writing that part of the software. There should be existing libraries for these languages for capturing images from video and for decoding QR codes; if there aren’t, I know there are for Python and while Python isn’t new to me, it’ll do in this case.
My camera app thingy will need to:
- Look at the screen say, 10x per second and see if there’s anything resembling a QR code
- If it finds one, decode it and then compare the timestamp to the timestamp of the last QR code it got
- If the timestamp is different then this is a new event and it needs to pass along the information to the Thinker
The only thing I’m really worried about here is whether or not I’ll be able to detect, decode and pass along the message to Thinker quickly enough. 10x per second might be a little aggressive – can I get by with 5x? I also don’t know how I feel about the Watcher determining whether or not what it saw is new or not – I think I’m OK with that idea, but it might change.
Pass information to Thinker
Since I’ll be using a Pi, it should be easy to have multiple services running on it. I don’t yet know what my Thinker service will look like, but I’m going to probably go with some kind of RESTful service because that seems pretty easy to make. Or maybe I could look at a messaging queue system to publish events and subscribe to them or something. This requires more thought, but fortunately isn’t necessary to go into at this point since this is just about the watcher.
This is the 5000 foot overview of my Watcher. Last thing I need to do is give it a name. And since it’s just watching and not doing a damn thing about what it sees other than passing along the info, I shall name it Uatu.