Sunday, 30 August 2009

in which I actually write some software

BioCalc main screen

Biostats Calculator appeared on the iTunes App Store the other day, after an approval process that felt like it took years (but actually was only a couple of weeks). It's my first iPhone app so it's kind of exciting, and some have been sold already (Hi, Colombia!), which is really exciting in the paying-my-rent kind of way.

The application performs various biostatistical significance tests - relative risks, t-tests, chi-squared etc - and a couple of other calculations. (Full details are on the website). It aims to provide the most common statistical calculations a medical researcher might need, without having to go back to their computer and deal with a full statistics package. Also, it's totally awesome and you should buy it and tell your friends.

</plug>

Other notes:

  • Objective C is awesome. Garbage collection is disabled on the iPhone, which I thought would bother me but is actually hardly a problem at all. It really gives you a sense of how simple the language as an extension of C - basically just objc_msgSend and a little bit of preprocessor magic. And knowing how the OO constructs work allows me to feel completely comfortable mixing in plain C instead when it's more natural. (I'm still not sure how autorelease works. I'm pretty sure it's magic.)
  • Apple did a really good job on Cocoa Touch. The simple things seem to just work, and the hard things aren't all that hard. Except where they're impossible by design, per the next point.
  • They're serious about their Human interface guidelines. The first submitted version of the application used an 'add contact' button icon (rendered as a blue plus-sign) to add data to a data set, and they rejected it for misusing a standard icon. I took a screenshot of the button, tinted the image green, and they had no problem with that.
  • The approval process is frustating. They have an actual human test your application, make sure it doesn't crash, follows guidelines, and does what it says on the box. I think this is a really good thing that drives up application quality and platform consistency, but it means waiting a couple of weeks between finishing the app and it hitting the store. Plus you get to repeat the whole process when you didn't read the HIG and used the wrong button icon. Ahem.
  • Despite not being quite as streamlined a process as I would have assumed (lots of forms to fill out, certificates to download, having to get a credit card in my name to open an account, etc) the App Store is an amazing service. You develop the software, describe it, and decide how much it should cost, and that's it - you don't have to work out how to sell it, just what the next app will do.

Thursday, 14 May 2009

Enjoy - a JoyToKey clone for mac

JoyToKey lets you map joysticks to keyboard buttons to control programs that aren't joystick-aware. Gamepad Companion is a similar thing for Mac but it's nagware and it annoyed me. Also I wanted the ability to automatically switch configurations depending on the active application. So: Enjoy.

Somewhat related: half-working hack to get the tilt-sensor on a generic PS3 guitar hero controller to work with Frets on Fire.

Tuesday, 3 March 2009

poker paradox?

hold'em in a nutshell

Each player has 2 secret cards, and later 5 shared cards are revealed. Players bet on who has the best combination of cards both before and after the shared cards are revealed. Most secret cards are not worth betting on, good players will play between 10% and 30% of their hands.

three common principles

  • Deception (fundamental theorem of poker): if your opponent has less (or wrong) information, you make more money than if they have more information
  • Hand strength: all else equal, you make more money with higher cards than lower cards. (E.g. 9♣8♣ vs 8♣7♣)
  • Style: a range of styles can be equally profitable, for example you may be able to maximise your winnings by playing anything from 10% to 30% of hands.

the problem

These principles seem axiomatic (except perhaps the third, but empirically it seems true). But they conflict:

  • Take two optimal strategies: a tight 10%-of-hands game "A" and a loose 30% game "B".
  • At the start of each hand, flip a coin, and play either strategy "A" or "B" accordingly. Call this compound strategy "C".
  • A player who's played against either A or B for long enough knows they play 10% or 30% of hands respectively, and they're winning so it's probably the best 10% or 30% of hands. (Strictly, a best 10%. Hand strength is only a partial order, it doesn't compare 9♣8♣ to 2♢2♠).
  • A player who's played against C knows that he plays 20% of hands. If the player is very observant he might know about the coin-flipping, but if the coin is hidden he doesn't know whether he's playing against A or B for a particular hand. In either case, he's playing against an optimal strategy with less information than last time, so when the coin comes up heads C makes more money than A, and when it comes up tails C makes more money than B. So C is more profitable than A and B (which were supposed to be equal and optimal).
  • Strategy C involves sometimes playing hands in the 20-30th percentile, and sometimes not playing hands in the 10-20th percentile. According to hand strength, we'd make more money by replacing some worse hands we're playing with some better ones we're not playing. This gives a strategy "D" where we flip a coin play the top 20% of hands according to strategy A or B.

So D > C > A and B, but A and B were supposedly optimal.

resolutions

The first weakness I can see is the assumption that it's always incorrect to throw away a more profitable and play a less profitable one. A hand like 5♡7♡ that rarely wins may be profitable if played occasionally because when it does win your opponent will discount the possibility that you have it. However you have to fold it most of the time, even though playing it would be profitable for that individual hand, to maintain this value.

I'm surprised that the benefit from playing a probabilistic strategy is big enough to counteract the 'paradox'. Most analysis online tends to focus on opponent ranges - the range of cards they might have based on their actions in the hand - and it's assumed in calculations that the opponent would play their cards that way every time. Against good opponents, the edges of the ranges will be significantly fuzzy, and not accounting for this makes for flawed analysis that loses money (fundamental theorem again).

The second weakness (thanks Kyle!) is the assumption that C gives our opponent less information than A or B. This is intuitively true if A and B are deterministic, in the same way that a coin-flip is more uncertain than 'heads' or 'tails' individually; but if they A and B are probabilistic then C may be no more uncertain - the average of two random numbers is not 'more random'.

Monday, 26 January 2009

guessing games 1

One of the first programs I ever wrote was a number guessing game:

I'M THINKING OF A NUMBER BETWEEN 1 AND 20
> 7
HIGHER
> 15
HIGHER
> 18
LOWER
> 17
YOU WIN

(The Commodore 64's green was brighter of course, but I wouldn't do that to you). The best strategy for beating the game, of course, is to pick the middle number of the range first in order to break the range in half. If that didn't happen to be the right number, you have a new possible range that's half the size, and you repeat the process. This is binary search at its purest. It's the best strategy in two ways: it always finishes in at most 5 guesses (worst-case performance), and the average number of guesses needed if you pick randomly is 3.7 (average-case performance).

I tried to write a program to do the guessing, but I couldn't work out how to let the user type "higher" or "lower" instead of numbers, and I gave up.

twenty questions

It's the same game, except instead of a number you can pick anything, and the guesser is allowed twenty yes-or-no questions to try to work out what it is.

I'M THINKING OF A SOMETHING
> IS IT ORGANIC?
YES
> IS IT ALIVE?
YES
> IS IT A PLANT OR AN ANIMAL?
YES
> LOL, IS IT A PLANT XOR AN ANIMAL?
NO
> IS IT ROBERT PLANT?
YOU WIN

Obviously it wouldn't be an easy program to write, but in theory it's possible if we formalise the game. First we need a dictionary, a list of all the possible objects the computer can be thinking about. There can only be a finite number if we want the game to be winnable.

Next we need a set of questions. Every yes-or-no question breaks the dictionary up into two sets: things that answer 'yes' and things that answer 'no'. We can characterise all questions by the sets of things that answer 'yes' (the objects they accept). Two questions that accept the same objects are really the same question.

Suppose our dictionary is {barrel, cat, dog, ball, fish, box, building}. The question "is it an animal?" accepts the objects {cat, dog, fish}. It's the same question, for our purposes, as "is it a cat or a dog or a fish" or "does it not begin with b".

If we allow any question (that is, any set of objects) to be asked, then the game becomes trivial: we can reduce it to the number guessing game. Just number all the objects, break them into ranges, and ask questions like: "Is it barrel or cat or dog". This is the optimal strategy, which implies that you can't always win at twenty questions if there are more than a million possible answers. But it makes for long and boring questions, so in the informal game we might say you're only allowed twenty words per question. In the formal game we'll say you're provided with a list of questions you're allowed to ask.

So our dictionary might be: {barrel, cat, dog, ball, fish, box, building}. And our questions might be:

  • {cat, dog, fish} ("is it an animal?")
  • {cat, dog, ball, box} ("can you play with it?")
  • {barrel, ball, box, building} ("does it begin with b?")
  • {barrel} ("is it a barrel?")
  • {cat} ("is it a cat?)
  • etc...

The "is it a X" questions aren't strictly required, you could say you 'win' as soon as you know exactly what the object is. They do guarantee it's possible to win though.

A strategy for 20 questions is a tree flowchart like the one for the guess-the-numbers game. Here's one for a poor strategy: just guessing.

The height of the tree is 6 which indicates it may take us six guesses to work out the object in the worst case. This next one is better (height 3). In fact it's optimal: since it is as good as binary search we know there is no better strategy. Usually the optimal strategy won't be as good as binary search.

The question is: how do we efficiently calculate the optimal strategy? Either on the fly - i.e. given the information we have so far, determine the next question to ask - or by pre-computing the entire flowchart.

More formally: given a set S and a collection of subsets T ⊆ ℘(S), construct a binary tree with leaves S and internal vertices T of minimal height, such that every left-descendant of t is in t, and no right-descendant is.

An obvious heuristic, reminiscent of binary search, is to always ask a question that splits the remaining possibilities as close to evenly as possible. While we might expect that this 'greedy algorithm' is optimal, it isn't due to interactions between questions. In the diagram below, question A divides the space of objects neatly in half, but whatever question we ask next we can be left with around a third of the objects afterward. Starting with questions B and C guarantees only a little more than a quarter of the possibilities remain after two questions.

A naive but approach is to recursively find the best left and right subtrees for the subsets induced by each first question, and choose the question that gives minimal max-height. Ultimately this runs takes exponential time since we find the tree for all 2n subsets of the dictionary. It has the feel of a hard problem, I wonder if exponential time is the best possible.

Sunday, 16 November 2008

and you shall know a webcomic by the questions he (frequently) answers

Amazing Super Powers is still funny when sober? Signs point to yes.

Tuesday, 11 November 2008

on envelopes

The two-envelopes problem, via xkcd. I'm not goint to re-hash it here, because it makes me dizzy, and the wikipedia page is fine. It's a 'paradox' of probability/game theory - like Monty Hall problem but without goats, cars, or a convincing solution.

There doesn't seem to be any dispute that if you can take a 50-50 bet that will either double or halve your money, then you should - your average profit is 25%. On the other hand, looking in your envelope doesn't give you any useful information, and switching without looking can't possibly earn you money in the long run. So what's going on?

Clearly the odds bet isn't 50-50! And the problem is that in order to fill the envelopes, we have to randomly pick an amount of money $X to put in the small envelope. But what does random mean? We can't have a uniform distribution over an infinite range - it doesn't make sense. You're trying to pick a number at random between 1 and infinity, whatever you pick will be in the bottom 1%.

Once we look in the envelope and see $Y, we know X = Y or X = Y/2. It's only a 50-50 bet if those possibilities are equally likely, and there aren't any probability distributions that have P(Y) = P(Y/2) for all Y.

So the game, as stated, can never be played. Any real randomizer for the game is going to give the player worse than 50-50 odds in some cases, and they won't make profit by switching in the long run.

Monday, 10 November 2008

that other election

So New Zealand just had our election. My take on the results: disappointing, not surprising, and not the end of the world.

The process was pretty uneventful compared to certain recent elections. Most of the difference can, I think, be attributed to a less polarised political culture; but part is the proportional representation system that prevents some votes being worth much more than others according to which part of the country you live in. The importance of a few 'swing' states (Ohio, Florida, et al) makes it tempting to try to tip the scales, and the rush of attention, money, and pressure to them provides the opportunity. If every vote is worth the same, election fever is spread out and doesn't boil over. (You like that metaphor? I mixed it myself).

mmp is broken

But we don't quite use proportional representation. New Zealand's system gives everyone a vote for a party, and a vote for a local MP. The party votes decide the proportional makeup of the parties in parliament. The electorate votes decide who your 'local MP' is. This is a carry-over from our old system where a parliament was made up of all the local MPs.

The upshot is that if the Banana party wins half the votes, they get 60 seats (out of 120) to fill with bananas. If Alice, Bob, and Carol from the Banana party all won local seats, then they have to be included. The other 57 seats are filled from a party list.

But there's a situation called 'overhang' that happens when a party wins too many electoral seats. Suppose the Banana party wins 65 electorates - more seats than their proportional fair share. What happens is 5 extra seats are created, so parliament has 125 instead of 120. All the electorate MPs are seated, and of course they don't get any list MPs. The proportionality has been distorted - the party with 50% of the vote got 52% of the seats.

This is pretty much impossible, there are only 70 electorate seats and to win that many without getting more than 50% of the vote isn't going to happen. But it can happen with smaller parties, in particular those with regional appeal. In our last election in 2005, we had an overhang for the first time. The Maori party won 4 electorate seats, but only had 3 seats worth of party vote.

and getting worse

This time around they got 5 electorate seats with basically the same party vote, creating an overhang of two seats. So far, so slightly odd. Now I'm going to do something very odd and assume both politicians and voters are rational.

Suppose you're a Maori party supporter. You know they tend to win more electorate seats than their party share. If you give them your party vote, you're not going to increase the number of seats they get, you're just going to decrease the size of their overhang by one. This increases their power a little (5/121 seats beats 5/122) but not much. Suppose they have a naturally allied party that doesn't have an overhang. You're better off giving that party your party vote, to better increase the size of your bloc and therefore the chances Maori make it into a coalition government.

Now suppose you're the Maori party. You realise this vote-splitting is better for you too, so you put out a TV ad directing supporters to vote that way. The Maori party vote drops to almost zero, they have the same number of seats, and more powerful allies.

If you're the allied party, you realise that running local candidates helps your profile a bit, but doesn't actually increase your share of parliamentary power. The Maori party winning electoral seats does help you, though, because it increases your voting bloc. So you stop running electoral candidates against them, and possibly encourage your popular local candidates to switch parties.

So given two allied parties, once one achieves an overhang, rational behavior leads to one becoming completely electorate-based, and one completely party-based. We have two single-seat parties in parliament at the moment, although both got enough party votes to 'cover' their electoral seat and avoid overhang. I think at least Jim Anderton may explicitly aim for an overhang next time, effectively running as an independent.

major parties?

Could a major party split in half to achieve the same thing? There are some practical problems:

  • You can't give list seats to people if they lose their electorate, if they're not in the same party. You'd probably just want to split off the party's 'safe' seats.
  • People will see it as cheating, and react negatively. A minor party has less choice, and more dedicated supporters, so can get away with gaming the system. A major party is likely to suffer blowback. However once one party succeeds with it, the others need to follow suit to be competitive.
  • You'd lose some support just due to the confusion - we're assuming rational and informed voters, but not everyone's going to split their vote in the desired way. For example, they might vote for the 'electorate' party.

If a major party did this the rules would almost certainly be changed quickly. The temptation is great, though - splitting your supporters' electorate votes to an overhang party increases its power by around 57%, and not taking advantage of this leaves votes on the table.

solutions?

Wikipedia discusses forms of MMP without overhang seats. The only one that sounds unreasonable is a system used in some German states, where extra party seats are allocated to restore proportionality. Supposing a party won 0.1% of the vote and 1 electorate seat (which would no longer be strategically clever, but can easily happen). To restore proportionality you need to inflate parliament to 1000 members!

My personal opinion is that sacrificing proportionality to provide strong voices to particular groups is misguided. If parliament can protect against injustice to minorities in general, then it can protect against injustice to certain parts of the country. (And if a parliament doesn't protect minorities from mob rule, it needs to be redesigned). If the need for advocacy is there and local governments aren't strong enough, perhaps the electorate vote could be changed to non-voting advocates to parliament.