Wednesday, December 5, 2012

Simplicity in the world

To my mind one of the most puzzling aspects of science is the success of the reasoning principle known as Occam's Razor. This is the notion that whenever two competing theories explain the known facts equally well, then the simpler theory is generally correct.

As a rule of thumb Occam's Razor helps us wade through the infinite number of potential theories that might be put forward to explain any given phenomenon. To demonstrate this I will use a trivial example that is not particularly deeply scientific.

When trying to come up with a principle to described the observation that the sun comes over the horizon every 24 hours we could generate an infinite set of theories as follows:

1) The earth rotates at a constant speed such that the sun appears on the horizon at regular intervals.

Then we may add an infinite set of exceptions.

2) The earth rotates at a constant speed such that the sun appears on the horizon at regular intervals. Except on Thursday the 6th December 2018, when the earth will stop rotating for 24 hours and then resume.

3) The earth rotates at a constant speed such that the sun appears on the horizon at regular intervals. Except on Thursday the 6th December 2018, when the earth will stop rotating for 24 hours and then rotate backwards.

4) The earth rotates at a constant speed such that the sun appears on the horizon at regular intervals. Except after when the New York Yankees have won the world series 100 times in a row, then it will slow down to half its speed.

Etc, etc.

Finding a set of theories that all equally explain the given evidence is easy. In this case we could decide between these theories easily through empirical means, because they make slightly different predictions, we just wait for the predicted outcomes to diverge. However, as there are in fact an infinite number of these alternate theories in practice it is not possible. Instead, we rely on the rule of thumb known as Occam's Razor to remove all alternative theories.

It turns out that in the realm of data mining Occam's Razor turns out to be incredibly practical. If you can fit multiple models to a data set with approximately equal error, then the simplest model will more often than not produce the best predictions. This principle has been critical in the design of many modern machine learning algorithms.

Interestingly the predictive power of simplicity extends beyond this. As google research director Peter Norvig discusses in the presentation below: we are finding that the critical factor in solving many modern computer science problems is data volume. As our data sets grow in size we see that the best predictive models come not from painstakingly building custom models for certain data, but from just using an array of simple models and letting the data speak.



Read more about this issue in the paper The Unreasonable Effectiveness of Data.

Saturday, November 10, 2012

Python Please

I have friends who swear by Python.

They rarely program with any other language, and I understand in theory the appeal. You have a language that forces you to write nicely formatted code just to make it work. You do away with redundant structure imposing syntax like braces and semicolons. No longer do you have to waste time formatting someone else's crappy code before you can work with it.

What I can't understand is how they manage to live with its horrendous approach to string processing. Python is a late bound weakly typed interpreted language with an approach to string processing that belongs in C or Assembly language. At that, all the purists are going to cry:

"Just because you don't understand encodings!!!"

Sigh. Yes. Every time I have to work with Python I go and reread Joel's fanatstic article: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), just to make sure I haven't missed something. Every time, I confirm that I am not a complete idiot, and I come back to wrestle with python and try and work out where in the process of passing a string around I went wrong.

The problem is partly that I use Python predominantly to scrape webpages. This means that I am always loading up badly formatted text with incomplete or missing meta-data. So to be fair maybe programmers who do not engage in this process never see the problems I see. But it is not only me, look at this thread on Stackoverflow to see how ridiculous the situation is.

I want to propose something to Python enthusiasts. Just say you are right, and the problems are entirely mine (real python programmers love having to monitor string encodings continuously). Ok sure, then:

Why not have a mode for the language that will just force all strings to be a single encoding, say UTF-8 ?

The chorus will yell back, we do. You just do: X-Y-Z

Well people I have tried all of those X-Y-Zs and they do not work. Perhaps again it something to do with my approach. I use a bunch of libraries to process the data, maybe urllib library, or beautiful soup which I use to parse things. I don't know, I am not an expert, and I shouldn't need to be just to parse strings reliably.

I don't understand why it just doesn't work. I have never wasted so much time dealing with string coding problems with any other language than I have with Python.

It should not be so hard. It really shouldn't.


Monday, September 10, 2012

Publishing Apps in the Lucky Country

I am in the process of developing a number of Titanium mobile apps for iOS that will have in-app payment processing. For which it turns out that the programming part is relatively easy, it is the administration that makes it difficult.

I was merrily coding away, and discovered only when I went to test the in-app payments that Apple would not allow me to do that until all my banking and tax info was sorted out. I can't finish the product until I have all the the admin done. Ok, fair enough.

I start jumping through the hoops, and I discover that as an app developer with a business based in Australia I am forced to register for GST before I can sell content. Again, this seems a little stupid, but I will have to do it eventually, unless the product is a complete flop. So I start to register my business for GST.

Apple redirects me to the ABR website. However, the ABR page indicates that it can no longer be done there. https://abr.gov.au/ABRWeb/Homepage.aspx?NavGraph=Home&View=Home&pid=71

That page directs me to http://help.abr.gov.au/BC/Index/Apply_for_PAYG/How_do_I_apply_for_GST_or_PAYG_withholding_/

Which tells me I need to register for the new Australian Business Security Creditial Software AUSkey.

I went through the ridiculous processing of installing their AUSkey software into firefox on my Ubuntu machine (after their instructions for both Firefox and Safari on a Mac failed to work).

To get it running I need to install the Orcale JRE, which I discovered (post-installation) has a serious outstanding security problem. So much for Australia's cutting edge business security.

Eventually I get the Auskey installation to work and log in.

I then try and log into the Business Portal so I can register for GST, lo and behold, error pages.

For the next half and hour whenever I try and login I get presented with :
--------------------------------------------
The system is temporarily unavailable. (Error Number: A918.18)
The system has encountered an unexpected error.
If the problem persists, please contact the ATO Technical Helpdesk on:
Phone 1300 139 373
E-mail technical.help@ato.gov.au
--------------------------------------------

Eventually I am allowed in.

What do I find ?

Nowhere in the Business Portal is there a menu item or option for "Register for GST" or "PAYG." Absolutely nothing. So I have to send an email asking for help. Now I wait until they decide to reply.

Ahhh Australia, the lucky country!

On days like to today I want to register a business in a tax haven country and forget about where I came from. The Australian government is like DRM for media, every time you come up against it nothing works properly and you wonder why you bother paying for things.

Thursday, August 23, 2012

App Distribution Key File Backup

I have been developing apps using Appcelerator's Titanium Studio for the past year. Unlike some of the other multiple OS distribution methods Appcelerator allows you to generate native controls and widgets across multiple platforms. They have a strong user community and the stability of the apps produced is excellent (contrary to the negative reviews found online).

In my standard backup process I create regular copies of the code base for my apps. However, it has come to my attention that for the sake of security you need to back up more than just your code.

When you distribute apps to the app stores, they need to be signed used using the private keys associated with your development certificates. If these keys are lost then you will be unable to publish updates to your apps.

So if you develop for Android and iOS as I do, here are a couple of links that give you the critical information.

iOS
Read this post on backing up your private key and follow the instructions at the bottom.

Android
For android releases to Google Play, you need to make a back up of the keystore you created with keytool. This is the android guide to application signing. You will have used the keytool command to create a key for your application before distributing, as described in this how-to. You simply make a back-up of the file listed when you run the command
keytool -list -v -keystore /XXXXXXX

Saturday, August 11, 2012

How much do you 'Like' me ?

A guy I don't know recently sent me a Facebook friend request. Usually I just ignore these requests. I am liberal with the definition of friend, but I have to have met someone once before I accept a request on Facebook. This time it was a guy who knew several of my friends and was working on an interesting project. So against my usual rules I accepted and sent him a quick message telling him that I liked the project and I had interest in collaborating if the opportunity arose.

A day later, instead of replying to the message, he sent me an invitation to 'Like' his page. A day after that he sent me an invite to be friends with his second FB account. This gets under my skin. It is practically an open admission that he has no interest in social networking other than using people to 'Like' stuff.

These kinds of experiences are not uncommon. On twitter I constantly encounter writers whose only tweets are plugs for their own books. In spite of all the raving going on about the economic miracle of 'Social', all I see is people using it for SPAM.

The best advice I have ever heard about 'Social' is be yourself and talk about something other than what you are selling. Maybe people will find you interesting enough to look into what you do.

Excuse me now, I have someone I need to unfriend.

Saturday, July 7, 2012

What happened to America

America was once an amazing country. It was populated by optimistic, hard working people, who dreamed of better ways of doing business and living life. This was the reason people flocked to get into the country, and other countries changed policies to try and keep up with the pace of American innovation.

Somehow last century America lost its edge. When I say that, I don't mean that America has lost its position as a world leader in any particular sphere of human endeavour, what I mean is that America now trundles along on autopilot. The present successes of America are due to the momentum generated in the middle of the twentieth century, rather than being the product of good governance in the present day.

America has an enormous domestic economy, and a global economic position largely based on its reserves of intellectual property. Yet the resources that it uses to maintain that position are partly externally derived. The US attracts many of the best and brightest students from abroad to come and study and work. More so than any country in the world, America derives a proportion of its intellectual capital from these foreign reserves.

In recent times the world has witnessed the extreme volatility of American political life. The country routinely has elections that hinge on anachronistic social concerns like abortion or gay rights. They have recently come out of a period where they elected one of the most illiterate and openly moronic leaders in living memory, and they turned it around to elect the first president of African ethnicity, who is the most eloquent politician in living memory. At the same time America has been responsible for an enormous global financial crisis, largely due to their insanely ridiculous banking policies regarding the regulation of home loans and the creation of derivative investments.

In spite of all of this, large proportions of their population cling ideologically to the notion that market regulations are destructive communist devices. Somehow, an international crisis is not enough empirical evidence for some people to adjust their theories of good governance.

At present many Americans are rallying around a political movement with slogans like "take our country back". From whom you might ask, and the answer is likely to be a confusing mish mash of cartoon politics and implied racism.

As we watch this from the outside it is hard to understand how it happened. How did America go from being the center of the world's envy and inspiration, to being a country ruled by ideology and predjudice? How was a century of economic advantage squandered to create a nation of religious zealots addicted to credit and fast food ? Why is the current political debate in the US not
"What are the best fiscal stimulus methods?"
 but instead
"Should gay people be allowed to marry?"

More importantly, can they ever escape this quagmire ?

According to some sources, the stars of the developing economies: Brazil, China and India have just reached the point where their collective domestic economies have overtaken the developed world. This means a lot for America, it means that its position as the economic super power is nearly over. The relative economic stability enjoyed by Americans will change as the currency is subject to the same fluctuations as other nations. It means their debts might be called on, and funding will continue to become harder and harder to find.

Perhaps more importantly, it means that fewer foreign students are going to view an American education as a ticket to success. Opportunities will grow inside their country of origin and the instability of America will seem less appealing. When this happens America will find that the stream of new intellectual property it relies on to be globally competitive, is suddenly less bountiful.

This will be America's great long term challenge for the coming century. America needs to improve its internal education system so that it can sustain its intellectual property based position in the global marketplace without a stream of foreign students working in labs around the country.

They just have to solve their debt problem first.

Saturday, June 16, 2012

The social cost of con artists

Several years ago in Barcelona airport I got conned. A man with a perfect English accent walked up beside me while I was walking between terminals. He told me his hard luck story about having arrived at the wrong airport and not having the money to get to the right airport (Girona). The bus was leaving in half an hour and he needed to get on it. This situation occurred to me a year earlier so I was immediately sympathetic with his plight. I gave him the money and continued on my way.

As I walked and turned it over in my mind several aspects of his story seemed to unravel in my mind. I turned around and headed back. Sure enough I saw the guy wandering through the terminal casually, nowhere near the bus stop. I walked along the outside of the terminal watching him through the glass, then I stepped inside, determined to confront him. He went out of my sight for a moment as I rounded one of the information booths. When I arrived on the other side he was gone. I searched and waited but he didn't reappear, some sixth sense had told him I was after him and he had fled.

As I reflect on this incident now I am aware of the fact that I am  considerably more suspicious of anyone who asks me for help. Perhaps you think I am a fool, and that this was a lesson I needed to learn. My thought is that I would like to live in a world where we have compassion for each other, and gladly give help when it is asked for.

The social cost of con artists is a great deal more than mere money. If money had been stolen from me I would have been angry, and I might have become more vigilant in my security, but I would not have become hardened with suspicion against people who ask for my help.

The legacy left by con artists around the world is a society where people care less, and view people in need with suspicion and hostility. For this reason I consider them the most despicable humans that have ever lived.

Friday, May 25, 2012

Embedding UTF-8 Subtitles

I recently spent hours trying to work out how to get hard coded subtitles into a video file so I could upload it onto youtube.

The video in question was made with Cinelerra on Ubuntu. This program is very powerful, but an absolute pain to use. Annoyingly I was unable to get it to recognise UTF-8 chars in the subtitle text I wanted to embed, so I had to look elsewhere.

Instead, I created an SRT subtitle file with the Linux program Subtitle Editor. I tried several programs, Avidemux and others, to embed this into the video. All of them failed to recognise the extended character set.

In the end I had to use Handbrake to embed the subtitles. However it was not all easy sailing. Handbrake would embed my SRT file happily, but only as a soft subtitle (meaning it was an option dependant on the user's preferences). Unfortunately, when I uploaded the file to youtube, I could not include the subtitle file.

So, back to the drawing board.

Strangely, although Handbrake allows hard encoding through the checkbox 'Burned in', this option was disabled for SRT files. It is only allowed for SSA files. I went back to Subtitle Editor and saved the file in SSA format. Then I discovered that handbrake would not let me open an SSA file to embed it.

After an afternoon of frustration I discovered I could use handbrake to create an MKV file of my movie, and then use mkvmerge to embed the SSA file in the MKV movie. I could then open the new MKV file in Handbrake, at which point I was able to select the SSA subtitles and choose the 'Burned in' option to make them a permanent part of the video.

If this does not make you want to jump out of a window, then you have a higher tolerance for absurdity than I do.

Wednesday, May 16, 2012

Search on Facebook, if you enjoy frustration

If you had a large successful internet company whose only asset was the attention spans of its users, you would think that you might put some effort into making sure that people could find what they were looking for. Not so with our benevolent masters at facebook.

At facebook search is an after-thought, just a text box to help people find other people, but not much else.

I have lost count of the number of times that I have seen an interesting post and thought "I will come back to that." When I do try and come back to it, it is lost thousands of posts back in my feed, and if I don't remember who posted it then I have no chance of finding it. A fully functioning search would be really handy here: but no, the search function appears to be completely uncorrelated from my feed. "Hang on!" you should be screaming, isn't that what facebook is supposed to offer, and internet that is improved by our social network?

Nope, not so with search.

Nor with events. If you search for an event with certain key words, there was a time when facebook only returned events that had passed. This appears to have changed, but nonetheless, if you don't accept an invitation to an event immediately, good luck finding it later. Yesterday I had two facebook tabs open, in one of them I had an event I was interested in going to, in the other tab I tried the facebook event search using the EXACT name of the event: and nothing appeared in the search results.

What makes this all the more puzzling is that the company that is the absolute king of search has been developing a product to rival facebook. At the same time facebook has done nothing to improve its challenge to Google: Social Search.

The only thing that seems to make sense is that perhaps facebook is relying on Bing to fill that gap for them, to make their feed of frivolity useful through search. If they don't, then I would not want to be one of the people putting out 100 billion dollars for facebook this week.

Friday, February 24, 2012

Named after a pirate

My name is John Hawkins

If you think that sounds like a pirate's name, you are right.

Sir John Hawkins originally worked for the queen stealing gold from the Spanish and Portuguese. One day he took off with a boat load for himself. So the legend goes.

So I and all the other John Hawkins' out there who aspire to fame through the humble google page rank, seem to be forever be humbled by this historical figure.

I propose that we all band together and create a unified John Hawkins identity that can combat our representation on google as a pirate. Except for the John Hawkins that is an obnoxious right wing blogger, he should forced to change his name to Jack.