We were having a discussion about what jobs would get killed off by ‘AI’ first.
I find a lot of the articles about AI taking jobs rather odd and uninformed. And by that they mean ‘machine learning’ - calling it AI is a bit rich as it generally has yet to make the two critical leaps.
- It’s not yet very good at getting from ‘I can classify cats correctly’ to ‘I can provide you a meaningful model of how to classify cats that you can act upon’
- It can’t discuss its model of cats with other systems and debate and reason about it and improvements. When Alexa and Siri start arguing with each other about the best way for you to get to the airport on time - then worry.
There are IMHO four things that define whether a human job can usefully be done by machine learning
The first is simple - what happens when it breaks. If there is a defined safe simple behaviour for ‘wtf I don’t know’ then it’s much easier to automate. It’s why we have years old self driving production trains on things like the Docklands Light Railway but not serious self driving cars. The ‘help I’ve gone wrong’ response for a light railway vehicle is to brake at a precalculated rate to stop ASAP without hurting the people inside. The ‘help I’ve gone wrong’ response for a car is seriously complicated and one humans often get wrong. Car accidents are full of ‘if only I had xyz then’
The second one is that it has to be reasonable predictable and have lots of labelled training data. If it’s not predictable then you lose (human or otherwise). The more complex it gets the more data you need (and current systems need way more than humans and are fragile). That also plays into the first problem. If you have a complex system where ‘help’ is not an acceptable response then you need a hell of a lot of data. Not good for self driving cars that have to be able to deal with bizarre rare events like deer jumping over fences, people climbing out of manholes and tornadoes. None of which feature prominently in data sets. Does a Google self drive car understand a tornado - I’d love to know ?
The third is context. A system can have a lot of inputs that are not obvious and require additional information to process. A human finding that a line of cones blocks the path from their drive way to the road is likely to have the contextual data to conclude that drunk students have been at work for example. In a system with very little context life is a lot easier.
The most critical of all though is what is in systems called variety. The total number of different states you have to manage. A system that can properly manage something has (we believe) to have more states than the system it manages. It’s a thing called ‘Ashby’s law’ although ‘law’ might be the wrong word for it given in the general armwaving systems context there isn’t a mathematical proof for it.
It’s why an IT department can handle lost passwords but falls apart when someone phones up to complain the printer is telling them to subscribe to youtube videos. It’s why the US tax system is so complicated and it leads to a whole pile of other fun things (such as never being able to understand yourself entirely). It’s also the other half of why a drunk student can outwit a self driving car.
Variety is a huge challenge for machine systems. It’s why burger flipping robots are easier than serving robots. It’s why security may well be the last job that gets automated in a shop. Automatic shelf stocking - not too hard, there are challenges. automatic sales - usually easy. Dealing with 8 drunk people swaying around the shop taking and drinking cans… difficult. Security folks may not be well paid but they actually have to deal with an enormous amount of variety and context.
Whilst it’s not usually phrased in AI terms we do actually know a hell of a lot about variety, systems that cope with it and structure through systems modelling, management cybernetics and the like going back to work in the early 1970s by folks like Stafford Beer (who is as interesting as his name) on viable system models - all the feedback loops and arrangements you need to have to make something that is actually functional and adaptable.
Back however to ‘what will machine learning kill off first’ (and not in the sense of run over in automated cars) we need something that has
- a ‘meh’ failure case
- a large amount of training data, preferably well labelled, online and easily bulk fed to the learning end of the system
- as little need for complex contextual information as possible
- not too much complex variety and state
It would also be nice if people would rate the output for free to help improve the model.
There are two obvious candidates. The first - cat pictures - doesn’t have enough commercial value, so while it would be funny to create a site that posts infinite made up cat pictures every Saturday it’s probably an academic or for laughs project.
The second is photographic porn (not video - far too much context and variety in physics models). There is a vast amount of training data, lots of labels and rating information and relatively low context and variety. The failure case is ‘wtf, reload’ and a lot of the training is already being done - for filters.
That therefore was my guess for the debate - that the obvious early deployment of machine learning is actually a non internet connected, unfirewallable app that produces still pornography on demand - without having to employ any models (except mathematical ones)