Good riddance to 2020

Christmas is very nearly here, and a very welcome thing it is, too. After a streak of mild and rainy days our snow is largely gone, and frankly it’s depressingly dark right now, so a bit of Christmas cheer is just the thing to wash away the dust and grime of this mess of a year. The December solstice was yesterday, so technically the days are growing longer already, but of course it’s going to take a good while before that becomes actually noticeable. 

Things seem to be looking up on the COVID front as well, with new cases on the decline in Oulu and the start of vaccinations just around the corner. I’ve been voluntarily living under lockdown-like conditions for a few weeks now: no band rehearsals, no coworker lunches (except on Teams), no pints in pubs, only going out for exercise and shopping and keeping the latter to a minimum. I hope this is enough for me to spend Christmas with my parents relatively safely; it’s going to be a very small gathering, but at least I won’t have to eat my homemade Christmas pudding all by myself, which might just be the death of me. 

This blog post will be the last work thing I do before I sign off for the year. I was going to do that yesterday, but decided to take care of a couple more teaching-related tasks today in order to have a slightly cleaner slate to start with when I return to work. There will still be plenty of carry-over from 2020 to keep me busy in January 2021; most urgently, there’s a funding application to finish and submit once we get the consortium negotiations wrapped up, as well as an article manuscript to revise and submit. I got the rejection notification a couple of weeks ago, but haven’t had the energy to do much about it apart from talking to my co-author about what our next target should be. 

Improving the manuscript is a bit of a problem, because the biggest thing to improve would be the evaluation, but the KDD-CHASER project is well and truly over now and I’ve moved on to other things, so running another live experiment is not a feasible option. We will therefore just have to make do with the results we have and try to bolster the paper in other areas, maybe also change its angle and/or scope somewhat. I should at least be able to beef up the discussion of the data management and knowledge representation aspect of the system, although I haven’t made much tangible progress on the underlying ontology since leaving Dublin. 

I have been working on a new domain ontology though, in the project that’s paying most of my salary at the moment. Ontologies are fun! There’s something deeply satisfying about designing the most elegant set of axioms you can come up with to describe the particular bit of the universe you’re looking at, and about the way new incontrovertible facts emerge when you feed those axioms into a reasoner. I enjoy the challenge of expressing as much logic as I can in OWL instead of, say, Python, and there’s still plenty of stuff for me to learn; I haven’t even touched SPARQL yet, for instance. Granted, I haven’t found a use case for it either, but I have indicated that I would be willing to design a new study course on ontologies and the semantic web, so I may soon have an excuse… 

Another thing to be happy about is my new employment contract, which is a good deal longer than the ones I’m used to, although still for a fixed term. On the flip side, I guess this makes me less free to execute sudden career moves, but I’d say that’s more of a theoretical problem than a practical one, given that I’m not a big fan of drastic changes in my life and anyway these things tend to be negotiable. In any case, it’s a nice change to be able to make plans that extend beyond the end of next year! 

Well, that’s all for 2020 then. Stay safe and have a happy holiday period – hope we’ll start to see a glimmer of normality again in 2021. 

Summing up the AI summit

The end of the year is approaching fast, with Christmas now barely two weeks away, but I managed to fit in one more virtual event to top off this year of virtual events: the Tortoise Global AI Summit. To be quite honest, I wasn’t actually planning to attend – didn’t even know it was happening – but a colleague messaged me the previous day, suggesting that it might be relevant to my interests and also that the top brass would appreciate some kind of executive summary for the benefit of the Faculty. Despite the short notice I had most of the day free from other engagements, and since the agenda did indeed look interesting, I decided to register and check it out – hope this blog post is close enough to what the Dean had in mind! 

I liked the format of the event, a series of panel discussions rather than a series of presentations. Even the opening keynote with Oxford’s Sir Nigel Shadbolt was organised as a one-on-one chat between Sir Nigel and Tortoise’s James Harding, which felt more natural in an online environment than the traditional “one person speaks, everyone else listens, Q&A afterward” style. Something that worked particularly well was the parallel discussion on the chat, to which anyone attending the event could contribute and from which the moderators would from time to time pick questions or comments to be discussed with the main speakers. Overall, I was left with the feeling that this is the way forward with virtual events: design the format around the strengths of online instead of trying to replicate the format of an offline event using tools that are not (yet) all that great for such a purpose. 

The keynote set the tone for the rest of the event, bringing up a number of themes that would be discussed further in the upcoming sessions: the hype around AI versus the reality, transparency of AI algorithms and AI-based decision making, AI education – fostering AI talent in potential future professionals and data/algorithm literacy in the general populace – and the need for data architectures designed to respect the ethical rights of data subjects. Unhealthy power concentrations and how to avoid them was a topic that resonated with the audience, and it shouldn’t be too hard to think of a few examples of such concentrations. The carbon footprint of running AI software was brought up on the chat. Perhaps my favourite bit of the session was Sir Nigel’s point that there is a need for institutional and regulatory innovations, which he illustrated by way of analogy by mentioning the limited company as a historical example of an institutional innovation. Such innovations are perhaps more easily overlooked than scientific and technological ones, but one can hardly deny that they, too, have changed the world and will continue to do so.

The world according to Tortoise

The second session was about the new edition of the Tortoise Global AI Index, which ranks 62 countries of the world on their strength in AI capacity, defined as comprising the three pillars of implementation, innovation and investment. These are further divided into the seven sub-pillars of talent, infrastructure, operating environment, research, development, government strategy and commercial, and the overall score of each country is based on a total of 143 individual indicators. The scores are normalised such that the top country gets an overall score of 100, and it’s no big surprise that said country is the United States, as it was last year when the index was launched. China and the United Kingdom similarly retain their places as no. 2 and no. 3, respectively. China has closed some of the gap with the US but is still quite far behind with a score of 62, while the UK, sitting at around 40, has lost some of its edge over the challengers. Canada, Israel, Germany, the Netherlands, South Korea, France and Singapore complete the top 10. 

Finland is just out of the top 10 but rising, up three places from 14th to 11th. According to the index, Finland’s particular forte is government strategy, comprising indicators such as the existence of a national AI strategy signed by a senior member of government and the amount of dedicated spending aimed at building AI capacity. In this particular category Finland is ranked 5th in the world. Research (9th) and operating environment (11th) can also be counted among Finland’s strengths, and all of its other subrankings (talent – 16th, commercial – 19th, infrastructure – 21st, development – 22nd) are solidly above the median as well. Interestingly, the US, while being ranked 1st in four categories and in the top 10 for all but one, is only 44th on operating environment. The most heavily weighted indicator here is the level of data protection legislation, giving countries covered by the GDPR a bit of an edge; 7 of the top 10 in this category are indeed EU countries, but there is also, for instance, China in 6th place, so commitment to privacy is clearly not the whole story. 

There was some good discussion on the methodology of the AI index, such as the selection of indicators. For example, one could question the rather heavy bias toward LinkedIn as a source of indicators for AI talent. Another interesting point raised was that while we tend to consider academics mainly in terms of their affiliation, it might also be instructive to look at their nationality. Indeed, the hows and whys of the compilation of the index would easily make for a dedicated blog post, or even a series of posts, but I’ll leave it for others to produce a proper critique. For those who are interested, a methodology report is available online. 

From the Global AI Index the conversation transitioned smoothly into the next session on the geopolitics of AI, where one of the themes discussed was if countries should be viewed as competing against one another in AI, or if AI should rather be seen as an area of international collaboration for the benefit of citizens everywhere. Is there an AI race, like there once was a space race? Is mastery of AI a strategic consideration? Benedict Evans advocated the position that to talk about AI strategy is to adopt a wrong level of abstraction, and that AI (or rather machine learning) is just a particular way of creating software that in about ten years’ time will be like relational databases are today: so ubiquitous and mundane that we hardly pay any attention to it. This was in stark contrast to the view put forward in the beginning of the session that AI is a general-purpose technology akin to electricity, with comparable potential to revolutionise society. The session was largely dominated by this dialectic, but there was still time for other themes as well, such as the nature of AI clusters in a world where geographically limited technology clusters are becoming an outdated concept, and the role of so-called digital plumbing in providing the essential foundation for the success of today’s corporate AI power players.

Winners and losers

The next session, titled “AI’s ugly underbelly”, started by taking a look at an oft-forgotten part of the AI workforce, the people who label data so that it can be used to train machine learning models. It’s been estimated that data labelling accounts for 25% of the total project time in an ML project, but the labellers are, from the perspective of the company running the project, an anonymous mass employed through crowdsourcing platforms such as MTurk. In academic research the labellers are often found closer to home – the job is likely to be done by your students and/or yourself, and when crowdsourcing is used, people may well be willing to volunteer for the sake of contributing to science, such as in the case of the Zooniverse projects. In business it’s a different story, and there is some money to be made by labelling data for companies, but not a lot; it’s an unskilled job that obeys the logic of the gig economy, where the individual worker must buy their own equipment and has very little in the way of job security or career prospects. 

The subtitle of this session was “winners and losers of the workforce”, the winners of course being the highly skilled professionals who are in increasingly high demand and therefore increasingly highly paid. There was a good deal of discussion on the gender imbalance among such people, reflecting a similar imbalance in the distribution of the sort of hard (STEM) skills usually associated with tech jobs. In labelling the gap is apparently much narrower, in some countries even nonexistent. It was argued that relevant soft skills and potential AI talent are distributed considerably more evenly, and that companies trying to find people for AI-related roles may want to look beyond the traditional recruiting pathways for such roles. A minor point that I found thought-provoking was that recruiting is one of the application domains of AI, so the AI of today is involved in selecting the people who will build the AI of tomorrow – and we know, of course, that AI can be biased. One of the speakers brought up the analogy that training an AI is like training a dog in that the training may appear to be a success, but you cannot be sure of what it is that you’ve actually trained it to respond to. 

There was more talk about AI bias in the “AI you can trust” session, starting with what we mean by the term in the first place. We can all surely agree that AI should be fair, but can we agree on what kind of fairness we want – does it involve positive discrimination, for example? Bias in datasets is a relatively straightforward concept, but beyond that things get less tidy and more ambiguous. There is also the question of how we can trust that an AI is not biased, provided that we can agree on the definition; a suggested solution is to have algorithms audited by a third party, which could provide a way to strike a balance between the right of individuals to know what kind of decision-making processes they are being subjected to and the right of organisations to keep their algorithms confidential. An idea that I found particularly interesting, put forth by Carissa Véliz of the Institute for Ethics in AI, was that algorithms should be made to undergo a randomised controlled trial before they are allowed to make decisions that have a serious, potentially even ruinous, effect on people’s lives. 

Data protection was, of course, another big topic in this session. That personal data should be handled responsibly is again something we can all agree on, but there was a good deal of debate on what is the proper way to regulate companies to ensure that they are willing and able to shoulder that responsibility. Should they be told how to behave in a top-down manner, or is it better to adopt a bottom-up strategy and empower individuals to look after their own interests when it comes to privacy? Is self-regulation an option? The data subject rights guaranteed by the GDPR represent the bottom-up approach and are, in my opinion, a major step in the right direction, but it’s also a matter of having effective means to enforce those rights, and here, I feel, there is still a lot of work to be done. The GDPR, of course, only covers the countries of the EU and the EEA, and it was suggested that perhaps we need an international organisation for the harmonisation of data protection, a “UN of data” – a tall order for sure, but one worth considering.

Grand finale

The final session, titled “AI: the breakthroughs that will shape your life”, included several callbacks to themes discussed in previous sessions, such as the growth of the carbon footprint of AI as the computational cost of new breakthroughs continues to increase – doubling almost every 3 months according to an OpenAI statistic. The summit took place just days after the announcement of a great advance achieved by DeepMind’s AlphaFold AI in solving the protein folding problem in computational biochemistry, mentioned already in the beginning of the first session and discussed further here. While it was pointed out that the DeepMind solution is not necessarily the end-all it has been hailed as, it certainly serves to demonstrate that the technology is good for tackling serious scientific problems and not just for mastering board games. The subject of crowdsourcing came up again in this context, as the approach has been applied to the folding problem with some success in the form of Folding@home, where the home computers of volunteers are used to run distributed computations, as well as Foldit, a puzzle video game that essentially harnesses the volunteers’ brains to do the computations. 

There was some debate on the place of humans in a society increasingly permeated by AI systems, particularly on where we want to draw the line on AI autonomy and whether new jobs created by AI will be enough to compensate for old ones replaced by AI. Somewhat ironically, data labeller is a job created by AI that may already be on its way to being made obsolete by advances in AI techniques that do not require large quantities of labelled data for training. One of the speakers, Connecterra founder Yasir Khokhar, talked about the role of AI in solving the problem of feeding the world, reminding me of Risto Miikkulainen’s keynote talk at CEC 2019, in which he presented agriculture as one of the application domains of creative AI through evolutionary computation. OpenAI’s GPT-3 was then brought up as another example of a recent breakthrough, leading to a discussion on how we tend to anthropomorphise our Siris and Alexas and to ascribe human thought processes to entities that merely exhibit some semblance of them. There was a callback to AI ethics here when someone asked whether we have the right to know when we are interacting with an AI – if we’re concerned about AI transparency, then arguably being aware that there is an AI is the most basic level of it. Of things that are still in the future, the impact of quantum computing on AI was discussed, as were the age-old themes of artificial general intelligence and rogue AI as existential risk, but in the time available it wasn’t feasible to come to any real conclusions. 

Inevitably, it got harder to stay alert and focused as the afternoon wore on, and I also missed the beginning of one session because I had to attend another (albeit very brief) meeting, but even so, I managed to gather a good amount of interesting ideas and information over the course of the day. I’m particularly happy that I got a lot of material on the social implications of AI that we should be able to use when developing our upcoming AI ethics course, since so far I haven’t been too clear about specific topics related to this aspect of AI that we could discuss in the lectures. This wasn’t a week too soon, I might add – we’re due to start teaching that course in March, so it’s time to get cracking on the preparations!