Words and music

The proceedings of Tethics 2021 are now available for your viewing pleasure at ceur-ws.org. This means that both of the papers I presented during my two-conference streak in October are now (finally!) officially published! Although I’ve mentioned the papers in my blog posts a few times, I don’t think I’ve really talked about what’s in them in any detail. Since they were published at more or less the same time, I thought I’d be efficient/lazy and deal with both of them in a single post. 

At Tethics I presented a paper titled “Teaching AI Ethics to Engineering Students: Reflections on Syllabus Design and Teaching Methods”, written by myself and Anna Rohunen, who teaches the AI ethics course with me. As the title suggests, we reflect in the paper on what we took away from the course, addressing the two big questions of what to teach when teaching AI ethics and how to teach it. In the literature you can find plenty of ideas on both but no consensus, and in a sense we’re not really helping matters since our main contribution is that we’re throwing a few more ideas into the mix. 

Perhaps the most important idea that we put forward in the paper is that the syllabus of a standalone AI ethics course should be balanced on two axes: the philosophy-technology axis and the practice-theory axis. The former means that it’s necessary to strike a balance between topics that furnish the students with ethical analysis and argumentation skills (the philosophy) and those that help them understand how ethics and values are relevant to the capabilities and applications of AI (the technology). The latter means that there should also be a balance between topics that are immediately applicable in the real world (the practice) and those that are harder to apply but more likely to remain relevant even as the world changes (the theory). 

The paper goes on to define four categories of course topics based on the four quadrants of a coordinate system formed by combining the two axes. In the philosophy/theory quadrant we have a category called Timeless Foundations, comprising ethics topics that remain relatively stable over time, such as metaethics and the theories of normative ethics. In the philosophy/practice quadrant, the Practical Guidance category consists of applied ethics topics that AI researchers and practitioners can use, such as computer ethics, data ethics and AI ethics principles. In the technology/practice quadrant, the Here and Now category covers topics related to AI today, such as the history and nature of AI and the ethical issues that the AI community is currently dealing with. Finally, the technology/theory quadrant forms the category Beyond the Horizon, comprising more futuristic AI topics such as artificial general intelligence and superintelligence. 

A way to apply this categorisation in practice is to collect possible course topics in each category, visualise them by drawing a figure with the two orthogonal axes and placing the topics in it, and drawing a bubble to represent the intended scope of the course. A reasonable way to start is a rough circle centered somewhere in the Here and Now quadrant, resulting in a practically oriented syllabus that you can stretch towards the corners of the figure if time allows and you want to include, say, a more comprehensive overview of general ethics. The paper discusses how you can use the overall shape of the bubble and the visualisation of affinities between topics to assess things such as whether the proposed syllabus is appropriately balanced and what additional topics you might consider including. 

On teaching practices the paper offers some observations on what worked well for us and what didn’t. Solidly in the former category is using applications that are controversial and/or close to the students’ everyday lives as case studies; this we found to be a good way to engage the students’ interest and to introduce them to philosophical concepts by showing how they manifest themselves in real-world uses of AI. The discussion on Zoom chat during a lecture dedicated to controversial AI applications was particularly lively, but alas, our other attempts at inspiring debates among the students were not so successful. Online teaching in general we found to be a bit of a double-edged sword: a classroom environment probably would have been better for the student interaction aspect, but on the other hand, with online lectures it was no hassle at all to include presentations, demos and tutorials by guest experts in the course programme. 

The other paper, titled “Ontology-based Framework for Integration of Time Series Data: Application in Predictive Analytics on Data Center Monitoring Metrics”, was written by myself and Jaakko Suutala and presented at KEOD 2021. The work was done in the ArctiqDC research project and came about as a spin-off of sorts, a sidetrack of an effort to develop machine learning models for forecasting and optimisation of data centre resource usage. I wasn’t the one working on the models, but I took care of the data engineering side of things, which wasn’t entirely trivial because the required data was kept in two different time series databases and for a limited time only, so the ML person needed an API that they could use to retrieve data from both databases in batches and store it locally to accumulate a dataset large enough to enable training of sufficiently accurate models. 

Initially, I wrote separate APIs for each database, with some shortcut functions for queries that were the most likely to be needed a lot, but after that I started thinking that a more generic solution might be a reasonably interesting research question in itself. What inspired this thought was the observation that while there’s no universal query language like SQL for time series databases, semantically speaking there isn’t much of a difference in how the query APIs of different databases work, so I saw here an opportunity to dust off the old ontology editor and use it to capture the essential semantics. Basically I ended up creating a query language where each query is represented by an individual of an ontology class and the data to be retrieved is specified by setting the properties of this individual. 

To implement the language, I wrote yet another Python API using a rather clever package called Owlready2. What I particularly like about it is that it treats ontology classes as Python classes and allows you to add methods to them, and this is used in the API to implement the logic of translating a semantic, system-independent representation of a query into the appropriate system-specific representation. The user of the API doesn’t need to be aware of the details: they just specify what data they want, and the API then determines which query processor should handle the query. The query processor outputs an object that can be sent to the REST API of the remote database as the payload of an HTTP request, and when the database server returns a response, the query processor again takes over, extracting the query result from the HTTP response and packaging it as an individual of another ontology class. 

Another thing I love besides ontologies is software frameworks with abstract classes that you can write your own implementations of, and sure enough, there’s an element of that here as well, as the API is designed so that it’s possible to add support for another database system without touching any of the existing code, by implementing an interface provided by the API. It’s hardly a universal solution – it’s still pretty closely bound to a specific application domain – but that’s something I can hopefully work on in the future. The ArctiqDC project was wrapped up in November, but the framework feels like it could be something to build on, not just a one-off thing. 

In other news, the choir I’m in is rehearsing Rachmaninoff’s All-Night Vigil together with two other local choirs for a concert in April. It’s an interesting new experience for me, in more than one way – not only was I previously unfamiliar with the piece, I had also never sung in Church Slavonic before! It turns out that the hours and hours I spent learning Russian in my school years are finally paying off, albeit in a fairly small way: the text has quite a few familiar words in it, I can read it more or less fluently without relying on the transliteration, and the pronunciation comes to me pretty naturally even though my ability to form coherent Russian sentences is almost completely gone by now. It’s still a challenge, of course, but also a beautiful piece of music, and I’m already looking forward to performing it in concert – assuming, of course, that we do get to go ahead with the performance. Because of tightened COVID restrictions, we won’t be able to start our regular spring term until February at the earliest, so I’m not taking anything for granted at this point… 

I’m an ethicist, get me out of here

Summer seems to have an impeccable timing this year: on Friday I came back from my vacation and immediately the temperature dropped by about ten degrees and it started raining. Certainly helped me feel less bad about spending the day indoors! Until then, July had been so consistently hot and sunny that it was almost enough to make you forget what a more typical Finnish summer looks like. Today in Oulu it’s +15°C and raining again, but the weather should get nicer toward the weekend, which is fortunate since I have some tickets booked for outdoor concerts. 

“Officially”, I was still on vacation all week last week – not that it makes much of a difference, since for now I’m still working from home; the university is currently not explicitly recommending remote work, but the city of Oulu is, and anyway all of my closest colleagues are still on vacation, so there doesn’t seem to be much point in going to the campus since I wouldn’t find anyone there to socialise with. Besides, given the most recent news about the development of the COVID situation, it may be best to wait until after the university’s response team has convened to see if there’s any update to the instructions currently in effect. 

The reason why I worked on Friday – I could get used to a one-day work week, by the way – is a happy one: a paper of mine got accepted to the 13th International Conference on Knowledge Engineering and Ontology Development, and the camera-ready version of the manuscript was due on July 30. The version submitted for review was ten pages long and was accepted as a short paper, which technically meant that the final version should have been two pages shorter, but I used the loophole of paying extra page charges and ended up adding a page so I could meaningfully address some of the reviewers’ suggestions. 

Already at the very beginning of my vacation I had received the pleasant news that another paper had been accepted to the Conference on Technology Ethics, so that’s a double whammy for the month of July! In fact, not only was the manuscript accepted – it received all “strong accept” ratings from the reviewers, which is surely a career first for me. What’s particularly exciting is that while all of the details are still TBA, it looks like the conference is going to be organised as an actual physical event in the city of Turku, which means that I may get to go on my first conference trip since 2019! I would certainly appreciate the opportunity to visit Turku, since it’s a city I’m way too unfamiliar with, having been there only once for a couple of days for work. 

I’m giving my next lecture on AI ethics already on Thursday, with two more to follow later in August, as part of a 10 ECTS set of courses in learning analytics. There seems to be no escaping the topic for me anymore, but I don’t exactly mind; it’s actually kind of cool that I’ve managed to carve myself a cosy little niche as a local go-to guy for things related to computing and ethics. Really the only problem is that I don’t always get to spend as much time thinking about ethics as I’d like to, since there are always other things vying for my attention. Generally those other things represent where the bulk of my salary is coming from, so then I feel guilty about neglecting them – but at the same time I’m increasingly feeling that the ethics stuff may be more significant in the long run than my contributions to more “profitable” areas of research.

Last spring term, during the AI ethics course, I was unhappy about it eating up so much of my time, and indeed for a while I barely had time for anything else. It didn’t help matters that the course kept spilling into what should have been my free time, but if you look at the big picture, you could say with some justification that it’s not the ethics eating up time from everything else but the other way around. Now I just need to find someone who’s willing to pay me a full salary for philosophising all day long…

Good riddance to 2020

Christmas is very nearly here, and a very welcome thing it is, too. After a streak of mild and rainy days our snow is largely gone, and frankly it’s depressingly dark right now, so a bit of Christmas cheer is just the thing to wash away the dust and grime of this mess of a year. The December solstice was yesterday, so technically the days are growing longer already, but of course it’s going to take a good while before that becomes actually noticeable. 

Things seem to be looking up on the COVID front as well, with new cases on the decline in Oulu and the start of vaccinations just around the corner. I’ve been voluntarily living under lockdown-like conditions for a few weeks now: no band rehearsals, no coworker lunches (except on Teams), no pints in pubs, only going out for exercise and shopping and keeping the latter to a minimum. I hope this is enough for me to spend Christmas with my parents relatively safely; it’s going to be a very small gathering, but at least I won’t have to eat my homemade Christmas pudding all by myself, which might just be the death of me. 

This blog post will be the last work thing I do before I sign off for the year. I was going to do that yesterday, but decided to take care of a couple more teaching-related tasks today in order to have a slightly cleaner slate to start with when I return to work. There will still be plenty of carry-over from 2020 to keep me busy in January 2021; most urgently, there’s a funding application to finish and submit once we get the consortium negotiations wrapped up, as well as an article manuscript to revise and submit. I got the rejection notification a couple of weeks ago, but haven’t had the energy to do much about it apart from talking to my co-author about what our next target should be. 

Improving the manuscript is a bit of a problem, because the biggest thing to improve would be the evaluation, but the KDD-CHASER project is well and truly over now and I’ve moved on to other things, so running another live experiment is not a feasible option. We will therefore just have to make do with the results we have and try to bolster the paper in other areas, maybe also change its angle and/or scope somewhat. I should at least be able to beef up the discussion of the data management and knowledge representation aspect of the system, although I haven’t made much tangible progress on the underlying ontology since leaving Dublin. 

I have been working on a new domain ontology though, in the project that’s paying most of my salary at the moment. Ontologies are fun! There’s something deeply satisfying about designing the most elegant set of axioms you can come up with to describe the particular bit of the universe you’re looking at, and about the way new incontrovertible facts emerge when you feed those axioms into a reasoner. I enjoy the challenge of expressing as much logic as I can in OWL instead of, say, Python, and there’s still plenty of stuff for me to learn; I haven’t even touched SPARQL yet, for instance. Granted, I haven’t found a use case for it either, but I have indicated that I would be willing to design a new study course on ontologies and the semantic web, so I may soon have an excuse… 

Another thing to be happy about is my new employment contract, which is a good deal longer than the ones I’m used to, although still for a fixed term. On the flip side, I guess this makes me less free to execute sudden career moves, but I’d say that’s more of a theoretical problem than a practical one, given that I’m not a big fan of drastic changes in my life and anyway these things tend to be negotiable. In any case, it’s a nice change to be able to make plans that extend beyond the end of next year! 

Well, that’s all for 2020 then. Stay safe and have a happy holiday period – hope we’ll start to see a glimmer of normality again in 2021. 

First blood

Time to look at the first results from my project! Well, not quite – the first results are in a literature survey I did immediately after starting the project and made into a journal manuscript. I’m currently waiting for the first round of reviews to come in, but in the meantime I’ve been busy developing my ideas about collaborative knowledge discovery into something a bit more concrete. In particular, I’ve been thinking about one of the potential obstacles to effective collaboration from the data owner’s perspective: privacy.

In the aftermath of the much publicised Facebook–Cambridge Analytica scandal, one would at least hope that people are becoming more wary about sharing their personal data online. On the other hand, with the General Data Protection Regulation in full effect since 25 May, a huge number of people are now covered by a piece of legislation that grants them an extensive set of personal data control rights and has the power to hurt even really big players (like Facebook) if they don’t respect those rights. Of course, it’s still up to the people to actually exercise their rights, which may or may not happen, but after all the GDPR news, emails and “we use cookies” notices on websites, they should be at least vaguely aware that they have them.

The increased awareness of threats to privacy online and the assertion of individuals, rather than corporations, as the owners of their personal data are welcome developments, and I like to think that what I’m trying to accomplish is well aligned with these themes. After all, the collaborative knowledge discovery platform I’m building is intended to empower individual data owners: to help them extract knowledge from their own data for their own benefit. This does not make the privacy issue a trivial one, however – in fact, I wouldn’t be surprised if it turned out that people are more uneasy about sharing a small portion their data with an individual analyst focusing on their case specifically than about using an online service that grabs and mines all the data it can but does so in a completely impersonal manner. The platform will need to address this issue somehow lest it end up defeating its own purpose.

The angle from which I decided to approach the problem involves using a domain ontology and a semantic reasoner, which are technologies that I had been interested in for quite some time but hadn’t really done anything with. As I was doing the literature survey, I became increasingly convinced that an underlying ontology would be one of the key building blocks of the new platform, but it was also clear to me that I would need to start by modelling some individual aspect of collaboration as a proof of concept, so that I would fail fast if it came to that. If I started working top-down to produce a comprehensive representation of the entire domain, in the worst case I might take ages to discover nothing but that it wasn’t a very viable approach after all.

All this came together somewhat serendipitously when I found out that the 2nd International Workshop on Personal Analytics and Privacy (PAP 2018), held in conjunction with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2018) in Dublin, had an open call for papers. The submission deadline was coming up in a month – enough time to put together some tentative results, though nothing hugely impressive – and coincided rather nicely with the date when I was planning to fly to Finland for my summer holidays. In about two weeks I had the first version of the manuscript ready, with another two left over for revisions.

The ontology I designed is based on the idea of a data owner and a data analyst (or possibly any number of either) using the collaborative knowledge discovery platform to negotiate the terms of their collaboration. Each uses the platform to specify requirements, but from opposing perspectives: the data analyst specifies analysis tasks, which require certain data items as input, while the data owner specifies privacy constraints, which prevent certain data items from being released to the data analyst. The data owners, data analysts, data items, analysis tasks and privacy constraints are all registered as individuals in the ontology and linked with one another such that a reasoner is able to use this information to detect conflicts, that is, situations where a data item is required for a data analysis task but not released by the data owner.

To resolve such conflicts, the data owner and the data analyst may, for example, agree that the analyst receives a version of the dataset from which the most sensitive information has been removed. Removing information reduces the utility of the data, but does not necessarily make it completely useless; finding a balance where the data owner’s privacy preferences are satisfied while the data analyst still gets enough material to work with is the essence of the negotiation process. The ontology is meant to support this process by not just pointing out conflicts, but by suggesting possible resolutions based on recorded knowledge about the utility effects of different methods of transforming data to make it less sensitive.

For the PAP workshop paper, I only had time to design the logic of conflict detection in any detail, and there also was no time to test the ontology in a real-world scenario or even a plausible approximation of one. It therefore hardly seems unfair that although the paper was accepted for a short oral presentation at the workshop, it was not accepted for inclusion in the post-proceedings. Obviously it would have been nicer to get a proper publication out of it, but I decided to go ahead and give the presentation anyway – ECML-PKDD is the sort of conference I might have gone to even if I didn’t have anything to present, and since the venue is a 25-minute walk away from my house, the only cost was the registration fee, which I could easily afford from the rather generous allowance for sundry expenses that came with the MSCA fellowship.

Croke Park may seem like an unlikely place to have a conference, but it is in fact a conference centre as well as a stadium, and seems to work perfectly well as a venue for an academic event – meeting spaces, catering and all. Besides Croke Park, we had Mansion House for the welcome reception and Taylor’s Three Rock for the conference banquet, so can’t complain about the locations. The regular programme was quite heavy on algorithms, which isn’t really my number one area of interest, but I did manage to catch some interesting application-oriented papers and software demos. What I enjoyed the most, however, were the keynote talks by Corinna Cortes, Misha Bilenko and Aris Gionis; there were two others that I’m sure I also would have found very interesting but was unable to attend, because there was a rather important deadline coming up and so I had to zig-zag between Croke Park and DCU to make sure I got everything finished on time.

My own talk went reasonably well I felt, with an audience of about twenty and some useful discussion afterwards on how I might go about modelling and quantifying the concept of utility reduction. On the last day of the conference, which was today, I went to another workshop, the 3rd Workshop on Data Science for Social Good (SoGood 2018), with presentations on how machine learning and data mining techniques can be used to address societal issues such as homelessness and corruption. I especially enjoyed the last one, if enjoy is the right word – it dealt with efforts to combat human trafficking by means of data science, certainly a worthy cause if ever there was one, but also rife with difficulties from the scarcity of good input data to the nigh-impossibility of devising an ethically justifiable experiment when there are literally lives at stake. Plenty of food for thought there, and a fine way to finish off this week of conference activities; on Monday it’s back to business as usual.