Tips for MSCA hopefuls

This year’s call for MSCA Individual Fellowship applications closed recently, with a grand total of 9,830 applications received – apparently a record number for MSCA and in fact for Horizon 2020 in general. Good luck to everyone who submitted! Soon after I started my own fellowship at the Insight Centre, I was invited by one of the people who helped me prepare my proposal to participate in a seminar on MSCA and speak on my experiences as a successful candidate. The presentation I gave there was quite well received, so I thought I’d share my little tips and tricks in the blog as well, even though the timing isn’t arguably the greatest, given that it won’t be until sometime in the spring that the next call opens.

First of all, if you’re considering applying but having some doubts, I heartily recommend that you go through with it. Although technically MSCA fellowships are H2020 projects, which may sound a bit frightening, the proposal process is actually quite lightweight, with the length of the research plan limited to ten pages and the budget being a simple function of which country you’re going to, how many months you will spend there and whether you have a family. The same goes for how the projects are managed, so you don’t need to worry that you’ll end up spending an inordinate portion of your precious research time cranking out deliverables instead of generating results. So, without further ado, here are my top 5 tips for would-be MSCA fellows:

1. Find the right host

I’ve already mentioned in a previous post that it boosts your chances considerably if the strengths of your prospective host complement yours. It certainly doesn’t hurt if there’s someone at the host institution – ideally, your prospective supervisor – that you already know and have developed a rapport with, but you shouldn’t get too hung up on that particular point; what really matters from the reviewers’ point of view is whether the place where you are proposing to carry out your project is the best possible environment for that project. Consider what the host can offer you in terms of things such as training, research infrastructure and potential collaborators, and make sure that you have a persuasive argument that comes across in your proposal. Also, keep in mind that there is expected to be two-way knowledge transfer between the researcher and the host, so it’s not just about what you can get from the host – it’s also about what you can bring to the host.

2. Get all the help you can

The most important part of the proposal is the actual research to be done – objectives, methodology, etc. – and that’s all up to you (plus, to a certain extent, your supervisor of course, but they’re likely to have quite a few things on their plate besides this). However, for everything else, don’t hesitate to take advantage of any support that the host institution can offer you in preparing the proposal. The odds are that there are people there who have done this sort of thing before and know what reviewers look for in a proposal in terms of facts and figures, hosting arrangements, available research services and so forth. They may also have access to external experts and offer to send your proposal to them for feedback, and I think it goes without saying that you should accept such an offer. What I found particularly useful was ideas on how to communicate my research results to non-academic audiences, since my first instinct (and I’m pretty sure I’m not alone in this) is to just write papers for journals and conferences and let others worry about public relations, and this cost me some crucial points when I applied for the first time.

3. It’s all about you

This is another thing I touched upon in that earlier post: MSCA fellowships are unusual, if not unique, in that their impact is measured in terms of the career development of the fellow as a European researcher. Therefore you should consider starting not with the question “What do I want to study?” but with “What do I want to be?” The answer won’t give you your research topic, but it will affect the way you go about choosing one and developing a plan around it. Do you want to work in academia or industry? In what sort of role? Or maybe you’re interested in starting your own company? Whatever your target is, state it clearly in the proposal and make sure that everything else in the proposal – research activities, training, etc. – is aligned with that target. If you’re not quite sure what you want and would prefer to keep your options open, pick a career goal anyway and pretend that you do know that’s where you’re headed; there’s nothing wrong with changing your mind later, but it doesn’t look good if you don’t seem to have any sort of long-term vision of your career. Of course, if you’ve come up with a work plan that can support multiple career paths equally well, it shouldn’t hurt if you point this out in the proposal.

4. Give details generously

This is really a more general formulation of the previous point: it can be tempting to keep things a bit vague, but every bit of vagueness will make your proposal seem that much less convincing – and remember, the bar is high and competition fierce, so every little bit counts. This goes for your career objectives, but other things as well; for example, when describing how you plan to disseminate the results of your research, try to come up with tentative titles for the papers you’re going to write and to identify specific journals and conferences where you will aim to publish those papers. If you can name some likely co-authors, even better, and it’s also good to consider how you will measure the impact of your dissemination and communication activities (e.g., number of paper citations, number of people reached). Likewise, in your implementation plan, provide as much detail as you can (without breaking the page limit) on things such as work breakdown, timetables, deliverables and milestones; in the real world, you won’t be expected to follow that plan to the letter, but you do need to demonstrate in the proposal that there’s a clear path from where you are now to where you want to be at the end of the project.

5. Focus your efforts right

Having only ten pages to explain your research plan in full detail is a blessing but also a curse, because rationing out those ten pages between the things you want to say may prove quite a challenge. To have an idea of where you should be concentrating your best efforts, keep always in mind the three evaluation criteria and their weights relative to your overall score: excellence counts for 50%, impact for 30% and implementation for 20%, so it’s a good rule of thumb to allot 5, 3 and 2 pages for the corresponding proposal sections, respectively. However, if you’re working on a revision of a proposal that didn’t get funding the first time around, you also need to consider your previous evaluation scores, because the law of diminishing returns applies as your score for a given criterion approaches 5. So, if you did very well on excellence but not quite as well on the other two criteria, you’re likely to get a bigger increase in your total score for the same amount of effort if you focus on impact and implementation, even though excellence weighs as much as the other two combined. You’ll definitely want to improve any criterion score lower than 4, and the verbal feedback in the evaluation report should give you a pretty good idea of how you can do that.

So that’s it! I hope you found these tips useful and will come back to them when it’s time to start preparing an application for the next MSCA IF call.

First blood

Time to look at the first results from my project! Well, not quite – the first results are in a literature survey I did immediately after starting the project and made into a journal manuscript. I’m currently waiting for the first round of reviews to come in, but in the meantime I’ve been busy developing my ideas about collaborative knowledge discovery into something a bit more concrete. In particular, I’ve been thinking about one of the potential obstacles to effective collaboration from the data owner’s perspective: privacy.

In the aftermath of the much publicised Facebook–Cambridge Analytica scandal, one would at least hope that people are becoming more wary about sharing their personal data online. On the other hand, with the General Data Protection Regulation in full effect since 25 May, a huge number of people are now covered by a piece of legislation that grants them an extensive set of personal data control rights and has the power to hurt even really big players (like Facebook) if they don’t respect those rights. Of course, it’s still up to the people to actually exercise their rights, which may or may not happen, but after all the GDPR news, emails and “we use cookies” notices on websites, they should be at least vaguely aware that they have them.

The increased awareness of threats to privacy online and the assertion of individuals, rather than corporations, as the owners of their personal data are welcome developments, and I like to think that what I’m trying to accomplish is well aligned with these themes. After all, the collaborative knowledge discovery platform I’m building is intended to empower individual data owners: to help them extract knowledge from their own data for their own benefit. This does not make the privacy issue a trivial one, however – in fact, I wouldn’t be surprised if it turned out that people are more uneasy about sharing a small portion their data with an individual analyst focusing on their case specifically than about using an online service that grabs and mines all the data it can but does so in a completely impersonal manner. The platform will need to address this issue somehow lest it end up defeating its own purpose.

The angle from which I decided to approach the problem involves using a domain ontology and a semantic reasoner, which are technologies that I had been interested in for quite some time but hadn’t really done anything with. As I was doing the literature survey, I became increasingly convinced that an underlying ontology would be one of the key building blocks of the new platform, but it was also clear to me that I would need to start by modelling some individual aspect of collaboration as a proof of concept, so that I would fail fast if it came to that. If I started working top-down to produce a comprehensive representation of the entire domain, in the worst case I might take ages to discover nothing but that it wasn’t a very viable approach after all.

All this came together somewhat serendipitously when I found out that the 2nd International Workshop on Personal Analytics and Privacy (PAP 2018), held in conjunction with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2018) in Dublin, had an open call for papers. The submission deadline was coming up in a month – enough time to put together some tentative results, though nothing hugely impressive – and coincided rather nicely with the date when I was planning to fly to Finland for my summer holidays. In about two weeks I had the first version of the manuscript ready, with another two left over for revisions.

The ontology I designed is based on the idea of a data owner and a data analyst (or possibly any number of either) using the collaborative knowledge discovery platform to negotiate the terms of their collaboration. Each uses the platform to specify requirements, but from opposing perspectives: the data analyst specifies analysis tasks, which require certain data items as input, while the data owner specifies privacy constraints, which prevent certain data items from being released to the data analyst. The data owners, data analysts, data items, analysis tasks and privacy constraints are all registered as individuals in the ontology and linked with one another such that a reasoner is able to use this information to detect conflicts, that is, situations where a data item is required for a data analysis task but not released by the data owner.

To resolve such conflicts, the data owner and the data analyst may, for example, agree that the analyst receives a version of the dataset from which the most sensitive information has been removed. Removing information reduces the utility of the data, but does not necessarily make it completely useless; finding a balance where the data owner’s privacy preferences are satisfied while the data analyst still gets enough material to work with is the essence of the negotiation process. The ontology is meant to support this process by not just pointing out conflicts, but by suggesting possible resolutions based on recorded knowledge about the utility effects of different methods of transforming data to make it less sensitive.

For the PAP workshop paper, I only had time to design the logic of conflict detection in any detail, and there also was no time to test the ontology in a real-world scenario or even a plausible approximation of one. It therefore hardly seems unfair that although the paper was accepted for a short oral presentation at the workshop, it was not accepted for inclusion in the post-proceedings. Obviously it would have been nicer to get a proper publication out of it, but I decided to go ahead and give the presentation anyway – ECML-PKDD is the sort of conference I might have gone to even if I didn’t have anything to present, and since the venue is a 25-minute walk away from my house, the only cost was the registration fee, which I could easily afford from the rather generous allowance for sundry expenses that came with the MSCA fellowship.

Croke Park may seem like an unlikely place to have a conference, but it is in fact a conference centre as well as a stadium, and seems to work perfectly well as a venue for an academic event – meeting spaces, catering and all. Besides Croke Park, we had Mansion House for the welcome reception and Taylor’s Three Rock for the conference banquet, so can’t complain about the locations. The regular programme was quite heavy on algorithms, which isn’t really my number one area of interest, but I did manage to catch some interesting application-oriented papers and software demos. What I enjoyed the most, however, were the keynote talks by Corinna Cortes, Misha Bilenko and Aris Gionis; there were two others that I’m sure I also would have found very interesting but was unable to attend, because there was a rather important deadline coming up and so I had to zig-zag between Croke Park and DCU to make sure I got everything finished on time.

My own talk went reasonably well I felt, with an audience of about twenty and some useful discussion afterwards on how I might go about modelling and quantifying the concept of utility reduction. On the last day of the conference, which was today, I went to another workshop, the 3rd Workshop on Data Science for Social Good (SoGood 2018), with presentations on how machine learning and data mining techniques can be used to address societal issues such as homelessness and corruption. I especially enjoyed the last one, if enjoy is the right word – it dealt with efforts to combat human trafficking by means of data science, certainly a worthy cause if ever there was one, but also rife with difficulties from the scarcity of good input data to the nigh-impossibility of devising an ethically justifiable experiment when there are literally lives at stake. Plenty of food for thought there, and a fine way to finish off this week of conference activities; on Monday it’s back to business as usual.