Reading time: 5 mins

Computer scientists often define specification as the ability to ensure that the behaviour of a system aligns with the operator’s true intentions. To those outside the field, it may seem silly that scientists worry about this problem. Why do we even bother to develop costly technologies if we cannot guarantee that they will behave in the way we want? However, the reality of technological development is not a binary choice between creating something that we have full control over or avoid creation altogether. This gradient between controlling and uselessness strongly relates to complexity. More complicated systems are harder to control. However, it may still be beneficial to create technology for such scenarios, even if it means having a lower degree of control.

The specification problem does not apply only to computers but arises in every complex system. Human behaviour, especially, makes systems extremely complicated. Thus economic and social networks are strongly affected by the specification problem. The famous tragedy of the commons example entails a system designed to give everyone access to something that ends up giving none access to anything.

Nature, on the contrary of scientists, does not care about the specification problem. It designs things that have no intrinsic specification, up to the point that we do not even call it design but mutations. Then, surviving organisms perpetrate, unfit ones go extinct, and the time and the circularity of ecosystems absorb the costs of inadequate specifications.

On the other hand, single humans worry about the specification problem because our lives are not circular. We each have a few shots. If what we build does not do what we intended to do, it fades away. If it works, it may last for decades. Most of us find the idea of leaving such legacy motivational. Thus, it seems that for humans, wrong specifications are as costly as the extent to which they damage individual and collective benefits.

In the last few decades, we all have been witnessing the rise of Artificial Intelligence. Investments often exceeding amounts devoted to any other technology are routine. Yet, the field is polarised, not between groups but between ideals. On the one hand, there is the idea that AI could be the solution to all problems. As of today, DeepMind’s homepage goes “What if solving one problem could unlock solutions to thousands more?”. On the other hand, there is the idea of AI Apocalypse. The specification problem is at the heart of most AI Apocalypse claims. It is simple, if we create something powerful that has goals contradicting ours, it will wipe us off.

While I agree that the specification problem is key to AI, I also agree with the many who are not compelled by doomsday-like arguments. When we inspect all smart systems, common sense tells us that we do not need to worry about killer robots. However, many of those that warn about the threats of AI are brilliant people such as Elon Musk, Bill Gates or Stuart Russell. I speculate that they are worried more about humans than AI. The example of humans already “wiping us off” by deploying at large scale systems with inadequate specifications is striking in front of us. We are locked in our fossil fuel economy that is kickstarting dangerous feedback effects that are threatening our species and ecosystem.

With this perspective, we can see the investments in AI that are setting the foundations for our future data economy with different eyes. Combined with the specification problem, they may pose a more significant threat than what we can foresee today. What if we get locked in a globally deployed resource management digital infrastructure, whose behaviour differs from what we expected? We can easily see today with Climate Change, how a global infrastructure cannot be changed overnight, which causes mounting costs and problems. The famous saying goes “Today’s Solutions Are Tomorrow’s Problems”.

However, innovation is not a binary choice between complete control and no creations. Humans deploying large scale systems with wrong specifications are scary. Still, humans can also do lots of good things, including spotting problems beforehand and learning from mistakes. Therefore, I believe that if we do not solve the specification problem, humans will lose trust towards AI systems taking decisions and adoption will plummet. Hence, the alternative (and more likely) scenario to doomsday is yet another “AI winter”. While this is a much better situation for the human species, it is still a significant failure for AI researchers that care about leaving a lasting heritage.

I hope you can see why I believe that the specification problem is a critical challenge to AI and a crucial research topic. However, we can solve it and that we can do so by putting more weight on learning preferences and system interpretability. Even more, solving specification is not just a way to avoid adverse outcomes, but may also be vital to solving AI as a whole. If we develop systems that safely behave in alignment with the intentions of their users, who in turns can understand it intuitively, AI adoption will accelerate. With faster adoption, we will see more data, use cases, failures to fix, investments, engineers and researchers. While this alone will not produce other breakthroughs needed to solve AI, such as common sense, it will increase the likelihood of them occurring.

Let me now tell you how I assume we will go about solving the specification problem. Humans and groups are ever-changing organisms, so there is no way we can expect to solve specification at the moment we create a system. While we still need an accurate specification to start with, we cannot just find the perfect mesh of goals, objectives and plans and hit start. Eisenhower puts this well: ”In preparing for battle, I have always found that plans are useless, but planning is indispensable”.

If humans and their goals are ever-changing, how can we hope to solve the specification problem? My answer is communication. If we enable humans and smart systems to understand each other and communicate well, there is no need to over-worry about the specification. Both parties can continuously update their beliefs about each other and solve specification by constantly adapt.

While computers understanding human intentions and goals may sound far-fetched, the fields of Preference Learning and Interactive Machine Learning are delivering extraordinary progress on this. On the other hand, we also need to enable humans to understand AI well. Explaining complex systems is the domain of AI Explainability (XAI) and Interpretability, another blossoming research area.

Stuart Russell says that we need to shift from “intelligent machines whose actions achieve their objectives” to “beneficial machines whose actions achieve our objectives”.

In practice, for me, this translates into moving from this current state of learning systems

Diagram of the present state of learning systems

to this future state of learning systems

Diagram of the future of learning systems

The communication enabled by preference learning and interpretability is essential to facilitate the adoption of every smart system. However, it is of foremost important for systems that take decisions, since they are the ones producing the highest cost when falling to specification problems. Therefore, the development of preference learning and interpretability is of primary importance in fields like Reinforcement Learning, Recommender Systems, Optimisation Systems and other systems that take decisions.

Thankfully, leading AI research centres have understood this and are working hard on both sides of the human-AI communication problem. OpenAI famously demonstrated in 2017 how to do Deep Reinforcement Learning from Human Preferences [1]. DeepMind published a paper on Interpretable Reinforcement Learning [2]. It borrows interpretability from the Attention Mechanism [3] developed for Neural Machine Translation.

While scientists are hard at work at creating novel solutions to both fronts of the specification problems, it will then pass onto practitioners to apply these in real life. I believe that the organisations that will offer actual communication channels between users and smarter decision-making systems will be the ones that gain faster traction.

References & Credits

[1] - Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems (pp. 4299-4307).

[2] - Mott, A., Zoran, D., Chrzanowski, M., Wierstra, D., & Rezende, D. J. (2019). Towards Interpretable Reinforcement Learning Using Attention Augmented Agents. arXiv preprint arXiv:1906.02500.

[3] - Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).

Credits for icons