EA - AI alignment researchers may have a comparative advantage in reducing s-risks by Lukas Gloor

The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Podcast artwork

Categories:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI alignment researchers may have a comparative advantage in reducing s-risks, published by Lukas Gloor on February 15, 2023 on The Effective Altruism Forum.I believe AI alignment researchers might be uniquely well-positioned to make a difference to s-risks. In particular, I think this of alignment researchers with a keen interest in “macrostrategy.” By that, I mean ones who habitually engage in big-picture thinking related to the most pressing problems (like AI alignment and strategy), form mental models of how the future might unfold, and think through their work’s paths to impact. (There’s also a researcher profile where a person specializes in a specific problem area so much that they no longer have much interest in interdisciplinary work and issues of strategy – those researchers aren’t the target audience of this post.)Of course, having the motivation to work on a specific topic is a significant component of having a comparative advantage (or lack thereof). Whether AI alignment researchers find themselves motivated to invest a portion of their time/attention into s-risk reduction will depend on several factors, including:Their opportunity costsWhether they think the work is sufficiently tractableWhether s-risks matter enough (compared to other practical priorities) given their normative viewsWhether they agree that they may have a community-wide comparative advantageFurther below, I will say a few more things about these bullet points. In short, I believe that, for people with the right set of skills, reducing AI-related s-risks will become sufficiently tractable (if it isn’t already) once we know more about what transformative AI will look like. (The rest depends on individual choices about prioritization.)SummarySuffering risks (or “s-risks”) are risks of events that bring about suffering in cosmically significant amounts. (“Significant” relative to our current expectation over future suffering.)(This post will focus on “directly AI-related s-risks,” as opposed to things like “future humans don't exhibit sufficient concern for other sentient minds.”)Early efforts to research s-risks were motivated in a peculiar way – morally “suffering-focused” EAs started working on s-risks not because they seemed particularly likely or tractable, but because of the theoretical potential for s-risks to vastly overshadow more immediate sources of suffering.Consequently, it seems a priori plausible that the people who’ve prioritzed s-risks thus far don’t have much of a comparative advantage for researching object-level interventions against s-risks (apart from their high motivation inspired by their normative views).Indeed, this seems to be the case: I argue below that the most promising (object-level) ways to reduce s-risks often involve reasoning about the architectures or training processes of transformative AI systems, which involves skills that (at least historically) the s-risk community has not been specializing in all that much.[1]Taking a step back, one challenge for s-risk reduction is that s-risks would happen so far ahead in the future that we have only the most brittle of reasons to assume that we can foreseeably affect things for the better.Nonetheless, I believe we can tractably reduce s-risks by focusing on levers that stay identifiable across a broad range of possible futures. In particular, we can focus on the propensity of agents to preserve themselves and pursue their goals in a wide range of environments. By focusing our efforts on shaping the next generation(s) of influential agents (e.g., our AI successors), we can address some of the most significant risk factors for s-risks.[2] In particular:Install design principles like hyperexistential separation into the goal/decision architectures of transformative AI systems.Shape AI training env...

Visit the podcast's native language site