EA - Risks from GPT-4 Byproduct of Recursively Optimizing AIs by ben hayum

The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Podcast artwork

Categories:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Risks from GPT-4 Byproduct of Recursively Optimizing AIs, published by ben hayum on April 6, 2023 on The Effective Altruism Forum.Epistemic Status: At midnight three days ago, I saw some of the GPT-4 Byproduct Recursively Optimizing AIs below on twitter which freaked me out a little and lit a fire underneath me to write up this post, my first on LessWrong. Here, my main goal is to start a dialogue on this topic which from my (perhaps secluded) vantage point nobody seems to be talking about. I don’t expect to currently have the optimal diagnosis of the issue and prescription of end solutions.Acknowledgements: Thanks to my fellow Wisconsin AI Safety Initiative (WAISI) group organizers Austin Witte and Akhil Polamarasetty for giving feedback on this post. Organizing the WAISI community has been incredibly fruitful in being able to spar ideas with others and see which strongest ones survive. Only more to come.(from @anthrupad on twitter)IntroductionRecently, many people across the internet have used their access to GPT-4’s API to scheme up extra dangerous capabilities. These are capabilities which the AGI labs certainly could have done on their own and likely are doing. However, these AGI labs at the very least seem to be committed to safety. Some people may say they are following through on this well and others may say that they are not. Regardless, they have that stated intention, and have systems and policies in place to try to uphold it. Random people on the internet taking advantage of open source do not have this.As a result, people are using GPT-4 as the strategizing intelligence behind separate optimizing programs that can recursively self-improve in order to better pursue their goals. Note that it is not GPT-4 that is self-improving, as GPT-4’s weights are stagnant and not open sourced. Rather, it is the programs that use GPT-4’s large context window (as well as outside permanent memory in some cases) to iterate on a goal and get better and better at pursuing it every time.Here are two examples of what has resulted to give a taste:This version of the program failed, but another that worked could in theory very quickly generate and run potentially very influential code with little oversight or restriction on how each iteration improves.This tweet made me think to possibly brand these recursively optimizing AI as “Russian Shoggoth Dolls”The program pursues the instrumental goal of increasing its power and capability by writing the generic HTTP plugin in order to better get at its terminal goal of better coding pluginsEvidence of this kind of behavior is really, really bad. See Instrumentally Convergent Goals and the danger they presentEveryone in the AI Safety community should take a moment to look at these examples, particularly the latter, and contemplate the consequences. Even if GPT-4 is kept in the box, simply by letting people access it through an API, input tokens, and receive the output tokens, we might soon have what in effect seem like separate very early weak forms of agentic AGI running around the internet, going wild. This is scary.Digging DeeperThe internet has a vast distribution of individuals out there from a whole bunch of different backgrounds. Many of them, quite frankly, may want to simply just build cool AI and not give safety guards a second thought. Others may not particularly want to create AI that leads to bad consequences but haven’t engaged enough with arguments on risks that they are simply negligent.If we completely leave the creation of advanced LLM byproduct AI up to the internet with no regulations and no security checks, some people will beyond a doubt act irresponsibly in the AI that they create. This is a given. There are simply too many people out there. Everyone should be on the same page about this.Let’s look ...

Visit the podcast's native language site