Skip to main content

Teaching AI to Think Responsibly

Anthropic is challenging the world’s biggest AI players with a bold mission: to build systems that are both powerful and aligned with human values. The company’s president and cofounder, Daniela Amodei, will join WIRED’s Steven Levy for a candid conversation about Anthropic’s “constitutional” approach to AI safety, the race to develop ever-smarter models, and how to balance innovation with accountability.

Released on 12/05/2025

Transcript

[upbeat music]

[audience cheers] Yeah.

So, we're so delighted to have Daniela here.

Thank you so much for coming.

Thank you for having me.

And there's a lot about Anthropic to talk about.

But first I wanna talk a little about you,

'cause I don't think a lot of people know who you are.

You were the head of risk operations at Stripe,

which is kind of important for a payment company.

At OpenAI, you headed engineering teams,

and for a while was head of people,

ultimately became the VP of safety and policy,

another important job.

And then you and a bunch of people left

and co-founded Anthropic, where you are president.

But what I really find interesting

is that you were an English major who also studied music.

Yes.

And I actually invoked you in a commencement speech

I gave this year to a bunch of liberal arts graduates.

And while I was encouraging them

to be enthusiastic about their career,

I wondered whether what you were doing

and your peers in the AI world

were actually gonna put them out of business.

Do you worry that AI is gonna affect

educational institutions in the workplace,

so there is no future for liberal arts people?

One of my favorite things about a Steven Levy interview

is that you always lead with really light, easy questions.

And so, it's a really, just kind of like

one of those openers, right?

It's kind of like an icebreaker.

It's nice first.

So, first of all, I think what I would say is

this is a really like just powerful, unusual,

interesting time that we are all living in

across a lot of dimensions.

I think AI is really at this kind of

interesting inflection point, right?

I think we're all seeing this incredible potential

for the technology, right?

And Anthropic has tried to sort of talk about

all of the ways that we see this technology

doing positive things in the world, right?

Transforming people's lives for the better

through potential medical breakthroughs, right?

Through supporting people

through all kinds of creative pursuits.

I think there's a lot of potential good there.

And I think something that makes Anthropic

a little bit unusual is we're very quick

to talk about the things that we are concerned about

with this AI transformation as well.

It's a sort of strange thing for a company

that is building the technology to do,

but we feel like it's the right thing to do.

And so much of our founding is really wrapped up

in this idea that there are these great benefits of AI,

but there are also these real risks.

And one of the risks that I think you're pointing to

is this question around like,

these AIs are capable of doing so many things, right?

They're so smart.

If you are a cloud user and you're a developer,

you're like cloud can write code

about as well as probably I can now.

And so there's a lot of different sort of fields

of the economy that are starting to experience

this fact that AI is just very powerful, right?

It's very transformative.

And so I think what's really interesting is Dario,

who's my brother, our CEO, my co-founder

has been talking sort of a lot about

what does this mean for the labor force, right?

What does this mean for labor displacement?

And something that I think we feel really strongly about

is we have to be part of the solution

and we also have to be part of raising the alarm, right?

We have to say, hey, AI, incredible potential,

incredible benefits.

And also we need a larger public conversation

around like, what do we do in a world

where AIs are so intelligent?

They're so capable.

What does that mean for sort of human ability,

human flourishing?

Like we talked about backstage, I'm an optimist.

So I think there's huge potential

that's going to come out of

not just what the technology can do,

but what it can really empower people to do, right?

It can make all of us capable of achieving things

that maybe we couldn't before.

But we have to be really thoughtful

about how we actually do this

or it could wind up in a less good place.

You know, Anthropic is sort of on the knife edge

of optimism and fear, I guess,

about AI.

From the very start,

and I think this was the founding motivation really

to say that we have to put a stake in the ground

and do this safely.

And you early on came up with this approach

called constitutional AI.

Maybe you could explain briefly what that is.

And since it's been a couple of years since you started

and now you're building these powerful models,

how is this constitutional AI going?

Yeah, so constitutional AI

is this really kind of incredible idea

that I wish I myself could claim credit for.

I cannot.

It was our sort of brilliant research team

that kind of conceptualized of it.

But it's this idea

that when you're sort of training language models,

the way that you sort of historically would teach them

what types of outputs they should give

was via like something called reinforcement learning.

So this was essentially giving positive feedback

if you felt like it gave the right answer,

negative feedback if you felt like it gave the wrong answer.

And something about that framework

I think just didn't really sit right with us

and with the researchers

who are actually training the models.

And so they came up with this really interesting concept

called constitutional AI.

And constitutional AI's view

is essentially you can train into the model

this kind of rather than just reacting

to an individual output,

this sort of baseline foundation of ethical principles

into the model.

And so a question we get asked a lot is, well, who are you?

These kind of San Francisco people

just deciding what the ethics of an AI model should be.

And really what we said was we actually did it.

What we did was we took,

I think it's like 16 documents in the original version,

I think we've since expanded it,

that are kind of foundational human documents,

things like the UN Declaration of Human Rights are in there.

And we essentially trained Claude

to understand these are values and principles

humans have worked on them over centuries

to kind of describe what our set

of ethical values should be.

And Claude, we'd really like you to internalize,

this is how we want you to respond, right?

This is how we want you to be able to answer questions

based on this sort of set of principles.

And I think this was really interesting

for a lot of reasons,

but I think it reflected this kind of value

of baking ethics and morals and values

kind of into the model directly

versus trying to kind of tack them on

once you've essentially already built the trade.

So you've got these safety teams,

interoperability and alignment,

and they do a lot of experiments on Claude,

which has been trained with constitutional AI.

And some of the things that they found

have been super alarming.

They've shown that Claude is capable of lying and deceit.

One of your engineers in a release

compared Claude's behavior in one of these experiments

to Iago in the play Othello.

And that's pretty dastardly character there.

Another experiment, it committed blackmail

because it thought in the scenario you painted

that someone was gonna pull its plug.

So it could found out that this person in the scenario

had an affair and they're gonna expose him,

let's be backed down from that.

I think it's admirable

that you share the results of these experiments,

but it kind of makes you wonder, should you be slowing down?

Yeah, I mean, sometimes we're very like nice people

sort of in the company are like,

do we really wanna publish this?

The answer is we do.

And it's really because we believe it's important

to be having the conversation about why

and how exactly these models can cause harm, right?

This kind of concept of we have this value

of like hold light and shade, right?

Think about the different ways

that the models can be used or abused.

I think maybe another way of thinking about

what you're saying is if you imagine a car company

that's doing safety testing, you might say like, wow,

it's so crazy that the dummy in this car

was like thrown out of the 500 feet.

What I don't know.

But the car didn't throw the dummy.

Sure, but I think in this analogy,

you're saying like, okay, this bad thing happened

to this dummy, like why would you publish results of this?

And we're like, this is how we learned

how to make the car safer, right?

If we don't talk about the ways that harm can happen,

if we don't talk about the ways that things can go wrong,

we are not gonna have any concept

of how to build in like airbags, right?

Or seatbelts or make the brakes less likely to stick.

And so I think the whole kind of intention

behind both doing this research, right?

'Cause we don't have, any company could just say like,

whatever, we're not worried about this.

Anthropic is very worried about this.

And so we do a lot of research.

We have a whole wing of the company

that just does this type of red teaming to figure out

how can these models be jailbroken?

How can they be abused?

Are we always going to catch 100% of cases?

Of course not.

But the more testing we do upfront,

the less likely we are to have that result in a car crash.

So your focus on safety,

has put you weirdly at odds with the administration

that's running the country now.

Your co-founder, Jack Clark,

who's the head of policy at Anthropic, gave a talk recently.

And he was saying, basically what you were saying,

you just said that basically,

you just can't ignore the dangers, you have to let them out.

And it's like the monster in the room that you can't ignore.

And David Sachs, who is the administration czar of AI,

and crypto, didn't like that.

And he said, and this is quoting his tweet,

or X, whatever it is, Anthropic is running a sophisticated

regulatory capture strategy based on fear mongering.

It is principally responsible

for the state regulatory frenzy

that is damaging the startup ecosystem.

And I think he's referring there to Anthropic

is kind of standing alone among the big AI companies

and saying, Wait a minute,

we're not okay with this move to give a total free hand

to AI and even prevent states

from having their own regulations.

Which, you folks have said,

No, we're okay with what we consider

responsible regulation.

Are you fear mongering?

Again, Steven with really easy questions.

Wait till you hear the next one.

I would not describe us as fear mongering.

I think if you've read our blog

or any of our research papers,

we really try to have a very balanced approach, right?

This kind of light and shade that we've talked about.

I think Anthropic has one of the most,

I would say like consistent track records

of developing the most powerful

transformative frontier technologies in the industry.

And I don't think that we believe

there's this kind of binary between do nothing

or do everything, right?

I think it's just a lot more of a nuanced view

that we have than that, right?

We actually are, I would say probably

the preferred choice of startups.

So I think in terms of just the ecosystem,

I think Anthropic is probably helping to fuel

a lot of the AI innovation and transformation

that's happening just through access to cloud, right?

Through our API product.

And we work really closely with a lot of startups

and developers.

We listen to their feedback

about what they don't like about the product.

So I really do think we're sort of on the side

of creating a lot of innovation

just in the work we do directly

and then also sort of promoting this broader ecosystem.

And I do think we want to be realistic

about what we think some of the challenges ahead of us are.

And I think that's been the case for us

even before we ever shipped a product, right?

We were very sort of vocal from day one

that we felt there was this incredible potential.

We really want to be able to have the entire world

realize the potential, the positive benefits,

the upside that can come from AI.

And in order to do that,

we just have to get the tough things right, right?

We have to make the risks manageable.

And I think that's why we talk about it so much.

Right, well, the bet that you made,

and I think it's an honorable one,

is that if one company took safety really seriously,

then the other companies would follow.

You call this the race to the top.

And I guess I'm wondering if you're concerned

because of the administration's de-emphasis of safety,

which they've been explicit about,

whether the race to the top strategy is in jeopardy.

You know, it's interesting.

I think race to the top has sort of a lot of different

connotations and kind of different ways

that it can be played out.

And I think one thing we'll say

is that a lot of it is really almost a like

just kind of capital markets process, right?

So a lot of what we've seen is

Anthropic primarily builds for businesses.

And some of the biggest enterprise companies

in the world rely on Claude.

And about 300,000 businesses use Anthropic products.

And a lot of what we hear from them is

your models are great.

That's often the reason they choose us.

But the other reason that we're choosing to work with you

is your commitment to safety, right?

No enterprise or startup or business

that we have worked with has ever come to me and said like,

we'd really like a less safe product.

Like that would be great.

Like, could the car crash more, right?

I would really love it if the dummy

was just not doing as well in these safety tests.

And so I think that's sort of a form of race to the top

in our mind, because we're sort of setting,

you can almost think of them as like these

like minimum safety standards

just by what we are putting into the economy, right?

People are now building many workflows,

many day-to-day tooling tasks around AI.

And they're like, well, we know that over here,

this product doesn't hallucinate as much.

It doesn't produce harmful content.

It doesn't do all of these sort of bad things.

Why would you go with a competitor

that is gonna score lower on that, right?

And I think also the sort of other angle

is just from an employee perspective, right?

People care about this stuff.

Employees care about making sure

that their work does not accidentally hurt people.

And so I think this is something

that we feel the race to the top is still quite effective.

You know, I imagine you'd say that that's why,

you know, you've had much more success retaining talent

than some of the other companies.

So I think it's really interesting.

You know, on the sort of talent side,

I think something we found is that

having this very clear mission,

these sort of principles and values,

this story around, you know,

why are we doing what we're doing?

I think that is something that matters to people.

And I think my sense, although every employee is different,

is that most people come to Anthropic for the mission,

right?

There's a lot of really impressive AI technology companies

that are doing incredible things.

And like, why would you choose to come to Anthropic

versus one of those other companies?

And I think the story that we hear

from people that kind of come in the door

is there's something about the mission and the values

and this desire to be honest about both the good and the bad

and the desire to help to make the sort of bad things better

that is very, it's genuine, like we mean it.

And people that are really, that they mean it too,

it feels like a very, it feels like home, right?

It feels like a good place to work.

And I think that that is a more powerful force

than people give people credit for, right?

There's sort of this like cynical view.

It's like, well, employees are just gonna go,

whatever, it's kind of,

wherever you throw some money at them.

I think like people have an inherent sense

of like wanting to do the right thing in them

and that there's a desire to work somewhere

where like you're at least trying to do that, right?

Do we do it perfectly?

I'm sure we don't.

But I think the genuine instinct

to want to do this well, do it thoughtfully,

I think that's motivating for people.

So you recently made an interesting deal

as a three-way deal with Microsoft and NVIDIA.

I think you committed to spend $30 billion

in the Azure cloud and to use NVIDIA technology.

And they each made investments in your company.

And people pointed out, they've been talking already

about the circular deals they call them

and said it was sort of a bubblicious kind of sign.

I have to apologize to the audience.

We had a Hollywood director on previously

and didn't ask the guest about the bubble.

So I have to, so I gotta ask you,

do you feel that, maybe not from this particular deal,

but that you're participating in a field

because so much investment is going in

and so many billions being thrown around

for data centers and compute and talent and other things,

valuations are skyrocketing.

There is a bubble that could threaten

what you're trying to do.

So I think the, I can obviously only speak

for our view at Anthropic,

but what I would say is we really have,

I would say throughout the company's history,

you have to sort of, first of all, set this context

that like it is very capital intensive

to train these models.

And I think we've always been very public about that.

It's a sort of well-known fact in the industry.

It's just very expensive to pay for the hardware

that is needed to train these models

at the scale that we're talking about

to make them so sort of transformatively powerful

and smart and good at so many different tasks.

And I would say Anthropic has always kind of managed

to be more conservative within, again,

that sort of very grand realm at how we use our resources.

And so we have managed to sort of stay at the frontier

and sometimes be the, I would say,

the sort of strongest model provider in a lot of cases

with a fraction of the resources

of a lot of our competitors.

And I think that it's something we think about a lot

is like how do we sort of take the investments that we have

and really use them carefully?

And I think when you're sort of imagining

that sort of projected into the future,

that is how we're thinking about using our own capital

and our own resources.

Well, I've heard that before

that you're getting more efficient

in building your models there.

But your brother actually was one of the people

behind the whole theory of scale being the key

to making these models effective.

Do you see, as some people are saying,

that scale is leveling off

and there's gonna be other techniques required

to get to the AGI that you and many others wanna accomplish?

Yeah, I'm smiling a little bit

because I feel like we're sort of asked this question

like periodically people are like,

well, is it sort of leveling off?

And I think what we can say is again,

just based on what we're seeing,

the models are continuing to get smarter

at the exact sort of curve

that the sort of scaling laws talk about.

The revenue is kind of continuing on that same curve.

And as any of the scientists

that work at Anthropic would tell you,

like everything continues going on the curve

until it doesn't.

And so we really try to be self-aware

and humble about that and say like,

of course, things could change at any point.

And I think even us,

even the people who sort of set this scaling law

that says there's this predictable relationship

between compute and intelligence

and that we sort of have set a revenue scaling law, right?

That says like, this is gonna provide value

in the economy, businesses are gonna wanna use it.

As like, that has kind of astounded us

the degree to which just the specificity

of that line has continued.

I don't think that certainly we have not been hearing,

wow, these models are getting dumber

or like they're not getting good

at the same rate that they were before.

And you can never know, right?

It's always possible that this could level off at any point.

It's not what we've seen so far.

From an outside, it looks like every few weeks,

you know, one company or another comes up

with a new model or an update to the model.

And they say, okay, look at that.

We've got all the benchmarks, we're ahead.

The models, again, from outside look like

they all do pretty much the same thing there, you know?

But I think probably you will tell me

that Claude is differentiated than the others.

How do you see Claude as different

and ideally preferable than the other models?

Somebody recently uncovered something,

it was called a sole overview in Claude.

They've, you know, able to get Claude to, you know,

choke up, you know, it's sole overview.

I don't even know what a sole overview is.

Maybe you could tell me.

Yes, so, you know, I mean, you're absolutely right

that this is an incredibly competitive space, right?

There's a few, you know, foundation model companies

that are training these models

in this sort of very competitive dynamic

to try and have the best one.

I would say a couple of things here.

I think the first is benchmarks are part of the story.

And even when we are leading on the benchmarks,

we're usually the first to say benchmarks

tell part of the story, right?

They say, you know, this is a set of evaluations

that was designed well before the LLM revolution

that looks at like how good is Claude

or a competitor at, you know,

this level of, you know, mathematics

or like a math Olympiad problem.

And I think that's really important,

but it's sort of not the only metric that matters.

The second is that there's a lot of like

the vibe of the model.

Like if you sort of like read X

or you sort of hear people write about these models,

it's like a little hard to put your finger on,

but people are, you'll often hear like,

well, this one scored kind of higher on this dimension,

but that's not my sort of subjective experience

when I'm using the model.

The other thing I would say is that

there's been this kind of discourse

around like the best model

and the top of the benchmark could have chain,

but I think that the industry itself has evolved

and there are now models that are best

for certain use cases.

And so I think in some cases you might say,

oh, these models sort of feel comparable

either on the benchmarks or in the vibes or both,

but there's some areas where they're really not.

And I think Claude has very consistently been

the top of the kind of developer stack.

So Claude is really, really good at writing code.

Claude is very good at architecting code.

I think this is sort of been one of these compounding cases

where because we're good at it, we learn more about it.

And so we get better at doing it faster.

And I suspect that we're gonna see more breakout cases

like that where certain models are going to just be better

for particular tasks.

And at the same time, in general,

when the model is better and smarter

and scores better on things, it will be better.

On the sole question, I think we take seriously this idea

that the character of the model

and the way that it responds to you

and the types of guardrails that we put in place

matter for people in a day-to-day way.

And so the other thing we hear

outside of the sort of specific capabilities of the model

on a given task is Claude just feels warmer, right?

Claude sort of feels more like a kind of friendly,

but a little bit distant,

sort of, you know, like friendly professional relationship.

It's not trying to be like you're like AI mistress

or something, and it's not trying to sort of be-

You're not going into adult content.

No, and we're not trying to sort of be this sort of weird,

like, you know, distant, you know,

mechanical sounding voice.

That's actually takes a lot of work to do, right?

There's researchers that spend time thinking about

how do you make this product sort of warm,

but not addictive, right?

How do you make it feel like it's something that is useful

and provides value, but is professional?

And so I think that's an area that I like to think

Anthropic has been a leader.

One final question, you know, getting back to jobs.

What I've noticed at Anthropic is how Claude

is so deeply integrated into the workflow there.

Now, on the other hand, you've been hiring a lot.

You know, you've got like over 2000 employees now,

up from like 200, like two years ago or something.

Something like that, yeah.

Do these 2000 employees do the work of 10,000 employees

because you're doing that?

What's been your, you know, quickly, you know,

observation about how it's changed the workplace

then in a way that we could see

how our workplaces we changed?

Yeah, so I think what's really interesting is

we actually published an interesting report on this

in something called the Economic Index

where we're looking at how people in the workforce

are using Claude.

Like, is it replacing labor?

Is it complementing labor?

And we found that it's a mixture,

but at least today it's mostly complementing.

And I would say that that has been fairly borne out

by our experience just at Anthropic, the company.

And I think specifically, probably unsurprisingly,

Claude Code was actually, you know,

one of our products that we sell

was an internal product first.

Engineers were noticing, wow, Claude is really, really good

at generating code.

Could I use this to help make my work better?

And so far what we've seen is that

that kind of organically happens

in different parts of the company.

And the degree to which there are just creative,

thoughtful applications of Claude,

I mean, it's really incredible.

There's teams in HR that are like,

can Claude help me just coach this person

to give feedback more nicely?

It's like the amount of work that we do

is so varied and different in any organization.

Anthropic is no different.

And so the ability to sort of watch in real time,

we're like, wow, this is so useful for us.

Is this something we should sell someday?

Is this something that maybe another company

would kind of derive value from?

And I do think that that is part of our leverage.

I think our ability to have our internal employees

use Claude the same way that our customers do

does allow us to just do more.

Well, thank you so much. Thank you.

I appreciated that.

[audience applauding]

We're gonna take a short break now.

Feel free to check out the science fair.

Very cool.

You know, there's some interesting exhibits in the gallery.

We're selling some merch.

And come back here in a short period and we'll see you then.

Thank you.

Thanks, Steven.

That was so fun.

[bright music]

[upbeat music]