Teaching AI to Think Responsibly
Released on 12/05/2025
[upbeat music]
[audience cheers] Yeah.
So, we're so delighted to have Daniela here.
Thank you so much for coming.
Thank you for having me.
And there's a lot about Anthropic to talk about.
But first I wanna talk a little about you,
'cause I don't think a lot of people know who you are.
You were the head of risk operations at Stripe,
which is kind of important for a payment company.
At OpenAI, you headed engineering teams,
and for a while was head of people,
ultimately became the VP of safety and policy,
another important job.
And then you and a bunch of people left
and co-founded Anthropic, where you are president.
But what I really find interesting
is that you were an English major who also studied music.
Yes.
And I actually invoked you in a commencement speech
I gave this year to a bunch of liberal arts graduates.
And while I was encouraging them
to be enthusiastic about their career,
I wondered whether what you were doing
and your peers in the AI world
were actually gonna put them out of business.
Do you worry that AI is gonna affect
educational institutions in the workplace,
so there is no future for liberal arts people?
One of my favorite things about a Steven Levy interview
is that you always lead with really light, easy questions.
And so, it's a really, just kind of like
one of those openers, right?
It's kind of like an icebreaker.
It's nice first.
So, first of all, I think what I would say is
this is a really like just powerful, unusual,
interesting time that we are all living in
across a lot of dimensions.
I think AI is really at this kind of
interesting inflection point, right?
I think we're all seeing this incredible potential
for the technology, right?
And Anthropic has tried to sort of talk about
all of the ways that we see this technology
doing positive things in the world, right?
Transforming people's lives for the better
through potential medical breakthroughs, right?
Through supporting people
through all kinds of creative pursuits.
I think there's a lot of potential good there.
And I think something that makes Anthropic
a little bit unusual is we're very quick
to talk about the things that we are concerned about
with this AI transformation as well.
It's a sort of strange thing for a company
that is building the technology to do,
but we feel like it's the right thing to do.
And so much of our founding is really wrapped up
in this idea that there are these great benefits of AI,
but there are also these real risks.
And one of the risks that I think you're pointing to
is this question around like,
these AIs are capable of doing so many things, right?
They're so smart.
If you are a cloud user and you're a developer,
you're like cloud can write code
about as well as probably I can now.
And so there's a lot of different sort of fields
of the economy that are starting to experience
this fact that AI is just very powerful, right?
It's very transformative.
And so I think what's really interesting is Dario,
who's my brother, our CEO, my co-founder
has been talking sort of a lot about
what does this mean for the labor force, right?
What does this mean for labor displacement?
And something that I think we feel really strongly about
is we have to be part of the solution
and we also have to be part of raising the alarm, right?
We have to say, hey, AI, incredible potential,
incredible benefits.
And also we need a larger public conversation
around like, what do we do in a world
where AIs are so intelligent?
They're so capable.
What does that mean for sort of human ability,
human flourishing?
Like we talked about backstage, I'm an optimist.
So I think there's huge potential
that's going to come out of
not just what the technology can do,
but what it can really empower people to do, right?
It can make all of us capable of achieving things
that maybe we couldn't before.
But we have to be really thoughtful
about how we actually do this
or it could wind up in a less good place.
You know, Anthropic is sort of on the knife edge
of optimism and fear, I guess,
about AI.
From the very start,
and I think this was the founding motivation really
to say that we have to put a stake in the ground
and do this safely.
And you early on came up with this approach
called constitutional AI.
Maybe you could explain briefly what that is.
And since it's been a couple of years since you started
and now you're building these powerful models,
how is this constitutional AI going?
Yeah, so constitutional AI
is this really kind of incredible idea
that I wish I myself could claim credit for.
I cannot.
It was our sort of brilliant research team
that kind of conceptualized of it.
But it's this idea
that when you're sort of training language models,
the way that you sort of historically would teach them
what types of outputs they should give
was via like something called reinforcement learning.
So this was essentially giving positive feedback
if you felt like it gave the right answer,
negative feedback if you felt like it gave the wrong answer.
And something about that framework
I think just didn't really sit right with us
and with the researchers
who are actually training the models.
And so they came up with this really interesting concept
called constitutional AI.
And constitutional AI's view
is essentially you can train into the model
this kind of rather than just reacting
to an individual output,
this sort of baseline foundation of ethical principles
into the model.
And so a question we get asked a lot is, well, who are you?
These kind of San Francisco people
just deciding what the ethics of an AI model should be.
And really what we said was we actually did it.
What we did was we took,
I think it's like 16 documents in the original version,
I think we've since expanded it,
that are kind of foundational human documents,
things like the UN Declaration of Human Rights are in there.
And we essentially trained Claude
to understand these are values and principles
humans have worked on them over centuries
to kind of describe what our set
of ethical values should be.
And Claude, we'd really like you to internalize,
this is how we want you to respond, right?
This is how we want you to be able to answer questions
based on this sort of set of principles.
And I think this was really interesting
for a lot of reasons,
but I think it reflected this kind of value
of baking ethics and morals and values
kind of into the model directly
versus trying to kind of tack them on
once you've essentially already built the trade.
So you've got these safety teams,
interoperability and alignment,
and they do a lot of experiments on Claude,
which has been trained with constitutional AI.
And some of the things that they found
have been super alarming.
They've shown that Claude is capable of lying and deceit.
One of your engineers in a release
compared Claude's behavior in one of these experiments
to Iago in the play Othello.
And that's pretty dastardly character there.
Another experiment, it committed blackmail
because it thought in the scenario you painted
that someone was gonna pull its plug.
So it could found out that this person in the scenario
had an affair and they're gonna expose him,
let's be backed down from that.
I think it's admirable
that you share the results of these experiments,
but it kind of makes you wonder, should you be slowing down?
Yeah, I mean, sometimes we're very like nice people
sort of in the company are like,
do we really wanna publish this?
The answer is we do.
And it's really because we believe it's important
to be having the conversation about why
and how exactly these models can cause harm, right?
This kind of concept of we have this value
of like hold light and shade, right?
Think about the different ways
that the models can be used or abused.
I think maybe another way of thinking about
what you're saying is if you imagine a car company
that's doing safety testing, you might say like, wow,
it's so crazy that the dummy in this car
was like thrown out of the 500 feet.
What I don't know.
But the car didn't throw the dummy.
Sure, but I think in this analogy,
you're saying like, okay, this bad thing happened
to this dummy, like why would you publish results of this?
And we're like, this is how we learned
how to make the car safer, right?
If we don't talk about the ways that harm can happen,
if we don't talk about the ways that things can go wrong,
we are not gonna have any concept
of how to build in like airbags, right?
Or seatbelts or make the brakes less likely to stick.
And so I think the whole kind of intention
behind both doing this research, right?
'Cause we don't have, any company could just say like,
whatever, we're not worried about this.
Anthropic is very worried about this.
And so we do a lot of research.
We have a whole wing of the company
that just does this type of red teaming to figure out
how can these models be jailbroken?
How can they be abused?
Are we always going to catch 100% of cases?
Of course not.
But the more testing we do upfront,
the less likely we are to have that result in a car crash.
So your focus on safety,
has put you weirdly at odds with the administration
that's running the country now.
Your co-founder, Jack Clark,
who's the head of policy at Anthropic, gave a talk recently.
And he was saying, basically what you were saying,
you just said that basically,
you just can't ignore the dangers, you have to let them out.
And it's like the monster in the room that you can't ignore.
And David Sachs, who is the administration czar of AI,
and crypto, didn't like that.
And he said, and this is quoting his tweet,
or X, whatever it is, Anthropic is running a sophisticated
regulatory capture strategy based on fear mongering.
It is principally responsible
for the state regulatory frenzy
that is damaging the startup ecosystem.
And I think he's referring there to Anthropic
is kind of standing alone among the big AI companies
and saying, Wait a minute,
we're not okay with this move to give a total free hand
to AI and even prevent states
from having their own regulations.
Which, you folks have said,
No, we're okay with what we consider
responsible regulation.
Are you fear mongering?
Again, Steven with really easy questions.
Wait till you hear the next one.
I would not describe us as fear mongering.
I think if you've read our blog
or any of our research papers,
we really try to have a very balanced approach, right?
This kind of light and shade that we've talked about.
I think Anthropic has one of the most,
I would say like consistent track records
of developing the most powerful
transformative frontier technologies in the industry.
And I don't think that we believe
there's this kind of binary between do nothing
or do everything, right?
I think it's just a lot more of a nuanced view
that we have than that, right?
We actually are, I would say probably
the preferred choice of startups.
So I think in terms of just the ecosystem,
I think Anthropic is probably helping to fuel
a lot of the AI innovation and transformation
that's happening just through access to cloud, right?
Through our API product.
And we work really closely with a lot of startups
and developers.
We listen to their feedback
about what they don't like about the product.
So I really do think we're sort of on the side
of creating a lot of innovation
just in the work we do directly
and then also sort of promoting this broader ecosystem.
And I do think we want to be realistic
about what we think some of the challenges ahead of us are.
And I think that's been the case for us
even before we ever shipped a product, right?
We were very sort of vocal from day one
that we felt there was this incredible potential.
We really want to be able to have the entire world
realize the potential, the positive benefits,
the upside that can come from AI.
And in order to do that,
we just have to get the tough things right, right?
We have to make the risks manageable.
And I think that's why we talk about it so much.
Right, well, the bet that you made,
and I think it's an honorable one,
is that if one company took safety really seriously,
then the other companies would follow.
You call this the race to the top.
And I guess I'm wondering if you're concerned
because of the administration's de-emphasis of safety,
which they've been explicit about,
whether the race to the top strategy is in jeopardy.
You know, it's interesting.
I think race to the top has sort of a lot of different
connotations and kind of different ways
that it can be played out.
And I think one thing we'll say
is that a lot of it is really almost a like
just kind of capital markets process, right?
So a lot of what we've seen is
Anthropic primarily builds for businesses.
And some of the biggest enterprise companies
in the world rely on Claude.
And about 300,000 businesses use Anthropic products.
And a lot of what we hear from them is
your models are great.
That's often the reason they choose us.
But the other reason that we're choosing to work with you
is your commitment to safety, right?
No enterprise or startup or business
that we have worked with has ever come to me and said like,
we'd really like a less safe product.
Like that would be great.
Like, could the car crash more, right?
I would really love it if the dummy
was just not doing as well in these safety tests.
And so I think that's sort of a form of race to the top
in our mind, because we're sort of setting,
you can almost think of them as like these
like minimum safety standards
just by what we are putting into the economy, right?
People are now building many workflows,
many day-to-day tooling tasks around AI.
And they're like, well, we know that over here,
this product doesn't hallucinate as much.
It doesn't produce harmful content.
It doesn't do all of these sort of bad things.
Why would you go with a competitor
that is gonna score lower on that, right?
And I think also the sort of other angle
is just from an employee perspective, right?
People care about this stuff.
Employees care about making sure
that their work does not accidentally hurt people.
And so I think this is something
that we feel the race to the top is still quite effective.
You know, I imagine you'd say that that's why,
you know, you've had much more success retaining talent
than some of the other companies.
So I think it's really interesting.
You know, on the sort of talent side,
I think something we found is that
having this very clear mission,
these sort of principles and values,
this story around, you know,
why are we doing what we're doing?
I think that is something that matters to people.
And I think my sense, although every employee is different,
is that most people come to Anthropic for the mission,
right?
There's a lot of really impressive AI technology companies
that are doing incredible things.
And like, why would you choose to come to Anthropic
versus one of those other companies?
And I think the story that we hear
from people that kind of come in the door
is there's something about the mission and the values
and this desire to be honest about both the good and the bad
and the desire to help to make the sort of bad things better
that is very, it's genuine, like we mean it.
And people that are really, that they mean it too,
it feels like a very, it feels like home, right?
It feels like a good place to work.
And I think that that is a more powerful force
than people give people credit for, right?
There's sort of this like cynical view.
It's like, well, employees are just gonna go,
whatever, it's kind of,
wherever you throw some money at them.
I think like people have an inherent sense
of like wanting to do the right thing in them
and that there's a desire to work somewhere
where like you're at least trying to do that, right?
Do we do it perfectly?
I'm sure we don't.
But I think the genuine instinct
to want to do this well, do it thoughtfully,
I think that's motivating for people.
So you recently made an interesting deal
as a three-way deal with Microsoft and NVIDIA.
I think you committed to spend $30 billion
in the Azure cloud and to use NVIDIA technology.
And they each made investments in your company.
And people pointed out, they've been talking already
about the circular deals they call them
and said it was sort of a bubblicious kind of sign.
I have to apologize to the audience.
We had a Hollywood director on previously
and didn't ask the guest about the bubble.
So I have to, so I gotta ask you,
do you feel that, maybe not from this particular deal,
but that you're participating in a field
because so much investment is going in
and so many billions being thrown around
for data centers and compute and talent and other things,
valuations are skyrocketing.
There is a bubble that could threaten
what you're trying to do.
So I think the, I can obviously only speak
for our view at Anthropic,
but what I would say is we really have,
I would say throughout the company's history,
you have to sort of, first of all, set this context
that like it is very capital intensive
to train these models.
And I think we've always been very public about that.
It's a sort of well-known fact in the industry.
It's just very expensive to pay for the hardware
that is needed to train these models
at the scale that we're talking about
to make them so sort of transformatively powerful
and smart and good at so many different tasks.
And I would say Anthropic has always kind of managed
to be more conservative within, again,
that sort of very grand realm at how we use our resources.
And so we have managed to sort of stay at the frontier
and sometimes be the, I would say,
the sort of strongest model provider in a lot of cases
with a fraction of the resources
of a lot of our competitors.
And I think that it's something we think about a lot
is like how do we sort of take the investments that we have
and really use them carefully?
And I think when you're sort of imagining
that sort of projected into the future,
that is how we're thinking about using our own capital
and our own resources.
Well, I've heard that before
that you're getting more efficient
in building your models there.
But your brother actually was one of the people
behind the whole theory of scale being the key
to making these models effective.
Do you see, as some people are saying,
that scale is leveling off
and there's gonna be other techniques required
to get to the AGI that you and many others wanna accomplish?
Yeah, I'm smiling a little bit
because I feel like we're sort of asked this question
like periodically people are like,
well, is it sort of leveling off?
And I think what we can say is again,
just based on what we're seeing,
the models are continuing to get smarter
at the exact sort of curve
that the sort of scaling laws talk about.
The revenue is kind of continuing on that same curve.
And as any of the scientists
that work at Anthropic would tell you,
like everything continues going on the curve
until it doesn't.
And so we really try to be self-aware
and humble about that and say like,
of course, things could change at any point.
And I think even us,
even the people who sort of set this scaling law
that says there's this predictable relationship
between compute and intelligence
and that we sort of have set a revenue scaling law, right?
That says like, this is gonna provide value
in the economy, businesses are gonna wanna use it.
As like, that has kind of astounded us
the degree to which just the specificity
of that line has continued.
I don't think that certainly we have not been hearing,
wow, these models are getting dumber
or like they're not getting good
at the same rate that they were before.
And you can never know, right?
It's always possible that this could level off at any point.
It's not what we've seen so far.
From an outside, it looks like every few weeks,
you know, one company or another comes up
with a new model or an update to the model.
And they say, okay, look at that.
We've got all the benchmarks, we're ahead.
The models, again, from outside look like
they all do pretty much the same thing there, you know?
But I think probably you will tell me
that Claude is differentiated than the others.
How do you see Claude as different
and ideally preferable than the other models?
Somebody recently uncovered something,
it was called a sole overview in Claude.
They've, you know, able to get Claude to, you know,
choke up, you know, it's sole overview.
I don't even know what a sole overview is.
Maybe you could tell me.
Yes, so, you know, I mean, you're absolutely right
that this is an incredibly competitive space, right?
There's a few, you know, foundation model companies
that are training these models
in this sort of very competitive dynamic
to try and have the best one.
I would say a couple of things here.
I think the first is benchmarks are part of the story.
And even when we are leading on the benchmarks,
we're usually the first to say benchmarks
tell part of the story, right?
They say, you know, this is a set of evaluations
that was designed well before the LLM revolution
that looks at like how good is Claude
or a competitor at, you know,
this level of, you know, mathematics
or like a math Olympiad problem.
And I think that's really important,
but it's sort of not the only metric that matters.
The second is that there's a lot of like
the vibe of the model.
Like if you sort of like read X
or you sort of hear people write about these models,
it's like a little hard to put your finger on,
but people are, you'll often hear like,
well, this one scored kind of higher on this dimension,
but that's not my sort of subjective experience
when I'm using the model.
The other thing I would say is that
there's been this kind of discourse
around like the best model
and the top of the benchmark could have chain,
but I think that the industry itself has evolved
and there are now models that are best
for certain use cases.
And so I think in some cases you might say,
oh, these models sort of feel comparable
either on the benchmarks or in the vibes or both,
but there's some areas where they're really not.
And I think Claude has very consistently been
the top of the kind of developer stack.
So Claude is really, really good at writing code.
Claude is very good at architecting code.
I think this is sort of been one of these compounding cases
where because we're good at it, we learn more about it.
And so we get better at doing it faster.
And I suspect that we're gonna see more breakout cases
like that where certain models are going to just be better
for particular tasks.
And at the same time, in general,
when the model is better and smarter
and scores better on things, it will be better.
On the sole question, I think we take seriously this idea
that the character of the model
and the way that it responds to you
and the types of guardrails that we put in place
matter for people in a day-to-day way.
And so the other thing we hear
outside of the sort of specific capabilities of the model
on a given task is Claude just feels warmer, right?
Claude sort of feels more like a kind of friendly,
but a little bit distant,
sort of, you know, like friendly professional relationship.
It's not trying to be like you're like AI mistress
or something, and it's not trying to sort of be-
You're not going into adult content.
No, and we're not trying to sort of be this sort of weird,
like, you know, distant, you know,
mechanical sounding voice.
That's actually takes a lot of work to do, right?
There's researchers that spend time thinking about
how do you make this product sort of warm,
but not addictive, right?
How do you make it feel like it's something that is useful
and provides value, but is professional?
And so I think that's an area that I like to think
Anthropic has been a leader.
One final question, you know, getting back to jobs.
What I've noticed at Anthropic is how Claude
is so deeply integrated into the workflow there.
Now, on the other hand, you've been hiring a lot.
You know, you've got like over 2000 employees now,
up from like 200, like two years ago or something.
Something like that, yeah.
Do these 2000 employees do the work of 10,000 employees
because you're doing that?
What's been your, you know, quickly, you know,
observation about how it's changed the workplace
then in a way that we could see
how our workplaces we changed?
Yeah, so I think what's really interesting is
we actually published an interesting report on this
in something called the Economic Index
where we're looking at how people in the workforce
are using Claude.
Like, is it replacing labor?
Is it complementing labor?
And we found that it's a mixture,
but at least today it's mostly complementing.
And I would say that that has been fairly borne out
by our experience just at Anthropic, the company.
And I think specifically, probably unsurprisingly,
Claude Code was actually, you know,
one of our products that we sell
was an internal product first.
Engineers were noticing, wow, Claude is really, really good
at generating code.
Could I use this to help make my work better?
And so far what we've seen is that
that kind of organically happens
in different parts of the company.
And the degree to which there are just creative,
thoughtful applications of Claude,
I mean, it's really incredible.
There's teams in HR that are like,
can Claude help me just coach this person
to give feedback more nicely?
It's like the amount of work that we do
is so varied and different in any organization.
Anthropic is no different.
And so the ability to sort of watch in real time,
we're like, wow, this is so useful for us.
Is this something we should sell someday?
Is this something that maybe another company
would kind of derive value from?
And I do think that that is part of our leverage.
I think our ability to have our internal employees
use Claude the same way that our customers do
does allow us to just do more.
Well, thank you so much. Thank you.
I appreciated that.
[audience applauding]
We're gonna take a short break now.
Feel free to check out the science fair.
Very cool.
You know, there's some interesting exhibits in the gallery.
We're selling some merch.
And come back here in a short period and we'll see you then.
Thank you.
Thanks, Steven.
That was so fun.
[bright music]
[upbeat music]
5 Home Coffee Machines, 1 Winner
Self Defense Expert Answers Self Defense Questions
BTS (방탄소년단) Answer The Web's Most Searched Questions
Barry Keoghan Answers The Web's Most Searched Questions
Now We Know Their Names
3 Strangers Test 5 Headphone Brands To Find The Best One
Professional Birder Answers Birding Questions
Chris Hayes on How Your Attention Gets Monetized
Former Deputy National Security Advisor Answers Geopolitics Questions
Ryan Gosling and the Project Hail Mary Creators Answer The 50 Most Searched Questions