explainlikeimfive

ELI5 Why does everyone use AWS, and what actually happens when it goes down?

Every time there's an AWS outage, half the internet seems to go offline. Why is there such a heavy dependence on it, and can anything be done to reduce that?

https://www.reddit.com/r/explainlikeimfive/comments/1lugjb8/eli5_why_does_everyone_use_aws_and_what_actually/
Reddit

Discussion

GalFisk

When the internet was small, you could put up a website on your home computer, and it could handle the dozens to hundreds of visits it'd get. If your website got very popular, you bought a dedicated server to run it, and a fast internet connection. As the web grew, businesses sprang up that would do this for you. Very popular websites got several servers around the world to spread traffic and mitigate delays and outages. Nowadays this is really big business, and companies like Amazon will host websites for millions of customers in their huge server parks around the globe.

Which works great until there's an outage which brings them all down. Such systems have lots of redundancy, so it very rarely happens, but it's very hard to make a system with no single point of failure. It can be quite interesting to read post failure analysis from such events, as it's often a chain of errors that led to the ultimate downtime.

1 day ago
afurtivesquirrel

It can be quite interesting to read post failure analysis from such events, as it's often a chain of errors that led to the ultimate downtime.

I work closely with guys who work in tech resilience and it's fascinating.

Its also amazing how many things that fail and the answer is "we....didn't even know we had that shit in our stack."

1 day ago
Ahielia

Its also amazing how many things that fail and the answer is "we....didn't even know we had that shit in our stack."

Or the similar "this computer does nothing, let's turn it off....1 minute later oh fuck wait why is this mission critical production line not working any more?!"

1 day ago
M-Noremac

"this computer does nothing, let's turn it off....1 minute later oh fuck wait why is this mission critical production line not working any more?!"

"Welp, let's just turn it back on and pretend we were never here... time for lunch break?"

1 day ago
OIlberger

Don’t they call that “the scream test”?

11 hours ago
MilkIlluminati

Yep. The only way to identify if there are any users of some old piece of garbage is to get rid of it and see who complains. If you just email blast everyone asking the question, you'll get like a 30% reply rate of "no", and nobody actually thinking about the question

8 hours ago
Existential_Racoon

Worse, you decommission a legacy dc vm for a random domain you don't own, no documentation, no one knows what it is or what it does, etc.

The Finance dept uses it once a year to run critical software for reporting. On some legacy app that was allegedly replaced years ago and integrated into your current setup.

8 hours ago
Ahielia

Oh yes, that is definitely lots worse. And if you're lucky then you have a backup or the actual hardware server that was used and you can boot it up to finish the task and then migrate properly, if you don't... Well you gotta learn how to code the legacy app real quick

7 hours ago
MilkIlluminati

why is this mission critical production line not working any more?!"

(There's a forgotten single-line .bat script that moves one file that gets called over the network on that machine by some other script)

8 hours ago
ary31415

the answer is "we....didn't even know we had that shit in our stack."

https://xkcd.com/2347/

1 day ago
trueppp

More real than we would like to think. There are more than a couple such libraries and programs.

1 day ago
zvekl

Imagemagik?

1 day ago
ary31415

Some open source image manipulation tool that's used in a lot of other open source things

1 day ago
zvekl

I know, just didn't think it was as critical as gd

1 day ago
ary31415

Eh I think he was being a little facetious

1 day ago
Whaty0urname

Sounds like NASA tbh but if AWS goes down I have to imagine it costs even more than a space mission at this point.

1 day ago
B1LLZFAN

Well considering it would cost about 50b to land on the moon and they made 107b revenue in 2024. I think it's safe to say they are on par with each other.

1 day ago
dan_dares

Whisper to them 'Chaos monkey' and watch them sweat

12 hours ago
dalittle

Netflix has a system called Chaos Monkey that intentionally breaks things in their live system constantly (which seems nuts) to make sure they never go down.

https://blog.codinghorror.com/working-with-the-chaos-monkey/

As for AWS, I will add that it is way cheaper for a company to use (or another service) than buying all the hardware and running it. And you can scale up and scale down resources to save money. We built a system and then ran 40+ servers to load data for months. Then once all the data was loaded we just turned all those servers off. And we found out we needed bigger database servers, which we changed with a config file and changed in an hour. If we had bought all that hardware we would have been stuck with all those servers we no longer need and it probably would have taken months to upgrade the database servers.

1 day ago
Internet-of-cruft

Cloud is great when you have ephemeral or even fluid hardware needs.

Most companies have a very poor grasp on what they need and end up with oversized hardware that they don't take advantage of discounted reservations on.

Moving your servers up to the cloud looks cheap, but it can significantly outpace the cost of on-premise with poorly (read: the overwhelming majority of what's out there) forklifted infrastructure from on-premise.

If you're thinking cloud native (like your organization is), this doesn't apply. Most companies don't think this way.

19 hours ago
Lurcher99

Five 9's isn't cheap.

1 day ago
fang_xianfu

Google even only provides three to four nines for most services.

1 day ago
Squossifrage

I'm working on a startup right now that will provide services more MUCH cheaper than any of the existing cloud providers, with the caveat that we only provide one nine of guarantee. And by "one nine" I mean 9%, not 90%.

"MAINTENANCE NOTICE: Servers in your cluster will be pulled down for maintenance at 2:00AM CDT on Sunday, July 12, 2025. We expect the outage to only last until approximately 4:00 CST on Sunday, November 30th, 2025. Should unanticipated issues require additional downtime, you will be notified via certified mail."

1 day ago
falconzord

If its cheap enough, people will use it for non critical workloads

1 day ago
FaisalCyber

Yeah, but in that case, it must beat ec2 spot price, which is already much cheaper and with little downtime (up to only a couple of minutes) if you configure it to auto heal

1 day ago
falconzord

I don't think they give any guarantees for spot availability

1 day ago
[deleted]

[deleted]

1 day ago
Beetin

3 nines is 99.9% uptime, which comes out to about 1 work day of outages per year.

4 nines is 99.99% uptime, so a bit less than an hour of downtime per year.

5 nine's is 99.999% uptime, which is about 5 minutes of downtime a year.

3 nine's is very workable for online services (you can still have 1-2 fairly significant outages a year).

1 day ago
slups

Thanks, I had no idea what the hell they meant by nines until your comment!

1 day ago
trueppp

3 nines can lose you a small fortune depending on the company. Ex: store chain POS system is down for a day, we are talking hundreads of thousands to multi millions in lost sales.

1 day ago
zurkog

3 nines is about 5 minutes per year

Closer to 9 hours. Five nines (99.999%) is just over 5 minutes per year. I unfortunately have to know these things as part of my job. :-(

Six and seven nines is where you get into seconds-per-year downtime.

1 day ago
StoneyBolonied

What kind of industry requires 6 or 7 nines?

1 day ago
zurkog

What kind of industry requires 6 or 7 nines?

Finance (stock market, and credit card processing), SCADA for a nuclear reactor, air traffic control and I'm guessing national security / defense systems although I've never worked on those directly.

21 CFR 820 doesn't list uptime in "nines" but it shoots for the equivalent of between six and nine nines. You don't want your pacemaker going down for a system update.

1 day ago
Insight42

Exactly.

Shit where being down even 5 minutes is potentially catastrophic will require that.

22 hours ago
gSTrS8XRwqIV5AUh4hwI

When the internet was small, you could put up a website on your home computer, and it could handle the dozens to hundreds of visits it'd get.

You can still do this just fine. Much better than in the past, in fact. Gigabit fiber is a common thing to have at home nowadays, some places even have 10 Gb/s fiber at reasonable prices (like, reasonable for normal home use), and you obviously trivially can serve millions to hundreds of millions of visitors on such a connection if you aren't serving tons of video content.

1 day ago
rocketmonkee

Don't most home Internet plans have caveats that forbid commercial use of their home plans? And while business plans do exist, I'm curious how many people are going to run a website from a home server that can handle hundreds of millions of visitors. While the connection itself might be 10 Gbps, a site that popular running on a PC in the closet is just asking for problems with security, downtime, resiliency, etc.

1 day ago
zxyzyxz

Hacker News runs on a single server, serving hundreds of millions of requests a month. This person, well known in the AI space, serves 200 million requests a month. It's viable, just not a lot of people do it. The vast majority of startups are not getting anywhere near this scale.

1 day ago
trueppp

Hacker News looks like a 1996 website. It's the only reason the can serve that many requests.

2 hours ago
zxyzyxz

That was just an example, there are lots of sites that run on a single server, https://nomads.com/ for example that doesn't look like it's from the 90s.

2 hours ago
gSTrS8XRwqIV5AUh4hwI

Don't most home Internet plans have caveats that forbid commercial use of their home plans?

Such clauses would be unenforceable here in Germany (or rather, the EU), because of net neutrality rules.

Also, websites don't need to be commercial.

And while business plans do exist, I'm curious how many people are going to run a website from a home server that can handle hundreds of millions of visitors.

Well, probably not many, but also, that wasn't really the point I was trying to make. The point is that the vast majority of websites have far fewer than a hundred million visitors in a day, and therefore, a lot of them could be run from a home internet connection just fine.

So, if you are Reddit ... maybe don't move things to a home internet connection. But if you, say, run a small-ish hobby forum that has a few thousand regular visitors, there really is no reason why you couldn't run that on a home internet connection. You need to be pretty big before that becomes limiting factor.

1 day ago
smokingcrater

You sure about that? Very first German ISP I checked very explicitly says the operation of a server is prohibited. (Under section 5 customer responsibilities.) (Had to rely on Google translate, my German is.... rusty.)

It also explicity says no commercial usage, and they closely monitor and will automatically bill you at the commercial rate if they suspect it.

So to your point, even running a home non commercial web server is against the terms of service and could get a warning and eventually booted.

Your ISP might be different, but not likely. Check the fine print.

https://www.wilhelm-tel.de/agb#c398

23 hours ago
GNUr000t

Home hosting has been better than ever with Fibre internet. The reason people can't host without a rack of servers isn't the network, it's the bloated horseshit that modern webapps run on.

I have a static website that gets thousands of hits a day and for a brief period last year got thousands of hits an hour. The $4/month VPS it was on didn't even reach 5% sustained CPU utilization.

11 hours ago
JRDruchii

but it's very hard to make a system with no single point of failure.

Part of what I think makes the internet so fascinating. Just how much infrastructure would need to be compromised for the entire thing to go offline.

1 day ago
Internet-of-cruft

There's an important part missing here: Big companies like Amazon are designed so there are many regions in the world and locations in each region where you can have stuff hosted for you.

Most of the major outages are because one location, or even one region, goes offline (power utility fails, someone/something cute the fiber lines providing Internet to the facilities, etc.)

When that fails, everyone running all their stuff in that one location (or region) goes down.

How do you fix it? By spending lots more money to have more copies of your stuff in other regions in the world.

That's expensive, and most companies using Amazon Web Services choose to not do so as a cost saving measure.

20 hours ago
jericon

Several servers is a vast understatement.

Many companies run hundreds of thousands of servers.

1 day ago
atbths

Yes, but that was a really solid ELI5, simplified but providing enough detail to answer the question.

1 day ago
philmarcracken

wasn't there a stir around the first DDoS attacks and amazons network suffered one, and actually managed to fulfill all the requests so it kinda 'failed' ? That might have been quite the unintended marketing

10 hours ago
True_to_you

Because it's up most of time and the capital to start up such an endeavor is out of reach foot 99.9 percent of companies. Data centers require years of planning and development to scale up to the size of something like AWS. You have to build extremely large buildings, have the ability to cool thousand of servers, have the infrastructure to support your operations, and the work force to roll all this out and develop software to run it and keep it secure. All of this while it doesn't make you money for years. It's really hard to deploy.

1 day ago
invisible_handjob

it's far more typical for companies that don't want to use cloud resources to rack up hardware in other peoples' datacenter than it is to build their own

But it still is difficult because you now have a whole lot more to maintain , you need to buy hardware & depreciate it after it's lifespan, and if you grow faster than expected or proverbially get on the front page of reddit or whatever, it takes months to get new hardware, not a couple seconds to spin up new AWS resources

1 day ago
tejanaqkilica

This. At my company we rent space in other people's datacenters and put our hardware inside. It's good because it's cheaper than AWS, however it's very unflexible, so for stuff that needs to scale up and down very quickly we use cloud resources (Azure instead of AWS, but still)

1 day ago
jericon

Lots of companies will even setup their own cloud. Using stuff like openstack to make their own cloud hosted in other data centers

1 day ago
trueppp

You're still limited in scaling. Which is one of the big advantages of AWS. Adding compute is trivial.

1 day ago
2called_chaos

I'd wager the vast majority of projects running on AWS would never run into actual scaling issues on more traditional setups. It's very good for very dynamic workloads however (such as game server instances). But I'm positive I could handle all requests of the ASUS client with a single machine (obviously a bad idea for redundancy and latency but I'm talking workload)

1 day ago
trueppp

It's scale up AND down. Example, we have a retail chain as a MSP client. Their Point of Service software (Odoo) was self hosted. They would have huge slowdowns during peak seasons (Ex: Black Friday and Boxing Day). It was cheaper for them to shift to AWS hosting as they could scale down outside of opening hours and scale up dynamically during peak times.

Uptime was also dramatically improved over self hosting. AWS ended up way cheaper than either colo, or getting their datacenter up to snuff for peak loads.

1 day ago
[deleted]

[removed]

1 day ago
trueppp

Everything has its place. On-Prem, Colo, Cloud compute etc. You just need to evaluate your needs and choose the correct solution for you budget/needs.

A lot of companies don't do their transition to the cloud correctly and simply migrated their on-prem VM's to the cloud with no changes.

Like running a VM with a Domain Controller or a SQL Server instance makes no sense. It's way more expensive than using their native cloud counterparts.

1 day ago
mithoron

If you own the compute instead of renting it there's no benefit to scaling down.

1 day ago
trueppp

Of course, but you still need to size the compute for maximum load with all related costs (Admin, power, disaster recovery etc).

Just having geographically seperated compute gets expensive fast.

1 day ago
mithoron

Admin and DR don't really scale with compute, they scale with complexity, and power scales up and down with use naturally. Compute is just hardware and that's one of the cheaper pieces of a datacenter. Yeah, you could save money if you were able to buy cheaper servers, but it's not an amount that's going to mean the life or death of a company except in the narrowest of edge cases.

Geographic spread is more of a sales point for cloud in my mind than the mythical scaling monster people always talk about.

1 day ago
aplarsen

I'm a small-time solopreneur with a couple thousand clients, and I was surprised at how quickly I needed autoscaling to handle dynamic workloads. I went from standing up a single box in EC2 and running a Python script on it to containerizing workloads with ECS and Fargate in a couple of months. It was a wild time.

20 hours ago
Floppie7th

Openstack, Kubernetes, Openshift, etc are all great for distributing workloads, especially workloads that need dynamic scaling, but they don't help you scale up your overall total compute (or storage, etc) when you need to physically acquire and rack machines

1 day ago
harmar21

yup we do the same thing, we have a few racks at two different datacenters. However end of month we do A LOT of processing, and we had so much hardware to handle that 10 hours or so of processing, while rest of the time it idles. So we moved that processing part to the cloud, and rest stays on prem and saved a ton of money on colo costs and hardware costs.

1 day ago
seifyk

I think this is the most common approach for medium sized companies. We do this in a state scale health system.

1 day ago
ProkopiyKozlowski

One advantage of cloud services over internal hosting is predictability of costs.

If you own your hardware you have to manage a lot of budgeting tasks related to it - procurement, repair/replacement and upgrades. Failures are unpredictable, so you must earmark a certain amount of funds for it. If you use a cloud service on the other hand - that's just a single recurring payment that is extremely easy to budget for. And if you need to expand your business - you just switch to a more expensive plan that suits your needs, no headache.

1 day ago
VoilaVoilaWashington

But also, if you need to shrink your business you can downgrade. Maybe not like "oh revenue is down 30% cut 30% of server space", but "we switched to a new service provider which actually means we're not hosting this data ourselves in the same way anymore." If you have your own data centre, not much you can do. AWS? It's a phone call.

1 day ago
Mezutelni

Unless you forget about this one lambda that somehow made your bill in 7 0's

1 day ago
Canaduck1

If you're a big company, you already had hardware/datacenter costing down to a science.

We've still got our datacenters (and for the time being, that probably won't change. Mainframe is still a thing), but have been moving a lot of stuff to AWS and Azure. it's been a costing nightmare.

(both in terms of predictability and total cost.)

1 day ago
VexingRaven

One advantage of cloud services over internal hosting is predictability of costs.

Huh?? Where in the heck did you come up with this idea? Internal hosting is the most predictable cost there is. You have XYZ hardware, it's going to cost you the same amount of electricity every month. You pay the same people the same cost every month to maintain them. You pay the same amount of rent every month for your rack space. There's no source of unpredictability.

Cloud services by their very nature can vary wildly in cost from month to month.

1 day ago
VicisSubsisto

No source of unpredictability? You sound like my company's design engineers. "This part does not need to be accessible for service or field-replaceable. It will not break."

1 day ago
gSTrS8XRwqIV5AUh4hwI

Yeah, the big advantage of spending 10 times as much as you'd spend with your own hardware so that you can avoid the uncertainty of not knowing whether you might need to spend half of your AWS bill rather than a tenth of your AWS bill next month because a server suddenly broke.

Because obviously, it would be really hard to budget the same amount and then have 80% of it left over at the end of the year.

1 day ago
Squossifrage

Hardware is rarely the most expensive part. Spending the money on AWS instead of your own hardware might reduce the expense and inflexibility of ten $100K admin jobs inside a $2 million building.

Does AWS make sense for replacing a Windows file server for a 10 man office? Almost certainly not.

Does it make sense for 1,000 endpoints across 2 campuses 1,000 miles apart in support of a web-facing product/service? Maybe.

1 day ago
VexingRaven

If you had 10 people and your own datacenter running things, you're certainly at the scale where you're spending millions every months on AWS. I'm not convinced that's going to be cheaper. AWS still needs to hire those same people and build the same datacenter, but now you're paying for their profit margins. It doesn't make sense.

There are reasons to use cloud. Cost is never one of them. I have never seen a cloud implementation that was cheaper than on-prem, and every time I try and run some numbers to come up with one, I fail.

1 day ago
trueppp

But it does not make sense for that 10 man office to be running a On-Prem Exchange server or AD server when they can be using O365 and EntraID

2 hours ago
gSTrS8XRwqIV5AUh4hwI

Hardware is rarely the most expensive part. Spending the money on AWS instead of your own hardware might reduce the expense and inflexibility of ten $100K admin jobs inside a $2 million building.

It might ... it just doesn't.

1 day ago
TicRoll

This, 1000x over.

For any company in a co-located facility, for God's sake, STAY THERE. AWS will NEVER be cheaper.

And what is it with people thinking you need some massive staff for hardware in a data center? I had 25 racks filled at one point and averaged one data center visit a year, scheduled in advance. And I did always know what the bill was because server costs are fairly constant, server capacity grew continuously, and I could size clusters to minimize licensing costs. And if you're messing with Oracle or Microsoft products, you know that licensing costs dwarf all other costs.

1 day ago
Kraligor

Yep. There's a reason why companies are going back to on prem. Cloud pricing has gone up to a point where on prem is the cheaper option for many use cases.

1 day ago
Polantaris

and if you grow faster than expected or proverbially get on the front page of reddit or whatever, it takes months to get new hardware, not a couple seconds to spin up new AWS resources

This happened to the MMO Final Fantasy XIV. When the Endwalkers expansion released, they had such a massive player influx that their servers just...couldn't handle it. They made a bunch of backend changes but all it did was alleviate some pressure during less demanding times, but ultimately you were often looking at login queues of >10,000 per server.

It took them nearly 18 months to upgrade all of their distinct data centers (they have three or four).

If it were architectured for cloud, horizontal and/or vertical scaling exist to solve this problem with the click of a button. It's not always so simple but it wouldn't have taken them 18 months.

1 day ago
sfo2

I worked at a startup where we did both. The CTO was obsessed with having his own data center and supercomputer, so they spent a bunch of money on that.

After several years, the hardware was obsolete, and it would go down sometimes, and didn’t scale, and one of our tech support guys would have to physically go there. By that point, we’d been using AWS for some things, so we slowly just migrated to AWS. There is still a $150k Nvidia supercomputer sitting in a closet somewhere now.

1 day ago
tuckfrump69

on prem solution often has as much outages or more than using AWS

and when something goes down you can't just blame amazon and have to spend resources to fix it

1 day ago
Kraligor

on prem solution often has as much outages or more than using AWS

Granted, MS isn't AWS, but there are basically ALWAYS issues with M365 and AAD. That wasn't the case with on-prem Exchange and domain servers. Just looking at the service health center now, there are 2 incidents and 15 advisories. And that's normal. That wasn't normal a decade ago. Service quality has gone down the drain with the advent of the cloud craze.

I know, it's a different sort of cloud, and we rarely have issues with our AWS instances, but still. People don't even consider on-prem solutions anymore (well, they're slowly coming back around to it now, thankfully).

1 day ago
that_baddest_dude

My dad worked IT in Dallas and he referred to one of these shared data center things as "the colo", as in colocation

1 day ago
MyClevrUsername

I’m a sysadmin at a small company. This is exactly what we do. We rent a couple racks at a datacenter.

1 day ago
Lyress

deprecate*

1 day ago
invisible_handjob

no, depreciate, the financial term, as in spread the cost over the lifetime. Not deprecate as in retire

19 hours ago
Lyress

TIL

16 hours ago
MrHedgehogMan

On top of that, say you have a cool solution to a problem that you just built and you want to increase capacity.

If you were doing it the old fashioned way, you have you buy more servers, find a home for them and wire them in and then spend however many hours setting them up.

In AWS if you wanted to scale out your solution you could do it in an afternoon.

Then you find out that you scaled out too far. All those servers you bought? A waste of money. But in AWS you can just wind back what you deployed.

1 day ago
UnsignedRealityCheck

Data centers require years of planning and development to scale up to the size of something like AWS.

Having managed a really small data center, what I can tell you is that scaling up is not hard - it's next to impossible unless you plan it to be big from the start. If you buy hardware 'just to get rolling' and then you have to scale horizontally, you will most likely have to start from scratch. Having switches, storage and alike planned to host (let's say) 20 blades, 6 switches with failover and redundant storage, you simply cannot bolt more shit on the side and have it work together seamlessly.

This is where AWS absolutely dominates. You can just send it and scale to "this is stupid" without much effort.

1 day ago
afurtivesquirrel

Honestly this is even true at a homelab level.

You want one thing, eventually you want two things, and it suddenly needs a full redesign.

AWS literally is just "one more, please"

1 day ago
smb275

eventually you want two things

Don't I fucking know it...

1 day ago
hapnstat

There’s another part to the answer I think is important. A modern web page makes requests to potentially dozens of servers / domains. If any of them in the critical path have issues, your site is gonna hang or be generally hosed. This used to be an even bigger issue when Akamai had a larger piece of the pie.

1 day ago
permalink_save

Most of the time meaning pretty much all the time. AWS uses four nines SLA which means a bit less than an hour of downtime per year before having to give credits. Typically SLAs leave a lot of wiggle room so providers don't have to refund back services, so it might be closer to minutes a year for them. Meaningful outages are incredibly rare in hosting these days.

1 day ago
VoilaVoilaWashington

And that includes maintenance outages. So you get a notification that on Thursday July 27th at 4 am, it's going down for a half hour. Depending on how critical your data is, you can plan around it or just let your users know.

1 day ago
Halgy

Also, even if you build your own data center, it will also have downtime, and almost certainly more than AWS. You'd just go from having an outage when everyone else does, to having an outage when no one else is.

1 day ago
ir_auditor

Basically, there are 3 large cloud computing companies globally: Amazon AWS, Microsoft Azure, and Google cloud.

If a company wants to run an application, they can run it on their own servers and infrastructure or just rent it from one of those 3. Currently, for many use cases, it is much simpler to host these things in the cloud than setup your own infrastructure. The reason is that hosting an application in most cases is much more than just hosting an application. You need an app server, a database server, load balances, backups, firewalls, all kinds of microservices doing things in the background, fail-overs That makes the infrastructure complex. This makes a good business case for those cloud systems, as they can provide much of that very effective and reliable.

But since there is only 3 of such companies that seem to dominate the market, if one of them fails, a large part of the world will notice.

1 day ago
Lucky-Elk-1234

It’s also easily scalable. So when your company grows and you need more processing power or data storage, you don’t need to buy a bigger building and more servers. You just log on to AWS and upgrade your plan with them for relatively cheap and easy.

1 day ago
mslass

Easy? Yes. Cheap? No.

1 day ago
Lucky-Elk-1234

Relatively cheap. As in you don’t need to buy/rent a new building to scale up.

1 day ago
aplarsen

Another thing to add is the services. Oh my, the services. Think about something as beautiful as SQS. You just send it messages, and it handles so much of what is needed to handle the concept of queueing. That's all abstracted out, and it just...works.

20 hours ago
samanime

Also, most large companies will use MORE than one of those 3. Usually at least 2 of them, since the odds of two of them having major issues at the same time are very, very slim.

Granted, that is expensive, so many more medium sized companies do not do that.

10 hours ago
UnkleRinkus

AWS outages are fairly rare, generally fairly contained, and good teams implement strategies to manage impact of any outages. The company I worked for used AWS. We show 99.98% uptime over the last year, and I don't recall any outages that were due to AWS service unavailability. We switched over to different AZ's once IIRC, but there was no customer impact.

1 day ago
Elegant-Magician7322

If you are in 2 AZ’s, and you don’t achieve 99.99% uptime, AWS is supposed to credit money back to you, according to their SLA.

1 day ago
SharkBaitDLS

They said their outages weren’t due to AWS infra. 

1 day ago
MacAllansPolsevogn

If you are in 2 AZ’s,

That is the issue. People assume that they can just magically use AWS and get the uptime and low prices. Reality is that correctly using AWS is a major skill, which few people have. Doing things correctly also means that it's more expensive. Ever seen the Hosting WordPress on AWS reference architecture that's not cheap.

AWS provide you with everything you need to built resilient systems, but it will cost you and most won't pay that price, or doesn't know. So instead everything run in the US-EAST-1 datacenter and then AWS is no more stable than so many other hosting providers, except that damage is that much bigger when US-EAST-1 has an outage.

1 day ago
dekacube

Multi-cloud is also a thing people do.

1 day ago
UnkleRinkus

Amazon.com is just never down, and I''ll bet a nice dinner that they aren't replicating to Azure and GCP.

My conservative customers spend lots of money on multi-cloud. It sounds really good at first glance. My last ten years in the ecology haven't convinced me of its value.

1 day ago
fang_xianfu

Amazon.com also has the luxury of things like eventual consistency that aren't suitable for every use case. If the order you "placed" doesn't actually charge your card or get picked for six hours because the backend is fucked, that's a huge logistical challenge for Amazon but doesn't affect you as a user at all.

1 day ago
afurtivesquirrel

Depends on your industry.

Five nines is a minimum for us. An hour's unscheduled downtime a year costs us millions and gets us a letter from the regulator.

AWS doesn't even offer five 9s on compute. You'd have to go multi cloud or keep on prem.

1 day ago
dekacube

Yeah, where I work is single cloud with failover AZs as well, never been an issue.

But I think one of the motivations for us moving from ECS to EKS was that it would make multicloud easier.

1 day ago
cbftw

We deploy to 3 AZs in our primary region and have a warm DR deployment in a second region in the unlikely event that the entire primary region goes down

1 day ago
rcunn87

We deploy in 3 regions each with 3 AZs and are active-active-active. Within 30 minutes of noticing a problem in a region we can evacuate all traffic from that region to the other two regions. It took years to get to that point and it's hard to build everything in this fashion. We also can evacuate service by service but I feel like that's less interesting than jumping regions.

1 day ago
cbftw

Is one of your regions us-east-1? We've never had a service issue that impacted business in us-east-2

1 day ago
rcunn87

Us-east-1, us-east-2, us-west-2

You forgot a 'yet' at the end of that sentence. It will happen and you can go down for the day and be okay with that or have infrastructure/services that can handle traffic migrating quickly. I think most of the time taking the outage is okay for a lot of companies.

A few Decembers ago there was a region outage in east1 then the following week there was a region outage in west2. We kept taking orders through both whereas competitor 1 went down in the first week and competitor 2 went down in the second week.

1 day ago
swinging_on_peoria

Amazon.com follows the recommended practices for using AWS, not every AWS customer does that.

1 day ago
pixel_of_moral_decay

You don’t do multi cloud for high availability, you do it for portability.

If you can run on multiple clouds you can migrate away which gives you a lot of negotiating power with cloud providers especially if you’re large.

The last thing you really want is lock in with a single vendor. Just look what happened to everyone who built their infrastructure around VMware and Broadcom came along. There’s no rule that prevents Amazon from deciding on contract renewals to 10-20X the bill and just let people squirm. It’s working for Broadcom, and no, the government isn’t stepping in.

I’d go as far as argue anyone with a single cloud strategy should need to disclose to investors at least annually with the usual business disclosures. It’s that big of a problem.

18 hours ago
mslass

Not many; the best features of each cloud are highly vendor-specific.

1 day ago
MedusasSexyLegHair

Right, and we don't want to spend another year re-optimizing all our systems for a different cloud provider and retweaking all of their configuration and settings.

Let alone end up with some critical part that's only on one cloud and not the other.

Or having to pay two unpredictable cloud bills.

And what's one thing worse than vendor lock-in? Being locked in to multiple vendors.

1 day ago
deathanatos

During a recent GCP outage, "Downdetector" for AWS spiked. This led quite a few commentators to (incorrectly) believe that GCP & AWS were having simultaneous outages. AFAICT, it was just people not understanding that other things that were down weren't hosted on GCP, and just incorrectly blaming AWS. (Downdetector's "information" is source from user reports …) The set of people who thought AWS was having an outage included our GCP rep.

18 hours ago
CanadaNinja

It's a very powerful web hosting service, but it's not the only one. Amazon's web hosting is AWS, but there's also Microsoft's Azure, and Google's Cloud. Because these are specialized vendors it's cheaper and more efficient than setting up your own servers and needing to manage it yourself.

The main way I know companies mitigate it is by paying for 2 or 3 of the big vendors so if one fails you can have the others still working, and then it's just load balancing and maybe some temporary slowness, rather than service failure.

You can also try to host your own servers, (remember when people were demanding dedicated servers for MW2?) but it's really not worth it these days.

1 day ago
rlt0w

I'm unreasonably triggered by you calling it a web hosting provider. Also, it's probably less common for people to use multiple cloud providers and rather go with multiple AWS regions. If they are mixing Azure and AWS, it's generally for AD services from Azure. I rarely see folks using compute resources in Azure.

1 day ago
jericon

I wouldn’t say that cloud is cheaper. Instead… It’s a more predictable cost.

1 day ago
merelyadoptedthedark

Cloud is really expensive. A lot of companies are moving back to on prem to save money.

For my company it's the second biggest cost after payroll.

1 day ago
jubza

It's cheaper for smaller companies, the cost of full on prem can be really expensive

1 day ago
pixel_of_moral_decay

It depends on your business.

General rule of thumb is if you have constant tasks on prem will be cheaper, for spikes loads cloud is cheaper.

For most the best formula is hybrid. Run your baseline on prem and use the cloud for flex capacity.

You pay a premium for that instant on demand capacity.

18 hours ago
fang_xianfu

Not just more predictable but smoother and simpler. It stops you having huge capital expenditures when you need to upgrade. When I worked with on-prem Hadoop data lakes there was a 6-9 month lead time for hardware and we would order hundreds of terabytes of RAM-worth of machines at a time. It was a huge financial pain in the ass.

1 day ago
Eokokok

I think this is one the biggest misconceptions of the excel-balancing generation of managers out there - what you wrote should be stated as 'it is cheaper OR more efficient than setting up your own servers'.

You cannot have both, and once you start going through SLA you realize that you either pay more in hard cash or in quality of service. But given outsourcing is great for book balancing said vendors grew fast.

Especially considering the fact most companies, even tech ones, tends to accumulate insane tech debt for their infrastructure regardless. So if you are already in such a situation where your underfunded IT barely works you will gladly push it outside, can't be worse than what you had...

1 day ago
Swiddt

You are confusing multiple concepts and use technical terminology wrong which makes your comment wrong. In addition to what the others are saying:

Your last sentence about MW2 has nothing to do with the discussion what so ever. MW2 had peer to peer lobbies which basicly means one of the clients was used as the server.

1 day ago
bakerzdosen

This has been answered but I’ll take a crack at it.

Building a datacenter is complex, expensive, and difficult. Managing one is also pretty complex.

Even though once you reach a certain point, it’s almost always more cost effective to build and manage your own “private cloud,” many companies choose not to do it.

One of the main reasons is flexibility.

This is pretty EL5, but say you run a massive Black Friday special and your site gets hammered for 7 days. You’ve gotta prepare for that and have the infrastructure to support it, otherwise customers will get frustrated and will go elsewhere.

The thing is, if you only need that infrastructure once a year, you’re wasting money by having it just sit there doing nothing 51 weeks out of the year. So, instead, you use AWS and those 51 weeks out of the year you use a small fraction. Then, that one week you ramp up your presence in AWS to accommodate your customer needs. When it’s done, you go back to your small footprint. In that way, AWS can save you money.

But, if your needs are pretty flat all year round, it makes more financial sense to have your own datacenter(s). But not all companies have the technical expertise to do that, and don’t want to (or can’t) hire someone.

Sometimes it’s a capex vs opex issue. This part isn’t exactly EL5 but suffice to say, some CFOs prefer to minimize their capex (capital expenditures—the things you buy and own like computer equipment) relying on opex (operational expenditures like essentially “renting” computers from AWS.) There are reasons for doing things both ways, but that accounting preference is another reason to go with AWS.

And lastly, sometimes c-level executives just want to be “buzzword compliant.” They heard AWS was somehow cutting edge or necessary to be… something so the edict comes from the top of the company to move “all in on the cloud.” Unfortunately they don’t usually do a full cost analysis on things before handing down such an edict and end up spending a LOT more than they anticipated.

AWS is great for a lot of things, but it’s not the solution for everything. Most large companies tend to have a more hybrid approach putting things in AWS when it makes sense and keeping them in their own private cloud when that makes more sense as well.

1 day ago
Elegant-Magician7322

Prior to switching to AWS, my previous company maintained its own data centers in different areas.

In order to have the 99.99% uptime to match AWS’s SLA, there needed to be people available both remotely, and physically at the data centers 24/7. There were beds in the data center facilities.

The locations chosen for the data centers had to be well thought out. Besides land cost, they have to be in areas where you don’t have to worry about too much natural disasters, such as earthquakes, fires, etc.

It is more cost effective to use AWS (or GCP, Azure, Oracle Cloud, etc). You pay them for the uptime.

It took few years for the company to move out of its own data centers to AWS. But the uptime has been more reliable than before.

1 day ago
jericon

The geographic factor is one reason that Arizona is extremely popular for data centers.

Other main hubs for data centers are located along major Internet backbones. Such as ashburn Virginia, which is where most of the oceanic trunks enter North America.

1 day ago
jericon

AWS actually started as a way for Amazon to capitalize on their unused server capacity the “rest of the year”.

1 day ago
dos8s

Ok, correct me if I'm wrong, but I believe Amazon actually got into being a cloud provider because the opposite of what you mentioned; outside of their busy seasons (like Black Friday) their infrastructure sat underutilized so they decided to "rent" that unused capacity.

1 day ago
bakerzdosen

I don’t know if that’s true or just a rumor (I’ve assumed it is true but I can’t back it up with actual data.)

My point was less about “THIS is how you save money by using AWS” and more about “there are some use cases out there where you can save money by using AWS.”

But usually (as in the industry-wide generally accepted number) it costs about 3x more than running your own.

18 hours ago
fang_xianfu

You have some good answers.

I work for a modern cloud-first bank so our dependency on the cloud vendors is really important to us. Our bank has to keep functioning in all kinds of nightmare scenarios so people can keep getting paid, buy food, get around, etc.

Our solution to this is extremely simple to describe but very hard to do: we made another copy of all the core functions of the bank that runs in Google Cloud, and one that runs in Microsoft Cloud, as well as AWS. If the main version ever fails, we can move over to running a limited subset of our services into another company's cloud temporarily, with very limited impact on customers. This system actually got tested a couple of weeks ago when there was a Google Cloud outage for a few hours and it worked great.

This was a huge project that took a massive amount of planning and work to pull off though, it's not something most businesses would do. Outages of that scale are rare and temporary and most businesses are ok with simply not running during the outage, that's a risk they accept. We can't do that because we breach certain regulations and compliance requirements if we aren't available for customers to use our services 24/7, so it was worth spending the time to do this.

I previously worked in the video games industry and we worked very similarly - we ran an online game similar to Counter-Strike, and when a big patch or something is coming along that will bring a bunch of people back to the game, we would have prepared to expand server capacity into several different public clouds to distribute the work as smoothly as possible.

1 day ago
Zesher_

It's very versatile and they have a ton of data centers. There are other good options, but it's become a standard for a lot of companies.

I used to work at Amazon, and on my second day AWS went down, it was interesting following all the internal conversations on how they were dealing with it. Then my friends joked that I did something to take it down.

1 day ago
[deleted]

[removed]

1 day ago
explainlikeimfive-ModTeam

Please read this entire message


Your comment has been removed for the following reason(s):

  • Top level comments (i.e. comments that are direct replies to the main thread) are reserved for explanations to the OP or follow up on topic questions (Rule 3).

Off-topic discussion is not allowed at the top level at all, and discouraged elsewhere in the thread.


If you would like this removal reviewed, please read the detailed rules first. If you believe it was removed erroneously, explain why using this form and we will review your submission.

1 day ago
WarPenguin1

AWS stands for Amazon web services. The reason AWS is so popular has to do with why it was created in the first place.

Amazon is an inline retailer. That means there are times when they get way more traffic than normal. Traditional retail hires temporary workers for these times.

Amazon need a large amount of servers for a temporary amount of time. These servers make Amazon a lot of money for a small investment but they are not needed all of the time. Why not rent out these servers when Amazon doesn't need to use them?

Amazon can do the work of creating a large amount of servers and they then are willing to rent them out for a relatively small amount of money. This is the reason AWS is so popular.

1 day ago
Karatekk2

Amazon Web Services provide many of the services that the web uses to function. Servers, db, auth, etc. Google and Microsoft offer alternatives.

1 day ago
aerothorn

All websites need servers. Once upon a time, this would be a single computer, dedicated just to hosting the website. As websites got bigger and more data intensive, they needed multiple computers: this got expensive.

Then virtualization came along, which was a way that one physical computer could "divvy up" it's resources to act as many different servers in one, each hosting different sites or services.

AWS built virtualization at a massive scale, with massive data centers all over the world. This scale made it both cheap and relatively reliable (the redundancy of multiple data centers).

The reason everyone uses AWS is that everything else is more expensive, and you don't get better results or performance. And at this point, it's also like IBM of yore: nobody ever got fired for choosing AWS.

How to reduce the single point of failure is a bit too complex for ELI5, but people would need a reason to use something other than Amazon (or Azure).

1 day ago
jericon

AWS started as a way for Amazon to sell their extra server capacity when they weren’t using it.

Then it became the largest part of their income. Honestly, AWS provides more income to Amazon than their e-commerce business

1 day ago
bert93

Everything else is not more expensive.

In fact racking up your own hardware in a DC can be much, much cheaper.

https://world.hey.com/dhh/the-big-cloud-exit-faq-20274010

The big cloud providers are actually quite expensive. The reasons they've become so popular are scalability and the many services that bolt on top for better management and deployment.

Another issue is that if you go down the route of having your own hardware, you need staff to manage it and you need to keep that internal knowledge readily available among staff through churn.

1 day ago
aerothorn

I am factoring in the expense of scale, making/buying services, and staffing. This is an ELI5 answer, not an in-depth exploration of the topic.

1 day ago
gSTrS8XRwqIV5AUh4hwI

The reason everyone uses AWS is that everything else is more expensive

Haha ... what?

1 day ago
Kian-Tremayne

Lots of companies (not absolutely everyone) use AWS because that way they’re renting Amazon’s computers to run their software instead of having to buy dnd run their own. It’s the same as renting a couple of floors in an office building instead of having your own building and all of the headaches involved in maintaining it.

As for reducing the impact - Amazon do a lot of that already by making AWS resilient, which is the word for “keeps working when part of it fails”. However, the customers need to build their software on AWS to take advantage of that resilience, and if they’re really paranoid they could build things so they also use one of AWS’ competitors or their own systems in parallel. However, the more effort you put into making stuff resilient the more complicated and expensive it gets. So even people who do design for resilience (and not everyone does) can only take it so far. Welcome to the world of design, where everything is a trade off.

1 day ago
Mayoday_Im_in_love

As an aside I am very impressed by what is free to hobbyists. These are real storage devices, processors, memory, internet connections using real resources like electricity and bandwidth fees.

Oracle are more than happy to give me three virtual machines for nothing with no apparent end date. There are also very generous database services built on top of these, again for free. GitHub and Cloudflare even make static websites free and accessible.

I appreciate it's a freemium model and there is money to be made at an enterprise level but hobbyists have never had it so good.

1 day ago
TbonerT

Everyone depends on it because it was built to handle the absolute crush of cyber Monday and then later Black Friday. If it can handle those, it has significant spare capacity at all other times.

1 day ago
TornadoFS

There are plenty of server-hosting solutions out there, what AWS offers that others don't is a full suite of services on top of it. Things like authentication, different types of databases, telemetry, monitoring, etc, etc.

The only companies that comes close in the space of services that AWS has is Google Cloud and Microsoft Azure. GCP and Azure are better in some of their services and worse in others, so it is not really as simple as saying AWS is better either.

1 day ago
permalink_save

Real answer is, well not everyone uses it, but it's still the same problem regardless of ISP. I can't speak for them but it shoyld be simular to us. Outage doesn't mean literally the whole thing is down. That um, really can't happen. An outage can mean control plane, which means no provisioning new servers and such, won't interrupt service. An outage could mean something like a network oopsie, but that would only affect the one MZR (regional datacenter). Everything is split up enough that things can just keep going for most of it. Also larger customers use multiple MZRs, and some use multiple providers.

Okay when I said it really can't happen, it kind of can, and has for us. Our backbone heavily fucked up, bad, which basically upstream from us cut us off from the internet. I don't remember how broad the impact was (like if it affected our services outside of the US) but it was the largest outage I have seen yet.

If you want to see what it would look like for a catastrophically mass outage then you should be asking what it looks like when cloudflare has an outage which does happen.

1 day ago
Themris

"In the cloud computing market, Amazon Web Services (AWS) holds the largest market share, followed by Microsoft Azure and Google Cloud Platform (GCP). Specifically, AWS leads with 30-33% market share, Azure holds 20-23%, and Google Cloud has around 10-12%"

Just worth pointing out that there is healthy competition.

1 day ago
Miliean

WAAAAAY back in the day, in the 90s as Bezos was building Amazon every online internet company had to have their own servers running their service. Since he was running an internet service, that meant that he had to own servers.

One of the genius things about Bezos is that he has a tendency to look at things his company is already doing, and figuring out how he can resell those things to other companies. That's why small companies can list their own products on Amazon, then use Amazon's warehouse and fulfillment processes. So it's amazon front to back, they just don't own the inventory.

AWS is basically doing the same thing but with servers. Amazon got big early and that meant that they had to be able to build and maintain data centers. But once you have a few datacenters, why not just add more and expand. So he did, and started selling space in those data centers to other people. That service eventually became AWS.

Today if you want to be an internet company, you don't need to own any servers at all (and people mostly don't). AWS provides that "service" to you. They rent you servers, in their datacenter for your internet company. You do with them what you will (within the rules) and they charge you a monthly fee.

They are very good at this, and as a result most internet companies use their services.

1 day ago
AV1869

When you watch a YouTube video, that video isn’t stored on your laptop – it’s stored remotely, on a server, and then when you click on it in your web browser, it makes a request to stream it to you, by sending little bits of it at a time over the internet. Now since there are hundreds of millions of YouTube videos and websites and whatnot, these all have to be stored somewhere. That’s where cloud providers like AWS come in. They take care of the business of hosting the video, which entails storing it and being able to provide you access to it when you want. AWS is one of the big providers for these servers, and there are many others like Google Cloud, Oracle, and Azure. Sometimes things can go wrong where the server is down, or some service that it depends on is down, etc. As others have said it’s pretty rare for this to happen but it does sometimes. The example of streaming a YouTube video seems simple, but the question of why these service providers comes up at scale. What about when you’re watching a twitch stream? In this case, the streamer has to upload their content to the server, and the server has to distribute it to thousands of viewers simultaneously. That requires a lot of effort – imagine trying to individually send a text message to thousands of people. The server handles all of this for Twitch, so in theory all twitch has to do as a website is tell upload the streamer’s video feed to the server when they go live, and deliver that same content from the server when a viewer clicks on their stream to view it. It’s would require a lot of effort and money for Twitch to develop their own service that does this, so they just hire a cloud provider such as AWS to do so. Kind of like how when you order something for a small online store, they use the services of a shipping company such as UPS to get it to you, instead of developing and entire freight network of their own. There’s a lot more to that process, but that’s the gist of it.

1 day ago
f0gax

When you want to put a website up for people to use, you need to put it on a computer that is connected to the Internet.

You can do that one of two ways (broadly): either buy your own computer and connect it to the Internet or put it on someone else's computer that's already connected to the Internet.

The second option is what we typically call "the cloud" these days. A number of companies operate clouds that people and businesses can subscribe to. Amazon has AWS. Microsoft has Azure, Google's is called GCS. There are also others of varying sizes, but those are three of the larger operations.

Early on in the history of cloud, the number of players was smaller. AWS was one of those. So organizations that wanted to be "in the cloud" would have started there. Because of that, there is a lot of knowledge around how to use the platform. As well as the platform itself being mature and feature-rich.

So you end up with a high number of organizations that have placed some, most, or all of their public-facing online presence in AWS. Thus, if AWS has an outage, those other orgs have an outage.

1 day ago
rlt0w

Those that go offline are those that didn't engineer their service for proper redundancy in AWS. A region can have issues (us-east-1 especially) but there are multiple regions each with multiple availability zones. If they've engineered their service correctly, it could still be served in any of the availability zones.

1 day ago
drlongtrl

For an individual person, AWS outages, just like Azure or GCloud outages, are a big deal because just so much is down at once.

For the companies using the service though, their up times are actually insanely good, MUCH better than what they or any small local business would be able to deliver.

1 day ago
JCS3

Think of the internet as roads and the websites and services you visit as buildings on those roads.

Because Amazon wants to be the destination that people go to when they need to buy something Amazon has spent a lot of money building, large buildings on the internet with large roads going to them. They also haven’t just built one building, they have built hundreds around the world so that everyone can quickly and easily get to one of their buildings.

Amazon realized that in addition to selling things on the internet, there might be other businesses that wanted to operate on the internet and have the same large and widespread network that they had built, so Amazon made the decision to offer web services (AWS). Essentially renting out space in one or more of their large buildings on the internet.

AWS then became a very important network for a lot of the internet. So in the rare instance that it has a problem, a lot of websites don’t work.

As for what can be done about it. Amazon is highly motivated to not have problems, so they invest in keeping their services up and running. Other business who rent space from Amazon, could rent space from other providers.

1 day ago
frank-sarno

For me, it was the ability to start up a project quickly without the high upfront cost of infrastructure. I didn't need to build out an entire environment but could use the AWS services on a per-usage based cost.

However, ongoing AWS costs are typically much more than an on-premise shop. The costs can quickly add up as users consume services but don't get rid of them.

Different parts of AWS can go down. Sometimes it's a service such as DNS or even connectivity issues. There have been a couple instances in a few years where they pushed software that broke things.

1 day ago
Few_Junket_1838

well, provided you have comprehensive backup strategies that replicate your data so that it is always accessible even if one of the copies in one of the storages cannot be accessed - then nothing really happens to you as a user as u still have access to your data

I found this useful: github backup best practices

1 day ago
needchr

The benefits that lure people in seem to be all of the automation, the ease of scaling, the low cost of entry, and the big one, inertia.

However there is a lot of downsides, such as unpredictable costs and that it can get super expensive very quickly.

Not everyone uses them, I expect you would also find a bunch of sites go down if cloudflare has a major outage.

I still host my content in a datacentre, and only in the last couple of years started dabbling with cloudflare as a CDN for that content.

Which brings me to my last point, datacentres havent really moved with the times, the standard port is still only a gigabit, some datacentres in 2025 still either give only 100mbit ports or cap gigabit ports below the port speed. There is datacentres still leasing out haswell era quad core intel's on 10 year old spindles, and capping the port to 250mbit outbound.

1 day ago
SaintTimothy

Not everyone, as some folks have pointed out there are 3 big players (Amazon, Google, Microsoft) in the cloud hosting business.

What happens when one of them goes down, down? A whole lot of companies you use experience outages.

Someone probably has a more appropriate example from when Azure went down a couple weeks ago. This was the first that came to mind, when one of the 4 main DNS routers went down.

That's a heck of a list (under affected services) https://en.m.wikipedia.org/wiki/DDoS_attacks_on_Dyn

1 day ago
VietOne

You want to make and sell candy. You need a way to make the candy and a way to sell it.

To make the candy you need a building, machines, workers, power, etc.

To sell the candy, you need a building, workers, shelves, etc.

You can take the time and find everything you need, or you can contact AWS and they can lend you their machines and workers.

Instead of the months or years of time and effort to start up your candy business, you can do it in days/weeks with AWS. AWS even has templates to get you started fast. You just fill in the blanks and press Build.

When it goes down, it's usually Domain Name System(DNS).

What this means is that every location has an easy name you can use. You wouldn't want to repeatedly tell someone that they should send deliveries to 3876 Jefferson St SW suite 154 Dawson, AK 73646-8376 so instead you decide that you and everyone else will call it "Dawson Store". You keep this friendly name to an address in an address book. To make it easier you put it online so everyone else can know.

But what happens when the address book can't be read anymore? Then it becomes difficult to know what the address was when people only know the friendly name.

1 day ago
aegrotatio

When S3 does go down it's usually because the us-east-1 region is having trouble.

So much of AWS relies on us-east-1 being up and running perfectly. Until recently the S3 endpoint had to use us-east-1 behind the scenes no matter what region you store your data in. Same with the AWS Console--it ran in us-east-1 only until very recently.

Many other services depend on us-east-1 being up. It's almost hypocritical.

1 day ago
my_beer

Most of the comments here are just about compute, cloud platforms offer a lot more than just compute. Cloud platforms provide a load of services that you could implement yourself and run on your own hardware but it is much easier to make it someone elses problem.
You want a secure, scalable reliable login system, sure you can build one (or use something open source), host it, update it, make sure it is secure, fix it when it breaks etc. but it is a hell of a lot easier just to use the one your cloud platform has.

1 day ago
dastardly740

A lot have got into why the heavy dependence. I want to mention that it takes a pretty significant problem to take out a single AWS availability zone (like a sub-region) let alone an entire AWS region (multiple availability zones). I can't think of a case where multiple regions have been down simultaneously.

hat losing a region takes down half the internet is a sign that those businesses are not using the redundancy capabilities that AWS provides to replicate to other regions. Which might make financial sense due to the cost of that level of redundancy versus the how often and how long a regional outage lasts. In addition, there is a bit of historical bias at AWS, that I don't know has even been mitigated, yet.

When AWS first started a lot of customers put their applications in the US-East region. And, not just US-East but the original availability zones in US-Easy. This resulted in US-East being the most capacity constrained because as those customers grew they would just provision more resources. And, unless you plan for it from the beginning, it can be fairly difficult to move your application even to another availability zone let alone another region. I read an outage report way back where an AWS employee made a configuration screw up that was the root cause of a regional outage in US East. Interestingly, the outage probably would not have happened or would have been only a slow down in any other region, but because US East had minimal idle capacity, it cascaded into a full failure where they had to actually transport hardware from another region to get enough capacity in place to be able to fix the problem and get everyone back up.

That one resulted in one of the "half the internet is down" issues because so many had started and grown in US East and had not done the additional work to become less dependent on a single overloaded AWs region.

1 day ago
Loki-L

AWS like the rest of Amazon are so successful because of economies of scale.

They can be cheap because they are so big because everyone uses them because they are so cheap.

It is a sort of virtuous cycle.

The same also applies for reliability. If you are big you can much easier be more reliable. wich leads to more customers which makes you bigger.

The same also goes for other companies making stuff specially meant to work with and on AWS, because they are the ones everyone is using and that ends up another reason to use them.

The same to a limited degree is also true for IT workers who can work with AWS, although they mostly can retrain easily between cloud platforms. But filling job openings for workers who have experience with AWS is easier since so many companies use it.

Only a few companies are anywhere near as big.

Microsoft and Google have their own competing offerings and they are not small business either. Plus you get regional companies that thrive because they can for example claim to not be US based like for example OVH in Europe.

But AWS leads the field, which means it is the top choice when some company pics where to host their new website, which means all those websites go down together when AWS does encounter a problem.

1 day ago
SilasTalbot

It is "infrastructure as a service"

Instead of buying computers, installing software, getting network connections to them, a firewall, updating the software regularly, etc. you can just rent that stuff from Amazon.

There are many benefits to this, one is that you can instantly scale up. If you suddenly need twice your servers, all you do is press a button.

This is one of the reasons that Zoom took off during the pandemic. They were a cloud first company, so when the global need for video conferencing suddenly went 20x, they were built around being able to just simply type in "I want 20x more servers please" and everything just happened automatically. So they were there and able to meet the demand, when other providers failed or just couldn't move fast enough.

Another way to think about it: if you are a race car driver, do you want to spend all your time in the garage building your race car? What do you want to build, versus get off the shelf? Do you want to make your own rubber for the wheels? Go mine the iron ore for your chassis? Clearly, no. You want to focus on what you're good at, and have some of that lower level work already handled for you. You want to be out there driving!

AWS is like a customizable stock car. You do have to put all the parts together, but you can order exactly the parts you want, and they come with instruction manuals and there's a lot of trained experts out there (AWS cloud engineers) who can put those things together for you in the garage real fast.

So your company can just write a check, and focus on the value add parts and not spend as much time and effort on the underlying parts.

1 day ago
trouphaz

One thing I'd like to add is that AWS and other cloud providers have some features that should significantly reduce the impact of issues like this. They have multiple availability zones in the same region and then separate regions. My company uses a lot of cloud resources as well as having 2 massive data centers. We have LOTS of problems with resiliency because application teams do not take advantage of the tools provided. That means having at least 2x the capacity available to run their workloads that are split across separate spaces and then having a proper load distribution solution out front that'll redirect traffic to whichever location is available.

The teams that setup things "correctly" where they have their application hosted in more than one location with proper load distribution often fall into the problem where they run active/active so load is handled on both sites... and then run both sites more than 50% so if one site goes offline, the other site doesn't have the capacity to run everything in one location.

Redundancy and resiliency are tough issues to sort out.

1 day ago
RepFilms

Marketing. Amazon has great marketing. Executives don't trust their own staff. They don't understand IT issues. They do understand the corp speak of Amazon marketing executives. AWS has convinced executives to dump IT staff and internal hardware to put their faith in Amazon.

1 day ago
boring_accountant

Making your servers and computers work is complicated, AWS makes this much simpler and sophisticated (although much more expensive). Lots of people use it, so when it goes down, everyone goes down.

1 day ago
ElvisAndretti

I was DBA for a company for eight years. With onsite servers we had no unplanned outages. None. Never. First year with AWS we had three. I still do not understand why they would want that.

13 hours ago
AnOtherGuy1234567

Back in the day (the '90s) if you ran a large website and needed a database, like Amazon you used Oracle. However it's very hard to migrate away from Oracle and their licensing terms are incredibly complicated and expensive. So from the late '90s LAMP (Linux, Apache, MySQL/MariaDB/MongoDB and PHP/Perl/ Python) increased in popularity. Eventually becoming pre-dominant for virtually all companies who hadn't got tied into Oracle or SAP in the '90s. Amazon spent years trying to get off Oracle. With Larry Ellison the boss of Oracle repeatedly telling Jeff Bezos, the head of Amazon. That he'd never be able to do it. Amazon set up AWS initially for their own internal use in order to get off Oracle and then opening it up to other companies.

AWS and other cloud providers such as Microsoft's Azure. Promise companies lower costs and higher reliability than on premesis solutions, with payments veing made on a regular basis, rather than many millions in one go for new seevers, which have to be replaced every X years. With some admins preferring cloud providers as when something goes wrong with their own servers. They have to pull their hair out until its fixed. But with Cloud providers, if AWS/Office 360/Azure goes down and they confirm that it's a global issue. Then there's nothing that they can do. Except for updating their intranet, sending out an email if they still work and updating their voice mail/on hold message. Then having a drink. Which is far less stressful and requires less experience and training. With IT needing far more continuous training than possibly any other profession. As what you learned at University was probably obsolete when they were teaching it. Let alone 20 years later.

13 hours ago
Devify

AWS (and other cloud providers) provide flexibility and a lot of redundancy.

You can run a website on your computer but if your power goes out, it's down. If you suddenly have 1k visitors it's not going to be able to handle that many people and go down.

So companies started using data centres where they buy a server (basically a fancy computer) but the data centre has things like backup power supplies so if power goes out, there's a backup generator and your website doesn't go down. The issue with using a general data centre is that you still own the server, so if there is an issue with it or you have a sudden increase in visitors, it might still go down until you can get it replaced or get an additional server installed.

So then places like AWS started offering infrastructure as a service. They have their own data centres with things like backup power. But instead of you owning the servers, AWS owns and manages the servers and you basically rent them. So if there is an issue with the server itself, they just move all your stuff to a different server that's working fine. It's generally more expensive than running your own servers in a data centre. But it means if you suddenly need 5 servers, you can have 5 ready to go within a couple of minutes rather than days or weeks.

13 hours ago