Use-cases/Project
2 min read

Running Spark on Amazon Web Services (AWS)

running apache spark on aws
Source: Acast-Tech https://medium.com/acast-tech/

When you search thought the net looking for methods of running Apache Spark on AWS infrastructure you are most likely to be redirected to the documentation of AWS EMR (Elastic Map Reduce) service, which is Amazon's Hadoop distribution suited to run in AWS cloud environment. It's quite an easy way to deploy your data pipelines, but sometimes bootstrapping a huge cluster to perform simple ad-hoc analysis it's a cumbersome task. They say:

"to a man with a hammer everything looks like a nail" :)

and we felt into this trap with EMR once.

The article below describes two other ways of running Apache Spark jobs on AWS-managed infrastructure - AWS Glue and AWS Fargate - that we use on our clients' data warehousing projects. You will find there the key differences between these methods when it comes to flexibility and pricing, showing why there is no place for "one service fits all" approach in AWS world.

Check out!

big data
spark
AWS
Amazon Web Services
18 December 2019

Want more? Check our articles

Big Data Tech Warsaw Summit 2019 summary

It’s been already more than a month after Big Data Tech Warsaw Summit 2019, but it’s spirit is still among us — that’s why we’ve decided to prolong it…

Read more

Business value of event processing - use cases

Every second your IT systems exchange millions of messages. This information flow includes technical messages about opening a form on your website…

Read more

Truecaller, GetInData and Google’s contribution to Big Data Tech Warsaw Summit

GetInData, Google and Truecaller participate in the Big Data Tech Warsaw Summit 2019.It’s already less than two weeks to the 5th edition of Big Data…

Read more

White Paper: Big Data Technologies in the Aviation Industry

AboutIn this White Paper we described use-cases in the aviation industry which are the most prominent examples of Big Data related implementations…

Read more

2³ Reasons To Speak at Big Data Tech Warsaw 2020 (February 27th, 2020)

Big Data Technology Warsaw Summit 2020 is fast approaching. This will be 6th edition of the conference that is jointly organised by Evention and…

Read more

Enabling Hive on Spark on CDH 5.14 — a few problems (and solutions)

Recently I’ve had an opportunity to configure CDH 5.14 Hadoop cluster of one of GetInData’s customers to make it possible to use Hive on Spark…

Read more

Contact us

Fill out this simple form. Our team will contact you promptly to discuss the next steps.

hello@getindata.comFist bump illustration

Any questions?

Choose one
By submitting this form, you agree to our  Terms & Conditions