Use-cases/Project
2 min read

Running Spark on Amazon Web Services (AWS)

running apache spark on aws
Source: Acast-Tech https://medium.com/acast-tech/

When you search thought the net looking for methods of running Apache Spark on AWS infrastructure you are most likely to be redirected to the documentation of AWS EMR (Elastic Map Reduce) service, which is Amazon's Hadoop distribution suited to run in AWS cloud environment. It's quite an easy way to deploy your data pipelines, but sometimes bootstrapping a huge cluster to perform simple ad-hoc analysis it's a cumbersome task. They say:

"to a man with a hammer everything looks like a nail" :)

and we felt into this trap with EMR once.

The article below describes two other ways of running Apache Spark jobs on AWS-managed infrastructure - AWS Glue and AWS Fargate - that we use on our clients' data warehousing projects. You will find there the key differences between these methods when it comes to flexibility and pricing, showing why there is no place for "one service fits all" approach in AWS world.

Check out!

big data
spark
AWS
Amazon Web Services
18 December 2019

Want more? Check our articles

highly available airflow cluster aws notext
Tutorial

Highly available Airflow cluster in Amazon AWS

These days, companies getting into Big Data are granted to compose their set of technologies from a huge variety of available solutions. Even though…

Read more
dynamodb aws jedraszewski getindata big data blog
Tutorial

Amazon DynamoDB - single table design

DynamoDB is a fully-managed NoSQL key-value database which delivers single-digit performance at any scale. However, to achieve this kind of…

Read more
wp stream blogingobszar roboczy 1 4x 100
Whitepaper

White Paper: Stream Processing Explained

Stream Processing In this White Paper we cover topic such as characteristic of streaming, the challegnges of stream processing, information about open…

Read more
16kxTuxGkZjskytKJLQKsJg
Tech News

Two BI companies in play. Tableau acquired by Salesforce and Looker by Google.

The two recently announced acquisitions by Google and Salesforce in the thriving business analytics market appear to be strategic moves to remain…

Read more
lean big data 1
Tutorial

Lean Big Data - How to avoid wasting money with Big Data technologies and get some ROI

During my 6-year Hadoop adventure, I had an opportunity to work with Big Data technologies at several companies ranging from fast-growing startups (e…

Read more
cloud computing insurance
Tutorial

Cloud computing standard for the insurance industry

On June 16, 2021, the Polish Insurance Association published the Cloud computing standard for the insurance industry. It is a set of rules for the…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

The administrator of your personal data is GetInData Sp. z o.o. Sp.k with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the  Terms & Conditions. For more information on personal data processing and your rights please see  Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy