Internship: Exploring Near Real-Time Data Processing at Scale with Apache Flink and Apache Spark

in Pully
Praktikum nicht angegeben Student
  • Job Identification: 2295
  • Posting Date: 21.02.2025
  • Job Schedule: Full time
  • Company: ELCA Informatique SA

About Us

We are ELCA, one of the largest Swiss IT tribe with over 2,300 experts. We are multicultural with offices in Switzerland, Spain, France, Vietnam and Mauritius. Since 1968, our team of engineers, business analysts, software architects, designers and consultants provide tailor-made and standardized solutions to support the digital transformation of major public administrations and private companies in Switzerland. Our activity spans across multiples fields of leading-edge technologies such as AI, Machine & Deep learning, BI/BD, RPA, Blockchain, IoT and CyberSecurity.

Job Description

Description

This internship offers an in-depth exploration and comparison of two leading stream processing frameworks — Apache Flink and Apache Spark — within the context of near real-time data processing.

The intern will gain hands-on experience designing and implementing near real-time data pipelines using Apache Kafka as the messaging backbone, and processing data streams with both Flink and Spark.

The project will include the development of practical use cases involving near real-world data sources, such as event streams from databases or web activity logs.

The final deliverable will consist of performance benchmarks, scalability assessments, and recommendations outlining the strengths and limitations of each framework across different data streaming scenarios.

Objectives

  • Understand the fundamental concepts of Kafka, Flink, and Spark, including their architecture and use cases.
  • Implement a pipeline to process streaming data from a single source using Kafka and Flink/Spark, gain insights about the technologies and test optimizations.
  • Build a second pipeline with a more complex setup:
  • Database → Debezium → Kafka → Flink/Spark → Operational and Analytical Queries.
  • Handle multiple tables and implement watermarking to ensure synchronized data processing.
  • Compare Flink and Spark based on performance, ease of use, and suitability for specific use cases.
  • Document findings and propose guidelines for choosing between the two frameworks.

Our offer

  • A dynamic work and collaborative environment with a highly motivated multi-cultural and international sites team
  • Various internal coding events (Hackathon, Brownbags), see our technical blog
  • Monthly After-Works organized per locations

Skills required

Core Skills:

  • Basics of data engineering and distributed systems.
  • Knowledge of SQL and database concepts (e.g., relational databases, transactions).
  • Understanding of streaming concepts and data pipelines

Technical Skills:

  • Familiarity with Docker and containerized environments.
  • Knowledge of Kafka and concepts like producers, consumers, topics, and partitions.
  • Programming skills in Python, Java or Scala.
  • Understanding of event-driven architectures.
  • Exposure to cloud platforms (e.g., AWS, Azure, or GCP) is an advantage.

Other Skills:

  • Analytical thinking and problem-solving skills.
  • Ability to learn new tools and technologies quickly.
  • Interest in benchmarking and performance evaluation.
Sprich uns an! Unser Recruiting Team freut sich darauf, Dich kennenzulernen! Kontakt
Am 24.04.2025 veröffentlicht. Originalanzeige