Our Big Blogs

New Internship: Building Realtime data pipelines with streamsets

Written by Matthias Vallaey | Oct 3, 2016 12:50:35 PM

Big Industries is the foremost one-stop advanced systems integration partner for Hadoop and NoSQL in Benelux. Our client base covers telco, financial services, public sector, transport & logistics, media and pharmaceuticals, and we are continuously looking for ways to reinvent, improve and refine our service offer.

Therefore Big Industries is now looking for motivated interns to help us to prepare and build high quality reference architectures for use as accelerator templates for customer projects.

The objective of the internship will be to help us to create reusable architecture blueprints that can help us rapidly build successful, high quality and robust customer implementations time and again.

The following design blueprints should be developed, deployed, secured, demonstration integrations built, the solution should be stress and soak as well as functionally tested, validated and documented (including constraints, limitations and lessons learned):

       

  • Hadoop and Hive based data warehouse cluster with near realtime continuous data ingestion using StreamSets
  • Hadoop and Impala/Kudu based data warehouse cluster with near realtime continuous data ingestion using StreamSets
  • Cassandra based data warehouse cluster with near realtime continuous data ingestion using StreamSets

Interns are expected to be self-starters able to manage a small project, with an appetite for BI, data integration and architecture; and will gain exposure to industry leading enterprise and open source data integration, data warehousing and data visualization technologies.