Senior Software/Data Engineer


Senior Software/Data Engineer

Full time • London

The role

At DeepCrawl we are solving big data problems. We crawl absolutely huge websites, producing many terabytes of data and even more after we process it. We then apply queries at scale in order to provide world class insights to our customers who need to manage their SEO.

You will be joining an experienced team of engineers to create testable, monitored, efficient and documented big data solutions.

Our Stack

AWS Athena / Presto SQL, AWS Glue, AWS EMR, Apache Spark , Apache Livy, AWS DynamoDB, AWS StepFunctions, AWS S3, AWS Lambda, Hive, Elasticsearch, Love

What you’ll get to do:

  1. Collaborate with algorithm and platform engineers to offer best in class architecture solutions, set standards and help others in keeping them
  2. Contribute to the team that implements our new serverless web crawling solution
  3. Be the go to person for all things AWS including Kinesis, Firehose, Step Functions, Lambda, DynamoDB and general SAM/Orchestration solutions
  4. Design and develop serverless applications with Node.js both for prototyping and production code using TypeScript and TDD methodology
  5. Produce solution (C4 or alike) and data pipeline diagrams, documentation, release and maintenance strategy

What you’ll bring:

  1. Mastery of architecting solutions using AWS and infrastructure services
  2. Strong knowledge of crawling concepts such as page levels, web performance metrics, web architecture, W3C standards and scraping tools such as Puppeteer, Chrome Dev protocol
  3. Considerable experience and passion for designing and developing serverless applications with Node.js both for prototyping and production code, using TypeScript or Python and TDD methodology
  4. Extensive experience working with high-volume data processing requirements including Big Data/event streaming and async messaging architectures and CQRS
  5. Scripting and automation expertise (Python, Bash, etc.)
  6. Proficiency in writing applications that use streaming of large amounts of data using Kinesis/Firehose

We’d especially love it if you have:

  1. Dabbled with Hashicorp Terraform (IaC) and/or AWS CloudFormation
  2. Worked with Spark and Scala and/or PySpark
  3. Imported large datasets into Elasticsearch and optimizing queries for serving that data

Apply for Role