Senior Data Engineer (Databricks)

Remote
Full Time
Mid Level
Position Overview 

At Hypersonix, we are building the leading Generative AI Platform for Commerce. Our Flagship GenAI Product – Competitor + Pricing AI – scrapes the product catalogue for our Enterprise customers and their competitors, and uses RAG to identify the nearest competitive match for each of our customer's product, facilitating intelligent pricing strategies that were previously impossible to achieve. 

We are seeing strong growth in our Enterprise product, and are building out an end-to-end product on Databricks for Shopify Store owners, specializing in Agentic workflows that automate critical business processes (Pricing + Promotion Strategies, Inventory Management, and Competitive Intelligence). 
We are seeking an experienced Senior Data Engineer to design, build, and optimize scalable data pipelines and infrastructure. The ideal candidate will have deep expertise in Databricks and modern data engineering practices, with a strong focus on building robust, production-grade data solutions that drive business value while maintaining cost efficiency. 

Key Responsibilities: 

Data Platform Development 

Design and implement enterprise-scale data pipelines using Databricks on AWS, leveraging both cluster-based and serverless compute paradigms 

Architect and maintain medallion architecture (Bronze/Silver/Gold) data lakes and lakehouses 

Develop and optimize Delta Lake tables for ACID transactions and efficient data management 

Build and maintain real-time and batch data processing workflows 

Engineering Excellence 

Create reusable, modular data transformation logic using DBT to ensure data quality and consistency across the organization 

Develop complex Python applications for data ingestion, transformation, and orchestration 

Write optimized SQL queries and implement performance tuning strategies for large-scale datasets 

Implement comprehensive data quality checks, testing frameworks, and monitoring solutions 

Cost Management & Optimization 

Monitor and analyze Databricks DBU (Databricks Unit) consumption and cloud infrastructure costs 

Implement cost optimization strategies including cluster right-sizing, autoscaling configurations, and spot instance usage 

Optimize job scheduling to leverage off-peak pricing and minimize idle cluster time 

Establish cost allocation tags and chargeback models for different teams and projects 

Conduct regular cost reviews and provide recommendations for efficiency improvements 

DevOps & Infrastructure 

Design and implement CI/CD pipelines for automated testing, deployment, and rollback of data artifacts 

Configure and optimize Databricks clusters, job scheduling, and workspace management 

Implement version control best practices using Git and collaborative development workflows 

Collaboration & Leadership 

Partner with data analysts, data scientists, and business stakeholders to understand requirements and deliver solutions 

Mentor junior engineers and promote best practices in data engineering 

Document technical designs, data lineage, and operational procedures 

Participate in code reviews and contribute to team knowledge sharing 

Required Qualifications: 

Technical Skills 

5+ years of experience in data engineering roles 

Expert-level proficiency in Databricks (Unity Catalog, Delta Live Tables, Workflows, SQL Warehouses) 

Strong understanding of cluster configuration, optimization, and serverless SQL compute 

Advanced SQL skills including query optimization, indexing strategies, and performance tuning 

Production experience with DBT (models, tests, documentation, macros, packages) 

Proficient in Python for data engineering (PySpark, pandas, data validation libraries) 

Hands-on experience with Git workflows (branching strategies, pull requests, code reviews) 

Proven track record implementing CI/CD pipelines (Jenkins, GitLab CI) 

Working knowledge of Snowflake architecture and migration patterns 

Additional Technical Skills: 

Experience with Apache Spark and PySpark optimization techniques (caching, partitioning, broadcast joins) 

Understanding of data modeling concepts (dimensional modeling, data vault, normalization) 

Knowledge of orchestration tools (Airflow, Databricks Workflows) 

Familiarity with cloud platforms (AWS, Azure, or GCP) and their data services 

Experience with data governance , security frameworks and SOC2 

Ability to use Databricks system tables and monitoring tools for cost analysis 

Professional Skills: 

Strong problem-solving abilities and analytical thinking 

Excellent communication skills with technical and non-technical audiences 

Ability to work independently and drive projects to completion 

Experience with Agile/Scrum methodologies 

Cost-conscious mindset with ability to balance performance and budget constraints 

Preferred Qualifications: 

Databricks certifications (Data Engineer Associate/Professional, Platform Administrator) 

Knowledge of data quality frameworks (Great Expectations, Deequ) 

Experience with container technologies (Docker, Kubernetes) 

Familiarity with Lakehouse architecture patterns and best practices 

Experience migrating from traditional data warehouses to modern data platforms 

Nice to Have: 

Familiarity with e-commerce and retail domain 

Experience with reverse ETL and data activation tools 

Knowledge of data catalog and metadata management tools 

Track record of achieving significant cost reductions (20%+) in cloud data platforms 

Experience with data mesh or data fabric architectural patterns 

Familiarity with Power BI, Tableau, or other BI tools 
Share

Apply for this position

Required*
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*