Data Engineering Interview Questions
Something a little different for this post…
I was asked to put together a set of interview questions that could be used on a cohort of data engineering candidates with a varied level of experience.
This seemed like a good thing to blog for the wider community to review and feedback on. Please also accept the obvious caveats that the point of the questions was/is not to have technical debates on what the perfect state of a given platform could/should be. But simply to tease out the differences in candidate capabilitities and where experience can be used to identify possible misleading or vague questions. With some overlap in content/answers.
Also, to award bonus points if those candidates follow my blog posts!
General and Theoretical questions
What does SQL stand for?
What does MDX stand for?
What does DAX stand for?
What does IaaS mean?
What does PaaS mean?
What does NoSQL mean?
What is meant by the term ‘serverless’?
What is the primary use case for an OLTP database?
What is the primary use case for an OLAP database?
Whats the difference between and OLTP and OLAP database?
In computer science database transaction processing what does ACID stand for?
Why are ACID resilient transactions important for data processing workloads?
What is meant by a heap table?
Explain the main concepts for the creation of a Kimball star schema data model.
How would you decide what gain to apply to a data warehouse fact table?
What is the difference between scaling up compute and scaling out?
What is the difference between a Lambda and Kappa architecture?
What are the key characteristics of a landing zone architecture?
What is the difference between Data Mesh and Data Fabric?
What is the role of the semantic layer in an analytics platform?
Explain what is meant by predicate push down.
In cloud data processing why is it important to decouple compute and storage?
When can ingested data be described as real-time?
What is meant by stream processing?
What is meant by data eGress vs inGress in the context of a cloud platform?
What are the five V’s used to characterise big data?
What is big data?
Azure Technology Related Questions
In SQL Databases what is the difference between a clustered and non-clustered index?
What is the role of a DataFrame in Apache Spark?
What is the difference between an Azure Data Factory Web Activity and a Web Hook Activity and why?
When does an Azure Function App become durable?
How is a Spark Application handled and executed by a Databricks cluster?
What is the difference between a Databricks job cluster and an interactive cluster?
What is meant by a clusters time to live (TTL)?
What is a Spark Session in Synapse Analytics?
What is the difference between a Data Lake and Delta Lake?
What is the underlying file system used by most Data Lake product offerings?
How is a Delta Lake entity represented on disk in the storage layer?
For an analytical data platform how would you support and implement disaster recovery requirements?
What endpoints could be used to handle a real-time data feed or messages?
What is the difference between a Private Endpoint and a Service Endpoint?
What resources can be used to orchestrate data processing in Azure?
What is the difference between a Resource Group and a Subscription in Azure?
What is the difference between a Service Principal and a Managed Identity?
When should we use Power BI Premium data models vs Azure Analysis Services?
What is the difference between a SQL Database and a Synapse Dedicated SQL Pool? Formly known as a SQL Data Warehouse?
If you found this useful please let me know. I’d like to support the next generation of data engineers as much as possible.
Many thanks for reading