Azure Data Engineer - Interview Preparation


Standard set of questions for requirement gathering:

Below are the some of standard questions asked during RFP / requirement gathering phase. You should design the solution as per the answers you get. You can add more questions related to use case.

Data:
1) How many sources are there from data to be ingested?
2) What are date types you want to load on Data lake?
3) What is size/volume of data from each source?
4) Complexity/Quality of data
5) Is Historical data in scope? if yes, what is size? For how many years
6) What is frequency to load data?
7) What is expected size of data growth over month/quarter or year?
8) How much history data you want to keep in data lake?
9) What are data archival requirements?
10) For API sources, what will be authentication type?

Non-functional:
1) Do you want logical grouping of data? e.g. region wise, subject area wise etc
2) At what level you decide to implement security?
3) Do you want to encrypt the data?
4) Is there any PII / sensitive data to be ingested?
5) Are there any SLA decided to ingest data till consumption layer?
6) How many users will be accessing the data parallel?

How to talk in interview:

  • In interview you should always talk on high level first. Start with design of your data load application. Most of folks do mistake by explaining project requirements in detailed technical terms like we load text file into Sql Server via Azure Data Factory. Be modest. Few examples.
            1) The requirement of our client was to ingest data from heterogeneous sources to Azure Data Lake.
            2) We designed 3 layers - Landing / Processed / Consumption. Landing will have data arriving from source as it is in raw format. After transformation, we keep data in Processed layer .. etc

  • Do not talk more on security or infra part if you are not aware in detail.
  • Microsoft keeps adding new feature every week. Do not emphasis on any limitation of Azure services bcz when you are appearing for interview, Microsoft might have overcame that limitation. e.g. In ADF, SFTP as sink was not supported earlier which is now supported. As per new changes, your design will change.
  • Earlier you were master in single tool e.g. SSIS. In cloud world, you must know multiple tools - e.g. ADF, Databricks plus SQL DW (Synapse Analytics). All have capabilities to read, processed and store data. You should take wise decision while designing the solution considering cost, efficiency and capability of tools.  e.g. SQL DB can processed only structured data while Databricks process any type of  data. In interview, while justifying the chosen tool (even not by you but architect), you must know why particular tool is used. 
  • Keep all challenges noted somewhere. While working on any Azure service, you might have come across challenges, keep all noted. It shows you have actually worked on Azure project. e.g. There is limited support for native Python libraries (e.g. Panda) on Azure Databricks High Concurrency cluster.
  • Since cloud platforms are new in world, most of interviewers will focus on design of your project rather than your technical skills. Show your technical skills as per expected role. e.g. If requirement is for developer and not for architect, while talking focus on what you have contributed in project. e.g. Created n number of ADF pipelines connecting to source and ingesting n GBs/TBs data; improved performance by configuring parallelism; data partitioning by datetime etc etc.

Certification and Sample Tests:
Prepare for certifications like DP-200 and DP-201. This will boost your confidence as you come across variety of topics / use cases in Azure.

Sample tests:

DP200 test on Mindhub                                Azure Data Engineering test on Udemy - click here