Databricks DE Associate Certification

Chanukya Pekala
2 min readDec 22, 2022

Ever since I joined in the new company, I started leveraging azure-databricks majorly for data processing and storage. Its been sometime, I am associated with Databricks and I decided to give it a try to take the exam.

Studying for certification has helped me to understand how much wide the scope is for Databricks. Most of the concepts are generally available which we could take into use and also deploy to production.

  • Some of the interesting concepts are delta-lake, unity-catalog, delta-live-tables and auto-loader, will make life easier as DE, when you have seen some life already in Hadoop ecosystem.
  • Alongside DE, there is also enough wide scope for ml/ai leveraging ml-flow
  • For BI there is DB SQL, which is for BI Specialists/analysts — who are happy use SQL interface, just by using SQL Warehouse, they could connect to any of your objects.
  • Auto alerting to email/webhooks over query anomalies or dashboard peaks is truly useful when jobs are in production.
  • Along with databricks, some technologies complement are Terraform, GitHub Actions, Airflow, at least to put things in production safely.
  • Unity Catalog redefines the governance aspect, which does the logging, querying, auditing all from the hierarchy from top metastore level to lowest object level.
  • Delta Live Tables gives declarative approach of creating pipelines and majorly for creating qualitative pipelines — It is a territory which is growing big and could be the next big thing or already is, maybe!

Deployments

  • I think there are several ways out here — options are multiple
  • with python, we can create egg or whl files
  • with Java/Scala, you can create a JAR and copy to the cluster
  • dependencies can be added as additional packages
  • Jobs API is easier to run in different styles — like with notebook, task, JAR, python whl files etc

ML Flow

  • You can create and track the model runs as well
  • Model registry is so helpful when it comes to ML Ops or life cycle of the model

Certification Details

  • Academy videos and practice labs by creating cluster, endpoint/warehouse, DLT pipeline, jobs, tasks, streaming jobs, data analysis with dashboards are helpful.
  • Practice tests from Udemy are recommended to get the hang of questioning style.

If you have experience with hadoop/spark/databricks and sql, its an easy ride.

Credentials for the certification is attached here.

--

--