Posts

Showing posts from July, 2024

Delta Live tables DLT

 # import relevant libraries import dlt from pyspark.sql.functions import * from pyspark.sql.types import * if you are using csv file , it will not use a schema. So we have to manually mention the schema . But for parquet it is not needed as schema is associated with it. Schema=StructType(StructField("cust_name" ,IntegerType()),StructField("cust_id" IntegerType())) ingest data using autoloader specify the cloud files format and other etails cloud_file_option={ cloudFiles.format="CSV" } create delta live tables using incoming data from storage account location or catalog @dlt.table ( name="dim_customer" #provide this same name in below function  description="provide a description" ) def dim_customer() df=spark.readStream.format("cloudfiles").options(**cloud_file_option).load("folder path where files stores") df.withColumn("Processed_Time",date_format(current_timestamp,'YYYYMMDD...