Powered By Blogger

Tuesday, November 29, 2022

Writing empty dataframe spark

 import spark.implicits._


val data=Seq()

val columns = Seq("firstname","lastname","country","state")

import spark.implicits._

val df = data.toDF(columns:_*)



println("emptydataframe write succeed")

val df = spark.emptyDataFrame

val path = "/Users/basan/Documents/sparktest/emptyFolder2"

df.write.format("orc").save(path)


Thursday, November 10, 2022

CDAP connect to external DB

 Add below properties in cdap-site.xml


<property>

    <name>data.storage.implementation</name>

    <value>postgresql</value>

    <description>

      PG as metadata store

    </description>

  </property>


  <property>

    <name>data.storage.sql.jdbc.connection.url</name>

    <value>jdbc:postgresql://localhost:5432/basanversion3</value>

    <description>

      PG JDBC details

    </description>

  </property>


  <property>

    <name>data.storage.sql.jdbc.driver.name</name>

    <value>org.postgresql.Driver</value>

    <description>

      PG jdbc driver

    </description>

  </property>


  <property>

    <name>data.storage.sql.jdbc.driver.external</name>

    <value>true</value>

    <description>

      Indicates whether the JDBC driver has to be loaded from an external directory.

      If true, then the JDBC driver directory has to be specified using

      "data.storage.sql.jdbc.driver.directory".

      If false, then the JDBC driver is present in the CDAP classpath.

      This config can only be used when the storage implementation is postgresql.

    </description>

  </property>



  <property>

    <name>data.storage.sql.jdbc.driver.directory</name>

    <value>/Users/basan/Documents/CDAP/git/postgres</value>

    <description>

      The base directory for storing JDBC driver jars.

      Sub-directory with the name that matches with the value of "data.storage.implementation" setting

      will be searched for the corresponding JDBC driver and

      dependencies jars to connect to the configured sql instance.

      The JDBC driver class to load has to be specified using "data.storage.sql.jdbc.driver.name".

      This config can only be used when the storage implementation is postgresql.

    </description>

  </property>



Create the db in postgres with the name basanversion3 

Place the jar postgresql-42.2.24.jar  in the location /Users/basan/Documents/CDAP/git/postgres

Tuesday, November 8, 2022

Pivot of Column

 


val orderData = Seq(("ordernumber1", "Shipt", 3000 , "Texas"),
("ordernumber1", "DriveUp", 4600, "Florida"),
("ordernumber2", "ShipToHome", 4100 , "California"),
("ordernumber3", "Shipt", 3000, "Pennsylvania"),
("ordernumber4", "DriveUp", 5000, "Texas"),
("ordernumber5", "ShipToHome", 3300, "Florida"),
("ordernumber6", "ShipToHome", 3900, "California"),
("ordernumber7", "ShipToHome", 7000, "Pennsylvania"),
("ordernumber8", "DriveUp", 4100, "Texas"))

val df = orderData.toDF("orderNumber", "orderType", "amount" , "state" )

+------------+----------+------+------------+
|orderNumber |orderType |amount|state |
+------------+----------+------+------------+
|ordernumber1|Shipt |3000 |Texas |
|ordernumber1|DriveUp |4600 |Florida |
|ordernumber2|ShipToHome|4100 |California |
|ordernumber3|Shipt |3000 |Pennsylvania|
|ordernumber4|DriveUp |5000 |Texas |
|ordernumber5|ShipToHome|3300 |Florida |
|ordernumber6|ShipToHome|3900 |California |
|ordernumber7|ShipToHome|7000 |Pennsylvania|
|ordernumber8|DriveUp |4100 |Texas |
+------------+----------+------+------------+



val df2 = df.groupBy("orderType").pivot("state").agg(max("amount") as "maxAmount")
df2.show(false)

+----------+----------+-------+------------+-----+
|orderType |California|Florida|Pennsylvania|Texas|
+----------+----------+-------+------------+-----+
|ShipToHome|4100 |3300 |7000 |null |
|Shipt |null |null |3000 |3000 |
|DriveUp |null |4600 |null |5000 |
+----------+----------+-------+------------+-----+


val df2 = df.groupBy("orderType").pivot("state").agg(max("amount") as "maxAmount" , min("amount") as "minAmount")

+----------+--------------------+--------------------+-----------------+-----------------+----------------------+----------------------+---------------+---------------+
|orderType |California_maxAmount|California_minAmount|Florida_maxAmount|Florida_minAmount|Pennsylvania_maxAmount|Pennsylvania_minAmount|Texas_maxAmount|Texas_minAmount|
+----------+--------------------+--------------------+-----------------+-----------------+----------------------+----------------------+---------------+---------------+
|ShipToHome|4100 |3900 |3300 |3300 |7000 |7000 |null |null |
|Shipt |null |null |null |null |3000 |3000 |3000 |3000 |
|DriveUp |null |null |4600 |4600 |null |null |5000 |4100 |
+----------+--------------------+--------------------+-----------------+-----------------+----------------------+----------------------+---------------+---------------+