Create the Text Format table
//Getting size of the underlaying folders
hadoop fs -du -h /user/hive/warehouse/trendytech.db
hadoop fs -ls /user/hive/warehouse/trendytech.db/country_bucket
For each partition there will be one folder created
hadoop fs -ls /user/hive/warehouse/trendytech.db/country_bucket/country=estonia
will list all the files under the partitioned folder
CREATE TABLE IF NOT EXISTS country_input(
longitide float,
lattitude float,
number int,
street string,
unit string,
city string,
district string,
region string,
postcode int,
id string,
hash string,
country string
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE
//loading the table from the file located in the path
load data local inpath '/Users/basan/datasetnew' into table country_input
select * from country_input where country='belgium' and street='Rue Ketels';
//Create table in ORC FORMAT
CREATE TABLE IF NOT EXISTS country_input_orc(
longitide float,
lattitude float,
number int,
street string,
unit string,
city string,
district string,
region string,
postcode int,
id string,
hash string,
country string
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS orc
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' is not needed in ORC file format.
For populating into ORC table from the text file
insert into table country_input_orc select * from country_input
/user/hive/warehouse - location for ORC tables
ORC tables will occupy less disk space and selection will be much faster compared to TEXTFORMAT file.
If we want to still make the queries faster then we have to use partitioning and bucketing. For each partition there will be a folder created, and
for each bucket there will be a file created under the partitioned folder. Once we define the size of bucket we cannot change, if needs to be changed then
we have to create fresh table and import data from old table
CREATE TABLE IF NOT EXISTS country_bucket(
longitide float,
lattitude float,
number int,
street string,
unit string,
city string,
district string,
region string,
postcode int,
id string,
hash string,
country string
)PARTITIONED BY (country string) CLUSTERED BY (street)
into 32 buckets
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE
We have created table with country partitioning and 32 buckets
//Getting size of the underlaying folders
hadoop fs -du -h /user/hive/warehouse/trendytech.db
hadoop fs -ls /user/hive/warehouse/trendytech.db/country_bucket
For each partition there will be one folder created
hadoop fs -ls /user/hive/warehouse/trendytech.db/country_bucket/country=estonia
will list all the files under the partitioned folder
//Getting size of the underlaying folders
hadoop fs -du -h /user/hive/warehouse/trendytech.db
hadoop fs -ls /user/hive/warehouse/trendytech.db/country_bucket
For each partition there will be one folder created
hadoop fs -ls /user/hive/warehouse/trendytech.db/country_bucket/country=estonia
will list all the files under the partitioned folder
No comments:
Post a Comment