Apache Spark SQL / Hive: Create External Table based on File in HDFS

To create a Table based on a file located in HDFS, we'll proceed as follow:

Update the file/folder to HDFS:

hadoop fs -put /local/source/location /hdfs/destination/location

Create the table using the below SQL:

CREATE TABLE sample_table(
        key STRING,
        data STRING)
USING CSV  -- This is based on the format of your source files
OPTIONS ('delimiter'=',',  -- This only needed for delimited file.
        'path'='hdfs:///hdfs/destination/location')

We can the now query our table:

SELECT *
FROM sample_table

References:
SparkSQL Documentation - Create Table

PS:
I wrote this to also help myself retrieve the solution faster.

Mike Houngbadji @mikekenneth77

Apache Spark SQL / Hive: Create External Table based on File in HDFS

Comments 0 total