This article is part of the data lake set up tutorial. Sqoop is a tool developed by Apache which allows you to transfer data between an RDBMS and HADOOP. In this tutorial we will see how to set up SQOOP 1.4.7 on Windows 10.
If you do not know what HADOOP is or how it is set up then you can begin here.

Download the SQOOP package
You need to download the following package from it’s official download page; “sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz”. Download it from the following link.
Unpacking
Now unpack this ‘.tar’ file in your destination directory. In my case the destination directory is ‘D:/Data_lake/’. I simply copy the .tar file into this directory and run the following command in the command prompt.
tar -xvzf 'sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz'
Code language: JavaScript (javascript)
You can use any other software to unpack the files.
It might take a few seconds to unpack.
Setting Environment Variables
We need to add this to the path in System Variables. Search for environment variables in your windows search bar and click on ‘edit the system environment variables’. From their click on “Environment Variables…”. Next click on ‘New…’ under the ‘user variables’ tab.


Name this new variable: “SQOOP_HOME”
The path will be the path to where you unpacked the Sqoop tar files. See below screenshot for my path.

Configure SQOOP

In order to configure SQOOP, you will need Git Bash. Open the Git Bash terminal and execute the following commands.
**Please note you will need to install Git for this**
cd $SQOOP_HOME\\bin<br>./configure-sqoop
Code language: HTML, XML (xml)
Some warnings may appear in the terminal window but ignore those for now.
Verify Installation
Open command prompt and run the following command.
%SQOOP_HOME%\bin\sqoop.cmd version
Code language: CSS (css)

This command should execute without any error (like shown in above screenshot).
SQOOP is now ready to work.
Connecting SQOOP with SQLServer database
Sqoop uses a jdbc connector to make a connection between a RDBMS and Hadoop.
There are two ways to connect to SQL Server:
- Using Username and Password. For this you need to create a User on your SQL Server. Your connection string would be something like.
"jdbc:sqlserver://localhost:1433;user=<username>;password=<password>"
Code language: HTML, XML (xml)
and your entire sqoop command would be:
%SQOOP_HOME%\bin\sqoop.cmd list-databases --connect "jdbc:sqlserver://localhost:1433;user=<username>;password=<password>"
Code language: HTML, XML (xml)
The above command would list all the databases present in your server.
2. Using Windows Authentication. This method only works if your SQL Server resides on the same machine on which you are running SQOOP and you have the right permissions.
Your connection string would be:
"jdbc:sqlserver://localhost:1433;integratedSecurity=true"
Code language: JSON / JSON with Comments (json)
And similarly the SQOOP connect statement would be.
%SQOOP_HOME%\bin\sqoop.cmd list-databases --connect "jdbc:sqlserver://localhost:1433;integratedSecurity=true"
Code language: PHP (php)
Once connected to the RDBMS via the connection string, there are a lot of functions in sqoop which you can explore. We see all those in a separate blog post. Stay Tuned!!
Possible errors
- Failed to login to user.
In this case double check your username and password. If those are correct then run your command prompt as administrator (permission issues).
2. Missing sqljdbc_auth.dll file.
FIX: Go to the following page and download the sqlserver jdbc connector. Unzip the file and go to the directory.
/sqljdbc_6.0/enu/auth/x64(or x86 depending on your sytem)/sqljdbc_auth.dll
Copy this DLL file to the following path.
%HADOOP_HOME%\bin\
3. Sqoop Import Failed “Java.Lang.Classnotfoundexception: Org.Apache.Commons.Lang.Stringutils”
FIX: Download the commons-lang-2.6.jar file from the following page. Paste this file to the following path.
%SQOOP_HOME%\lib