3 minutes
Connect Python Azure SQL DB using Pandas
The other day I wanted to connect pandas
to Azure SQL DB and boy took me longer than I wanted.
In the end I solved my problem. Although, connecting to DBs always involves more complex things like DB drives and OS dependencies the part I struggled the most was with the connection string 😅.
Microsoft, in its documentation, which by the way is good, gives good example and steps on setting up the environnement and how to connect through pyodbc
.
Nevertheless, I wanted to connect using sqlalchemy
.
Both works fine with pandas
but I specifically wanted to connect with sqlalchemy
for reasons I will get into later in this post.
I run my code in my Windows working computer always in WSL (Ubuntu) so I will go step by step on what you need to get your environment running and also a starting code to get you connected.
steps are
- Prepare the environment
- Connect
1. Prepare the environnement
OS dependencies
SQL server driver.
To install in linux the driver “Microsoft ODBC 18” just follow this instructions
for Ubuntu here is the script
if ! [[ "18.04 20.04 21.04" == *"$(lsb_release -rs)"* ]];
then
echo "Ubuntu $(lsb_release -rs) is not currently supported.";
exit;
fi
sudo su
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list > /etc/apt/sources.list.d/mssql-release.list
exit
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18
# optional: for bcp and sqlcmd
sudo ACCEPT_EULA=Y apt-get install -y mssql-tools18
echo 'export PATH="$PATH:/opt/mssql-tools18/bin"' >> ~/.bashrc
source ~/.bashrc
# optional: for unixODBC development headers
sudo apt-get install -y unixodbc-dev
Python environment
for the examples below install pyodbc
sqlalchemy
and pandas
pip install pyodbc SQLAlchemy pandas
2. Connect
Now comes the easy… part… the code
Connecting using pyodbc
import os
import pandas as pd
import pyodbc
server = '<my-server-name>.database.windows.net'
database = '<my-db-name>'
driver= '{ODBC Driver 18 for SQL Server}'
username = os.getenv('SQL_AZURE_USERNAME')
password = os.getenv('SQL_AZURE_PASSWORD')
connection_string = f'DRIVER={driver};SERVER=tcp:{server};PORT=1433;DATABASE={database};UID={username};PWD={password}'
# query system views
with pyodbc.connect(connection_string) as conn:
query = '''
SELECT
[name]
,[database_id]
FROM [sys].[databases];
'''
df = pd.read_sql(sql=query, con=conn)
print(df.head())
This will print to the output the first 10 rows of the query result of the data frame.
It works fine but pandas
will issue the following warning
UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy
Connecting using sqlalchemy
The documentation in SQLAlchemy does not goes great length explaining the connection string with SQL Server DB and after a few stack overflow posts here is a working code.
import os
import pandas as pd
from sqlalchemy import create_engine
server = '<my-server-name>.database.windows.net'
database = '<my-db-name>'
driver= '{ODBC Driver 18 for SQL Server}'
username = os.getenv('SQL_AZURE_USERNAME')
password = os.getenv('SQL_AZURE_PASSWORD')
odbc_params = f'DRIVER={driver};SERVER=tcp:{server};PORT=1433;DATABASE={database};UID={username};PWD={password}'
connection_string = f'mssql+pyodbc:///?odbc_connect={odbc_params}'
engine = create_engine(connection_string)
# query sys.databases view
query = '''
SELECT
[name]
,[database_id]
FROM [sys].[databases];
'''
df = pd.read_sql(query, engine)
print(df.head())
engine.dispose()
It is very similar. You just need to “attach” the same pyodbc string after 'mssql+pyodbc:///?odbc_connect='
for more detail on sql and pandas
the
pandas
documentation is very good.
this is it.
If I had this post on my google search my life would be simpler but I would have learned less 😃