Uncategorized

There are many situations where you might like to get data from a database and into a pandas data frame. The simplest way to is to initialize a dataframe via the pandas read_sql_query method. The ORM of choice for pandas is SQLAlchemy. By installing a few more packages, you can query Redshift data and read that into a dataframe with just a few lines of of Python code.

The Setup

This post assumes you have a number of Python packages already installed, such as pandas, numpy, sqlalchemy and iPython. I’ve used the Python distribution Anacoda, which can be downloaded here. It comes packed full of all the libraries you might need for data analysis.

The Dialects documentation for SQLAlchemy mentions that Redshift is supported through another Python package; this package also depends on a Postgresql driver. So, the additional packages needed for connecting to Redshift are redshift-sqlalchemy and psycopg2. If you already have Anaconda, you can install psycopg2 quickly using conda. For getting redshift-sqlalchemy installed, here are some docs on using conda/pip in conjunction with each other. TBH I just used downloaded the package and did this in my terminal:

Of course, I did this in my default environment. Everything seems to work, but the Anaconda docs are very useful if you have multiple Python environments on your machine.

On To The Data

Start iPython in your terminal (or Notebook if you prefer). Note that the connection string follows the pattern ‘flavor+DBAPI://username:password@host:port/database’. Make sure to change the connection string below to your own!

If we look at the “test” object we see:

Read more

Honestly, the hardest part is authenticating. I’m assuming you’ve already created a project in Google’s Developer Console, turned on the GA API, setup a client_secret credential, and requested access to the real-time API beta:

  1. Copy and paste the URL that is printed from authorize_url
  2. Click “Accept” when prompted; you will be redirected to a page with a long URL
  3. Paste line 25 into your console
  4. Now, input that long URL
  5. Boom, you should be on your way

Make sure to change the hard-coded paths in here to wherever you store you Google Analytics account information and the client secrets you got from the developer console.

A few to-do items:

  • This is a really manual process for getting auth tokens…obviously there is a better way
  • I’d much rather use Python’s requests library
  • Need to parse the blob of JSON that is returned

Here are some pages I found helpful:

Read more