What follows a movie about databases?

the SQL.

…I don’t get it.

It’s pronounced SEE-QUILL. Like “sequel.”

And it is…

a database.

Well…I vaguely understand what that means, but I’m a visual person.

Ok. I got it.

A database is like an empty shelf. We’re going to go with a wine shelf. Because people like wine.

I don’t.

Well, that’s good for you. Anyway you use a shelf to store things until you need them. A database works the same way. There are different types of databases, and SQL is a relational database management system. Which means it looks like a Excel chart. Or an empty shelf.

empty wine shelf

How am I supposed to remember that?

Just remember an empty shelf is what you’ll have when your relations come over and drink all your wine. As with any wine shelf, you can only fit one bottle of wine in any given space.

wine bottles

Unless you have bottles of ice wine. In that case you are very lucky.

SQL databases work the same way. You have one piece of data in each slot, and to find them you will use a key, or search by column or row. Here is how you would get a delicious red wine from our “shelf” database:

SELECT wine FROM shelf
WHERE taste = “delicious”
AND dryness = “dry”
AND color = “red”;

As you can see, it’s pretty easy to understand. If only we could search for our favorite bottle of wine in real-life as easily…

So why should I care?

Despite the fact that NoSQL databases (another type we will get to later) are rising in popularity- SQL still reigns. 79 percent of databases are relational databases. So next time you upload some cat photos or sign up for that website you know you will never use, think about where that data is going.

You have my attention.

If you are interested in learning more about SQL or *gasp* actually learning it, here’s a tutorial. And have fun uploading those cat photos.

Thank you.


Streamgraphs are pretty, but can you understand them?

Take a look at this:




Is it a digital representation of marble art?
It kind of looks like it…
but actually its a part of Nicolas Garcia Belmonte‘s Streamgraph showing the number of tweets during the 2012 European football tournament.





Click to read more about it here.


A what?

What is a streamgraph?  Basically, it’s an area graph (fancy line graph usually with a lot of colors for displaying a whole lot of quantities.)  See the below.



This fun visualization is by Jure Leskovec, Lars Backstrom and Jon Kleinberg (check them out here) is a flashback to the 2008 presidential campaign.  It shows the rise and fall of popularity of memes during that time.


And then you take it and you flip it on a center axis and then you have this:



This is the most famous example of a streamgraph, by a team at the NY Times showing the ebb and flow of box office revenue. Plus it’s pretty.


And there’s always a “but”

Streamgraphs are beautiful, but they have come under fire for their readability. Especially when you come across ones like this:


***Eric Rodenbeck (creator) made these as as a prototype and was planning on changing the colors.  Also please check out his blog here.




Eric Rodenbeck



This streamgraph not only highlights the importance of color choice but a potential problem with streamgraphs itself.  With that much visual information, what do people know what to look at? Only when you slice it down can you decipher what each color is referring to.



Steamgraph 2




So do streamgraphs offer a good way to convey information?  It depends on two factors, the target audience (how much time are people willing to interact with your visualization) and the amount of information that the creator wants to show (if your doing a time series analysis and have a lot of quantities, a streamgraph may be the way to go.)


Streamgraphs have an immense amount of potential in its digital form, as long as its large amount of information doesn’t get lost in the stream.
I’m sorry I had to.  Also check out more reading here.


Delving into the world of MongoDB

Today I played with Mongo DB a bit,  and am planning on building some things in the next week.

It was a busy day so I didn’t accomplish much so far.

I’m using Udacity’s Data Wrangling with MongoDB. It’s the first course I’ve taken on their platform, so I’ll write a little review afterwards.

That’s it for today. Here’s to a more productive day tomorrow!




The purpose of this blog is:
A: A purely technical blog dedicated to publishing insights about data science.
B: A blog for me discussing interesting articles that I find.
C: Daily musings about my life
D: An undecided mix of all of the above

Unsurprisingly, if you choose all of the above, you are correct (If it’s an option, there’s a 70 percent chance it’s the correct answer.)
This is also a place for me to be honest about my successes and my struggles. This is a never-ending journey for me, as I’m sure it also is for many of you. This is also the first blog I’ve regularly kept in a very long time, so it will take some practice to get the hang of writing again.

“Maybe stories are just data with a soul.”

-Brené Brown

I look forward to writing these stories together.