Results for tag "databases"

5 Articles
Mongo DB Pipeline

MongoDB’s Aggregation Pipeline: make your life easier… or at least less difficult

The following is from Avinash Kaza’s article: Business Intelligence Platform: Tutorial Using MongoDB Aggregation Pipeline

Found here:


Using data to answer interesting questions is what researchers are busy doing in today’s data driven world. Given huge volumes of data, the challenge of processing and analyzing it is a big one; particularly for statisticians or data analysts who do not have the time to invest in learning business intelligence platforms or technologies provided by Hadoop eco-system, Spark, or NoSQL databases that would help them to analyze terabytes of data in minutes.

The norm today is for researchers or statisticians to build their models on subsets of data in analytics packages like R, MATLAB, or Octave, and then give the formulas and data processing steps to IT teams who then build production analytics solutions.

One problem with this approach is that if the researcher realizes something new after running his model on all of the data in production, the process has to be repeated all over again.

What if the researcher could work with a MongoDB developer and run his analysis on all of the production data and use it as his exploratory dataset, without having to learn any new technology or complex programming languages, or even SQL?

If we use MongoDB’s Aggregation Pipeline and MEAN effectively we can achieve this in a reasonably short time. Through this article and the code that is available here in this GitHub repository, we would like to show how easy it is to achieve this.


For more, check out the entirety of Avinash’s tutorial.





SQL vs NoSQL. Part Three.

In previous posts I talked about SQL and NoSQL, and I want to go into a little more detail (while keeping it simple) what makes them different.

Scalability>>> Think making big things small. In SQL data is stored vertically (so typically all on one server- expensive!).  NoSQL stores it horizontally (many servers==ok).

Schema>>> Technically schema means a representation of some model. In programming land, it is used to refer to a structure of a database.  So think because you can’t see a database (at least I hope you can’t) you have to think how that structure is represented.   In SQL, the schema is fixed, columns must be decided ahead of time, and you have to put data in every column.  Remember that wine shelf? You can’t really be adding a new column to your shelf after you’ve built it…it will probably look like all the images when you google “shelf fail.”

Shelf Fail

I don’t know why, but this shelf is kind of cute.

Also, you have to put a bottle in every slot. Someone’s going to be a happy wine collector.

NoSQL deals with schema in a very different way. It just says “Nope.” and walks away. You can add (or leave out) anything you want, anytime you want. Now that’s flexibility.

Data>>> Finally let’s get to the data. In SQL all rows contain one specific entry. For example, in a row containing information about a bottle of wine you might have “Year”,”Location”,”Winery” etc. You can’t have two years for a bottle of wine, or two locations. In NoSQL, that’s A-OK. You can have two wineries (maybe it was a collaboration?) or no wineries. If that’s what you want.

More reading.

Next post I’ll be going into more detail about NoSQL and specifically MongoDB.



What follows a really bad movie about databases?


This is SQL.
wine bottles

This is NoSQL.
wine bottles pile

As discussed in my previous post- SQL is a relational (tabular) database, one that looks like an Excel sheet or an empty shelf. NoSQL is its evil twin sister.

The lovable evil twin sister.

People like NoSQL for it’s flexibility. You can only fit one wine bottle per shelf using SQL, but with NoSQL you can throw those wine bottles in a pile and it’s A-OK.

Also some people believe it is faster… but there are a lot of mixed opinions on this…

Let’s look for our favorite red wine again…

db.shelf.find( { “taste”: “delicious”, “dryness”: “dry” , “color”: “red”} )

A lot shorter, isn’t it?
You find your wine not by searching for it in it’s proper cubby in its designated row and column, you search by keys.

As you can imagine, there are pros and cons to this. A lot of companies are still reluctant to embrace NoSQL. (See other article)

Here’s some more reading. If you have some time to kill.

Take care out there. There’s wild Pokemon.

What follows a movie about databases?

the SQL.

…I don’t get it.

It’s pronounced SEE-QUILL. Like “sequel.”

And it is…

a database.

Well…I vaguely understand what that means, but I’m a visual person.

Ok. I got it.

A database is like an empty shelf. We’re going to go with a wine shelf. Because people like wine.

I don’t.

Well, that’s good for you. Anyway you use a shelf to store things until you need them. A database works the same way. There are different types of databases, and SQL is a relational database management system. Which means it looks like a Excel chart. Or an empty shelf.

empty wine shelf

How am I supposed to remember that?

Just remember an empty shelf is what you’ll have when your relations come over and drink all your wine. As with any wine shelf, you can only fit one bottle of wine in any given space.

wine bottles

Unless you have bottles of ice wine. In that case you are very lucky.

SQL databases work the same way. You have one piece of data in each slot, and to find them you will use a key, or search by column or row. Here is how you would get a delicious red wine from our “shelf” database:

SELECT wine FROM shelf
WHERE taste = “delicious”
AND dryness = “dry”
AND color = “red”;

As you can see, it’s pretty easy to understand. If only we could search for our favorite bottle of wine in real-life as easily…

So why should I care?

Despite the fact that NoSQL databases (another type we will get to later) are rising in popularity- SQL still reigns. 79 percent of databases are relational databases. So next time you upload some cat photos or sign up for that website you know you will never use, think about where that data is going.

You have my attention.

If you are interested in learning more about SQL or *gasp* actually learning it, here’s a tutorial. And have fun uploading those cat photos.

Thank you.

Delving into the world of MongoDB

Today I played with Mongo DB a bit,  and am planning on building some things in the next week.

It was a busy day so I didn’t accomplish much so far.

I’m using Udacity’s Data Wrangling with MongoDB. It’s the first course I’ve taken on their platform, so I’ll write a little review afterwards.

That’s it for today. Here’s to a more productive day tomorrow!