Insert pandas dataframe into Mongodb



Insert pandas dataframe into Mongodb



Pandas is most commonly used open-source Python Library for data manipulation, it provides high-performance data manipulation and analysis using its powerful data structures.


Here we use pandas library to insert dataframe into Mongodb. I am taking Yahoo finance library to get dataset for a ticker and save that data into Mongo.

First, we import all the necessary libraries

from pandas_datareader import data as pdr
import pandas as pd
from pymongo import MongoClient
import yfinance as yf
yf.pdr_override()

We get data from yahoo finance 

# download dataframe
data1 = pdr.get_data_yahoo("SPY", start="2017-01-01", end="2017-01-15")

Now, we make a connection to Mongodb.

Here I am connecting to local MongoDb Server on port 27018 and then creating a database with the name `finance`. MongoDB has a collection(table) and I name it as `mycollection`.

#Step 1: Connect to MongoDB - Note: Change connection string as needed
myclient = MongoClient("mongodb://localhost:27017/")
mydb = myclient["finance"]
mycol = mydb["mycollection"]

Now that you have the data (data1), you can insert into the MongoDB database. But before it, you have to do convert the data frame into a dictionary. The other thing is that the Date column is set as Index of the Dataframe, therefore you have to reset the index before inserting.

# Step 2: Insert Data into DB
data1.reset_index(inplace=True) # Reset Index
data_dict = data1.to_dict("records") # Convert to dictionary
mycol.insert_one({"index":"SPY","data":data_dict}) # inesrt into DB

From the above code, we have successfully saved data into MongoDB. You can login to your MongoDB UI and can check the data, it appears like below:



Now, the question is how to load the dataframe from MongoDB to pandas dataframe?

We get data from MongoDB using the find_one(), then converting the data into Dataframe using pandas. After that, I set the “Date” as the index and display it on the screen.

# Step 3: Get data from DB
data_from_db = mycol.find_one({"index":"SPY"})
output_dataframe = pd.DataFrame(data_from_db["data"])
output_dataframe.set_index("Date",inplace=True)
print(output_dataframe)

The output screen looks like below:











Comments

Popular posts from this blog

Create Desktop Application with PHP

Python desktop application