Insert pandas dataframe into Mongodb
Insert pandas dataframe into Mongodb
Pandas is most commonly used open-source Python Library for data manipulation, it provides high-performance data manipulation and analysis using its powerful data structures.
Here we use pandas library to insert dataframe into Mongodb. I am taking Yahoo finance library to get dataset for a ticker and save that data into Mongo.
First, we import all the necessary libraries
from pandas_datareader import data as pdr
import pandas as pd
from pymongo import MongoClient
import yfinance as yf
yf.pdr_override()
We get data from yahoo finance
# download dataframe
data1 = pdr.get_data_yahoo("SPY", start="2017-01-01", end="2017-01-15")
Now, we make a connection to Mongodb.
Here I am connecting to local MongoDb Server on port 27018 and then creating a database with the name `finance`. MongoDB has a collection(table) and I name it as `mycollection`.
#Step 1: Connect to MongoDB - Note: Change connection string as needed
myclient = MongoClient("mongodb://localhost:27017/")
mydb = myclient["finance"]
mycol = mydb["mycollection"]
Now that you have the data (data1), you can insert into the MongoDB database. But before it, you have to do convert the data frame into a dictionary. The other thing is that the Date column is set as Index of the Dataframe, therefore you have to reset the index before inserting.
# Step 2: Insert Data into DB
data1.reset_index(inplace=True) # Reset Index
data_dict = data1.to_dict("records") # Convert to dictionary
mycol.insert_one({"index":"SPY","data":data_dict}) # inesrt into DB
From the above code, we have successfully saved data into MongoDB. You can login to your MongoDB UI and can check the data, it appears like below:
Now, the question is how to load the dataframe from MongoDB to pandas dataframe?
We get data from MongoDB using the find_one(), then converting the data into Dataframe using pandas. After that, I set the “Date” as the index and display it on the screen.
# Step 3: Get data from DB
data_from_db = mycol.find_one({"index":"SPY"})
output_dataframe = pd.DataFrame(data_from_db["data"])
output_dataframe.set_index("Date",inplace=True)
print(output_dataframe)
The output screen looks like below:
Comments
Post a Comment