This article details my experiences with extracting and transforming spreadsheets to a staging area and then storing the data in a database. I went further to enable downloads and the presentation of a dashboard.
One of my clients needed a Business Intelligence dashboard to analyze their data and make decisions on how to improve.
I had the data in a Microsoft Excel sheet, and normally, I would build some automations (macros) in Excel that would transform the data and prepare it for loading into Microsoft Power BI for dashboard building.
The transformations ended up being quite tricky (for me), where I had to strip out some data from within cells and obtain some unique references. I also wanted to allow my users to extract the data they needed and explore it using tables.
I looked at Snowflake, but it was way more than I needed, as the sheets only had about 5k rows in them.
After a bit of research, I decided Python with the numpy and pandas library (I love dataframes now, much better than the old arrays, so awkward) was the best fit for completing the data transformations, and the Streamlit library was the best for presenting the dashboards. The one downside being that I wasn't going to be able to have the authentication layer I needed out of the box - I was going to have to use the Flask library for that. More on that in another post.
As I was working, I decided to store my transformed data in a database so I could retrieve the summarized data and present it using the Streamlit components available. I stored the processed files in .csv files that were available for users in Streamlit tables to download.
I've got more to share about this whole process but for now you'll have to come back another time to see how I made it all work together.
My starting point was this video from Sasha
Comments