06/09/2021
Big data analysis is growing at a faster rate with more and more data coming in every second and rising demand of getting something meaningful out of the data at hand. If you are dealing with a huge volume of data, you must have realized that pandas alone might not able to handle the data processing. Spark comes to our rescue in such a situation. Apache Spark is an open-source unified analytics engine for large-scale data processing. It runs workload 100x faster. It can be used interactively from the Scala, Python, R, and SQL shells.
We have collated the list of common analysis operations done on datasets with pandas as well as Spark so that you don't have to google them every time. Instead, you can refer this summarized version. We have also created a short demo using "Placement dataset of an university" taken from Kaggle
Please have a look at our blog and let us know your thoughts. We would love to hear about your experiences with Spark. Happy Learning!!
https://pythonminiprogramseries.blogspot.com/2021/09/placement-data-analysis-using-spark.html
21/01/2021
We are here again with the most awaited part 4 of the tutorial "Data Analysis and Model Building of health insurance dataset". Thanks for reaching out to us with your lovely responses. We are overwhelmed to share that the code repository has been forked 4 times and cloned 20+ times. In this last tutorial, we have discussed on how to try and test out 3 different models on our dataset. Let us know, in the comments below, how this tutorial has helped you. Also if you need tutorials for any specific topic in Python or Data Science, let us know. We will try our best to make them for you.
Please visit our blog to know more. Happy learning!!
Mini Program 17 - Health Insurance Data Analysis & Model building using Python - Part 4
After exploratory data analysis and building hypothesis, we move to predictive model building stage where we try and test many models on the...
17/01/2021
In industrial level projects, we are often stuck with quality deliverables to be accommodated within limited resources, cost being an important factor. Due to the budget constraints, we are pushed to work on using cost friendly configurations and also decreasing the ex*****on time of our application. As a developer, this drives us to look for an optimised version of our codes that can meet the technical constraints.
Please have a look at my article on how to measure the code performance so that we optimize the required ones for better result. Let me know, in the comment section below, on your experiences and challenges with the code performance of application that you have developed or worked with.
Python Code Performance Measurement - Measure the right metric to optimize better!
This article was published as a part of the Data Science Blogathon. Introduction Performance optimization is an important concern in any data science project. Since most of the project runs on cloud platforms, there is always a cost factor associated with computational resources. We aim to write pro...
30/11/2020
I have connected with lots of data science enthusiasts over the past, who wants to do something in data science. However, even after doing MOOC or online courses, they are still lost when it comes to analyzing something out of the box or taking random datasets from a public library for analysis. I have got this question lots of time on how to start a project on your own especially when you do not have a guide on how to start and what to do next. I have always maintained my stand on 'You cannot achieve something big in a single day. Everything takes time.' So will your data science project. Plan to spend 2 weeks of disciplined approach in taking up end to end implementation of your idea. If you are nervous on how to take your first step, here is a document that you can read and follow to take up your first project.
Nervous about your first data science project! Here are 6 easy steps to get started!
This article was published as a part of the Data Science Blogathon. Introduction We all have been hearing about the buzz word -
27/11/2020
Keyword Extraction is a text analysis technique. It helps make text concise and helps us save time. One of the popular concept for keyword extraction is TF-IDF score. Check the article below for a simplified python implementation of the concept.
Words that matter! A Simple Guide to Keyword Extraction in Python
This article was published as a part of the Data Science Blogathon. Introduction Unstructured data contains a plethora of information. It is like energy when harnessed, will create high value for its stakeholders. A lot of work is already being done in this area by various companies. There is no dou...
02/11/2020
Thank you all for an amazing response on the first part of tutorial series on 'Data Analysis and Model building on health insurance dataset'. So, here we are with part - 2 of the tutorial where we have discussed in details about few anomalies in the data and steps taken to get the clean data. Also, we have shown exploratory data analysis on numerical variables to get hidden patterns in the data.
You can read the details on the blog and watch the video to know about the implementation. If you liked what we built, subscribe our channel and like the video. Happy learning!!
Mini Program 15 - Health Insurance Data Analysis & Model building using Python - Part 2
Data Cleaning and Exploratory Data Analysis are crucial parts in any data science project. It is rightly said that if we give junk input to ...
29/10/2020
Hi Friends,
We have been away for quite sometime now due to some personal reasons. We are back again with exciting mini projects in Python. We have integrated the development of the projects with data science domain for more fun.
Also, we are now on youtube - https://www.youtube.com/channel/UCqPzucQFiyeNBZJxto4ASGg?view_as=subscriber
Please show some support by subscribing the channel and like the video if you enjoyed to watch what we built.
Watch the video till the end to know more about data analysis using Python in an easy to understand language. Happy Learning!!
Tech-o-phile - A mini project hub
27/10/2020
With data growing so fast in today's world specially unstructured data, gathering and converting it into meaningful format becomes imperative. A lot can be done in web-scraping area. Here's an article on Analytics Vidhya which details out steps using Selenium. Enjoy and happy learning!
Beginners Guide to Web Scraping Using Selenium in Python!
Introduction By 2025, the world's data will grow to 175 Zettabytes - IDC The overall amount of data is growing and so is the unstructured data. It is estimated that about 80% of data in the universe constitutes unstructured data. Unstructured data is the data that doesn't fit into any data model. Th...
14/11/2019
Have you used beautifulsoup to scrape websites? Well! we have come across another package in Python that can do the same thing - selenium. Try it out and see for yourself.
and follow for more such exciting content. Check out the video on our blog and the github link for complete code illustration.
Mini Program 12 - Web-Scraping using Selenium
Do you know how easy it is to scrape a website? I am sure most of us have used beautifulsoup to web-scrape. But this time, we came ac...
16/10/2019
Recently came across a situation to read a huge text file. Leveraged this opportunity to learn more on Python. We have created a simple code for splitting the text files as per the desired number of splits.
and follow for more such exciting content. Check out the video on our blog and the github link for complete code illustration.
Mini Program 11 - Creating File Splitter in Python
Have you ever faced the wrath of large files which are not possible to read. Yes! It's so annoying. Recently came across this situati...
02/10/2019
Hey There!! How is it going with Python? Knowddy and Hairbrow have brought something very exciting for you to catch up with Python.
In Google Assistant just type - "Check your Python Knowledge" and there is surprise waiting for you. Just play along!!
Congratulations Prachi for getting your Google Flashcard published.