Sign in

It is one of the difficult decisions to quit a well-paying career and follow an interesting passion. We as humans are bound to various things that might make us not switch careers and feel secure in the path we currently are. But to be a dream follower, one needs to come out of his safe zone and explore. So my journey to pursue this interest and passion for data science started at the end of 2016. Until then I never heard of this data science field.

An artist must not train only his eyes but also his soul

The above…

This will be a simple guide to use boto3 to perfom various operations on S3 bucket.

How to use PYTHON to get data, upload data, and search for an object?

If you are puzzled with these questions then this post is for you. Is there an easy way to do this? During the early stages of me using amazon s3 with python, I found different ways that kind of confused me. I will present the easiest and straightforward way that I use now.

What do you need?

you need something called “Boto3”. What is it?

Boto3 is a Python SDK/library…

This post is for DATA SCIENTIST enthusiasts with general questions about the field and their fit in this field.

I see posts across Facebook groups where beginners with enthusiasm to work in the domain of data science ask questions like,

This Guide will elaborate full pipelining stages from data production setup to model creation and evaluation. Here for illustration Movielens dataset is used with Kafka as producer.

Using Pyspark, a Machine learning model using the Alternating least square method is build and its performance is compared with the deep learning models built using the TensorFlow framework in Databricks.

The data set can be found in Kaggle here.

Intro to Collaborative Filtering

Collaborative filtering (CF) is a popular recommendation algorithm that bases its predictions and recommendations on the ratings or behavior of other users in the system. In simple words, If…

My journey to being an intern in the Startup is what discussed here. Personally joining a startup has transformed my life to become a professional coder. Until this internship at Startup “Sensego, Paris”, I could write down any programming logic’s in python or apply machine learning algorithms on dataset over the internet. It is an easy task. But in the production environment, it is completely different. The dataset is quite complicated, When it comes to the production environment, where the code is written to be a part of the bigger modules, the code needs to be very efficient mostly leveraging…

What are Bo and B1? these model parameters are sometimes referred to as teta0 and teta1. Basically, B0 represents the intercept and later represents the slope of the regression line.

We all know that the regression line is given by Y=B0+B1.X

To understand as to how Y is expressed as a function of X with these model parameters and to understand how the best fit line is selected, In this post step by step derivation of the formula for B0 and B1 is derived.

Consider some problems as shown below, the best regression line is selected with B0= 19.969 and…

Regular expression or called fancily Regex is one of the most important topics one needs to be aware of to be a data scientist. The knowledge of Regular expression pays its way while programming.

Data scientist use regular expressions in the field of Natural language processing such as text mining, computer vision to extract the part of sentence or strip away the words desired. Regular expression forms the basis of text mining which is now deemed mostly as NLP.

An example of where regular expression is used is in “Data science applied to Ad industry”. With each advertisement we see…


An cheering, enthusiastic datascience professional open to work with dynamic team to share and work in harmony.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store