Skip to main content

Posts

Showing posts from November, 2022

Standard Normal Distribution with examples using Python

Easy Guide to using f-strings in Python with examples

Easy Guide to using f-strings in Python with examples f-strings were introduced in Python 3.6. String formatting has been made simpler with f-strings. f-strings are of the form f" " or f' ' . Let's look at some examples to understand this better. How to print a variable using f-strings? name = "James Bond" drink = "martini" serving_type = "shaken and not stirred" f"My name is {name} and I like my {drink} {serving_type}" --->My name is James Bond and I like my martini shaken and not stirred f-strings use a { } to evaluate any expression and format it as a string. The above example evaluates each of the variables within curly braces and formats it as string. How to use mathematical expressions with f-strings f"2x2 = {2*2}" --->2x2 = 4 f"Square of 8 is {8**2} ---> Square of 8 is 64 Practically anything can be put inside the { } as long as the expresson eval

How to format float and numerical values using f-strings

f-strings have made life easier for Python developers. They are of the form f" {} " where the curly brackets are optional. The advantage of curly brackets is that expressions can be evaluated within f-strings at runtime and printed to the screen. We need not cast the variable to string format anymore. They have been around since Python 3.6 How to restrict decimal to n positions for a float variable? pi= 22/7 f"Value of pi upto 2 decimal positions is {pi:0.2f}" --->Value of pi upto 2 decimal positions is 3.14 f"Value of pi upto 3 decimal positions is {pi:0.3f}" --->Value of pi upto 3 decimal positions is 3.143 f"Value of pi upto 4 decimal positions is {pi:0.4f}" --->Value of pi upto 4 decimal positions is 3.1429 f"Value of pi with no decimal position is {pi:0.0f}" --->Value of pi with no decimal position is 3 How to format a fraction as a Percentage using f-strings? perc = 0.555553

How to adopt Embeddings for Categorical features in Tabular Data using PyTorch's nn.Embedding( )-- Part 2

In the previous post , we set up the context to utilize embeddings for categorical features. In this post, we will figure out how to create these embeddings and combine them with other continuous features to build a neural network model. Dataset Download We will utilize the UCI machine learning repo which has a dataset on credit card default for customers in Taiwan. This dataset is also available in Kaggle . Metadata about this dataset is available on the respective websites. To follow this post, it is recommended to download the dataset from Kaggle. Most of the features are self explanatory. Embedding Creation A few definitions first. Levels in a categorical feature represent unique values available for that categorical feature. For e.g. MARRIAGE has levels 0,1,2,3. Each level of a categorical feature is represented by a vector of numbers. So, if you stack up all the levels together and all the vectors together, you can imagine levels to be a colum

How to adopt Embeddings for Categorical features in Tabular Data using PyTorch's nn.Embedding( )-- Part 1

How to adopt Embeddings for Categorical features in Tabular Data using PyTorch's nn.Embedding( )-- Part 1 In this post, we will talk about using embeddings for categorical features using PyTorch. This post will be broken down into following parts. Dataset Download Data Understanding Data Preprocessing Embedding Creation Define Dataset and Dataloaders in PyTorch Neural Network definition in PyTorch The Training Loop Model Validation The idea about using Embeddings from Categorical Features was first mooted during a Kaggle contest and a paper was also published on this. In the context of NLP and word embeddings, we represent each word in an n dimesnional vector space. In a similar way, we can represent any categorical feature in an n dimesnional vector space as well. 1. Dataset Download We will utilize the UCI machine learning repo which has a dataset on credit card default for customers in Taiwan. This dataset is also av

How to use Stratified Sampling using scikit-learn and pandas

Many times, when we are working on classification problems in Data Science, we blindly perform Random Sampling and split the dataset into train and test sets. For the uninitiated, a random sampling is a technique where all observations have equal probability of getting selected. What could possibly go wrong with Random Sampling when creating models? Let's say you wanted to predict which kind of customers might default on their credit card bills. If the whole dataset had 65% male and 35% female distribution,but your test set has a distribution of 55% females, your model validation will not be robust. You must believe that in future as well your male and female distribution should be similar to 65:35 and then validate your models. So, the whole point of sampling is that you want your test set to mimic your train set in terms of similar distribution to make your models as robust as possible. Scenarios in which you want your categorical features to have a s