Skip to main content

Posts

Showing posts with the label embeddings for categorical features

Standard Normal Distribution with examples using Python

How to adopt Embeddings for Categorical features in Tabular Data using PyTorch's nn.Embedding( )-- Part 2

In the previous post , we set up the context to utilize embeddings for categorical features. In this post, we will figure out how to create these embeddings and combine them with other continuous features to build a neural network model. Dataset Download We will utilize the UCI machine learning repo which has a dataset on credit card default for customers in Taiwan. This dataset is also available in Kaggle . Metadata about this dataset is available on the respective websites. To follow this post, it is recommended to download the dataset from Kaggle. Most of the features are self explanatory. Embedding Creation A few definitions first. Levels in a categorical feature represent unique values available for that categorical feature. For e.g. MARRIAGE has levels 0,1,2,3. Each level of a categorical feature is represented by a vector of numbers. So, if you stack up all the levels together and all the vectors together, you can imagine levels to be a colum...

How to adopt Embeddings for Categorical features in Tabular Data using PyTorch's nn.Embedding( )-- Part 1

How to adopt Embeddings for Categorical features in Tabular Data using PyTorch's nn.Embedding( )-- Part 1 In this post, we will talk about using embeddings for categorical features using PyTorch. This post will be broken down into following parts. Dataset Download Data Understanding Data Preprocessing Embedding Creation Define Dataset and Dataloaders in PyTorch Neural Network definition in PyTorch The Training Loop Model Validation The idea about using Embeddings from Categorical Features was first mooted during a Kaggle contest and a paper was also published on this. In the context of NLP and word embeddings, we represent each word in an n dimesnional vector space. In a similar way, we can represent any categorical feature in an n dimesnional vector space as well. 1. Dataset Download We will utilize the UCI machine learning repo which has a dataset on credit card default for customers in Taiwan. This dataset is also av...