Machine Learning Model Part 1: Relative Name Frequency Function

Step 1: Preprocessing U.S. Social Security Names Data

Step 2: Loading in List of Unique First Names from Sunshine List

Step 3: Create Function

High-Level Explanation: Function will use the unique name instances with respective male/female gender frequency from preprocessed US Social Security name data above to calculate probability of each name from input (second parameter) being male/female.

Step 4: Run Function

Machine Learning Model 2: Naive Bayes Classifier via NLTK Library

Step 1: Preprocessing US Social Security Names Data

(To be used as training data for the NLTK classifier)

Step 2: Preprocessing Sunshine List Unique First Names Data to be Used in NLTK Model

Only the names that receieved a 'U' or 'EV' gender prediction from the relative name frequency model will be passed to the NLTK model.

Step 3: Instantiating and Training NLTK Naive Bayes Classification Model Instance

Step 4: Naive Bayes Classification Model Gender Predictions

Notes:

Step 4a: Preprocessing Output from Above

Note:

Step 5: Concatenation

Notes:

Step 6: Gender Predictions to Database

Notes: