NOTE: this cleansing file takes approximately 20 min to execute. You will be prompted at the end to enter in your PGAdmin password in order to execute the creation of the SQL tables.

Sunshine List raw data import

Create consolidated SunShine List file

Add new columns for data cleansing

Data Cleansing : 'Sector' column

The sector column will be cleansed to standardize and consolidate the sector names for visualization purposes. In addition, it will correct for inconsistent naming conventions from year to year for certain sectors.

Data Cleansing: 'Last Name' column

Data Cleansing: 'Employer' column

The Employer is a cleansed column of the Sunshine Employer column for the purpose of standardizing and consolidation of employers for visualization purposes. In addition, it will correct for inconsistent naming conventions from year to year for certain employers.

Populate City based on Employer column

The city column has been created using the employer column and job title where city has been specifically noted. Many employers such as hospitals, school boards and agencies have been looked up for city headquarter information.

Data Cleansing: 'cleansed_job_title' column

Exploring the Data for Cleansing

reviewing first name character counts

Reviewing single initial First Names - can they be matched with a full name using historical data? --on hold due to time constrainst; rows to be deleted where only inital used as first name

Data Cleansing

Update Final First Name column based on clean_first_name and clean_alt_first_name columns

Drop rows

Create concatenation of Last, first name

Creating Unique First Names for Machine Learning Model

Reorder columns and rename

Connect Pandas and SQL