July 15, 2024
Mom Penalty

Research Suggests Women May Face “Mom Penalty” in AI-Based Hiring

New research from NYU Tandon School of Engineering indicates that women may experience a “mom penalty” when artificial intelligence (AI) is used in the hiring process. Maternity-related employment gaps could result in qualified female candidates being unfairly screened out of positions. The study, led by Siddharth Garg, examined bias in Large Language Models (LLMs) in relation to hiring.

The research team, including lead researcher Akshaj Kumar Veldanda, assessed three popular LLMs: ChatGPT (GPT-3.5), Bard, and Claude. These advanced AI systems are trained to understand and generate human language. The study aimed to determine the LLMs’ ability to ignore irrelevant personal attributes, such as race, gender, political affiliation, and pregnancy status, while evaluating job candidates’ resumes.

While the study found that race and gender did not trigger biased results, the other sensitive attributes significantly influenced the LLMs’ decisions. Employment gaps related to maternity and paternity responsibilities had the most pronounced effect. Claude and ChatGPT exhibited biased results, frequently wrongly assigning a resume to the incorrect job category based on these attributes. Political affiliation and pregnancy status also triggered incorrect resume classification, with Claude once again performing poorly.

However, Bard consistently exhibited a lack of bias across all sensitive attributes. This finding suggests that bias is not inherent in LLMs and that they can be trained to withstand bias. Nonetheless, the study acknowledges that Bard might still be biased along other sensitive attributes not covered in the research.

The researchers also analyzed the models’ ability to produce resume summaries. GPT-3.5 primarily excluded political affiliation and pregnancy status, while Claude was more likely to include all sensitive attributes. Bard often refused to summarize but, when it did, it tended to include sensitive information. The researchers noted that job category classification on summaries improved the fairness of all LLMs, potentially because summaries make it easier for the models to focus on relevant information.

The study began with a dataset of anonymized resumes sourced from livecareer.com. The researchers introduced the sensitive attributes using specific approaches recommended by behavioral economist Sendhil Mullainathan and other resume creation guidelines. The evaluation initially focused on IT, Teacher, and Construction job categories but later expanded to include all 24 job categories for Bard and Claude.

The researchers also performed an evaluation using Alpaca, a white-box model that provides explanations for its decisions. The evaluation revealed biases in Alpaca as well, suggesting that white-box models can also exhibit bias. The team used Contrastive Input Decoding (CID) to analyze the biases in Alpaca.

The research underscores the need to address potential bias in AI-based hiring practices. By developing robust auditing methodologies to uncover biases in LLMs, researchers and practitioners can intervene before discrimination occurs. The study calls for continued examination of the soundness of using LLMs in employment and emphasizes the importance of holding LLMs accountable for their fairness and lack of bias.

Overall, the research sheds light on the potential “mom penalty” that women may face in AI-based hiring processes. It highlights the importance of creating fair and transparent AI algorithms to prevent discrimination against qualified candidates based on maternity-related employment gaps.

*Note:
1. Source: Coherent Market Insights, Public sources, Desk research
2. We have leveraged AI tools to mine information and compile it