2024 - ML Conference

male-leader-talking-to-employees-showing-the-plan-2023-11-27-05-03-45-utc-1200x798.jpg

A Paradigm Shift to Data as the Core Product

In the journey toward refining our product, one significant hurdle was the ambitious scope of functionality testing conducted in parallel, which led to complexities in communication and managing expectations. Through collaborative efforts with the business teams, we addressed a multifaceted challenge, streamlining the advertisement administration process to enhance both accuracy and efficiency. The culmination of these endeavors led to the emergence of a distinct and innovative product from the Inspira Group‘s Data Science team: a comprehensive text processing infrastructure.

This architectural overview and detailed description of the infrastructure’s components were contributed by Borko Rastovic, Inspira Group Senior Data Engineer.

This infrastructure is a complex network of services set up in detailed, flexible pipelines. Each step is closely connected, meaning what happens in one phase affects the next, and all information is carefully put together with the outcome. The system can handle many different types of inputs, like documents, HTML, and images, showing its wide-ranging use and thorough approach.

Architectural Overview

The infrastructure’s architecture is underpinned by three fundamental components, each serving a critical role in the processing ecosystem:

Extraction, Processing, and Storage Services: These services form the backbone of the system, easing the selection of documents for processing and the extraction of text from various formats such as documents, images, and HTML.
Basic Text Processing Service (NLP): This part is equipped with an array of over ten tools, including functionalities for language detection, diacritization, transliteration, normalization, anonymization, sentence splitting, lemmatization, vectorization any many more. These tools are indispensable across all our pipelines, supplying the necessary foundation for advanced text analysis and processing.
LLM Processing Service (LLM Module): Acting as a wrapper for the OpenAI API (or any alternative LLM platform), this module is designed for efficiency and scalability. It offers one prompt per endpoint, integrating seamlessly with the NLP service to enhance functionality. This part is crucial for monitoring costs effectively per endpoint and ensuring the reliability and accuracy of output validation.

Bonus: Introducing the Prompt Batch Tester – A Catalyst for Development

Amidst the development of the Job Formatter, a pivotal tool appeared to address a critical need within our development teams—the “Prompt Batch Tester.” This tool was conceived to empower product development teams, enabling them to efficiently test prompts on larger datasets. This capability is crucial for the refinement and optimization of AI-driven features and functionalities across our product suite.

Key Features of the Prompt Batch Tester

Prompt Customization: Product managers can craft and define specific prompts, select the right model, and configure its parameters to best suit their project needs.
Dataset Upload: The tool supports the upload of extensive datasets, easing a robust testing environment that closely mimics real-world application scenarios.
Test on a large scale: By simulating how a prompt will perform in a larger scale of data, the tool offers valuable insights into its practical application and utility.
Cost Forecasting: An essential feature of the Prompt Batch Tester is its ability to project the total operational costs of deploying such a product in a live environment. This aids in strategic planning and budget allocation.
Response Time Evaluation: Understanding the response time is critical for assessing the scalability and user experience of the product. This tool supplies precise metrics, helping teams to optimize performance and efficiency.

Figure 1 – “Prompt batch tester” user interface

Conclusion

Our innovative journey at Inspira Group has led us from the first prototypes to a robust AI-driven text processing infrastructure, revolutionizing the efficiency and accuracy of job ad moderation. Through the development of advanced tools like the “Prompt Batch Tester,” we have embraced challenges, iterated solutions, and unlocked new potential.

In collaboration with the Infostud team, we’ve also achieved significant milestones beyond job and moderation. Our development of an AI-powered CV parser has simplified the process for job seekers to craft distinctive profiles on our platforms. Moreover, we’re demystifying the interview preparation process with an innovative AI tool, making daunting tasks more manageable and less stressful for candidates.

As we look ahead, we are excited to collaborate with forward-thinkers and innovators. Together, let’s continue to break new ground and redefine what is possible in the evolving landscape of technology and work.

Author

Srđan Mijušković, Senior Product Manager at Inspira Group

trading-teaching-male-leader-talking-to-employees-2023-11-27-05-16-49-utc-1200x776.jpg

Opening Insights: This article is structured in two main parts: the first delves into the nuances of product development, and the second outlines the architectural solutions and the envisioned future of the product. Given the comprehensive nature of our discussion, we encourage you to peruse the entire article for a thorough understanding.

In the following sections, I will unfold the narrative of our product’s evolution, tracing the path from early-stage prototyping to the polished end-product. The introduction of the GPT-3 API in the early months of 2023 heralded a shift towards enhancing the efficiency of job ad moderation on our platform, poslovi.infostud.com. Here, we manage the publication and moderation of hundreds of job advertisements each week, a process traditionally performed manually. Although meticulous, this approach is susceptible to inaccuracies and delays.

Confronted with many moderation guidelines, both explicit and implicit, alongside the transformative potential of generative AI models, we went beyond rigidly set KPIs and objectives. Our approach was characterized by an openness to experimentation, acknowledging the risk of failure while striving to maximize the likelihood of success through rapid prototyping. This method encompasses several key phases:

Prototyping: Crafting high-fidelity prototypes that deliver practical solutions.
Feedback: Gathering and interpreting feedback from users to inform development.
Improvements: Refining the prototype in response to insights gained from user feedback.

While our aims were intentionally not defined as SMART goals, they were nonetheless focused and ambitious:

Speed: Accelerating the moderation process for each job advertisement.
Quality: Elevating the overall quality of job advertisements post-moderation.

This introduction sets the stage for a deeper exploration into the iterative development process and architectural innovation, underscoring our commitment to pushing the boundaries of what is possible with AI in the realm of job ad moderation.

V1 Prototype: Foundational Steps and Preliminary Concept Evaluation

The start of our first prototype- designated V1, was driven by the ambition to evaluate a foundational concept. The ultimate aim was to iteratively refine this concept into a robust final product. Our preliminary strategy aimed to equip administrators with insights into potentially contentious sections of job advertisements while concurrently extracting a diverse array of metadata from these ads. For the implementation of this prototype, we identified Streamlit as the optimal platform, given its flexibility and user-friendly interface.

The prototype was designed to find

The existence of any form of discrimination within the advertisement’s text
The inclusion of contact details such as email addresses, salary information, or phone numbers
The presence of grammatical inaccuracies
Whether the advertisement’s text inadvertently had multiple job listings

Included in the prototype’s functionality was the ability to extract

The seniority level of the advertised position
Relevant IT-related tags
The geographical location of the job, pinpointing specific cities

The testing phase of the V1 prototype yielded encouraging results, demonstrating a high degree of accuracy in its operational performance. This initial success laid a solid foundation for later development stages, confirming the viability of our approach and the potential for further refinement and expansion of the prototype’s capabilities.

V2 Prototype: Broadening Functionality and Elevating User Experience

Building upon the insights gained from the first prototype, we embarked on the development of the second iteration, prototype version 2 (V2), intending to significantly enhance both the content quality and the visual appeal of advertisements. This ambition was realized through focused improvements in the user interface (UI) and user experience (UX), ensuring a more intuitive and efficient interaction for administrators.

UI and UX Improvements

Introduction of a feature enabling the live streaming of processed data into the application, without the need for manual text input.

Functional Enhancements

Acknowledging the considerable time administrators dedicate to formatting HTML versions of job ads, V2 aimed to automate this process. By employing a set of predefined rules, the prototype could transform ad text into HTML. This included the automatic identification of job descriptions, candidate profiles, benefits, etc. and their allocation to specified styles. For instance, the system was designed to recognize essential qualifications and skills from the ad text, correct any grammatical errors, remove discriminatory language, apply H2 headers in blue, and format benefits as bullet points.
To address issues such as grammatical mistakes and discriminatory language in both Serbian and English, we broadened the prototype’s scope of functionality.
A concerted effort was made to refine the ETL (Extract, Transform, Load) process, acknowledging the varied formats in which employers post job advertisements on poslovi.infostud.com, including PDF, DOCX, Images, and others. We intensified our efforts in processing and converting these formats into text in near real-time, after sending the text for further processing. The next sections will delve deeper into how this project catalyzed further investment in text processing infrastructure development.

This phase of development is not only aimed at enhancing the practical aspects of the platform but also at enriching the overall experience for both administrators and end-users, setting a precedent for continuous improvement and innovation.

V2.1 Prototype: Refinement and Advanced Change Detection

In response to the administrative challenge of finding textual modifications in job advertisements through mere visual inspection, we introduced an innovative feature in version 2.1 aimed at enabling precise monitoring of changes. This was achieved through the integration of a diff algorithm, setting up a robust foundation, and setting clear expectations for the final, production-ready iteration of our product.

In the evaluation of version 2, we decided to stop the automatic formatting of job ads for two main reasons:

Employer Branding Integrity: Automatic formatting could alter the original layout provided by employers, affecting their brand presentation. By dropping it, we ensure the employer’s intended formatting is preserved.
Resource Efficiency: The benefits of automatic formatting did not justify the significant resources needed for its maintenance and improvement.

Nonetheless, we kept and further developed all other functionalities from version 2.1 for inclusion in version 3, poised to be the definitive version of our product.

The enhanced product helps the direct annotation of sections of text within the administrative editor that may present issues. Administrators are empowered with the capability to either approve or dismiss suggested modifications. Internally, we affectionately refer to this feature as “Grammarly on steroids 😊,” a nod to its enhanced content editing prowess tailored to our specific application.

Figure 1 – Implemented solution

As we close the first chapter of our adventure into the world of AI-powered job ad moderation at Inspira Group, we find ourselves on the brink of a deeper exploration. Our focus shifts to the core of our innovation journey: data as our primary product. This chapter has shed light on our progress in improving moderation processes and hints at a major shift in direction.

Read the continuation of our story here.

Author:

Srđan Mijušković, Senior Product Manager at Inspira Group

2 days / 15 talks
Awesome and great blog

Transforming Job Advertisement – The Inspira Group Path to AI-Driven Innovation – part 2