How v3 of Jobgeek's recommendation algorithm works with an in-depth breakdown from Miro Keimiöniemi.

The Jobgeek recommendation algorithm is the heart and soul of the platform. If you were to ask "Who is Jobgeek?", the answer would be the recommender system. It determines all the jobs you see as well as a large part of what your profile looks like. You may have seen the "Recommendations confidence" bar in the upper right corner of your profile page, which refers to how confident we are about finding your dream job based on your criteria if it is on the market at the moment. This is a detailed, slightly technical deep dive into how it really works.

How it Works:

The Jobgeek recommendation algorithm has now gone through three major iterations, the latest and greatest of which was released yesterday.

The initial development version used very old and simplistic natural language processing (NLP) techniques such as tf-idf of bag-of-words representations created with tokenizing and stemming along with tried and tested keyword matching for languages, skills and descriptions.

The second version used for the launch of the app was a massive upgrade using representative text vector embeddings of AI-enriched profiles and jobs, which allowed for efficient querying of tens of thousands of jobs using Approximate Nearest Neighbor (ANN) search with cosine distance metric over a Hierarchical Navigable Small World (HNSW) vector index. The resulting sample was then ordered by ranking the jobs with a multiplicative compound score considering the overall semantic similarity along with skills match, seniority match, language match and location match dimensions. From this ranking, the 30 highest scoring jobs were saved for the user to see until a re-computation was triggered in the background when 10 or less jobs remained.

v3 Improvements:

This third version is yet again a major improvement over that, building on top of the v2 recommendation algorithm by adopting a modular design for computing up to 12 match dimensions:

- Semantic similarity (cosine similarity between vector embeddings of specific, carefully selected profile and job ad attributes)

- Skills match (a multidimensional score assessing the coverage, i.e. proportion of matching skills, rarity of the matched skills as a proxy for the uniqueness of the profile with respect to the job, and the raw number of skills matched as a proxy for how impressive the skills coverage match should be considered)

- Language match (set overlap with a sharp language level gradient taking into account the user's proficiency in each required language)

- Seniority match (leaky filter score allowing directly adjacent seniority levels to pass through with a slight preference for a higher-than-specified seniority rather than lower only in the absence of an exact match)

- Location match (a hierarchical score with adjustments for working modes, increasing with location resolution)

- Salary match (proportional distance between minimum salary preference and potential maximum salary for the job)

- Working mode match (binary exact match with partial compatibility for similar working modes)

- Employment type match (binary exact match)

- Contract type match (binary exact match)

- Starting date match (linear penalty for days before specified starting date)

- Education match (binary exact match between lowest job education requirement and highest education in profile)

- Experience match (proportional distance between required experience in months and the user's most extensive industry-specific experience in an industry sufficiently similar to the matched job role industry)

These are abstracted to four overall categories:

- Suitability score that captures likely user interest and perceived relevance from the user's point of view based on semantic similarity of the profile and the job as well as multiple different skills-related metrics in a carefully tuned ratio.

- Preferences score that captures all user-specified requirements or wishes for the characteristics and qualities of the job.

- Qualifications score that considers the user's industry-specific experience and education background assessing the relevance of the user profile from a recruiter's point of view.

- Relevance score that acts as a leaky hard filter for language and seniority in order to ensure that only immediately actionable job advertisements are shown as far as they exist but recommendations do not run out entirely when they no longer do.

All the scores range from 0 to 1 for interpretability and default to 1 in the absence of data or specified preferences so that only data clearly distinguishing job postings from one another is considered. This means that the overall quality of the job postings is not really assessed yet with the rationale that the user should find the potentially best jobs for them out there regardless of how well they are written.

Data Quality:

Much of the recommendation algorithm's performance comes down to the data quality, which we are carefully guarding by only using the best sources. However, we also enrich both our profile and job data using Large Language Models (LLMs) to provide more surface area for the algorithm to grip onto when considering the different match dimensions.

We translate and structure the job data by extracting and inferring key pieces of information such as the role industry, job level and the various requirements from languages and skills to education and experience. We then summarize the key points and, where mentions of them may be missing in the job description, we try to guess, for example, the potential salary range, which is always marked with the "Estimate" label on the job card to make it explicit that the value is indeed a guess but still provide a reasonable starting point for negotiations, for example. This allows us to be really flexible with our data sources as any unstructured job description can be processed, though we try to find as much structured data as possible.

For the user profile, we parse either the LinkedIn profile or the CV that we are provided and then make optimistic but critical inferences about the job titles, industries and locations that the user might potentially be interested in based on their profile along with additional skills and preferences that they might have, all of which is used by the recommendation algorithm. This is done to make the sign up as quick and frictionless as possible so that, if they like, the users can immediately start swiping jobs with the hope that already the first one would impress them enough to apply.

Maintaining Control:

The user is also given a lot of control over their recommendations by how they modify their profile. As mentioned above, all of the inferences, including the suggested job titles, industries, locations and skills are used by the recommendation algorithm according to the confidence that it has in them to curate the best selection of jobs but each job title, industry, location or skill added by the user themselves is weighted more heavily so that those are prioritized over the guesses for as long as there is enough relevant jobs. It can therefore improve the recommendations a lot to both add those preferences that are most interesting as well as discard those that are not.

Summary:

The Jobgeek recommendation algorithm v3 is a highly sophisticated, interpretable content-based recommendation system operating on cleverly AI-enriched data to provide you the best, individually personalized recommendations that only you are the most uniquely qualified for. Try it out at app.jobgeek.ai.

Miro Keimiöniemi,

Chief Data Officer

Who is Jobgeek?