Gain a competitive advantage using vectors to analyse unstructured data

  • Learn about the big picture regarding vectors
  • Understand the different business use cases for vector technology
  • Take action and get started with vectors today
min read

Data exists everywhere. It is generated every second, and with every action. Capturing and analysing this unstructured data can be a monumental challenge which few businesses were able to do. Until now.

Vector technology is an revolutionary emerging tech which is a subset of artificial intelligence with a basis in advanced mathematical concepts. Whilst it has been around in some form for some decades, only recently has the barrier to entry been lowered .

Vectors are the mathematical representation of qualitative information, which stores the meaning of unstructured data, in a language that machines can understand.

Vectors can be obtained from neural networks, on almost any type of data.

Why are vectors important for businesses?

Relevance and similarity problems are present in all facets of businesses; consider the following.

  • Identifying common patterns in large qualitative datasets
  • Predicting risk based on historical patterns
  • Recommending similar actions, content OR products based on criteria that is hidden to humans
  • Searching for deeper meaning and insight within a dataset

Vector technology is the cutting edge solution for solving similarity and relevance problems that your business should consider.

Document searching & classification

Businesses and end users all need to find relevant documents in a timely fashion – e.g. text files, books, pdfs. In case of large amounts of documents searching for relevant ones can be an onerous task.

How vector technology solves this, is that the content held within these documents is vectorised, storing in document vectors.

These vectors can then be searched with their similarity compared via clustering. This method automatically clusters similar documents groups together based on the meaning of their content.

Example document classification use cases include:

  • Semantic search
  • Personalised recommendation in document management systems
  • Personalised book recommendations
  • Support ticket classification
  • Tech & SaaS Knowledgebases
  • Classification of articles and content.

Sports & eSports player coaching

Sports & eSports gaming have been revolutionised since the creation of Sabermetrics and the evolution through to Moneyball & beyond.

Unstructured data can produce insights for teams and players that significantly improves their performance by allowing teams / players to review their previous game statistics and evaluate performance..

Datapoints on player actions and game statistics are a prime candidate for vectorisation. These vectorised datasets will contain information about playstyle, successful player moves, tendency to take risk throughout the game and more.

Vector similarity can work out whether player attributes or actions are compared to ideal patterns (e.g. other pro players), by creating vector clusters to explain correlation.

After in-depth analysis of players’ in-game actions and engagement, actionable, personalized recommendations can be provided to players on how to improve.

Payment & Credit Card Fraud Detection

Online payment fraud is an international problem costing businesses $20 billion a year. How much of this affects your business?

As e-commerce grows, unfortunately so too does payment fraud with defrauding methods becoming increasingly more sophisticated.

As payment and credit card fraud detection is a complex problem to solve – vector-based technology is the best method at scale to make predictions about fraudulent payment transactions. This is achieved through analysing previous examples of payment usage, profile data and overall fraud patterns.

All of these pieces of data relating to transaction details, customer profile details & customer engagement data can be vectorised securely.

Knowing which patterns resulted in fraud can inform about potential future fraud risk. Fraud vectors can be clustered so suspicious new transactions can be identified if they fall into the same fraud cluster.

Risk identification and analysis

Businesses can better understand their risk exposure through unstructured data – however, without the right analytics this data remains dormant and inactive.

We can build risk profiles of different scenarios from historic data with a high degree of accuracy and confidence.

Input data relating to previous negative outcomes can be vectorised and stored then interpreted.

Low, medium and high risk profiles can be defined with clustering, with new potential risks flagged should they fall into a risk cluster identified previously.

Subsequently, interventions for risk mitigation can be recommended and subsequently analysed.

Example use cases include construction and other project risk management and student and employee dropout risk analysis.

User / customer feedback analysis

Analysing feedback from users and customers takes a lot of effort meaning manual analysis is not feasible at scale.

Vector technology enables companies to automatically make sense of qualitative feedback through sentiment analysis, natural language summary and topic modelling.

Natural language can be vectorised for deep analysis behind the meaning of words, sentences and paragraphs. Vector clusters will indicate the most representative topics and sentiment, with insights in text able to be searched semantically.

Example use cases include:

  • Analysis of user research notes
  • Customer review analysis
  • Analysis of UI / UX testing
  • Analysis of video recordings.
  • Research into customer insights through live chats & emails to support.

Social media campaign analysis

Analysing metrics is a marketing best practice, however when it comes qualitative marketing metrics such as images and comments, traditional analysis tools don’t provide a worthwhile solution.

Data points from social media such as post caption, description, posted picture / video, comments, likes, impressions and other engagement data can all be vectorised easily.

These vectors are then clustered to reveal patterns about low, medium and high conversion rates.

Example use cases include the analysis of thousands of Reddit, LinkedIn, Twitter, Instagram posts to surface analysis results about best converting pictures and captions.

Customer segmentation

Customer segmentation is the process of dividing current or potential customers based on certain characteristics. In case these characteristics involve unstructured and qualitative data signals at large volumes, it is a very challenging task.

Customer characteristics such as profile and engagement data can be vectorised. Once these vectors are clustered, each customer cluster will include those personas that are the most similar and relevant to each other.

Example use cases include segmenting current customers and building churn models to figure out who are most likely to abandon the service. Analysing potential customer segments to know who to target at the right time, with the right messaging.

Image categorization

Image is a common type of unstructured data that is hard to analyse. Image categorization and classification show the most similar images in a given dataset. These categories can be used to create image collections that are highly relevant to each other. Images then can be recommended based on other similar images.

Images can be vectorized, without labelling images or using metadata properties. Image vectors then can be clustered based on common features and characteristics. Images that are the most similar to each other are categorised in the same clusters.

Example use cases include systems that enable search by image (e.g. Taobao image search), image recommendation systems (e.g. Pinterest), or quality control based on tens of thousands of photos made of products by mystery shoppers.

Duplicate detection

Duplicate detection is a crucial capability in asset management and data storage systems.

In some cases, users want to find duplicates that are exactly the same, whilst in other cases they want to find assets that are almost identical.

When it comes to detecting duplicates in unstructured datasets (e.g. images, free text) it’s an impossible task for traditional BI tools.

Unstructured data can be vectorised to search for similarity. Vectors that have the exact same position will represent duplicates. Arbitrary similarity scores can be set by clustering vectors and defining cluster sizes arbitrarily.

Example use cases include:

  • Puplicate detection in digital asset management to filter out duplicates
  • Application lifecycle management where detecting requirement duplication is important for regulatory approval.

How can your business take advantage of vectors?

To take advantage of vectors, your data scientists, analysts, practioners or machine learning engineers will need a platform that enables data experimentation through vectors.

This is where RelevanceAI comes into play.

Book your platform demo here with our vector experts and learn how you can take the next steps.

Alternatively with knowledge of Python and Juypiter notebooks, you can create an account and get started today.

Gain a competitive advantage using vectors to analyse unstructured data
Benedek Zajkas
March 4, 2022
Find out how your business can glean insights through unstructured data with vectors

Book a demo with our experts today.