Stripe’s Credit Card Payment Fraud Detection Case Study at a glance
Stripe is the 3rd largest private company globally by market capitalisation as of February 2022, a leading provider of online payments infrastructure.
With advancements in machine learning capabilities, Stripe significantly expanded its business offerings to support payments (as well as those made by credit card) and broader business operations.
Online credit card & payment fraud is estimated to cost online businesses more than $20 billion a year, with a staggering 18% growth year-on-year. As e-commerce grows, so too does payment fraud.
Payment and credit card fraud detection is a challenging, complex problem to solve – however, Stripe uses state-of-the-art vector-based technology to make fraud payment predictions based on previous examples of data signals and connecting output.
Stripe’s vector-based credit card fraud detection matters
The purpose of this case study is to provide an insider perspective to data scientists, decision-makers, and practitioners about the benefits of a vector solution in payments fraud detection.
Through the use case of one of the most successful payments infrastructure companies globally, we argue that vector-based systems are state-of-the-art solutions to fraud detection.
Online payments fraud is estimated to cost online businesses more than $20 billion a year, with a staggering 18% growth year on year. As e-commerce grows, so does payments fraud. It is not only a huge problem but also a very challenging one to solve.
Vector-based clustering is a state-of-the-art solution that large companies with sufficient resources – such as Stripe – utilize to combat fraud.
Today, however, this technology is available for any organisation, independent of its size and whether it possesses internal machine learning capabilities.
Stripe is the leading online payments & credit card gateway infrastructure.
Stripe, founded in 2009, is the 3rd largest private company globally by market capitalisation as of February 2022, with a valuation of $95 billion.
Stripe processes billions of dollars each year for businesses of all sizes. Companies use Stripe’s software and APIs to accept payments, send payouts, and manage their businesses online.
With advancements in its machine learning capabilities, Stripe significantly expanded its business offerings to support payments and more comprehensive business operations.
Stripe Radar utilizes state-of-the-art machine learning to detect payment fraud.
Stripe Radar is a vector-based clustering solution built to identify and handle online credit card fraud for Stripe clients. Its goal is to assess whether a cards use was legitimate or fraudulent, based on specific patterns.
Stripe Radar clusters and assigns risk scores to each online payment, then flags payments beyond a certain risk threshold.
Online payment fraud explained.
Online payment fraud (aka online credit card fraud) happens when the card owner does not authorize the credit card transaction (fraudster uses stolen credit card details, for example), but the transaction still goes through.
Fraudsters acquire credit card details via scam payment pages or other means, then use these details to purchase goods online, impersonating the cardholder. The online seller then processes the payment and sends the goods to the fraudster. The cardholder notices the charges and submits a chargeback claim.
Online payment fraud causes significant losses.
In case of online payment fraud, businesses are liable for paying the money back to the card owner (chargeback), incurring the cost of lost goods and additional operational costs in the form of fraud investigation, fighting disputes, network fees, and customer churn.
Global businesses lose about 1.8% of revenue each year due to online payment fraud.
Fraud is scary and frustrating to consumers, as their card and personal details always have the possibility of being stolen. They also need to deal with the chargeback claim & credit card blocking and changing their credentials.
In addition to this, e-commerce sales are projected to reach $7.4 trillion in 2025 globally, with a 10% compound annual growth rate. Due to this rapid acceleration, there is a significant increase in online payment fraud every year ($20 billion in 2021, 18% annual growth rate).
As consumer payments increasingly move online, many consumers are exposed to making payments and store card details online, fraudsters are following.
Challenges in fraud detection that Stripe set out to solve
A data science and business problem
Detecting fraud in a highly automated and reliable manner is a challenging task. Businesses need to produce good machine learning models by implementing the right technology with suitable algorithms. We propose that vector solution is the state-of-the-art technology to handle fraud with machine learning.
Fraud detection is not only a data science but also a business problem. Businesses need to pick the right policy to decide how much potential fraud to block as there is a trade-off between different approaches. They must be able to change and customize policies and rules to fine-tune protection.
Trade-offs in payment fraud detection
Businesses do not only need to detect false negatives accurately but also need to filter out false positives.
False-negative in fraud detection is when the system does not detect and block a suspicious, fraudulent transaction even though it should. As written above, this is a huge issue for online businesses. In such cases, they lose money, goods and reputation.
An additional difficulty with fraud detection is that the system should “not only prevent but also allow.” It should effectively detect and handle false positives as well.
False-positive in fraud detection is when the system labels a transaction as fraudulent even though it is legitimate, eventually blocking a customer.
Blocking purchases by legitimate customers, businesses suffer immediate revenue loss and reputation damage. Surveys show that approximately 33% of US consumers don’t return to retailers if their legitimate online payments get blocked.
There is a trade-off between false positives and false negatives. We can reduce false negatives by blocking more transactions, resulting in increased false positives. On the other hand, we can reduce false positives by allowing more transactions, which increases the frequency of false negatives.
Fraud in 2022 is more complex & sophisticated than ever.
Nowadays, it’s even more challenging to combat payment fraud as sophisticated fraudsters always find new techniques to carry it out. A few ways how fraudsters improve their processes include:
- Acquiring personal IDs and personal documents and creating accounts under these synthetic IDs
- Creating multiple instances of virtual machines
- Changing the virtual location to where the stolen card is registered
- Trying to act like real customers, deleting items and waiting at checkout before purchasing
Requirements of real-time computations and real-world application
As fraud trends change quickly and fraudsters find new ways to commit payment fraud, the vector infrastructure that is used to detect false negatives and false positives needs to adapt in real-time.
Stripe developed a particular vector infrastructure to retrain models with the latest data. There is also a need to fine-tune vector encoding models, testing which attributes are most relevant.
The Stripe vector-based fraud detection solution
Unstructured data is encoded into vectors.
Stripe Radar is built on the Stripe network, using all available Stripe data. Therefore, encoding models that generate vectors are trained with hundreds of millions of data points and consider thousands of signals to detect and prevent fraud.
Stripe processes payments from 195 countries, with an 89% chance that the card number has been used before on the Stripe network.
When cardholders claim they didn’t authorise the transaction, data is generated by credit card issuers in the form of so-called TC40 and SAFE reports. These reports contain transaction details, merchant details, information about the cardholder claiming the fraud, and banking information of the merchant and those involved in the transaction.
These data points are vectorised and stored in embeddings. Vectors are then clustered, so their similarity and relevance to each other can be compared and used to surface fraud patterns. These signals make it possible to predict fraud with high confidence.
Leveraging vectors for more efficient training
Stripe’s vector-based solution uses various data signals to predict the likelihood that a charge will result in a fraud dispute. With vectors, Stripe makes predictions based on previous examples of data signals and connecting output.
A vector-based solution is an effective way to learn higher-level concepts without explicit training. For example, fraud patterns are often unevenly distributed geographically. Using vectors and vector clusters it is relatively easy to identify the same way in Brazil, even if it first appeared in the U.S. just recently. All of which can be done without further training.