Life at Tinder | How On-Device AI Models Find Your Best Tinder Profile Photos

At Tinder, the quality of your profile photos is paramount. However, asking users to manually sift through their entire camera roll to select images that best represent them can be an overwhelming task. Users often spend a great deal of time searching for that one perfect shot, only to remain uncertain if their chosen photos will generate the desired responses.

This challenge has significant implications. Many users end up putting minimal effort into crafting their profiles, often uploading only a couple of images or photos that don’t effectively showcase their personality. Such shortcomings in photo selection can result in profiles that feel less genuine and engaging, which affects the overall user experience and key engagement metrics.

To address these issues, we envisioned a radical improvement in the profile creation process. Leveraging on-device AI, our solution analyzes the user’s photo library, identifies the most compelling images, and intelligently recommends them for their Tinder profile — all while operating securely on the user’s device. Advances in on-device AI and modern smartphone capabilities have empowered our team to transform a previously daunting task into an efficient, intelligent, and effortless experience that elevates the authenticity of every profile.

‍

Tinder’s AI Photo Selector

Our engineering team broke down this complex feature into several key components. This endeavor leveraged third-party libraries along with significant contributions from Tinder’s in-house ML team and data scientists, all working together to optimize and elevate accuracy to its highest potential.

‍

Primary Challenges

As we began thinking about what an AI photo selector should be for Tinder users, we identified several key questions to answer before writing any code.

Capturing the User’s Face: How can we reliably extract the user’s face as the definitive reference for scanning camera roll assets?
State Management: How do we handle different states when models are downloading, selfie is being captured (or existing profile photos are analyzed), and pre-processing occurs at various intervals?
Efficient Processing: With thousands of assets, how do we process them concurrently and in a resource-efficient manner?
Face Detection: For each asset, how can we accurately determine whether it contains a face?
— User Face Recognition: If a face is detected, how do we verify that it actually belongs to the user?
— Photo Optimization: How can we assess if a particular photo has the highest probability of success?
— Content Safety: How do we ensure that the photo recommended is safe to display?
Performance Analytics Integrity: How do we ensure that analytics are accurately captured as assets are processed?

‍

Capturing the User’s Face

Initially, the AI Photo Selector prompted users to take a selfie.

To handle camera access and image capture, we utilized Apple’s AVFoundation, which provided all the necessary components. From a single image, we introduced a new component, VisionFaceDetectionService, designed to answer two critical questions:

‍

1. Identifying the number of faces in each image

Apple’s Vision framework makes this straightforward via VNDetectFaceRectanglesRequest, which detects the total number of faces in a CGImage.

let handler = VNImageRequestHandler(cgImage: cgImage, orientation: cgOrientation)

do {
    // Detect face rectangles and ensure there's only one face in the image.
    let faceRectanglesRequest = VNDetectFaceRectanglesRequest()
    try handler.perform([faceRectanglesRequest])

    if let faceRectanglesResult = faceRectanglesRequest.results {
        if faceRectanglesResult.isEmpty {
            return .success(.detectedNoFace)
        } else if faceRectanglesResult.count > 1 {
            return .success(.detectedMultipleFaces)
        }
    } else {
        return .success(.detectedNoFace)
    }
}

This approach allowed us to filter out selfies containing multiple faces. Moreover, Apple’s API conveniently supplies the bounding box position and size for each detected face.

let faceBoundingBoxRatio = faceRectanglesRequest.results?.first?.boundingBox

‍

2. Normalizing the face crop

Simply cropping the face based on the bounding box was insufficient in cases where the selfie might be taken at an angle. To address this, we developed a “normalized crop” implementation. This technique aligns the face in a standard position by applying affine transformations so that the eyes, nose, and mouth match a known reference (e.g., a 112×112 coordinate space). The process involves:

Declaring standard coordinates for eyes, nose, and mouth in a 112×112 grid.
Detecting the face bounding box using Apple’s Vision framework.
Computing an affine transform that maps the selfie’s face onto the normalized reference.
Employing a least squares solution (Ax = B) to minimize errors and accurately align source points with the destination coordinates.

/// The following coordinates represent the desired positions of the landmarks of the face in a 112x112 image.
static let landmarkDest: [CGPoint] = [
    CGPoint(x: 38.2946, y: 51.6963), // Left eye
    CGPoint(x: 73.5318, y: 51.5014), // Right eye
    CGPoint(x: 56.0252, y: 71.7366), // Nose tip
    CGPoint(x: 41.5493, y: 92.3655), // Left lip corner
    CGPoint(x: 70.7299, y: 92.2041) // Right lip corner
]

// Extract facial landmarks.
let faceLandmarksRequest = VNDetectFaceLandmarksRequest()
faceLandmarksRequest.revision = 3
try handler.perform([faceLandmarksRequest])

guard faceLandmarksRequest.results?.count == 1 else {
    return .success(.detectedMultipleLandmarks)
}

guard let landmarks = faceLandmarksRequest.results?.first?.landmarks else {
    return .success(.detectedNoLandmarks)
}

// Obtain the coordinates from the landmarks.
let landmarkPoints = convertLandmarksToPoints(landmarks: landmarks, imageSize: image.size)

// Crop the face from the image and apply affine transformation
let faceImage = normCrop(image: image, landmarks: landmarkPoints)

By using normalized cropping, we can reliably extract a reference face from any asset, regardless of angle — thereby enabling accurate comparisons when searching a user’s camera roll for potential profile photos.

‍

Addressing funnel drop-offs

A key product concern arose around users who might decline to take a selfie, potentially causing significant funnel drop-offs. Not all users are comfortable capturing a selfie on demand. They might be in a public space or simply uneasy about sharing a photo. To mitigate this, engineering introduced a fallback mechanism called BasePhotoInference. Rather than insisting on a new selfie, we scan any existing profile photos for a single recognizable face with a suitable bounding region. When found, we ask for the user’s consent to confirm that it matches their face.

let visionProcessed = visionFaceDetectionService.runModel(
    onImageData: data,
    isReferenceSelfie: true
)

switch visionProcessed {
    case let .success(visionRunModelResult):
        switch visionRunModelResult {
            case let .identifiedAndCropped(visionProcessedImage):
                if let imageData = visionProcessedImage.imageData {
                    visionImageData[index] = imageData
                }
                if visionProcessedImage.faceBoundingBoxRatio > largestFaceRatio {
                    largestFaceRatio = visionProcessedImage.faceBoundingBoxRatio
                    indexWithLargestFaceRatio = index
                }

This approach had a substantial impact on reducing funnel friction — sometimes, the best process is none at all. Instead of coaxing users into a dedicated selfie, we seamlessly leverage their existing images. The VisionFaceDetectionService again plays a crucial role here, detecting whether there is exactly one face present in each profile photo. Once confirmed, we can proceed with the normalized crop flow and offer an optimized photo selection without burdening the user with extra steps.

‍

State Management

Before attempting to enter into processing step, multiple tasks run in parallel before processing can begin:

Model Preparation: Downloading and decrypting multiple TensorFlow Lite models, then loading and compiling them to initialize the interpreters.
Third-Party SDK Setup: Downloading a separate ML model from a third-party SDK for facial similarity checks, verifying licenses, and initializing SDK manager classes.
Selfie Detection: Using Apple’s Vision framework to detect a single face in the user’s selfie, then applying resizing and normalized cropping.
Fallback Approach: If a selfie is not provided, scanning existing profile photos for a single face that can serve as a base reference. For new installs, the system must first confirm these nine photos are properly downloaded.
User Trigger: Finally, the feature needs to monitor when the user actually taps “Let’s do it” to trigger the entire process.

Given these parallel tasks, Apple’s Combine framework was a natural fit. In Tinder’s Node architecture, the feature’s Context and corresponding State object offer a centralized place for orchestration. We declare the State as a @Published property, allowing property changes to be observed. By chaining operators such as .dropFirst(), compactMap, and .filter, we ensure the processing flow only proceeds once all prerequisites, akin to synchronized “light switches” are met.

/// The State instance
@Published internal private(set) var state: PhotoSelectorNavigationState

var currentStep: PhotoSelectorNavigationStep

var selfieImageData: Data?
var preProcessedImageData: PreProcessedFaceImageData?
var modelsDownloadState: ModelsDownloadState = .incomplete
var externalModelsDownloadState: ModelsDownloadState = .incomplete

var profileImagesState: ProfileImagesState = .notDownloaded

var canProcessImagesUsingSelfie: Bool {
    externalModelsDownloadState == .succeeded &&
    modelsDownloadState == .succeeded &&
    selfieImageData != nil &&
    currentStep == .selfieCapture
}

var canProcessImagesUsingProfilePhoto: Bool {
    externalModelsDownloadState == .succeeded &&
    modelsDownloadState == .succeeded &&
    currentStep == .basePhotoInference &&
    preProcessedImageData != nil
}

For instance, when the models finish downloading, we update a modelsDownloadState, which then triggers downstream Combine pipelines.

internal func fetchSelectorModels() {
    fetchModels()
        .result()
        .receive(on: UIScheduler.shared)
        .sink { [weak self] result in
            guard let self else { return }
            switch result {
            case .success:
                state.modelsDownloadState = .succeeded
            case .failure:
                state.modelsDownloadState = .failed
            }
        }.store(in: &cancellables)
}

Filtering for conditions like canProcessImagesUsingSelfie or canProcessImageUsingProfilePhoto ensures we only proceed when all required states are true. From there, we retrieve the image data, apply normalization or cropping, and pass it through .receive and .sink.

private func observeProcessingViaBasePhoto() {
    $state
        .dropFirst()
        .filter { $0.canProcessImagesUsingProfilePhoto }
        .compactMap { $0.preProcessedImageData }
        .receive(on: UIScheduler.shared)
        .sink { [weak self] preProcessedImageData in
        // Attach processing step

private func observeProcessingViaSelfie() {
    $state
        .dropFirst()
        .filter { $0.canProcessImagesUsingSelfie }
        .compactMap { $0.selfieImageData }
        .flatMap { [normCropImageData, preProcessImage] in
            Just($0)
            .flatMap { normCropImageData(imageData: $0) }
            .flatMap { preProcessImage(imageData: $0) }
            .result()
        }
        .receive(on: UIScheduler.shared)
        .sink { [weak self] result in
        // Attach processing step

This setup centralizes state transitions in one place, making the codebase more readable, maintainable, and easier to adapt as new product requirements arise.

‍

Concurrent and efficient image processing

Concurrent image processing required careful architectural planning to ensure both performance and reliability. Engineering chose to implement an OperationQueue with a clearly defined maximum number of concurrent operations, allowing for precise control and straightforward adjustments to optimize the user experience. Additionally, the ability to cancel operations on-demand was critical, coupled with a timeout feature to prevent the device from hanging indefinitely during intensive processing.

The timeout mechanism works by automatically canceling all ongoing operations within the OperationQueue if processing exceeds a predefined duration. Otherwise, processing is structured as follows:

For instance, processing 1,000 photos involves adding 1,000 asynchronous operations to the OperationQueue, followed by one additional BlockOperation. This final operation explicitly depends on the completion of all previous operations, resulting in a total of 1,001 operations in the queue. Through extensive instrumentation and profiling, engineering identified a maximum of 8 concurrent operations as the ideal threshold — balancing optimal utilization of device capabilities while avoiding battery drain or UI thread stuttering.

internal func setUpAssetProcessOperations(for assets: [PHAsset]) -> [AssetProcessOperation] {
    // Loop through assets and prepare AssetProcessingOperation.
    let operations: [AssetProcessOperation] = (0..<assets.count).compactMap { index in
        guard let interpreter = self.interpreter else { return nil }
        let operation = AssetProcessOperation(
            visionFaceDetectionService: visionFaceDetectionService,
            faceRecognitionService: faceRecognitionService,
            interpreter: interpreter,
            imageFactory: imageAssetsFactory,
            asset: assets[index],
            baseEmbedding: baseEmbedding
        )

internal func enqueueFlushOperation(
        operations: [AssetProcessOperation],
        promise: @escaping (Result<ProcessedAssets, Error>) -> Void
) {
    // Set up flushing results operation to depend on all queued operations.
    let operation = BlockOperation()
    operation.completionBlock = { [unowned operation, weak self] in
        guard let self else { return }
        let processedAssets = processFlushOperation(
            isCancelled: operation.isCancelled,
            isTimedOut: didTimeOut,
            operations: operations
        )
        promise(.success(processedAssets))
    }

    operations.forEach { operation.addDependency($0) }

    Log.debug("[Photo Selector] Operations start")
    // Queue all operations + flushing operation.
    operationQueue.addOperations(operations + [operation], waitUntilFinished: false)
}

Each individual operation leverages the previously mentioned VisionFaceDetectionService to detect a face. Next, it utilizes a third-party facial-recognition SDK (FaceMeSDK) to verify if the detected face matches the user’s base photo as reference. If the similarity surpasses a defined threshold, Tinder’s own TensorFlow Lite model infers a high probability of receiving likes. Only images successfully completing all these stages are instantiated as a ProcessedImage object with an expected probability of receiving likes score.

override func execute(finish: @escaping Closure) {
    guard let visionProcessed = self.performFaceDetection(resizedImage) else {
    finishOperation("[AssetProcessOperation] Vision did not detect a face.")
    return
    }
    faceDetected?()

guard let imageData = visionProcessed.imageData,
      let recognized = self.performFaceRecognition(imageData),
      let similarityScore = self.faceRecognitionService.computeSimilarity(
      self.baseEmbedding, recognized),           self.faceRecognitionService.isSimilar(similarityScore)
else {
    finishOperation("[AssetProcessOperation] Vision did not provide imageData.")
    return
}
faceIdentified?()

guard let score = self.performScoreInference(resizedImage) else {
    finishOperation("[AssetProcessOperation] Scoring Failed.")
    return
}
scored?()
processedImage = .init(

let interpreter = try getInterpreter()
let inputTensor = try interpreter.input(at: 0)

// Crops the image to the biggest square in the center and scales it down to model dimensions.
// Remove the alpha component from the image buffer to get the RGB data.
let shape = imageInputShape
let scaledSize = CGSize(width: shape.width, height: shape.height)
guard let scaledPixelBuffer = pixelBuffer.resized(to: scaledSize),
      let rgbData = scaledPixelBuffer.rgbData(
          byteCount: inputTensor.shape.dimensions.reduce(1, { x, y in x * y }),
          isModelQuantized: inputTensor.dataType == .uInt8,
          normalization: normalization) else {
    throw MachineLearningError.inputError
}

// Copy the RGB data to the input `Tensor`.
try interpreter.copy(rgbData, toInputAt: 0)

preProcessTime = InterpreterMetrics.now - preProcessTime
inferenceTime = try InterpreterMetrics.measure {
    try interpreter.invoke()
}

postProcessTime = InterpreterMetrics.now
for i in 0..<interpreter.outputTensorCount {
    output.append(try convertOutputTensor(try interpreter.output(at: i)))
}

static func resize(image: UIImage) -> UIImage {
    // Ensures 3pt -> 1px rgbData model size is computed correctly.
    // Without format, rgbData's size is three times bigger than model expects.
    let format = UIGraphicsImageRendererFormat.default()
    format.scale = 1

    return image.resize(
        size: Constants.targetAssetSize,
        preservingAspectRatio: false,
        rendererFormat: format
    )
}

private let options: PHImageRequestOptions = {
    let options: PHImageRequestOptions = .init()
    options.version = .current
    options.isSynchronous = true
    options.isNetworkAccessAllowed = false
    options.resizeMode = .none
    return options
}()

manager.requestImage(
    for: asset,
    targetSize: targetSize,
    contentMode: .aspectFit,
    options: options
) { uiImage, _ in
    guard let uiImage else {
        Log.debug(scope: .aiExperience, "no image found for asset provided")
        return
    }
    promise(.success(uiImage))

Upon completion, a final array of ProcessedImage objects is created by iterating through all operations using compactMap. These processed images are then sorted based on their likes probability scores, with the top 100 images undergoing an additional moderation check to ensure safety before being recommended to the user.

// If either timed out and leading to operations to complete
// or all operations actually completed, grab all processed images.
let images = operations.compactMap(\.processedImage)

iPhone devices with neural engine CPUs were tested repeatedly using 1000 photos to benchmark the performance to get a sense of what boundary is comfortable for users. Tinder’s current users device distributions were taken into account and the team was able to come up with experiment boundaries such as list of devices to support the feature, OS level, total number of photos to scan, etc.

‍

Content Safety

At Tinder, trust and safety are fundamental to every feature we build — and AI Photo Selector is no exception.

To ensure we never recommend unsafe or inappropriate images, our ML team developed a dedicated moderation TensorFlow Lite model designed to rigorously assess photo content. This moderation model evaluates images against multiple dimensions — such as detecting underage individuals, violent content, text-heavy images, or other harmful scenarios — each with specific threshold criteria. Any photo exceeding these safety thresholds is automatically excluded from final recommendations.

Given that moderation applies exclusively to the top 100 candidate images, extensive concurrent processing wasn’t necessary. Instead, linear processing proved sufficient, allowing us to closely monitor accuracy and consistently maintain high safety standards. Significant engineering effort was devoted to ensuring Tinder’s recommendations remain safe, reliable, and trustworthy for all users.

Measuring the success of Tinder’s AI Photo Selector

Engineering needed a reliable way to measure the performance of AI Photo Selector without exposing private or confidential user metadata. Specifically, we aimed to capture metrics such as the total number of photos containing faces, how many of those faces matched the user’s base image, and ultimately, the feature’s impact on users’ match rates and photo-upload behaviors.

Due to the concurrent processing architecture, capturing these analytics required careful linearization. We introduced a dedicated analytics DispatchQueue to ensure accurate and synchronized counting. Closures were strategically placed at various operation stages, and we leveraged DispatchGroup’s enter() and leave() methods to enforce sequential counting and reliable reporting at the conclusion of processing. This approach provided robust analytics while safeguarding user privacy. When the final flush operation is called, most importantly this dispatchGroup was awaited until finished to ensure accurate count.

/// Count detected and identified for analytics purposes.
operation.onFaceDetected { [weak self] in
     guard let self else { return }
     analyticsCounterGroup.enter()
     analyticsCounterQueue.async { [weak self] in
         guard let self else { return }
         analyticsData.numPhotosFaceDetected += 1
         analyticsCounterGroup.leave()
     }
}
internal func processFlushOperation(
    isCancelled: Bool,
    isTimedOut: Bool,
    operations: [AssetProcessOperation]
) -> ProcessedAssets {
    guard let startTime = analyticsData.startTime else {
        Log.debug("[Photo Selector] Operations enqueued without start time.")
        return ProcessedAssets()
    }
    // Flush when counting finishes.
    analyticsCounterGroup.wait()

    // Ensure to fulfill promise once finished.
    let processedImages: [ProcessedImage]
analyticsData.totalProcessTime = Date().timeIntervalSince(startTime)

‍

The impact of our AI Photo Selector

After extensive experimentation, AI Photo Selector has now been rolled out globally to all Tinder users. Initial success metrics are promising, and Tinder’s engineering team is committed to ongoing refinements to further enhance the feature’s performance and user experience.

‍

Key Takeaways and Future

When it comes to concurrent image processing on-device, it is important to minimize the load by filtering any unwanted assets early (i.e. not having a face, not matching the user’s face), and experiment with max concurrent operations depending on the load of each operation.
Resizing any image can be the point of performance improvement by ensuring lowest scale of 1.0 and providing closest targetSize into the PHImageManager
Combine provides an excellent structure for state management to wait until multiple tasks must complete in order to start processing.
On Device AI features unlock a suite of creative features that is within the privacy of users, access to much more data (thousands of photos on user’s device) without ever leaving the device.
Rather than incrementally improving conversion rate on selfie step, engineering’s recommendation on attempting to detect a base photo from existing profile photos was a monumental funnel drop improvement.
AI Photo Selector utilized a ton of cutting edge tools like Vision, CoreML, and TensorflowLite, CryptoKit, Combine, and concurrent async operations with high level of precision and monitoring instrumentations.

With powerful new APIs emerging each year, along with continual advancements in device compute capabilities, Tinder Engineering remains committed to developing tools that help craft the best possible version of you — enabling meaningful matches and sparking genuine connections.

How On-Device AI Models Find Your Best Tinder Profile Photos was originally published in Tinder Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

How On-Device AI Models Find Your Best Tinder Profile Photos