Life at Tinder | How the Tinder iOS App reduced the size of our localizations by 95% using Emerge

Authored by: Maxwell Elliott and Connor Wybranowski

The Tinder iOS application is used in over 190 countries around the world. In order to operate in each of these countries, we need to provide a localized experience. A critical aspect of this localized experience is seeing the correct copy for the current user’s locale, whatever that may be. In practice, this means we might support over 50 languages at any given time for any feature we deliver to our end users. Over time, there is a material cost to shipping all supported locales to our users, especially those in more network constrained markets.

‍

Problem

Localization on Apple based platforms uses a common directory filepath pattern to enable string localization. Each folder is named by its supported locale: for example en.lproj maps to US English, fr.lproj maps to French, and then each strings/stringdict file residing in the folder will be used for localizations in that language. Below is a visual representation of this mapping:

Localization
├── en.lproj
│ ├── Localized.strings
│ └── Localized.stringsdict
├── fr.lproj
│ ├── Localized.strings
│ └── Localized.stringsdict

This structure is repeated for each target that uses localization in our build graph, meaning there will be at least 50 Localized.strings and Localized.stringdicts files per target. For applications that use static linking, this means that each of these files will need to be namespaced by the target they reside in since we cannot have filepath collisions in the final iOS App Store Package (IPA). For example, if there were two targets TargetA and TargetB, then the tree of localized files would need to look something like this for static linking to work:

TargetA
├── Localization
│ ├── BUILD
│ ├── en.lproj
│ │ ├── TargetA_Localized.strings
│ │ └── TargetA_Localized.stringsdict
│ ├── fr.lproj
│ │ ├── TargetA_Localized.strings
│ │ └── TargetA_Localized.stringsdict
TargetB
├── Localization
│ ├── BUILD
│ ├── en.lproj
│ │ ├── TargetB_Localized.strings
│ │ └── TargetB_Localized.stringsdict
│ ├── fr.lproj
│ │ ├── TargetB_Localized.strings
│ │ └── TargetB_Localized.stringsdict

What we have so far will work, but there is an issue presented by Apple’s code signing process. Code signing has a minimum size per file of 4KB, meaning that each localized file regardless of the content will take up at least 4KB in the final IPA due to code signing.

To make a meaningful reduction in size, we needed to reduce the number of files shipped to our end users. Additionally, there are possible wins by reducing the content stored in the localization files while still maintaining fidelity. After examining our localization files, you can see a common pattern in the strings:

/* Title text to display in Account Settings for Email setting */
“account_settings.email” = “Email”;

/* Detail text to display in Account Settings when email needs to be verified */
"account_settings.emailVerifyNow" = "Verify Now";

/* Title text to display in Account Settings for Email setting */“account_settings.email” = “Email”;/* Detail text to display in Account Settings when email needs to be verified */"account_settings.emailVerifyNow" = "Verify Now";

The comment above each string file is only used by translators and cannot be used by clients. There is also unused whitespace in these files. By removing these comments and whitespace, we can reclaim some space without affecting our end users. This has an additional benefit of removing potentially proprietary information from our localization files.

After removing this unused content we end up with something like this:

“account_settings.email” = “Email”;
“account_settings.emailVerifyNow” = “Verify Now”;

“account_settings.email” = “Email”;“account_settings.emailVerifyNow” = “Verify Now”;

Things are looking better at this stage, but now we can see that for each localization file we repeat the key over and over again for every supported language. This can be optimized using Emerge’s SmallStrings tool.

SmallStrings can compress the keys and values for each language into LZFSE files that can be dynamically decompressed at runtime to fetch strings. Lucky for us we already use code generation to consume our localized strings, so we were able to seamlessly inject SmallStrings into our existing build; from the developer’s perspective, there were no observable changes to any workflows.

Ultimately, we ended up with a file structure that looks similar to the example below, where the keys were compressed in a single file and each language’s values were compressed in a separate file.

Tinder.app
├── keys.json.lzfse
├── en.values.json.lzfse
├── fr.values.json.lzfse
├── he.values.json.lzfse
… (full locale list omitted for brevity)

‍

Solution

Although the steps described in the previous section are relatively straightforward, there are important considerations when addressing the problem at scale. In reality there are two distinct challenges that need to be solved in order to achieve the highest level of minification:

Collapse all localization files into a single file per language
Minify this final merged localization file using Emerge’s SmallStrings

Both of these tasks must be completed outside the critical path and the work should be cached, otherwise we risk introducing a meaningful build time impact by adopting this solution. Luckily, our use of Bazel ensures that these actions will be performed out of the critical path and the results will be cached until the underlying strings content changes.

‍

Collapse all localization files into a single file per language

To accomplish this part of the effort we implemented a new rule called namespaced_strings. This rule gathers the localizations for each target in the graph and returns a Provider containing the namespaced version of each localization file.

You can view some Starlark pseudocode below:

NamespacedStringsInfo = provider(
    doc = "Provides lever configuration information.",
    fields = {
        "namespace": "The namespace used to namespace the localized keys.",
        "namespaced_localizations": "The localized strings.",
        "unmodified_localizations": "The unmodified localized strings.",
    },
)

def _namespaced_strings_impl(ctx):
   namespaced_localizations = []
   unmodified_localizations = []
   for src in ctx.files.srcs:
       ...
       ctx.actions.run(
           mnemonic = "NamespaceStrings",
           inputs = [src],
           outputs = [output_file],
           executable = ctx.executable._strings_tool,
           arguments = [args],
           tools = [ctx.executable._namespace_stringsdict_tool],
       )
       namespaced_localizations.append(output_file)
   return [
       NamespacedStringsInfo(
           namespace = ctx.attr.namespace,
           namespaced_localizations = namespaced_localizations,
           unmodified_localizations = unmodified_localizations,
       ),
   ]

namespaced_strings = rule(
   implementation = _namespaced_strings_impl,
   attrs = {
       "srcs": attr.label_list(
           doc = "List of localization files to consume.",
           allow_empty = False,
           allow_files = [
               ".strings",
               ".stringsdict",
           ],
       ),
       "namespace": attr.string(
           doc = "Used to namespace the localized keys.",
           mandatory = True,
       ),
       …
   },
)

Then using this Provider, we created an Aspect and another rule to trigger it. This Aspect can traverse our entire build graph, collect the namespaced strings providers, and then merge all the localization files into a single file per language.

MergeStringsInfo = provider(
   doc = "Provides lever configuration information.",
   fields = {
       "namespaced_localizations_json_infos": "A List of Files containing a JSON payload representing the namespaced strings.",
       "namespaced_localizations": "The namespaced localizations.",
   },
)

def _collect_namespaced_strings_info_impl(target, ctx):
   namespaced_localizations_json_infos = []
   namespaced_localizations = []
   if hasattr(ctx.rule.attr, "data"):
       for data in ctx.rule.attr.data:
           if NamespacedStringsInfo in data:
               namespaced_localizations.extend(data[NamespacedStringsInfo].namespaced_localizations)
               localized_strings_by_locale = {}
               localized_stringsdicts_by_locale = {}
               for localization in data[NamespacedStringsInfo].namespaced_localizations:
                   locale = localization.basename.split("_")[0]
                   if localization.extension == "strings":
                       localized_strings_by_locale[locale] = localization.path
                   elif localization.extension == "stringsdict":
                       localized_stringsdicts_by_locale[locale] = localization.path
               json_info = {
                   "namespace": data[NamespacedStringsInfo].namespace,
                   "localized_strings_by_locale": localized_strings_by_locale,
                   "localized_stringsdicts_by_locale": localized_stringsdicts_by_locale,
               }
               namespace_localizations_info = ctx.actions.declare_file(
                   data[NamespacedStringsInfo].namespace + "_namespace_localizations_info.json",
               )
               ctx.actions.write(
                   namespace_localizations_info,
                   json.encode(json_info),
               )
               namespaced_localizations_json_infos.append(namespace_localizations_info)
   namespaced_localizations_json_infos_depset = depset(
       direct = namespaced_localizations_json_infos,
       transitive = [dep[MergeStringsInfo].namespaced_localizations_json_infos for dep in ctx.rule.attr.deps if MergeStringsInfo in dep] if hasattr(ctx.rule.attr, "deps") else [],
   )
   namespaced_localizations_depset = depset(
       direct = namespaced_localizations,
       transitive = [dep[MergeStringsInfo].namespaced_localizations for dep in ctx.rule.attr.deps if MergeStringsInfo in dep] if hasattr(ctx.rule.attr, "deps") else [],
   )
   return [
       MergeStringsInfo(
           namespaced_localizations_json_infos = namespaced_localizations_json_infos_depset,
           namespaced_localizations = namespaced_localizations_depset,
       ),
       OutputGroupInfo(
           namespaced_localizations = namespaced_localizations_depset,
       ),
   ]

collect_namespaced_strings_info = aspect(
   implementation = _collect_namespaced_strings_info_impl,
   attr_aspects = ["deps"],
)

def _merge_strings_impl(ctx):
   for dep in ctx.attr.deps:
       if MergeStringsInfo in dep:
           namespaced_localizations = sets.union(namespaced_localizations, sets.make(dep[MergeStringsInfo].namespaced_localizations.to_list()))
           namespaced_localizations_json_infos = sets.union(namespaced_localizations_json_infos, sets.make(dep[MergeStringsInfo].namespaced_localizations_json_infos.to_list()))
   ...
           ctx.actions.run(
               mnemonic = "MergeStrings",
               inputs = sets.to_list(namespaced_localizations) + sets.to_list(namespaced_localizations_json_infos),
               outputs = [localized_strings_output_file, localized_stringsdict_output_file],
               executable = ctx.executable._merge_strings_tool,
               arguments = [args],
               tools = [ctx.executable._namespace_stringsdict_tool],
           )
           output_files.append(localized_strings_output_file)
           output_files.append(localized_stringsdict_output_file)
   return [
       SmallStringsInfo(merged_localizations = depset(output_files)),
       OutputGroupInfo(merged_localizations = depset(output_files)),
   ]

merge_strings = rule(
   implementation = _merge_strings_impl,
   doc = "Transitively merges all strings assets into a single strings file and stringsdict file. This is done in an effort to reduce bundle size of the app.",
   attrs = {
       "deps": attr.label_list(
           doc = "The dependencies to merge strings from.",
           aspects = [
               collect_namespaced_strings_info,
           ],
       ),
   },
)

Thanks to our new merge_strings rule, we now have a way to transitively collect all our namespaced strings for the entire application.

To make this approach work, we also needed to move the ownership of the localization files from individual targets to the running executable. This means that the application and our test targets own the final localization files; targets refer to these bundles to fetch the localizations. This migration was simple since the logic to do this was already code generated, meaning developers were unaware of this migration.

‍

Minify this final merged localization file using Emerge’s SmallStrings

Now that we have a single localization file per language, we have the necessary setup to support using Emerge’s SmallStrings tool. One of the first things we needed to do was migrate the Ruby-based tooling to Swift since we do not want to maintain a Ruby toolchain in the build. Once we had ported the tool’s frontend logic to Swift, we then created cc_binary and cc_library targets for the C code used to perform the LZFSE compression. Once these targets were in place, we could then create new rules that could ingest our NamespacedStringsInfo provider and perform the necessary minification:

SmallStringsInfo = provider(
   doc = "Provides small strings information.",
   fields = {
       "merged_localizations": "A List of Files containing all merged localizations files.",
   },
)

def _small_strings_impl(ctx):
   strings_files = []
   lzfse_output_files = {}
   output_files = []
   for dep in ctx.attr.deps:
       small_strings_info = dep[SmallStringsInfo]
       localizations = small_strings_info.merged_localizations.to_list()
       for localization in localizations:
           if localization.extension == "stringsdict":
               output_files.append(localization)
           elif localization.extension == "strings":
               locale = localization.dirname.split("/")[-1].split(".")[0]
               strings_files.append(localization)
               lzfse_output_files[locale] = ctx.actions.declare_file(
                   ctx.label.name + "/" + "{locale}.values.json.lzfse".format(locale = locale),
               )
               output_files.append(_create_placeholder_file(ctx, localization))
   if strings_files:
       keys_json_lzfse_file = ctx.actions.declare_file(
           ctx.label.name + "/" + "keys.json.lzfse",
       )
       output_files.append(keys_json_lzfse_file)
       sorted_keys_json_file = ctx.actions.declare_file(
           ctx.label.name + "/" + "sorted_keys.json",
       )
       args = ctx.actions.args()
       args.add_all([
           "compress-strings-keys",
           "--compression-tool-path",
           ctx.executable._compression_tool,
       ])
       for strings_file in strings_files:
           args.add("--merged-localized-strings-filepaths")
           args.add(strings_file.path)

       args.add("--keys-json-lzfse-output-path", keys_json_lzfse_file.path)
       args.add("--sorted-keys-json-output-path", sorted_keys_json_file.path)
       ctx.actions.run(
           outputs = [keys_json_lzfse_file, sorted_keys_json_file],
           inputs = strings_files,
           tools = [ctx.executable._compression_tool],
           executable = ctx.executable._strings_tool,
           arguments = [args],
           mnemonic = "SmallStringsKeys",
       )
       for locale, lzfse_output_file in lzfse_output_files.items():
           args = ctx.actions.args()
           args.add_all([
               "compress-strings-values",
               "--compression-tool-path",
               ctx.executable._compression_tool,
               "--sorted-keys-json-path",
               sorted_keys_json_file.path,
               "--locale",
               locale,
               "--values-json-lzfse-output-path",
               lzfse_output_file.path,
           ])
           for strings_file in strings_files:
               args.add("--merged-localized-strings-filepaths")
               args.add(strings_file.path)
           ctx.actions.run(
               outputs = [lzfse_output_file],
               inputs = strings_files + [sorted_keys_json_file],
               tools = [ctx.executable._compression_tool],
               executable = ctx.executable._strings_tool,
               arguments = [args],
               mnemonic = "SmallStringsValues",
           )
           output_files.append(lzfse_output_file)
   return DefaultInfo(
       files = depset(output_files),
   )

def _create_placeholder_file(ctx, src):
   output = ctx.actions.declare_file(src.dirname + "/" + src.basename)
   ctx.actions.write(
       output,
       "\"placeholder\" = \"_\";\n",
   )
   return output

small_strings = rule(
   implementation = _small_strings_impl,
   attrs = {
       "deps": attr.label_list(
           providers = [
               SmallStringsInfo,
           ],
       ),
   },
)

After this change we again needed to update the code generation to use the new SSTStringForKey function to fetch the strings from the bundle. As with the previous changes, this change was merged into the codebase in a transparent way to our developers. Even though this action requires compute time during the build, Bazel’s action scheduler and caching minimizes or eliminates the impact in the vast majority of builds.

‍

Impact

We broke down the wins for each part of this solution to better understand the impact of each on the size of our build:

‍

Collapse all localization files into a single file per language

Download size change: -5.5MB
Install size change: -30.1MB

‍

Minify this final merged localization file using Emerge’s SmallStrings

Download size change: -5.2MB
Install size change: -21.2MB

Overall the effort led to a 10.7MB reduction in our download size and a 51.3MB reduction in app install size without any impact to our developers or end users

‍

Takeaways

Here were some takeaways from this effort:

Code generation can be an incredibly powerful tool for abstracting over large changes such as this.
Packaging many files with a size under 4KB can meaningfully increase the size of your application, making it critical to reduce the number of small files packaged into the final IPA.
Those interested in this approach should develop tests on the packaged localization files before and after this change to ensure that no strings are orphaned by mistake. These tests are also a good way to prove that the number of localization files has been reduced by the change.
Bazel’s hermetic sandbox can transparently deliver improvements to our developers and users. Instead of having to commit the result of these minification actions we can use the hermetic sandbox to hide these implementation details. It would have been infeasible to overhaul our translation pipeline to accommodate this workflow for a single platform.
This effort has surfaced even more wins such as dynamically removing unused strings. Thanks to this approach we see a way to accomplish this and unlock even more improvements.

For those interested in this approach please refer to the SmallStrings repository.

How the Tinder iOS App reduced the size of our localizations by 95% using Emerge

Problem

Solution

Collapse all localization files into a single file per language

Minify this final merged localization file using Emerge’s SmallStrings

Impact

Collapse all localization files into a single file per language

Minify this final merged localization file using Emerge’s SmallStrings

Takeaways

Read similar posts

How We Decomposed Tinder’s Monolith

How On-Device AI Models Find Your Best Tinder Profile Photos

Tinder’s migration to Elasticsearch 8