Overview
Production Readiness
0.9
Novelty Score
0.6
Cost Impact Score
0.7
Citation Count
0
Why It Matters For Business
Automate landing-page generation from content to expand topic coverage, improve collection relevance, and increase organic search indexing with less manual curation.
Summary TLDR
PinLanding is a production system that builds keyword landing pages (KLPs) by first extracting attributes from product images and text (using GPT-4V), consolidating and filtering those attributes, and then training a CLIP-style dual encoder to match products to attributes at web scale. The system auto-generates natural-language collection titles with GPT-4 and assembles feeds with attribute-overlap matching on Apache Spark. In production it created 4.2M shopping pages, increased topic coverage 4×, improved human-evaluated collection precision by 14.29% over search-log baselines, and scored 99.7% Recall@10 on Fashion200K.
Problem Statement
Manual curation and search-log approaches either don't scale to millions of topical landing pages or miss content and produce imprecise collections. Platforms need a scalable way to create high-precision, searchable collections directly from content rather than relying on user queries.
Main Contribution
Content-first pipeline that derives landing page topics from product content rather than user search logs.
Two-phase, multi-modal system: (1) VLM (GPT-4V) for free-form attribute extraction and human+LLM curation; (2) CLIP-style dual encoder for scalable product-to-attribute matching.
Automated query (collection title) synthesis with GPT-4 and distributed attribute-based feed generation on Apache Spark, deployed to 4.2M landing pages.
Key Findings
Very high attribute retrieval on a public benchmark
Better collection precision than search-log approach
Large production scale and SEO impact
Results
Recall@10
Average collection precision@10
Production landing pages
Search engine index rate
Processing speed improvement
Who Should Care
What To Try In 7 Days
Run GPT-4V (or other VLM) on a small catalog slice to extract candidate attributes.
Curate a compact attribute vocabulary (frequency + deduplication) and sample-check for bias.
Train a lightweight CLIP dual-encoder on the labeled slice and build a few pilot KLPs to measure precision@10 and indexability.
Optimization Features
Infra Optimization
- using 8 A100 GPUs for 12-hour training runs
- memory caching of frequent mappings
Model Optimization
- CLIP dual-encoder fine-tuning
- frequency-based attribute reweighting to correct long-tail smoothing
System Optimization
- distributed matching on Apache Spark
- data partitioning and join optimization
- minimum-product thresholds per collection to guarantee quality
Training Optimization
- FusedAdam optimizer for memory and speed
- pretraining initialization from CLIP encoders
Inference Optimization
- attribute score caching for O(1) lookups
- batching and grouped product processing
Reproducibility
Open Source Status
- no
Risks & Boundaries
Limitations
- Relies on a VLM (GPT-4V) and commercial LLMs; reproducibility is limited without access to these models.
- Attribute-based method struggles to capture emergent cultural or trend-based concepts that are not decomposable into fixed attributes.
- Human review was applied to a subset (~4,000 attributes), leaving open the risk of unchecked biases in long-tail attributes.
When Not To Use
- If you need to surface emergent social trends or highly abstract style labels that require discourse signals.
- When proprietary VLM/LLM access is unavailable or cost-prohibitive.
- For small catalogs where manual curation is already low-cost and highly accurate.
Failure Modes
- Overfitting to merchant-provided text or image styles, causing poor generalization to different merchandising formats.
- Long-tail attribute misweighting if frequency reweighting hyperparameters are not tuned for new domains.
- LLM/VLM hallucinated attributes that pass automated filters but introduce biased or unsafe labels.
Core Entities
Models
- GPT-4V
- GPT-4
- CLIP (dual-encoder)
- CLIPH/14 (1.3B)
- ViT (vision encoder)
- BERT
Metrics
- Recall@10
- Precision@10
- search engine index rate
- topic coverage increase
Datasets
- Fashion200K
- internal 200k fashion product dataset
- production catalog (millions of product pins)
Benchmarks
- Fashion200K

