Skip to content

Matching Models

Matching models are essential components in recommendation systems, used to quickly retrieve candidate sets relevant to users from massive item catalogs. Torch-RecHub provides various advanced matching models covering different retrieval strategies and modeling approaches.

1. DSSM

Description

DSSM (Deep Structured Semantic Models) is a classic two-tower retrieval model that maps users and items to the same vector space and computes vector similarity for retrieval.

Paper Reference

Huang, Po-Sen, et al. "Learning deep structured semantic models for web search using clickthrough data." Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2013.

Core Principles

  • Two-tower Structure: Contains separate neural networks for user tower and item tower
  • Feature Embedding: Maps user and item features to low-dimensional vector space
  • Similarity Computation: Uses cosine similarity or dot product to compute user-item similarity
  • Negative Sampling: Trains model through negative sampling to optimize ranking performance

Usage

python
from torch_rechub.models.matching import DSSM
from torch_rechub.basic.features import SparseFeature, DenseFeature

user_features = [
    SparseFeature(name="user_id", vocab_size=10000, embed_dim=32),
    DenseFeature(name="age", embed_dim=1)
]

item_features = [
    SparseFeature(name="item_id", vocab_size=100000, embed_dim=32),
    SparseFeature(name="category", vocab_size=1000, embed_dim=16)
]

model = DSSM(
    user_features=user_features,
    item_features=item_features,
    temperature=0.02,
    user_params={"dims": [256, 128, 64], "dropout": 0.2, "activation": "prelu"},
    item_params={"dims": [256, 128, 64], "dropout": 0.2, "activation": "prelu"}
)

Parameters

ParameterTypeDescriptionDefault
user_featureslistUser feature listNone
item_featureslistItem feature listNone
temperaturefloatTemperature parameter for similarity distribution0.02
user_paramsdictUser tower network parametersNone
item_paramsdictItem tower network parametersNone

Use Cases

  • Text matching scenarios
  • Large-scale recommendation systems
  • Cold start problems

2. FaceBookDSSM

Description

FaceBookDSSM is a DSSM variant proposed by Facebook, using different network structures and loss functions to further improve retrieval performance.

Core Principles

  • Two-tower Structure: Inherits DSSM's two-tower structure
  • Deep Network: Uses deeper network structure for improved expressiveness
  • Improved Loss Function: Uses improved loss function for better training
  • Feature Engineering: Emphasizes feature engineering, supports multiple feature types

Usage

python
from torch_rechub.models.matching import FaceBookDSSM

model = FaceBookDSSM(
    user_features=user_features,
    item_features=item_features,
    user_params={"dims": [512, 256, 128], "dropout": 0.3, "activation": "relu"},
    item_params={"dims": [512, 256, 128], "dropout": 0.3, "activation": "relu"}
)

Parameters

ParameterTypeDescriptionDefault
user_featureslistUser feature listNone
item_featureslistItem feature listNone
user_paramsdictUser tower network parametersNone
item_paramsdictItem tower network parametersNone

Use Cases

  • Large-scale recommendation systems
  • Ad retrieval
  • Content recommendation

3. YoutubeDNN

Description

YoutubeDNN is a deep retrieval model proposed by YouTube, predicting the next video to watch based on user historical behavior sequences.

Paper Reference

Covington, Paul, Jay Adams, and Emre Sargin. "Deep neural networks for youtube recommendations." Proceedings of the 10th ACM conference on recommender systems. 2016.

Core Principles

  • Sequence Modeling: Uses deep neural networks to model user historical behavior sequences
  • Negative Sampling: Uses negative sampling technique for training efficiency
  • Two-tower Structure: Contains user and item towers, supports offline item embedding precomputation
  • Multi-objective Optimization: Simultaneously optimizes multiple objectives like CTR and watch time

Usage

python
from torch_rechub.models.matching import YoutubeDNN
from torch_rechub.basic.features import SequenceFeature

user_features = [
    SequenceFeature(name="user_history", vocab_size=100000, embed_dim=32, pooling="mean"),
    SparseFeature(name="user_id", vocab_size=10000, embed_dim=16)
]

item_features = [
    SparseFeature(name="item_id", vocab_size=100000, embed_dim=32),
    SparseFeature(name="category", vocab_size=1000, embed_dim=16)
]

model = YoutubeDNN(
    user_features=user_features,
    item_features=item_features,
    user_params={"dims": [256, 128, 64], "dropout": 0.2, "activation": "relu"},
    temperature=0.02
)

Parameters

ParameterTypeDescriptionDefault
user_featureslistUser feature listNone
item_featureslistItem feature listNone
user_paramsdictUser tower network parametersNone
temperaturefloatTemperature parameter0.02

Use Cases

  • Video recommendation
  • Music recommendation
  • Content recommendation
  • History-based recommendation

4. YoutubeSBC

Description

YoutubeSBC (Sample Bias Correction) is an improved deep retrieval model proposed by YouTube that addresses sampling bias issues.

Paper Reference

Wu, Liang, et al. "RecSys 2019 tutorial: Deep learning for recommendations." Proceedings of the 13th ACM Conference on Recommender Systems. 2019.

Core Principles

  • Sample Bias Correction: Introduces sampling bias correction mechanism for better generalization
  • Two-tower Structure: Inherits YoutubeDNN's two-tower structure
  • Improved Training: Uses improved training methods for better performance

Usage

python
from torch_rechub.models.matching import YoutubeSBC

model = YoutubeSBC(
    user_features=user_features,
    item_features=item_features,
    user_params={"dims": [256, 128, 64], "dropout": 0.2, "activation": "relu"},
    temperature=0.02
)

Parameters

ParameterTypeDescriptionDefault
user_featureslistUser feature listNone
item_featureslistItem feature listNone
user_paramsdictUser tower network parametersNone
temperaturefloatTemperature parameter0.02

Use Cases

  • Large-scale recommendation systems
  • Scenarios with severe sampling bias
  • Content recommendation

5. MIND

Description

MIND (Multi-Interest Network with Dynamic Routing) is a multi-interest retrieval model that learns multiple interest representations for each user.

Paper Reference

Chen, Jiaxi, et al. "Multi-interest network with dynamic routing for recommendation at Tmall." Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019.

Core Principles

  • Multi-interest Modeling: Learns multiple interest vectors for each user
  • Dynamic Routing: Uses capsule network's dynamic routing mechanism to adaptively aggregate user interests
  • Interest Evolution: Captures dynamic changes in user interests
  • Two-tower Structure: Supports offline item embedding precomputation

Usage

python
from torch_rechub.models.matching import MIND

user_features = [
    SequenceFeature(name="user_history", vocab_size=100000, embed_dim=32, pooling=None),
    SparseFeature(name="user_id", vocab_size=10000, embed_dim=16)
]

model = MIND(
    user_features=user_features,
    item_features=item_features,
    user_params={"dims": [256, 128], "dropout": 0.2, "activation": "relu"},
    n_items=100000,
    n_interest=4,
    temperature=0.02
)

Parameters

ParameterTypeDescriptionDefault
user_featureslistUser feature listNone
item_featureslistItem feature listNone
user_paramsdictUser tower network parametersNone
n_itemsintTotal number of itemsNone
n_interestintNumber of interests to learn4
temperaturefloatTemperature parameter0.02

Use Cases

  • Diverse user interests scenarios
  • E-commerce recommendation
  • Content recommendation

6. GRU4Rec

Description

GRU4Rec is a GRU-based sequential recommendation model that captures dynamic dependencies in user behavior sequences.

Paper Reference

Hidasi, Balázs, et al. "Session-based recommendations with recurrent neural networks." arXiv preprint arXiv:1511.06939 (2015).

Core Principles

  • GRU Sequence Modeling: Uses GRU to capture dynamic changes in user behavior sequences
  • Session Recommendation: Focuses on within-session recommendation without requiring user history
  • Negative Sampling: Uses negative sampling for training efficiency
  • BPR Loss: Uses BPR loss function for ranking optimization

Usage

python
from torch_rechub.models.matching import GRU4Rec

user_features = [
    SequenceFeature(name="user_history", vocab_size=100000, embed_dim=32, pooling=None)
]

model = GRU4Rec(
    user_features=user_features,
    item_features=item_features,
    embedding_dim=32,
    hidden_size=64,
    num_layers=2,
    dropout=0.2,
    temperature=0.02
)

Parameters

ParameterTypeDescriptionDefault
user_featureslistUser feature listNone
item_featureslistItem feature listNone
embedding_dimintEmbedding dimension32
hidden_sizeintGRU hidden layer size64
num_layersintNumber of GRU layers2
dropoutfloatDropout rate0.2
temperaturefloatTemperature parameter0.02

Use Cases

  • Session recommendation
  • Short sequence recommendation
  • E-commerce scenarios

7. NARM

Description

NARM (Neural Attentive Session-based Recommendation) is an attention-based session recommendation model that captures both local and global interests within sessions.

Paper Reference

Li, Jing, et al. "Neural attentive session-based recommendation." Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2017.

Core Principles

  • GRU Sequence Modeling: Uses GRU to capture sequence dependencies within sessions
  • Attention Mechanism: Introduces attention to capture local interests within sessions
  • Global Representation: Learns global session representation, combining local and global interests
  • Session Recommendation: Focuses on within-session recommendation

Usage

python
from torch_rechub.models.matching import NARM

model = NARM(
    user_features=user_features,
    item_features=item_features,
    embedding_dim=32,
    hidden_size=64,
    num_layers=1,
    dropout=0.2,
    temperature=0.02
)

Parameters

ParameterTypeDescriptionDefault
user_featureslistUser feature listNone
item_featureslistItem feature listNone
embedding_dimintEmbedding dimension32
hidden_sizeintGRU hidden layer size64
num_layersintNumber of GRU layers1
dropoutfloatDropout rate0.2
temperaturefloatTemperature parameter0.02

Use Cases

  • Session recommendation
  • Short sequence recommendation
  • Local and global interest modeling

8. SASRec

Description

SASRec (Self-Attentive Sequential Recommendation) is a self-attention based sequential recommendation model that captures long-range dependencies in user behavior sequences.

Paper Reference

Kang, Wang-Cheng, and Julian McAuley. "Self-attentive sequential recommendation." 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 2018.

Core Principles

  • Self-attention Mechanism: Uses multi-head self-attention to capture long-range dependencies
  • Positional Encoding: Adds positional information to preserve sequence order
  • Layer Normalization: Accelerates convergence and improves training stability
  • Residual Connection: Enhances model expressiveness

Usage

python
from torch_rechub.models.matching import SASRec

model = SASRec(
    user_features=user_features,
    item_features=item_features,
    embedding_dim=32,
    num_heads=4,
    num_layers=2,
    hidden_size=128,
    dropout=0.2,
    temperature=0.02
)

Parameters

ParameterTypeDescriptionDefault
embedding_dimintEmbedding dimension32
num_headsintNumber of attention heads4
num_layersintNumber of Transformer layers2
hidden_sizeintHidden layer size128
dropoutfloatDropout rate0.2
temperaturefloatTemperature parameter0.02

Use Cases

  • Long sequence recommendation
  • Scenarios where user behavior sequences are important
  • Sequential recommendation tasks

9. SINE

Description

SINE (Sparse Interest Network for Sequential Recommendation) is a sparse interest network that effectively models users' sparse interests.

Paper Reference

Chen, Jiaxi, et al. "SINE: A sparse interest network for sequential recommendation." Proceedings of the 14th ACM Conference on Recommender Systems. 2021.

Core Principles

  • Sparse Interest Modeling: Specifically handles sparse user interest problems
  • Dynamic Routing: Uses dynamic routing mechanism to adaptively aggregate user interests
  • Interest Evolution: Captures dynamic changes in user interests
  • Efficient Computation: Optimized for computational efficiency, suitable for large-scale data

Usage

python
from torch_rechub.models.matching import SINE

model = SINE(
    user_features=user_features,
    item_features=item_features,
    embedding_dim=32,
    num_heads=4,
    num_layers=2,
    hidden_size=128,
    dropout=0.2,
    n_interest=4,
    temperature=0.02
)

Parameters

ParameterTypeDescriptionDefault
user_featureslistUser feature listNone
item_featureslistItem feature listNone
embedding_dimintEmbedding dimension32
num_headsintNumber of attention heads4
num_layersintNumber of Transformer layers2
hidden_sizeintHidden layer size128
dropoutfloatDropout rate0.2
n_interestintNumber of interests4
temperaturefloatTemperature parameter0.02

Use Cases

  • Scenarios with sparse user interests
  • Large-scale recommendation systems
  • E-commerce recommendation

10. STAMP

Description

STAMP (Short-Term Attention/Memory Priority Model) is an attention-based session recommendation model that focuses on modeling recent user behavior.

Paper Reference

Liu, Qiao, et al. "STAMP: short-term attention/memory priority model for session-based recommendation." Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018.

Core Principles

  • Short-term Attention: Focuses on recent user behavior, giving higher weight to recent actions
  • Memory Module: Maintains a memory vector to capture global session information
  • Session Recommendation: Focuses on within-session recommendation
  • Simple and Efficient: Simple model structure with high computational efficiency

Usage

python
from torch_rechub.models.matching import STAMP

model = STAMP(
    user_features=user_features,
    item_features=item_features,
    embedding_dim=32,
    hidden_size=128,
    dropout=0.2,
    temperature=0.02
)

Parameters

ParameterTypeDescriptionDefault
user_featureslistUser feature listNone
item_featureslistItem feature listNone
embedding_dimintEmbedding dimension32
hidden_sizeintHidden layer size128
dropoutfloatDropout rate0.2
temperaturefloatTemperature parameter0.02

Use Cases

  • Session recommendation
  • Short-term interest modeling
  • E-commerce scenarios

11. ComirecDR

Description

ComirecDR (Controllable Multi-Interest Recommendation with Dynamic Routing) is a controllable multi-interest recommendation model that allows controlling the number of generated interests.

Paper Reference

Chen, Jiaxi, et al. "Controllable multi-interest framework for recommendation." Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020.

Core Principles

  • Controllable Multi-Interest: Allows controlling the number of generated interests
  • Dynamic Routing: Uses dynamic routing mechanism to adaptively aggregate user interests
  • Two-tower Structure: Supports offline item embedding precomputation
  • Efficient Computation: Optimized for computational efficiency, suitable for large-scale data

Usage

python
from torch_rechub.models.matching import ComirecDR

model = ComirecDR(
    user_features=user_features,
    item_features=item_features,
    embedding_dim=32,
    hidden_size=128,
    num_layers=2,
    n_interest=4,
    dropout=0.2,
    temperature=0.02
)

Parameters

ParameterTypeDescriptionDefault
user_featureslistUser feature listNone
item_featureslistItem feature listNone
embedding_dimintEmbedding dimension32
hidden_sizeintHidden layer size128
num_layersintNumber of layers2
n_interestintNumber of interests4
dropoutfloatDropout rate0.2
temperaturefloatTemperature parameter0.02

Use Cases

  • Controllable multi-interest recommendation
  • Scenarios with diverse user interests
  • Large-scale recommendation systems

12. ComirecSA

Description

ComirecSA (Controllable Multi-Interest Recommendation with Self-Attention) is the self-attention version of Comirec, using self-attention mechanism to model user interests.

Core Principles

  • Self-attention Mechanism: Uses self-attention to capture dependencies in user behavior sequences
  • Controllable Multi-Interest: Allows controlling the number of generated interests
  • Two-tower Structure: Supports offline item embedding precomputation
  • Efficient Computation: Optimized for computational efficiency, suitable for large-scale data

Usage

python
from torch_rechub.models.matching import ComirecSA

model = ComirecSA(
    user_features=user_features,
    item_features=item_features,
    embedding_dim=32,
    num_heads=4,
    num_layers=2,
    hidden_size=128,
    n_interest=4,
    dropout=0.2,
    temperature=0.02
)

Parameters

ParameterTypeDescriptionDefault
user_featureslistUser feature listNone
item_featureslistItem feature listNone
embedding_dimintEmbedding dimension32
num_headsintNumber of attention heads4
num_layersintNumber of Transformer layers2
hidden_sizeintHidden layer size128
n_interestintNumber of interests4
dropoutfloatDropout rate0.2
temperaturefloatTemperature parameter0.02

Use Cases

  • Controllable multi-interest recommendation
  • Long sequence interest modeling
  • Large-scale recommendation systems

13. Model Comparison

ModelComplexityExpressivenessEfficiencyUse Cases
DSSMLowMediumHighText matching, cold start
FaceBookDSSMMediumHighMediumLarge-scale recommendation, ad retrieval
YoutubeDNNMediumHighMediumVideo recommendation, content recommendation
YoutubeSBCMediumHighMediumLarge-scale recommendation, sampling bias scenarios
MINDMediumHighMediumMulti-interest recommendation, e-commerce
GRU4RecMediumMediumHighSession recommendation, short sequence
NARMMediumMediumMediumSession recommendation, local/global interests
SASRecHighHighLowLong sequence recommendation, sequential recommendation
SINEHighHighMediumSparse interest modeling, large-scale recommendation
STAMPLowMediumHighSession recommendation, short-term interests
ComirecDRMediumHighMediumControllable multi-interest, large-scale recommendation
ComirecSAHighHighMediumControllable multi-interest, long sequence recommendation

14. Usage Recommendations

  1. Choose models based on data characteristics:

    • For long sequence data, use SASRec, ComirecSA
    • For session data, use GRU4Rec, NARM, STAMP
    • For multi-interest scenarios, use MIND, ComirecDR, ComirecSA
  2. Choose models based on computational resources:

    • With limited resources, use DSSM, GRU4Rec, STAMP
    • With sufficient resources, try more complex models like SASRec, SINE
  3. Consider business requirements:

    • For controllable multi-interest, use ComirecDR, ComirecSA
    • For sparse interest handling, use SINE
    • For sampling bias correction, use YoutubeSBC
  4. Try combining multiple retrieval strategies:

    • Different models may capture different user interests; model fusion can improve overall retrieval performance
    • Combine content-based retrieval, collaborative filtering, and other strategies

15. Complete Training Example

python
from torch_rechub.models.matching import DSSM
from torch_rechub.trainers import MatchTrainer
from torch_rechub.utils.data import MatchDataGenerator
from torch_rechub.basic.features import SparseFeature, DenseFeature

# 1. Define features
user_features = [
    SparseFeature(name="user_id", vocab_size=10000, embed_dim=32),
    DenseFeature(name="age", embed_dim=1),
    SparseFeature(name="gender", vocab_size=3, embed_dim=8)
]

item_features = [
    SparseFeature(name="item_id", vocab_size=100000, embed_dim=32),
    SparseFeature(name="category", vocab_size=1000, embed_dim=16)
]

# 2. Prepare data
# Assume x and y are preprocessed feature and label data
# x contains user features and item features
x = {
    "user_id": user_id_data,
    "age": age_data,
    "gender": gender_data,
    "item_id": item_id_data,
    "category": category_data
}
y = label_data  # click/no-click labels

# Test user data and all item data
x_test_user = {
    "user_id": test_user_id_data,
    "age": test_age_data,
    "gender": test_gender_data
}
x_all_item = {
    "item_id": all_item_id_data,
    "category": all_item_category_data
}

# 3. Create data generator
dg = MatchDataGenerator(x, y)
train_dl, test_dl, item_dl = dg.generate_dataloader(
    x_test_user=x_test_user,
    x_all_item=x_all_item,
    batch_size=256,
    num_workers=8
)

# 4. Create model
model = DSSM(
    user_features=user_features,
    item_features=item_features,
    temperature=0.02,
    user_params={"dims": [256, 128, 64], "activation": "prelu"},
    item_params={"dims": [256, 128, 64], "activation": "prelu"}
)

# 5. Create trainer
trainer = MatchTrainer(
    model=model,
    mode=0,  # 0: point-wise, 1: pair-wise, 2: list-wise
    optimizer_params={"lr": 0.001, "weight_decay": 0.0001},
    n_epoch=50,
    earlystop_patience=10,
    device="cuda:0",
    model_path="saved/dssm"
)

# 6. Train model
trainer.fit(train_dl, test_dl)

# 7. Export ONNX model
# Export user tower
trainer.export_onnx("user_tower.onnx", mode="user")
# Export item tower
trainer.export_onnx("item_tower.onnx", mode="item")

# 8. Vector retrieval example
# Generate user embeddings
user_embeddings = trainer.inference_embedding(
    model, mode="user", data_loader=test_dl, model_path="saved/dssm"
)
# Generate item embeddings
item_embeddings = trainer.inference_embedding(
    model, mode="item", data_loader=item_dl, model_path="saved/dssm"
)

# Use Annoy or Faiss for vector indexing and retrieval
# Here's an example using Annoy
from annoy import AnnoyIndex

# Create index
index = AnnoyIndex(64, 'angular')  # 64 is the embedding dimension
for i, embedding in enumerate(item_embeddings):
    index.add_item(i, embedding.tolist())
index.build(10)  # 10 trees

# Retrieval example
user_idx = 0
user_emb = user_embeddings[user_idx].tolist()
recall_results = index.get_nns_by_vector(user_emb, 10)  # Retrieve top 10 items
print(f"User {user_idx} retrieved items: {recall_results}")

16. FAQ

Q: How to handle large-scale item sets?

A: Try the following approaches:

  • Use two-tower structure to support offline item embedding precomputation
  • Use approximate nearest neighbor search libraries (e.g., Annoy, Faiss) to accelerate vector retrieval
  • Adopt hierarchical retrieval strategy: coarse retrieval first, then fine retrieval

Q: How to handle cold start problems?

A: Try the following approaches:

  • For new users, use content-based retrieval
  • For new items, use collaborative filtering or content-based retrieval
  • Use transfer learning to transfer knowledge from related domains

Q: How to evaluate matching model performance?

A: Common retrieval evaluation metrics include:

  • Recall@K: Proportion of relevant items in top K results
  • Precision@K: Proportion of relevant items in top K results
  • NDCG@K: Retrieval quality considering ranking
  • Hit@K: Whether at least one relevant item is retrieved
  • MRR@K: Mean Reciprocal Rank

Q: How to choose the right negative sampling strategy?

A: Common negative sampling strategies include:

  • Random negative sampling: Simple and efficient, but may sample irrelevant items
  • Popularity-based negative sampling: Samples based on item popularity, more realistic
  • Hard negative sampling: Samples negatives similar to positives, improves model discrimination
  • Contrastive learning negative sampling: Methods like MoCo, SimCLR

Q: How to optimize matching model performance?

A: Try the following approaches:

  • Increase model depth and width to improve expressiveness
  • Use more advanced feature engineering
  • Optimize negative sampling strategy
  • Try model fusion
  • Adjust temperature parameter to optimize similarity distribution

17. Deployment Suggestions

  1. Offline Precomputation: For two-tower models, precompute item embeddings offline to reduce online computation
  2. Vector Indexing: Use efficient vector indexing libraries (e.g., Faiss) to store item embeddings and accelerate online retrieval
  3. Hierarchical Deployment: Adopt hierarchical retrieval architecture - coarse retrieval with simple models first, then fine retrieval with complex models
  4. Caching Mechanism: Cache retrieval results for high-frequency users to reduce redundant computation
  5. Real-time Updates: Periodically update item embeddings to ensure retrieval result freshness
  6. A/B Testing: Validate different retrieval strategies through A/B testing

Matching models are essential components of recommendation systems. Choosing the right matching model and optimizing it can significantly improve overall recommendation performance. Torch-RecHub provides a rich set of matching models covering different retrieval strategies, making it easy for developers to select and use based on business requirements.