Models API Reference
This section provides detailed API documentation for all models in Torch-RecHub.
Recall Models
Recall models are primarily used in the recall stage for quick retrieval of relevant items from massive candidate sets. They typically adopt two-tower or sequential model structures to meet the efficiency requirements of the recall stage.
Two-Tower Model Series
DSSM (Deep Structured Semantic Model)
- Introduction: Originally proposed by Microsoft for semantic matching and later widely applied in recommender systems. Adopts classic two-tower structure that separately represents users and items, computing similarity through inner product. This structure allows pre-computation of item vectors during online serving, greatly improving service efficiency. The key lies in learning effective user and item representations.
- Parameters:
user_features
(list): List of user featuresitem_features
(list): List of item featureshidden_units
(list): List of hidden layer unitsdropout_rates
(list): List of dropout ratesembedding_dim
(int): Final representation vector dimension
Facebook DSSM
- Introduction: Facebook's improved version of DSSM that incorporates multi-task learning framework. Besides the main recall task, it adds auxiliary tasks to help learn better feature representations. The model can simultaneously optimize multiple related objectives like clicks, favorites, purchases, etc., learning richer user and item representations.
- Parameters:
user_features
(list): List of user featuresitem_features
(list): List of item featureshidden_units
(list): List of hidden layer unitsnum_tasks
(int): Number of taskstask_types
(list): List of task types
YouTube DNN
- Introduction: A deep recall model proposed by YouTube, designed for large-scale video recommendation scenarios. The model aggregates user viewing history through average pooling and combines it with other user features. Innovatively introduces negative sampling techniques and multi-task learning framework to improve training efficiency and effectiveness.
- Parameters:
user_features
(list): List of user featuresitem_features
(list): List of item featureshidden_units
(list): List of hidden layer unitsembedding_dim
(int): Embedding dimensionmax_seq_len
(int): Maximum sequence length
Sequential Recommendation Series
GRU4Rec
- Introduction: A pioneering work that first applied GRU networks to session-based sequential recommendation. Through GRU network, it captures temporal dependencies in user behavior sequences, with hidden states at each time step containing information about historical behaviors. The model also introduces special mini-batch construction methods and loss function designs to adapt to the characteristics of sequential recommendation.
- Parameters:
item_num
(int): Total number of itemshidden_size
(int): Size of GRU hidden layernum_layers
(int): Number of GRU layersdropout_rate
(float): Dropout rateembedding_dim
(int): Item embedding dimension
NARM (Neural Attentive Recommendation Machine)
- Introduction: A sequential recommendation model that introduces attention mechanism on top of GRU4Rec. Through attention mechanism, the model can dynamically focus on relevant behaviors in the sequence based on the current prediction target. It maintains both global and local sequence representations, comprehensively capturing user's short-term interests. This design enables better handling of user interest diversity and dynamics.
- Parameters:
item_num
(int): Total number of itemshidden_size
(int): Size of hidden layerattention_size
(int): Size of attention layerdropout_rate
(float): Dropout rateembedding_dim
(int): Item embedding dimension
SASRec (Self-Attentive Sequential Recommendation)
- Introduction: A representative work that applies Transformer structure to sequential recommendation. Through self-attention mechanism, the model can directly compute and learn relationships between any two behaviors in the sequence, unrestricted by RNN's inherent sequential dependencies. Position encoding helps preserve temporal information of behaviors, while multi-layer structure allows the model to extract increasingly abstract behavior patterns layer by layer. Compared to RNN-based models, it offers better parallelism and scalability.
- Parameters:
item_num
(int): Total number of itemsmax_len
(int): Maximum sequence lengthnum_heads
(int): Number of attention headsnum_layers
(int): Number of Transformer layershidden_size
(int): Hidden layer dimensiondropout_rate
(float): Dropout rate
MIND (Multi-Interest Network with Dynamic routing)
- Introduction: A recall model designed for user's diverse interests. Through capsule network and dynamic routing mechanism, it extracts multiple interest vectors from user's behavior sequence. Each interest vector represents user preferences in different aspects, providing a more comprehensive characterization of user interest distribution.
- Parameters:
item_num
(int): Total number of itemsnum_interests
(int): Number of interest vectorsrouting_iterations
(int): Number of dynamic routing iterationshidden_size
(int): Hidden layer dimensionembedding_dim
(int): Item embedding dimension
Ranking Models
Ranking models are primarily used in the fine-ranking stage to precisely rank candidate items. They learn complex interactions between users and items through deep learning methods to generate final ranking scores.
Wide & Deep Series
WideDeep
- Introduction: A classic model proposed by Google in 2016 that combines the advantages of linear models and deep neural networks. The Wide part performs memorization through feature crosses, suitable for modeling direct, explicit feature correlations; the Deep part performs generalization through deep networks, capable of learning implicit, high-order feature relationships. This combination allows the model to both memorize historical patterns and generalize to new patterns.
- Parameters:
wide_features
(list): List of features for the wide part, used in linear layerdeep_features
(list): List of features for the deep part, used in deep networkhidden_units
(list): List of hidden layer units for the deep network, e.g., [256, 128, 64]dropout_rates
(list): Dropout rates for each layer, used for preventing overfitting
DeepFM
- Introduction: A model that combines Factorization Machines (FM) feature interactions with deep learning models. The FM part efficiently models second-order feature interactions, while the Deep part learns high-order feature relationships. Compared to Wide&Deep, DeepFM doesn't require manual feature engineering and can automatically learn feature crosses. The model consists of three parts: first-order features, FM's second-order interactions, and deep network's high-order interactions.
- Parameters:
features
(list): List of featureshidden_units
(list): Hidden layer units for DNN partdropout_rates
(list): List of dropout ratesembedding_dim
(int): Feature embedding dimension
DCN / DCN-V2
- Introduction: Learns feature interactions explicitly through specially designed Cross Network layers. Each cross layer performs interactions between feature vectors and their original form, increasing the degree of feature crossing as the depth increases. DCN-V2 improves the cross network parameterization, offering both "vector" and "matrix" options, maintaining model expressiveness while improving efficiency.
- Parameters:
features
(list): List of featurescross_num
(int): Number of cross layershidden_units
(list): Hidden layer units for DNN partcross_parameterization
(str, DCN-V2): Cross parameterization method, "vector" or "matrix"
AFM (Attentional Factorization Machine)
- Introduction: Introduces attention mechanism to FM, assigning different importance weights to different feature interactions. Through the attention network, it adaptively learns the importance of feature interactions, identifying feature combinations that are more relevant to the prediction target.
- Parameters:
features
(list): List of featuresattention_units
(list): Hidden layer units for attention networkembedding_dim
(int): Feature embedding dimensiondropout_rate
(float): Dropout rate for attention network
FiBiNET (Feature Importance and Bilinear feature Interaction Network)
- Introduction: Dynamically learns feature importance through SENET mechanism and uses bilinear layers for feature interaction. The SENET module helps identify important features, while bilinear interaction provides richer feature interaction methods than inner products.
- Parameters:
features
(list): List of featuresbilinear_type
(str): Bilinear layer type, options: "field_all"/"field_each"/"field_interaction"hidden_units
(list): Hidden layer units for DNN partreduction_ratio
(int): Reduction ratio for SENET module
Attention-based Series
DIN (Deep Interest Network)
- Introduction: A model designed for user interest diversity, using attention mechanism for adaptive learning of user historical behaviors. The model dynamically calculates relevance weights of user historical behaviors based on the current candidate ad, thereby activating relevant user interests and capturing diverse user preferences. It innovatively introduced attention mechanism to recommender systems, pioneering a new paradigm for behavior sequence modeling.
- Parameters:
features
(list): List of base featuresbehavior_features
(list): List of behavior features for attention calculationattention_units
(list): Hidden layer units for attention networkhidden_units
(list): Hidden layer units for DNN partactivation
(str): Activation function type
DIEN (Deep Interest Evolution Network)
- Introduction: An advanced version of DIN that models the dynamic evolution of user interests through interest evolution layer. It uses GRU structure to capture interest evolution and innovatively designs AUGRU (GRU with Attentional Update Gate) to make the interest evolution process aware of target items. It also includes auxiliary loss to supervise the training of interest extraction layer. This design not only captures the dynamic changes of user interests but also models the temporal dependencies of interests.
- Parameters:
features
(list): List of base featuresbehavior_features
(list): List of behavior featuresinterest_units
(list): Units for interest extraction layergru_type
(str): GRU type, "AUGRU" or "AIGRU"hidden_units
(list): Hidden layer units for DNN part
BST (Behavior Sequence Transformer)
- Introduction: A pioneering work that introduces Transformer architecture to recommender systems for modeling user behavior sequences. Through self-attention mechanism, the model can directly compute relationships between any two behaviors in the sequence, overcoming the limitations of RNN models in processing long sequences. Position embedding helps the model perceive temporal information of behaviors, while multi-head attention allows the model to understand user behavior patterns from multiple perspectives.
- Parameters:
features
(list): List of base featuresbehavior_features
(list): List of behavior featuresnum_heads
(int): Number of attention headsnum_layers
(int): Number of Transformer layershidden_size
(int): Hidden layer dimensiondropout_rate
(float): Dropout rate
EDCN (Enhancing Explicit and Implicit Feature Interactions)
- Introduction: A deep cross network that enhances both explicit and implicit feature interactions. Through a newly designed cross network structure, it considers both explicit and implicit feature interactions. Introduces gating mechanism to regulate the importance of different orders of feature interactions and uses residual connections to alleviate training issues in deep networks.
- Parameters:
features
(list): List of featurescross_num
(int): Number of cross layershidden_units
(list): Hidden layer units for DNN partgate_type
(str): Gate type, "FGU" or "BGU"
Multi-task Models
Multi-task models learn multiple related tasks jointly to achieve knowledge sharing and transfer, improving overall model performance.
SharedBottom
- Introduction: The most basic multi-task learning model that shares parameters in bottom network for extracting common feature representations. The shared layers learn common features across tasks, while task-specific layers learn individualized features for each task. This simple yet effective structure laid the foundation for multi-task learning.
- Parameters:
features
(list): List of featureshidden_units
(list): Hidden layer units for shared networktask_hidden_units
(list): Hidden layer units for task-specific networksnum_tasks
(int): Number of taskstask_types
(list): List of task types
ESMM (Entire Space Multi-Task Model)
- Introduction: An innovative multi-task model proposed by Alibaba specifically designed to address sample selection bias in recommender systems. Through joint modeling of CVR and CTR tasks, it performs parameter learning in the entire space. The core innovation lies in introducing CTR as an auxiliary task and optimizing CVR estimation through task multiplication relationship. This design not only solves the sample selection bias in traditional CVR estimation but also provides unbiased CTR and CTCVR estimation.
- Parameters:
features
(list): List of featureshidden_units
(list): List of hidden layer unitstower_units
(list): List of task tower layer unitsembedding_dim
(int): Feature embedding dimension
MMoE (Multi-gate Mixture-of-Experts)
- Introduction: A multi-task learning model proposed by Google that achieves soft parameter sharing through expert mechanism and task-related gating networks. Each expert network can learn specific feature transformations, while gating networks dynamically allocate expert importance for each task. This design allows the model to flexibly combine expert knowledge based on task requirements, effectively handling task differences.
- Parameters:
features
(list): List of featuresexpert_units
(list): Hidden layer units for expert networksnum_experts
(int): Number of expertsnum_tasks
(int): Number of tasksexpert_activation
(str): Activation function for expert networksgate_activation
(str): Activation function for gate networks
PLE (Progressive Layered Extraction)
- Introduction: An improved version of MMoE that better models task relationships through progressive layered extraction. Introduces the concept of task-specific experts and shared experts, implementing progressive feature extraction through multi-level expert networks. Each layer contains both task-specific experts and shared experts, allowing the model to learn both commonalities and individualities of tasks. This progressive design enhances the model's ability for knowledge extraction and transfer.
- Parameters:
features
(list): List of featuresexpert_units
(list): Units for expert networksnum_experts
(int): Number of experts per layernum_layers
(int): Number of layersnum_shared_experts
(int): Number of shared expertstask_types
(list): List of task types