Skip to content

Models API Reference

This section provides detailed API documentation for all models in Torch-RecHub.

Recall Models

Recall models are primarily used in the recall stage for quick retrieval of relevant items from massive candidate sets. They typically adopt two-tower or sequential model structures to meet the efficiency requirements of the recall stage.

Two-Tower Model Series

DSSM (Deep Structured Semantic Model)

  • Introduction: Originally proposed by Microsoft for semantic matching and later widely applied in recommender systems. Adopts classic two-tower structure that separately represents users and items, computing similarity through inner product. This structure allows pre-computation of item vectors during online serving, greatly improving service efficiency. The key lies in learning effective user and item representations.
  • Parameters:
  • user_features (list): List of user features
  • item_features (list): List of item features
  • hidden_units (list): List of hidden layer units
  • dropout_rates (list): List of dropout rates
  • embedding_dim (int): Final representation vector dimension

Facebook DSSM

  • Introduction: Facebook's improved version of DSSM that incorporates multi-task learning framework. Besides the main recall task, it adds auxiliary tasks to help learn better feature representations. The model can simultaneously optimize multiple related objectives like clicks, favorites, purchases, etc., learning richer user and item representations.
  • Parameters:
  • user_features (list): List of user features
  • item_features (list): List of item features
  • hidden_units (list): List of hidden layer units
  • num_tasks (int): Number of tasks
  • task_types (list): List of task types

YouTube DNN

  • Introduction: A deep recall model proposed by YouTube, designed for large-scale video recommendation scenarios. The model aggregates user viewing history through average pooling and combines it with other user features. Innovatively introduces negative sampling techniques and multi-task learning framework to improve training efficiency and effectiveness.
  • Parameters:
  • user_features (list): List of user features
  • item_features (list): List of item features
  • hidden_units (list): List of hidden layer units
  • embedding_dim (int): Embedding dimension
  • max_seq_len (int): Maximum sequence length

Sequential Recommendation Series

GRU4Rec

  • Introduction: A pioneering work that first applied GRU networks to session-based sequential recommendation. Through GRU network, it captures temporal dependencies in user behavior sequences, with hidden states at each time step containing information about historical behaviors. The model also introduces special mini-batch construction methods and loss function designs to adapt to the characteristics of sequential recommendation.
  • Parameters:
  • item_num (int): Total number of items
  • hidden_size (int): Size of GRU hidden layer
  • num_layers (int): Number of GRU layers
  • dropout_rate (float): Dropout rate
  • embedding_dim (int): Item embedding dimension

NARM (Neural Attentive Recommendation Machine)

  • Introduction: A sequential recommendation model that introduces attention mechanism on top of GRU4Rec. Through attention mechanism, the model can dynamically focus on relevant behaviors in the sequence based on the current prediction target. It maintains both global and local sequence representations, comprehensively capturing user's short-term interests. This design enables better handling of user interest diversity and dynamics.
  • Parameters:
  • item_num (int): Total number of items
  • hidden_size (int): Size of hidden layer
  • attention_size (int): Size of attention layer
  • dropout_rate (float): Dropout rate
  • embedding_dim (int): Item embedding dimension

SASRec (Self-Attentive Sequential Recommendation)

  • Introduction: A representative work that applies Transformer structure to sequential recommendation. Through self-attention mechanism, the model can directly compute and learn relationships between any two behaviors in the sequence, unrestricted by RNN's inherent sequential dependencies. Position encoding helps preserve temporal information of behaviors, while multi-layer structure allows the model to extract increasingly abstract behavior patterns layer by layer. Compared to RNN-based models, it offers better parallelism and scalability.
  • Parameters:
  • item_num (int): Total number of items
  • max_len (int): Maximum sequence length
  • num_heads (int): Number of attention heads
  • num_layers (int): Number of Transformer layers
  • hidden_size (int): Hidden layer dimension
  • dropout_rate (float): Dropout rate

MIND (Multi-Interest Network with Dynamic routing)

  • Introduction: A recall model designed for user's diverse interests. Through capsule network and dynamic routing mechanism, it extracts multiple interest vectors from user's behavior sequence. Each interest vector represents user preferences in different aspects, providing a more comprehensive characterization of user interest distribution.
  • Parameters:
  • item_num (int): Total number of items
  • num_interests (int): Number of interest vectors
  • routing_iterations (int): Number of dynamic routing iterations
  • hidden_size (int): Hidden layer dimension
  • embedding_dim (int): Item embedding dimension

Ranking Models

Ranking models are primarily used in the fine-ranking stage to precisely rank candidate items. They learn complex interactions between users and items through deep learning methods to generate final ranking scores.

Wide & Deep Series

WideDeep

  • Introduction: A classic model proposed by Google in 2016 that combines the advantages of linear models and deep neural networks. The Wide part performs memorization through feature crosses, suitable for modeling direct, explicit feature correlations; the Deep part performs generalization through deep networks, capable of learning implicit, high-order feature relationships. This combination allows the model to both memorize historical patterns and generalize to new patterns.
  • Parameters:
  • wide_features (list): List of features for the wide part, used in linear layer
  • deep_features (list): List of features for the deep part, used in deep network
  • hidden_units (list): List of hidden layer units for the deep network, e.g., [256, 128, 64]
  • dropout_rates (list): Dropout rates for each layer, used for preventing overfitting

DeepFM

  • Introduction: A model that combines Factorization Machines (FM) feature interactions with deep learning models. The FM part efficiently models second-order feature interactions, while the Deep part learns high-order feature relationships. Compared to Wide&Deep, DeepFM doesn't require manual feature engineering and can automatically learn feature crosses. The model consists of three parts: first-order features, FM's second-order interactions, and deep network's high-order interactions.
  • Parameters:
  • features (list): List of features
  • hidden_units (list): Hidden layer units for DNN part
  • dropout_rates (list): List of dropout rates
  • embedding_dim (int): Feature embedding dimension

DCN / DCN-V2

  • Introduction: Learns feature interactions explicitly through specially designed Cross Network layers. Each cross layer performs interactions between feature vectors and their original form, increasing the degree of feature crossing as the depth increases. DCN-V2 improves the cross network parameterization, offering both "vector" and "matrix" options, maintaining model expressiveness while improving efficiency.
  • Parameters:
  • features (list): List of features
  • cross_num (int): Number of cross layers
  • hidden_units (list): Hidden layer units for DNN part
  • cross_parameterization (str, DCN-V2): Cross parameterization method, "vector" or "matrix"

AFM (Attentional Factorization Machine)

  • Introduction: Introduces attention mechanism to FM, assigning different importance weights to different feature interactions. Through the attention network, it adaptively learns the importance of feature interactions, identifying feature combinations that are more relevant to the prediction target.
  • Parameters:
  • features (list): List of features
  • attention_units (list): Hidden layer units for attention network
  • embedding_dim (int): Feature embedding dimension
  • dropout_rate (float): Dropout rate for attention network

FiBiNET (Feature Importance and Bilinear feature Interaction Network)

  • Introduction: Dynamically learns feature importance through SENET mechanism and uses bilinear layers for feature interaction. The SENET module helps identify important features, while bilinear interaction provides richer feature interaction methods than inner products.
  • Parameters:
  • features (list): List of features
  • bilinear_type (str): Bilinear layer type, options: "field_all"/"field_each"/"field_interaction"
  • hidden_units (list): Hidden layer units for DNN part
  • reduction_ratio (int): Reduction ratio for SENET module

Attention-based Series

DIN (Deep Interest Network)

  • Introduction: A model designed for user interest diversity, using attention mechanism for adaptive learning of user historical behaviors. The model dynamically calculates relevance weights of user historical behaviors based on the current candidate ad, thereby activating relevant user interests and capturing diverse user preferences. It innovatively introduced attention mechanism to recommender systems, pioneering a new paradigm for behavior sequence modeling.
  • Parameters:
  • features (list): List of base features
  • behavior_features (list): List of behavior features for attention calculation
  • attention_units (list): Hidden layer units for attention network
  • hidden_units (list): Hidden layer units for DNN part
  • activation (str): Activation function type

DIEN (Deep Interest Evolution Network)

  • Introduction: An advanced version of DIN that models the dynamic evolution of user interests through interest evolution layer. It uses GRU structure to capture interest evolution and innovatively designs AUGRU (GRU with Attentional Update Gate) to make the interest evolution process aware of target items. It also includes auxiliary loss to supervise the training of interest extraction layer. This design not only captures the dynamic changes of user interests but also models the temporal dependencies of interests.
  • Parameters:
  • features (list): List of base features
  • behavior_features (list): List of behavior features
  • interest_units (list): Units for interest extraction layer
  • gru_type (str): GRU type, "AUGRU" or "AIGRU"
  • hidden_units (list): Hidden layer units for DNN part

BST (Behavior Sequence Transformer)

  • Introduction: A pioneering work that introduces Transformer architecture to recommender systems for modeling user behavior sequences. Through self-attention mechanism, the model can directly compute relationships between any two behaviors in the sequence, overcoming the limitations of RNN models in processing long sequences. Position embedding helps the model perceive temporal information of behaviors, while multi-head attention allows the model to understand user behavior patterns from multiple perspectives.
  • Parameters:
  • features (list): List of base features
  • behavior_features (list): List of behavior features
  • num_heads (int): Number of attention heads
  • num_layers (int): Number of Transformer layers
  • hidden_size (int): Hidden layer dimension
  • dropout_rate (float): Dropout rate

EDCN (Enhancing Explicit and Implicit Feature Interactions)

  • Introduction: A deep cross network that enhances both explicit and implicit feature interactions. Through a newly designed cross network structure, it considers both explicit and implicit feature interactions. Introduces gating mechanism to regulate the importance of different orders of feature interactions and uses residual connections to alleviate training issues in deep networks.
  • Parameters:
  • features (list): List of features
  • cross_num (int): Number of cross layers
  • hidden_units (list): Hidden layer units for DNN part
  • gate_type (str): Gate type, "FGU" or "BGU"

Multi-task Models

Multi-task models learn multiple related tasks jointly to achieve knowledge sharing and transfer, improving overall model performance.

SharedBottom

  • Introduction: The most basic multi-task learning model that shares parameters in bottom network for extracting common feature representations. The shared layers learn common features across tasks, while task-specific layers learn individualized features for each task. This simple yet effective structure laid the foundation for multi-task learning.
  • Parameters:
  • features (list): List of features
  • hidden_units (list): Hidden layer units for shared network
  • task_hidden_units (list): Hidden layer units for task-specific networks
  • num_tasks (int): Number of tasks
  • task_types (list): List of task types

ESMM (Entire Space Multi-Task Model)

  • Introduction: An innovative multi-task model proposed by Alibaba specifically designed to address sample selection bias in recommender systems. Through joint modeling of CVR and CTR tasks, it performs parameter learning in the entire space. The core innovation lies in introducing CTR as an auxiliary task and optimizing CVR estimation through task multiplication relationship. This design not only solves the sample selection bias in traditional CVR estimation but also provides unbiased CTR and CTCVR estimation.
  • Parameters:
  • features (list): List of features
  • hidden_units (list): List of hidden layer units
  • tower_units (list): List of task tower layer units
  • embedding_dim (int): Feature embedding dimension

MMoE (Multi-gate Mixture-of-Experts)

  • Introduction: A multi-task learning model proposed by Google that achieves soft parameter sharing through expert mechanism and task-related gating networks. Each expert network can learn specific feature transformations, while gating networks dynamically allocate expert importance for each task. This design allows the model to flexibly combine expert knowledge based on task requirements, effectively handling task differences.
  • Parameters:
  • features (list): List of features
  • expert_units (list): Hidden layer units for expert networks
  • num_experts (int): Number of experts
  • num_tasks (int): Number of tasks
  • expert_activation (str): Activation function for expert networks
  • gate_activation (str): Activation function for gate networks

PLE (Progressive Layered Extraction)

  • Introduction: An improved version of MMoE that better models task relationships through progressive layered extraction. Introduces the concept of task-specific experts and shared experts, implementing progressive feature extraction through multi-level expert networks. Each layer contains both task-specific experts and shared experts, allowing the model to learn both commonalities and individualities of tasks. This progressive design enhances the model's ability for knowledge extraction and transfer.
  • Parameters:
  • features (list): List of features
  • expert_units (list): Units for expert networks
  • num_experts (int): Number of experts per layer
  • num_layers (int): Number of layers
  • num_shared_experts (int): Number of shared experts
  • task_types (list): List of task types