Basic Components API Reference
This document provides detailed documentation for basic components in Torch-RecHub, including feature processing, data transformation, and other fundamental functionalities.
Feature Processing
Feature Columns
DenseFeature
- Introduction: Process continuous numerical features.
- Parameters:
name
(str): Feature namedimension
(int): Feature dimensiondtype
(str): Data type, default 'float32'
SparseFeature
- Introduction: Process discrete categorical features.
- Parameters:
name
(str): Feature namevocabulary_size
(int): Size of category vocabularyembedding_dim
(int): Embedding vector dimensiondtype
(str): Data type, default 'int32'embedding_name
(str): Embedding layer name, default None
VarLenSparseFeature
- Introduction: Process variable-length discrete features.
- Parameters:
name
(str): Feature namevocabulary_size
(int): Size of category vocabularyembedding_dim
(int): Embedding vector dimensionmaxlen
(int): Maximum sequence lengthdtype
(str): Data type, default 'int32'embedding_name
(str): Embedding layer name, default Nonecombiner
(str): Sequence pooling method, options: 'sum', 'mean', 'max', default 'mean'
Data Transformation
Data Preprocessing
MinMaxScaler
- Introduction: Normalize numerical features.
- Parameters:
feature_range
(tuple): Normalization range, default (0, 1)
StandardScaler
- Introduction: Standardize numerical features.
- Parameters:
with_mean
(bool): Whether to remove mean, default Truewith_std
(bool): Whether to scale by standard deviation, default True
LabelEncoder
- Introduction: Encode categorical features.
- Methods:
fit(values)
: Fit the encodertransform(values)
: Transform datafit_transform(values)
: Fit and transform
Data Format Conversion
pandas_to_torch
- Introduction: Convert Pandas data to PyTorch tensors.
- Parameters:
df
(pd.DataFrame): Input DataFramedense_cols
(list): List of continuous feature column namessparse_cols
(list): List of discrete feature column namesdevice
(str): Device type, 'cpu' or 'cuda'
numpy_to_torch
- Introduction: Convert NumPy arrays to PyTorch tensors.
- Parameters:
arrays
(list): List of NumPy arraysdevice
(str): Device type, 'cpu' or 'cuda'
Model Components
Activation Functions
Dice
- Introduction: Dice activation function, proposed in Deep Interest Network (DIN).
- Parameters:
epsilon
(float): Smoothing parameter, default 1e-3device
(str): Device type, default 'cpu'
Attention Mechanisms
ScaledDotProductAttention
- Introduction: Scaled dot-product attention mechanism.
- Parameters:
temperature
(float): Temperature parameter for scalingattn_dropout
(float): Attention dropout rate
MultiHeadAttention
- Introduction: Multi-head attention mechanism.
- Parameters:
d_model
(int): Model dimensionn_heads
(int): Number of attention headsd_k
(int): Key vector dimensiond_v
(int): Value vector dimensiondropout
(float): Dropout rate