Basic Components API Reference
This document provides detailed documentation for basic components in Torch-RecHub, including feature processing, data transformation, and other fundamental functionalities.
Feature Processing
Feature Columns
DenseFeature
- Introduction: Process continuous numerical features.
- Parameters:
name(str): Feature namedimension(int): Feature dimensiondtype(str): Data type, default 'float32'
SparseFeature
- Introduction: Process discrete categorical features.
- Parameters:
name(str): Feature namevocabulary_size(int): Size of category vocabularyembedding_dim(int): Embedding vector dimensiondtype(str): Data type, default 'int32'embedding_name(str): Embedding layer name, default None
VarLenSparseFeature
- Introduction: Process variable-length discrete features.
- Parameters:
name(str): Feature namevocabulary_size(int): Size of category vocabularyembedding_dim(int): Embedding vector dimensionmaxlen(int): Maximum sequence lengthdtype(str): Data type, default 'int32'embedding_name(str): Embedding layer name, default Nonecombiner(str): Sequence pooling method, options: 'sum', 'mean', 'max', default 'mean'
Data Transformation
Data Preprocessing
MinMaxScaler
- Introduction: Normalize numerical features.
- Parameters:
feature_range(tuple): Normalization range, default (0, 1)
StandardScaler
- Introduction: Standardize numerical features.
- Parameters:
with_mean(bool): Whether to remove mean, default Truewith_std(bool): Whether to scale by standard deviation, default True
LabelEncoder
- Introduction: Encode categorical features.
- Methods:
fit(values): Fit the encodertransform(values): Transform datafit_transform(values): Fit and transform
Data Format Conversion
pandas_to_torch
- Introduction: Convert Pandas data to PyTorch tensors.
- Parameters:
df(pd.DataFrame): Input DataFramedense_cols(list): List of continuous feature column namessparse_cols(list): List of discrete feature column namesdevice(str): Device type, 'cpu' or 'cuda'
numpy_to_torch
- Introduction: Convert NumPy arrays to PyTorch tensors.
- Parameters:
arrays(list): List of NumPy arraysdevice(str): Device type, 'cpu' or 'cuda'
Model Components
Activation Functions
Dice
- Introduction: Dice activation function, proposed in Deep Interest Network (DIN).
- Parameters:
epsilon(float): Smoothing parameter, default 1e-3device(str): Device type, default 'cpu'
Attention Mechanisms
ScaledDotProductAttention
- Introduction: Scaled dot-product attention mechanism.
- Parameters:
temperature(float): Temperature parameter for scalingattn_dropout(float): Attention dropout rate
MultiHeadAttention
- Introduction: Multi-head attention mechanism.
- Parameters:
d_model(int): Model dimensionn_heads(int): Number of attention headsd_k(int): Key vector dimensiond_v(int): Value vector dimensiondropout(float): Dropout rate