特征定义
Torch-RecHub提供了三种核心特征类,用于处理不同类型的特征:
DenseFeature
处理数值型特征,如年龄、薪资、日点击量等。
python
from torch_rechub.basic.features import DenseFeature
# 创建数值型特征
dense_feature = DenseFeature(name="age", embed_dim=1)参数说明:
name:特征名称embed_dim:嵌入向量长度,固定为1
SparseFeature
处理类别型特征,如城市、学历、性别等。
python
from torch_rechub.basic.features import SparseFeature
# 创建类别型特征
sparse_feature = SparseFeature(
name="city",
vocab_size=100, # 词汇表大小
embed_dim=16, # 嵌入向量长度
shared_with=None # 与其他特征共享嵌入表
)参数说明:
name:特征名称vocab_size:词汇表大小embed_dim:嵌入向量长度,若为None则自动计算shared_with:共享嵌入表的其他特征名称padding_idx:填充索引,在InputMask层中会被掩码为0initializer:嵌入层权重初始化器
SequenceFeature
处理序列特征或多热特征,如用户行为序列、物品标签等。
python
from torch_rechub.basic.features import SequenceFeature
# 创建序列特征
sequence_feature = SequenceFeature(
name="user_history",
vocab_size=10000, # 词汇表大小
embed_dim=32, # 嵌入向量长度
pooling="mean" # 池化方式:mean, sum, concat
)参数说明:
name:特征名称vocab_size:词汇表大小embed_dim:嵌入向量长度,若为None则自动计算pooling:池化方式,支持mean、sum、concatshared_with:共享嵌入表的其他特征名称padding_idx:填充索引,在InputMask层中会被掩码为0initializer:嵌入层权重初始化器
特征使用示例
python
from torch_rechub.basic.features import DenseFeature, SparseFeature, SequenceFeature
# 定义特征
dense_features = [
DenseFeature(name="age", embed_dim=1),
DenseFeature(name="income", embed_dim=1)
]
sparse_features = [
SparseFeature(name="city", vocab_size=100, embed_dim=16),
SparseFeature(name="gender", vocab_size=3, embed_dim=8)
]
sequence_features = [
SequenceFeature(name="user_history", vocab_size=10000, embed_dim=32, pooling="mean")
]
# 合并所有特征
all_features = dense_features + sparse_features + sequence_features