Atomic files are introduced to format the input of mainstream recommendation tasks in a flexible way.
So far, our library introduces six atomic file types, we identify different files by their suffixes.
| Suffix | Content | Example Format |
|---|---|---|
| .inter | User-item interaction | user_id, item_id, rating, timestamp, review |
| .user | User feature | user_id, age, gender |
| .item | Item feature | item_id, category |
| .kg | Triplets in a knowledge graph | head_entity, tail_entity, relation |
| .link | Item-entity linkage data | entity, item_id |
| .net | Social graph data | source, target |
Atomic files are combined to support the input of different recommendation tasks. One can write the
suffixes into the config arg load_col to load the corresponding atomic files.
For each recommendation task, we have to provide several mandatory files:
| Tasks | Mandatory atomic files |
|---|---|
| General | .inter |
| Context-aware | .inter, .user, .item |
| Knowledge-aware | .inter, .kg, .link |
| Sequential | .inter |
| Social | .inter, .net |
Each atomic file can be viewed as a `m \times n` table (except header), where `n` is the number of
features and `m` is the number of data records.
The first row corresponds to feature names, in which each entry has the form of
feat_name:feat_type,indicating the feature name and feature type.
We support four feature types, which can be processed by tensors in batch.
| feat_type | Explanations | Examples |
|---|---|---|
| token | single discrete feature | user_id, age |
| token_seq | discrete features sequence | review |
| float | single continuous feature | rating, timestamp |
| float_seq | continuous feature sequence | vector |
As an example, we present the formatted ML-1M dataset below.
ml-1m.inter
| user_id:token | item_id:token | rating:float | timestamp:float |
|---|---|---|---|
| 1 | 1193 | 5 | 978300760 |
| 1 | 661 | 3 | 978302109 |
ml-1m.user
| user_id:token | age:token | gender:token | occupation:token | zip_code:token |
|---|---|---|---|---|
| 1 | 1 | F | 10 | 48067 |
| 2 | 56 | M | 16 | 70072 |
ml-1m.item
| item_id:token | movie_title:token_seq | release_year:token | genre:token_seq |
|---|---|---|---|
| 1 | Toy Story | 1995 | Animation Children's Comedy |
| 2 | Jumanji | 1995 | Adventure Children's Fantasy |
ml-1m.kg
| head_id:token | relation_id:token | tail_id:token |
|---|---|---|
| m.0gs6m | film.film_genre.films_in_this_genre | m.01b195 |
| m.052_dz | film.film.actor | m.02nrdp |
ml-1m.link
| item_id:token | entity_id:token |
|---|---|
| 2694 | m.02hxhz |
| 2079 | m.0kvcr9 |