Skip to content

Data Specification

Sift Lab - Data Specification

A datamodel in Sift Lab is made up of three data tables. Each line in the interaction table is linked to a specific user and item in the item and user tables.

classDiagram
`Transaction data` --> Datamodel
`Product data` --> Datamodel
`User data` --> Datamodel

Data encoding

Data must be encoded as UTF-8 + RFC 4180 (see more info at https://tools.ietf.org/html/rfc4180). Below are some specifics regarding the cell and row formatting.

Case Example
Line break \n
Quotes must be escaped “14”” TV”
Cell wrap with double if the value contains , or \n “Hello, world”
Number, avoid space and comma. 12345678.9
Timestamp/date, use unix timestamp or formatted date string. 1591718519 or “2020-03-17 15:01:02” or “2006-01-02T15:04:05-0700”

A valid example

id,value,number
1,"Hello, world!",123.45
2,"14"" TV",123.45

Transaction data

Provide a file containing transactional data between your products and users. Usually represents order rows.

If necessary, you may provide an additional file with data that is associated with an order number. An example would be an order header containing shipping cost, payment fee etc. In this case, provide an order number in the transactional data, and use the order number as the unique identifier in the orders file.

Field Role Description
user User (required) Must uniquely identify a user, same as in User table.
id Category (required) Must uniquely identify a transaction/order row. Can be created from combining other fields.
item Item (required) Must uniquely identify an item, same as in Item table
date_time Timestamp (required) Timestamp at which the interaction occurred.
timezone Timezone Timezone at which the interaction occurred. format (+6)
revenue Revenue Revenue of transaction/order row (This is ex VAT)
quantity Quantity Number columns must be adjusted for quantity, ex the Revenue/Margin column must be equal to the Total Revenue/Margin for that interaction.
order_id Category Must uniquely identify an order
row_type Category specify what type of interaction it is Purchase, Return, etc.
returned_quantity Number negative in the case of returns (placed as a new order with negative quantity)
discount_rate Number Discount for all items in the row. Also called voucher/promo_code (25% discount = 0.25 NOT 25)
tax_rate Number 25% discount = 0.25 NOT 25
payment_method Category
country Category
market Category
store Category
sales_channel Category ex. retail, ecomm, etc.
currency Category SEK, USD, EUR
exchange_rate Number Exchange rate for the currence at the time the purchase was made
campaign Category
utm_tag Category
days_to_delivery Number Days from placed order to received shipment
coupons Category
order_status Category
is_return Category Yes/No/Partially
cancelled_quantity Number
ship_to_country Category
ship_to_city Category
handling_fee Number cost of handling the order
shipping_fee Number cost of shipping
shipping_method Category Method used, such as: Home delivery, Click & Collect
shipping_partner Category Transportation service provider
full_price Number
revenue_after_returns Number
freight_cost Number
margin_gm1 Number GM1 = Difference between frt rcvd and paid
margin_gm2 Number GM2 = Gross margin after labour
margin_gm3 Number GM3 = Gross margin after deductions
margin_gm4 Number GM4 = Gross margin after commission
margin_gm5 Number GM5 = Gross margin after interest
margin_gm6 Number GM6 = Gross margin after insurance
...< ANY > Number, Category Additional metadata may be added that you find interesting to examine in our analytics.

Product data

Provide a file containing product (item) meta data. Make sure all product ids that are present in the transactions are also present in this file.

Categorical data in hierarchical structure should be added as separate columns: category_1, category_2, etc. If a product can have multiple categories but without hierarchy, provide them as a semi-colon separated string: “categoryA;categoryC:categoryE”. An example would be genres for a movie, as a movie can be associated with multiple genres in no particular order.

Field Role Description
item Item (required) Must uniquely identify a product, same as in transactions. This is what the machine learning trains on and is what will be used in recommendations
title Format The displayed name that will be associated with the product.
image Image URL to product image.
link Category URL to product page
sales_price Number Current selling price including discounts
full_price Number Price without any discounts
availability Category Yes / No
in_stock Number How many units are sellable in stock
sku Category Stock keeping unit (e.g. specific shirt in a certain color and size)
product_variant Category Variation specified product, usually product page (e.g. specific shirt in a certain color, any size)
parent_product Category The named product (e.g. specific shirt in any color or size)
category_level_1 Category The highest category level
category_level_2 Category 2:nd highest category level
category_level_3 Category 3:rd highest category level
category_level_4 Category 4:th highest category level
category_level_5,6,7... Category etc.
size Category Standard size format (S/M/L, M6/M8/M10, 90/120/140)
cogs Number Cost of goods
description Category Description of item. Limited to 255 characters.
color Category
brand Category
county_of_origin Category Where the product is produced
condition Category Such as New or Used
collection Category such as SS19, SS20, SS21 etc.
supplier Category
shipping_weight Number
product_launch Timestamp Date for product release
data_source Category In the case of multiple datasources, this identifies which source the data is coming from.
ml_blocked - Blocks specific items from the machine learning. Not the same as excluding them by filtration.
...< ANY > Number, Category Additional metadata may be added that you find interesting to examine in our analytics.

User data

Provide a file containing user meta data. Make sure all user ids that are present in the transactions are also present in this file.

In order to have deletes propagate to our side, provide users with empty data on all fields except id or a “is_deleted” field that we will use to clear all metadata.

Field Data type Note
user User (required) Must uniquely identify a user, same as in transactions.
market Category
country Category
country_code Category
region Category
area Category
city Category
disctrict Category
zip_code Category
customer_store Category
first_name Category
last_name Category
email Email
age Number
birthdate Timestamp
gender Category
phone_number Category
control_group Number Used during tests, can be imported or generated.
mosaic_group Category
nps Category
accepts_communication Category Can be several fields depending on setup. Used to identify users that can be communicated with by Yes/No.
subscribe_date Timestamp
unsubscribe_date Timestamp
discount_cohort Category
returned_revenue_percent_cohort Category
returned_revenue_cohort Category
revenue_cohort Category
returned_quantity_percentage_cohort Category
quantity_sum Number
quantity_sum_returned Number
quantity_return_percentage Number
total_value_paid Number
total_value_returned Number
total_value_return_percentage Number
average_discount Number
customer_net_value Number
registration_date Timestamp
...< ANY > Number, Category Additional metadata may be added that you find interesting to examine in our analytics.