TSV format of Phantomscan


Intended audience: data administrators, developers

Phantomscan is an in-store productivity webapp that helps store employees to get rid of phantom inventory. In order to do that, Phantomscan can import inventory data from plain text files published on an FTP server in TSV format, as long as the files are provided in a predefined format.

See also our BigFiles, our file hosting service that supports FTP. Push your files directly to Lokad.
Phantomscan relies on two types of files:

  • regular snapshots of inventory state at points in time, where each file is separate snapshot
  • continuous history of recorded inventory movements over periods of time, where each file represents a portion of ongoing history.

This document details how to setup these TSV files in a way that will enable Phantomscan to auto-import your inventory data without any further integration needed from Lokad's side.

Download sample TSV files


Tab-separated Values (TSV)

TSV stands for Tab-separated Values. It is a format of storing datasets in a specially formatted text files, which can be used by software. Although the format is extremely old, it works surprisingly well for data exchange with large datasets.

TSV format is defined by:
  • A file in TSV format consists of lines.
  • Each line contains fields separated from each other by TAB characters (horizontal tab, HT, ASCII control code 9).
  • "Field" means here just any string of characters, excluding TABs. The point is simply that TABs divide a line into pieces, components.
  • Each line must contain the same number of fields.
  • The first line contains the names for the fields (on all lines), i.e. column headers.

Since TAB is used as a separator between fields, a field cannot contain a TAB. However, TABs usually don't appear in data items that you wish to tabulate, so this is seldom a restriction. Otherwise, we suggest to strip them upon export, replacing with 4 space characters.

More detailed description (along with instructions to Export from Excel) is available: Tab Separated Values (TSV): a format for tabular data exchange

Numbers and dates

In all TSV files, floating numbers must be formatted using . (dot) as decimal separator. No other digit separators are allowed.

  • Good: 1234.00
  • Good: -12
  • Bad: 1234,12
  • Bad: 1.123,12
  • Bad: 1,123.12

All dates should be formatted using yyyy-MM-dd pattern or yyyy-MM-dd HH:mm:ss pattern, where:

  • yyyy - 4 year digits
  • MM - 2 month digits
  • dd - 2 day digits
  • HH - 2 digits of time in 24 hour format
  • mm - 2 minute digits
  • ss - 2 second digits

For example:

  • Good: 2012-09-18
  • Good: 2012-09-18 19:23:00
  • Bad: 2012-9-18
  • Bad: 9-18-2012
  • Bad: 18-9-2012
  • Bad: 18-9-2012 5:34 PM

Prices and currencies

All prices are expected to be expressed in the canonical currency unit, following the guidelines here above for floating numbers. A given Phantomscan project supports only a single currency. The currency is defined from the Phantomscan settings.

For example, in order to represent $199.50:

  • Good: 199.50
  • Bad: $199.50
  • Bad: 199.50 USD
  • Bad: 19950 (cents not USD)

Overall file hierarchy

Phantomscan imports two types of files:

  • SKU files - include static information about each SKU, generally representing a snapshot of inventory at certain point in time
  • inventory events files - include the history of inventory movements during a certain period of time.

These files are to be uploaded to a FTP directory exposed to Phantomscan. This directory can be kept up-to-date by uploading new files, while preserving existing files. Here's how such a directory might look like:

    /Phantomscan_InventoryEvents_2009-01-01.tsv
    /...
    /Phantomscan_InventoryEvents_2012-12-14.tsv
    /Phantomscan_InventoryEvents_2012-12-15.tsv
    /Phantomscan_InventoryEvents_2012-12-16.tsv
    /Phantomscan_SKUs_2009-01-01.tsv
    /...
    /Phantomscan_SKUs_2012-12-14.tsv
    /Phantomscan_SKUs_2012-12-15.tsv
    /Phantomscan_SKUs_2012-12-16.tsv

The lines with ... represent omitted set of files.

Usually, it's desirable to update data daily, uploading new SKU snapshots and inventory events, which happened since the previous upload. However different intervals (even irregular) are also possible.

If files are hosted on the Lokad FTP Service, Phantomscan might compress them with GZIP, appending .gz extension. This enables you to use space more efficiently.

An essential property of this file naming convention is that files are never modified. Once a file has been uploaded, it is considered to be part of history (for the date that appears in its name) and will never be changed again.

Format of Phantomscan_SKUs_YYYY-MM-DD.tsv

The SKU file is intended as a snapshot of all the SKUs being part of the inventory at the date embedded in the file name. Whenever the ambiguity exists, the content is assumed to be associated with the state of the inventory at the end of the day, that is, after the daily sales or the daily shipment.

Please, keep in mind precise naming pattern of the file. This is needed by Phantomscan in order to properly associate inventory snapshots with dates.

Capturing daily snapshots of the SKU information can seem very redundant, because SKUs change very little from one day to the next. In practice, however, we observe that most ERP or inventory management software do not support fine-grained tracking of all changes that impact inventory, such as tracking that an SKU has been renamed for example. Hence, Phantomscan relies on snapshots, and on snapshots comparison to figure out changes over time.

If your software allows retrieval of fine-grained events about inventory change events, please get in touch with Lokad support.

Phantomscan applies the following interpretations to the SKU files: as long no newer SKU file is provided, the most recent file is considered as being in effect. Let's illustrate the situation with an example. Let's assume that the FTP repository contains 4 files:

    /Phantomscan_SKUs_2012-12-01.tsv
    /Phantomscan_SKUs_2012-12-03.tsv
    /Phantomscan_SKUs_2012-12-07.tsv
    /Phantomscan_SKUs_2012-12-14.tsv

Then, we have the following observations:
  • The set of files is valid. There are some days missing but Phantomscan does not require SKU snapshots to be taken every day (although doing that can improve accuracy).
  • On December 1st and 2nd, the file Phantomscan_SKUs_2012-12-01.tsv applies.
  • From December 3rd to December 6th, the file Phantomscan_SKUs_2012-12-03.tsv applies.
  • If a given SKU appears in Phantomscan_SKUs_2012-12-01.tsv but not in Phantomscan_SKUs_2012-12-03.tsv, then it is assumed to have disappeared on December 3rd.

TSV Fields for SKUs

Core fields, must always be present in the extract.
SkuId string Required. The identifier associated to the SKU itself. A given file is not supposed to contain two lines with the same ID value. This identifier is used to match SKUs from one snapshot file to another.
Name string Required. The primary human-readable entry associated to the SKU. Within Phantomscan, this entry is used to refer to the SKU.
StockOnHand float Required. The expected inventory level for the SKU. This quantity should be computed at the end of the day (hence all stock movements for the day are supposed to be already applied).
ItemIds string Required. A comma (,) separated list of unique item identifiers (usually, the GTIN barcode) that are part of the SKU.
Optional fields
UnitSalesPricefloat The per-unit selling price of the item, no tax applied. May be used by Phantomscan to compute the expected ROI for counting specific SKUs, which in turn may lead to improved counting suggestions.
LocationIdstring An unique identifier for the location where a given SKU is stored. If the column is provided, then blank values are forbidden. If the column is not provided, then a single "default" location is assumed. Used if you need to support multiple locations, such as multiple stores
LabelFoostring All the fields that start with the prefix Label are treated properties to be visually associated to the SKU within Phantomscan. For example, LabelSupplierName could be used to list the name of the supplier. Multiple labels can be used, just replace Foo with a relevant suffix. This mechanism is intended to offer flexibility in the information to be exposed by Phantomscan.
H1Foostring Identifier of the first level of hierarchy associated to the item. The name of the hierarchy itself is hold by the suffix. For example, H1SuperFamily could be introduced to a top level hierarchy named SuperFamily. Most companies have their own hierarchical terminologies, Phantomscan adapts to your conventions.
H2Foostring Same as H1Foo for nested levels (sub-categories).
H3Foostring Same as H1Foo for nested levels (sub-sub-categories).


Format of Phantomscan_InventoryEvents_YYYY-MM-DD.tsv

The file represents inventory movement events (e.g.: item sold in store, item shipped, item lost), within a certain date range. Unlike the SKU files (which are mostly redundant snapshots), each event file represents a portion of history which will be combined by Phantomscan with the other event files.

Unlike details about SKUs, inventory movements are nearly always properly historically captured within the software. Hence, instead of dealing with snapshots, Phantomscan imports inventory events as recorded over defined periods of time.

Phantomscan applies the following interpretations to the event files: an event file is supposed to contain all events since the end of the previous file. Let's illustrate the situation with an example. Let's assume that the repository contains 4 files:

    /Phantomscan_InventoryEvents_2012-12-01.tsv
    /Phantomscan_InventoryEvents_2012-12-03.tsv
    /Phantomscan_InventoryEvents_2012-12-07.tsv
    /Phantomscan_InventoryEvents_2012-12-14.tsv

Then, we have the following observations:
  • As being the early file present in the repository, the file of December 1st might contain all the event history going back several years in the past.
  • The file of December 3rd contains all events from December 2nd and December 3rd.
  • The file of December 7th contains all events from December 4th until December 7th.

The date ranges covered by the event files should not overlap.

TSV fields for Inventory Events

Inventory events are represented as rows in TSV file. Phantomscan supports several distinct event types. Move types (in and out) represent individual stock changes, usually when a sale, shipment or replenishment occurs, where the quantity that leaves or enters the inventory is precisely known. Count events represent manual inventory counts when the exact inventory becomes precisely known. Two fields are mandatory and should be present for every event:

EventDatedatetime Required. The date and time of the event entry.
EventTypeenum Required. The type of the event. Allowed values: in, out, count.

The specific details for each event type are described below. Each event type has its own fields with specific meanings, some of which are mandatory. TSV columns which are not used by a specific event type should be left blank on the corresponding lines.

Movements (in/out)

The in (resp. out) inventory event represents cases where a specific quantity of a certain item enters (resp. leaves) the inventory.

Quantityfloat Required. The quantity of items involved. For example, if 2 units are sold, then an out event with a quantity of 2 should be issued.
ItemIdstring Required. The item involved in the inventory movement. The inventory event will be associated with the SKU that contains that item at the specified location.
LocationIdstring The location where the event took place. If this column is missing or blank, a "default" location is assumed.

Counts

The count event represents situations when a manual inventory count was performed for a given SKU. This establishes the current stock level for the SKU with certainty.

Quantity float Required. The exact current quantity for the SKU, as determined by the manual count. For example, if a manual count determines that there are 11 units in stock, a count event with a quantity of 11 should be issued.
SkuIdstring Required. The SKU that underwent manual counting.

FAQ (Frequently Asked Questions)

Difference between ItemId and SkuId? In a store, sales are fundamentally recorded leveraging the barcode information. However, over time, the barcode of a product might change. Then, since the two products (old and new) can be extremely similar, a single SKU typically aggregates the two products; that is, the store manager manages the two products as if they were a single product. Phantomscan operates its aggregation at the SKU level, but by supporting the notion of ItemId, one does not need to rewrite the history (ake inventory events) with the new barcode identifiers in case of barcode evolution; only the SKU file needs to be refreshed with the latest mappings.