> For the complete documentation index, see [llms.txt](https://docs.toucanai.cloud/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.toucanai.cloud/references/apis/data-steps.md).

# Data steps

Steps are the building blocks of data transformation. They are used in **chart queries**: each step is a single operation in the query (filtering, aggregating, renaming columns, etc.). Steps run in sequence; the output of one step becomes the input of the next.

The JSON examples in this reference can be copied and adapted for your chart queries. Use the `name` field to identify the step type and fill in the other properties as described for each step below.

### `absolutevalue` step

This step is meant to compute the absolute value of a given input column.

```json
{
    "name": "absolutevalue",
    "column": "my-column",
    "newColumn": "my-new-column"
}

```

#### Example

**Input dataset:**

| Company   | Value |
| --------- | ----- |
| Company 1 | -33   |
| Company 2 | 0     |
| Company 3 | 10    |

**Step configuration:**

```json
{
  "name": "absolutevalue",
  "column": "Value",
  "newColumn": "My-absolute-value"
}

```

**Output dataset:**

| Company   | Value | My-absolute-value |
| --------- | ----- | ----------------- |
| Company 1 | -33   | 33                |
| Company 2 | 0     | 0                 |
| Company 3 | 10    | 10                |

### `addmissingdates` step

Add missing dates as new rows in a dates column. Exhaustive dates will range between the minimum and maximum date found in the dataset (or in each group if a group by logic is applied - see thereafter).

Added rows will be set to null in columns not referenced in the step configuration.

You should make sure to use a group by logic if you want to add missing dates in independent groups of rows (e.g. you may need to add missing rows for every country found in a "COUNTRY" column). And you should ensure that every date is unique in every group of rows at the specified granularity, else you may get inconsistent results. You can specify "group by" columns in the `groups` parameter.

An addmissingdates step has the following structure:

```json
{
  "name": "addmissingdates",
  "datesColumn": "DATE",
  "datesGranularity": "day",
  "groups": [ "COUNTRY"]
}

```

#### Example 1: day granularity without groups

**Input dataset:**

| DATE                       | VALUE |
| -------------------------- | ----- |
| "2018-01-01T00:00:00.000Z" | 75    |
| "2018-01-02T00:00:00.000Z" | 80    |
| "2018-01-03T00:00:00.000Z" | 82    |
| "2018-01-04T00:00:00.000Z" | 83    |
| "2018-01-05T00:00:00.000Z" | 80    |
| "2018-01-07T00:00:00.000Z" | 86    |
| "2018-01-08T00:00:00.000Z" | 79    |
| "2018-01-09T00:00:00.000Z" | 76    |
| "2018-01-10T00:00:00.000Z" | 79    |
| "2018-01-11T00:00:00.000Z" | 75    |

Here the day "2018-01-06" is missing.

**Step configuration:**

```json
{
  "name": "addmissingdates",
  "datesColumn": "DATE",
  "datesGranularity": "day"
}

```

**Output dataset:**

| DATE                       | VALUE |
| -------------------------- | ----- |
| "2018-01-01T00:00:00.000Z" | 75    |
| "2018-01-02T00:00:00.000Z" | 80    |
| "2018-01-03T00:00:00.000Z" | 82    |
| "2018-01-04T00:00:00.000Z" | 83    |
| "2018-01-05T00:00:00.000Z" | 80    |
| "2018-01-06T00:00:00.000Z" |       |
| "2018-01-07T00:00:00.000Z" | 86    |
| "2018-01-08T00:00:00.000Z" | 79    |
| "2018-01-09T00:00:00.000Z" | 76    |
| "2018-01-10T00:00:00.000Z" | 79    |
| "2018-01-11T00:00:00.000Z" | 75    |

#### Example 2: day granularity with groups

**Input dataset:**

| COUNTRY | DATE                       | VALUE |
| ------- | -------------------------- | ----- |
| France  | "2018-01-01T00:00:00.000Z" | 75    |
| France  | "2018-01-02T00:00:00.000Z" | 80    |
| France  | "2018-01-03T00:00:00.000Z" | 82    |
| France  | "2018-01-04T00:00:00.000Z" | 83    |
| France  | "2018-01-05T00:00:00.000Z" | 80    |
| France  | "2018-01-07T00:00:00.000Z" | 86    |
| France  | "2018-01-08T00:00:00.000Z" | 79    |
| France  | "2018-01-09T00:00:00.000Z" | 76    |
| France  | "2018-01-10T00:00:00.000Z" | 79    |
| France  | "2018-01-11T00:00:00.000Z" | 85    |
| USA     | "2018-01-01T00:00:00.000Z" | 69    |
| USA     | "2018-01-02T00:00:00.000Z" | 73    |
| USA     | "2018-01-03T00:00:00.000Z" | 73    |
| USA     | "2018-01-05T00:00:00.000Z" | 75    |
| USA     | "2018-01-06T00:00:00.000Z" | 70    |
| USA     | "2018-01-07T00:00:00.000Z" | 76    |
| USA     | "2018-01-08T00:00:00.000Z" | 73    |
| USA     | "2018-01-09T00:00:00.000Z" | 70    |
| USA     | "2018-01-10T00:00:00.000Z" | 72    |
| USA     | "2018-01-12T00:00:00.000Z" | 78    |

Here the day "2018-01-06" is missing for "France" rows, and "2018-01-11" and "2018-01-11" are missing for "USA" rows.

Note that "2018-01-12" will not be considered as a missing row for "France" rows, because the latest date found for this group of rows is "2018-01-11" (even though "2018-01-12" is the latest date found for "USA" rows).

**Step configuration:**

```json
{
  "name": "addmissingdates",
  "datesColumn": "DATE",
  "datesGranularity": "day",
  "groups": "COUNTRY"
}

```

**Output dataset:**

| COUNTRY | DATE                       | VALUE |
| ------- | -------------------------- | ----- |
| France  | "2018-01-01T00:00:00.000Z" | 75    |
| France  | "2018-01-02T00:00:00.000Z" | 80    |
| France  | "2018-01-03T00:00:00.000Z" | 82    |
| France  | "2018-01-04T00:00:00.000Z" | 83    |
| France  | "2018-01-05T00:00:00.000Z" | 80    |
| France  | "2018-01-06T00:00:00.000Z" |       |
| France  | "2018-01-07T00:00:00.000Z" | 86    |
| France  | "2018-01-08T00:00:00.000Z" | 79    |
| France  | "2018-01-09T00:00:00.000Z" | 76    |
| France  | "2018-01-10T00:00:00.000Z" | 79    |
| France  | "2018-01-11T00:00:00.000Z" | 85    |
| USA     | "2018-01-01T00:00:00.000Z" | 69    |
| USA     | "2018-01-02T00:00:00.000Z" | 73    |
| USA     | "2018-01-03T00:00:00.000Z" | 73    |
| USA     | "2018-01-04T00:00:00.000Z" |       |
| USA     | "2018-01-05T00:00:00.000Z" | 75    |
| USA     | "2018-01-06T00:00:00.000Z" | 70    |
| USA     | "2018-01-07T00:00:00.000Z" | 76    |
| USA     | "2018-01-08T00:00:00.000Z" | 73    |
| USA     | "2018-01-09T00:00:00.000Z" | 70    |
| USA     | "2018-01-10T00:00:00.000Z" | 72    |
| USA     | "2018-01-11T00:00:00.000Z" |       |
| USA     | "2018-01-12T00:00:00.000Z" | 78    |

#### Example 3: month granularity

**Input dataset:**

| DATE                       | VALUE |
| -------------------------- | ----- |
| "2019-01-01T00:00:00.000Z" | 74    |
| "2019-02-01T00:00:00.000Z" | 73    |
| "2019-03-01T00:00:00.000Z" | 68    |
| "2019-04-01T00:00:00.000Z" | 71    |
| "2019-06-01T00:00:00.000Z" | 74    |
| "2019-07-01T00:00:00.000Z" | 74    |
| "2019-08-01T00:00:00.000Z" | 73    |
| "2019-09-01T00:00:00.000Z" | 72    |
| "2019-10-01T00:00:00.000Z" | 75    |
| "2019-12-01T00:00:00.000Z" | 76    |

Here "2019-05" and "2019-11" are missing.

**Step configuration:**

```json
{
  "name": "addmissingdates",
  "datesColumn": "DATE",
  "datesGranularity": "month"
}

```

**Output dataset:**

| DATE                       | VALUE |
| -------------------------- | ----- |
| "2019-01-01T00:00:00.000Z" | 74    |
| "2019-02-01T00:00:00.000Z" | 73    |
| "2019-03-01T00:00:00.000Z" | 68    |
| "2019-04-01T00:00:00.000Z" | 71    |
| "2019-05-01T00:00:00.000Z" |       |
| "2019-06-01T00:00:00.000Z" | 74    |
| "2019-07-01T00:00:00.000Z" | 74    |
| "2019-08-01T00:00:00.000Z" | 73    |
| "2019-09-01T00:00:00.000Z" | 72    |
| "2019-10-01T00:00:00.000Z" | 75    |
| "2019-11-01T00:00:00.000Z" |       |
| "2019-12-01T00:00:00.000Z" | 76    |

### `aggregate` step

Perform aggregations on one or several columns. Available aggregation functions are sum, average, count, count distinct, min, max, first, last.

An aggregation step has the following structure:

```json
{
   "name": "aggregate",
   "on": [ "column1", "column2"],
   "aggregations":  [
    {
      "newcolumns": [ "sum_value1", "sum_value2"],
      "aggfunction": "sum",
      "columns": [ "value1", "value2"]
    }
    {
      "newcolumns": [ "avg_value1"],
      "aggfunction": "avg",
      "columns": [ "value1"]
    }

  ]
  "keepOriginalGranularity": false

}

```

#### Example 1: keepOriginalGranularity set to false

**Input dataset:**

| Label   | Group   | Value1 | Value2 |
| ------- | ------- | ------ | ------ |
| Label 1 | Group 1 | 13     | 10     |
| Label 2 | Group 1 | 7      | 21     |
| Label 3 | Group 1 | 20     | 4      |
| Label 4 | Group 2 | 1      | 17     |
| Label 5 | Group 2 | 9      | 12     |
| Label 6 | Group 2 | 5      | 2      |

**Step configuration:**

```json
{
  "name": "aggregate",
   "on": [ "Group"],
   "aggregations":  [
    {
      "newcolumns": [ "Sum-Value1", "Sum-Value2"],
      "aggfunction": "sum",
      "columns": [ "Value1", "Value2"]
    }
    {
      "newcolumns": [ "Avg-Value1"],
      "aggfunction": "avg",
      "columns": [ "Value1"]
    }
  ],
  "keepOriginalGranularity": false
}

```

**Output dataset:**

| Group   | Sum-Value1 | Sum-Value2 | Avg-Value1 |
| ------- | ---------- | ---------- | ---------- |
| Group 1 | 40         | 35         | 13.333333  |
| Group 2 | 16         | 31         | 5.333333   |

#### Example 2: keepOriginalGranularity set to true

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

**Step configuration:**

```json
{
  "name": "aggregate",
   "on": [ "Group"],
   "aggregations":  [
    {
      "newcolumns": [ "Total"],
      "aggfunction": "sum",
      "columns": [ "Value"]
    }
  ],
  "keepOriginalGranularity": true
}

```

**Output dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 40    |
| Label 2 | Group 1 | 40    |
| Label 3 | Group 1 | 40    |
| Label 4 | Group 2 | 16    |
| Label 5 | Group 2 | 16    |
| Label 6 | Group 2 | 16    |

### `append` step

Appends to the current dataset, one or several datasets resulting from other `pipelines`. WeaverBird allows you to save `pipelines` referenced by name in the Vuex store of the application. You can then call them by their unique names in this step.

```json
{
  "name": "append",
  "pipelines": [ "pipeline1", "pipeline2"]
}

```

#### Example

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |

**dataset1 (saved in the application Vuex store)**:

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |

**dataset2 (saved in the application Vuex store)**:

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

**Step configuration:**

```json
{
  "name": "append",
  "pipelines": [ "dataset1", "dataset2"]
}

```

**Output dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

### `argmax` step

Get row(s) matching the maximum value in a given `column`, by group if `groups` is specified.

```json
{
  "name": "argmax",
  "column": "value",
  "groups": [ "group1", "group2"]
}

```

#### Example 1: without `groups`

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

**Step configuration:**

```json
{
  "name": "argmax",
  "column": "Value"
}

```

**Output dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 3 | Group 1 | 20    |

#### Example 2: with `groups`

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

**Step configuration:**

```json
{
  "name": "argmax",
  "column": "Value",
  "groups": [ "Group"]
}

```

**Output dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 3 | Group 1 | 20    |
| Label 5 | Group 2 | 10    |

### `argmin` step

Get row(s) matching the minimum value in a given `column`, by group if `groups` is specified.

```json
{
  "name": "argmin",
  "column": "value",
  "groups": [ "group1", "group2"]
}

```

#### Example 1: without `groups`

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

**Step configuration:**

```json
{
  "name": "argmin",
  "column": "Value"
}

```

**Output dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 4 | Group 2 | 1     |

#### Example 2: with `groups`

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

**Step configuration:**

```json
{
  "name": "argmin",
  "column": "Value",
  "groups": [ "Groups"]
}

```

**Output dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 2 | Group 1 | 7     |
| Label 4 | Group 2 | 1     |

### `comparetext` step

Compares 2 string columns and returns true if the string values are equal, and false oteherwise. The comparison is case-sensitive (see examples below).

```json
{
  "name": "comparetext",
  "newColumnName": "NEW",
  "strCol1": "TEXT_1",
  "strCol2": "TEXT_2"
}

```

#### Example

**Input dataset:**

| TEXT\_1 | TEXT\_2 |
| ------- | ------- |
| France  | Fr      |
| France  | France  |
| France  | france  |
| France  | England |
| France  | USA     |

**Step configuration:**

```json
{
  "name": "split",
  "newColumnName": "RESULT",
  "ctrCol1": "TEXT_1",
  "ctrCol2": "TEXT_2"
}

```

**Output dataset:**

| TEXT\_1 | TEXT\_2 | RESULT |
| ------- | ------- | ------ |
| France  | Fr      | false  |
| France  | France  | true   |
| France  | france  | false  |
| France  | England | false  |
| France  | USA     | false  |

### `concatenate` step

This step allows to concatenate several `columns` using a `separator`.

```json
{
  "name": "concatenate",
  "columns": [ "Company", "Group"],
  "separator": " - ",
  "newColumnName": "Label"
}

```

#### Example

**Input dataset:**

| Company   | Group   | Value |
| --------- | ------- | ----- |
| Company 1 | Group 1 | 13    |
| Company 2 | Group 1 | 7     |
| Company 3 | Group 1 | 20    |
| Company 4 | Group 2 | 1     |
| Company 5 | Group 2 | 10    |
| Company 6 | Group 2 | 5     |

**Step configuration:**

```json
{
  "name": "concatenate",
  "columns": [ "Company", "Group"],
  "separator": " - ",
  "newColumnName": "Label"
}

```

**Output dataset:**

| Company   | Group   | Value | Label               |
| --------- | ------- | ----- | ------------------- |
| Company 1 | Group 1 | 13    | Company 1 - Group 1 |
| Company 2 | Group 1 | 7     | Company 2 - Group 1 |
| Company 3 | Group 1 | 20    | Company 3 - Group 1 |
| Company 4 | Group 2 | 1     | Company 4 - Group 2 |
| Company 5 | Group 2 | 10    | Company 5 - Group 2 |
| Company 6 | Group 2 | 5     | Company 6 - Group 2 |

### `convert` step

This step allows to convert `columns` data types.

```json
{
  "name": "convert",
  "columns": [ "col1", "col2"],
  "dataType": "integer"

}

```

In a effort to harmonize as much as possible the conversion behaviors, for some cases, the Sql translator implements casting otherwise than the CAST AS method.

Precisely, when casting float to integer, the default behaviour rounds the result, other languages truncate it. That's why the use of `TRUNCATE` was implemented when converting float to int. The same implementation was done when converting strings to int (for date represented as string). As for the conversion of date to int, we handled it by assuming the dataset's timestamp is in `TIMESTAMP_NTZ` format.

#### Example

**Input dataset:**

| Company   | Value |
| --------- | ----- |
| Company 1 | '13'  |
| Company 2 | '7'   |
| Company 3 | '20'  |
| Company 4 | '1'   |
| Company 5 | '10'  |
| Company 6 | '5'   |

**Step configuration:**

```json
{
  "name": "convert",
  "columns": [ "Value"],
  "dataType": "integer"
}

```

**Output dataset:**

| Company   | Value |
| --------- | ----- |
| Company 1 | 13    |
| Company 2 | 7     |
| Company 3 | 20    |
| Company 4 | 1     |
| Company 5 | 10    |
| Company 6 | 5     |

### `cumsum` step

This step allows to compute the cumulated sum of value columns based on a reference column (usually dates) to be sorted by ascending order for the needs of the computation. The computation can be scoped by group if needed.

The `toCumSum` parameter takes as input a list of 2-elements lists in the form \['valueColumn', 'newColumn'].

```json
{
  "name": "cumsum",
  "toCumSum": [["myValues", "myCumsum"]],
  "referenceColumn": "myDates",
  "groupby": [ "foo", "bar"]
}

```

#### Example 1: Basic usage

**Input dataset:**

| DATE    | VALUE |
| ------- | ----- |
| 2019-01 | 2     |
| 2019-02 | 5     |
| 2019-03 | 3     |
| 2019-04 | 8     |
| 2019-05 | 9     |
| 2019-06 | 6     |

**Step configuration:**

```json
{
  "name": "cumsum",
  "toCumSum": [["VALUE", ""]],
  "referenceColumn": "DATE"
}

```

**Output dataset:**

| DATE      | VALUE | VALUE\_CUMSUM |
| --------- | ----- | ------------- |
| 2019-01   | 2     | 2             |
| 2019-02   | 5     | 7             |
| 2019-03   | 3     | 10            |
| 2019-04   | 8     | 18            |
| 2019-05   | 9     | 27            |
| 2019-06 6 | 6     | 33            |

#### Example 2: With more advanced options

**Input dataset:**

| COUNTRY | DATE      | VALUE |
| ------- | --------- | ----- |
| France  | 2019-01   | 2     |
| France  | 2019-02   | 5     |
| France  | 2019-03   | 3     |
| France  | 2019-04   | 8     |
| France  | 2019-05   | 9     |
| France  | 2019-06 6 | 6     |
| USA     | 2019-01   | 10    |
| USA     | 2019-02   | 6     |
| USA     | 2019-03   | 6     |
| USA     | 2019-04   | 4     |
| USA     | 2019-05   | 8     |
| USA     | 2019-06 6 | 7     |

**Step configuration:**

```json
{
  "name": "cumsum",
  "toCumSum": [["VALUE", "MY_CUMSUM"]],
  "referenceColumn": "DATE",
  "groupby": [ "COUNTRY"]
}

```

**Output dataset:**

| COUNTRY | DATE      | VALUE | MY\_CUMSUM |
| ------- | --------- | ----- | ---------- |
| France  | 2019-01   | 2     | 2          |
| France  | 2019-02   | 5     | 7          |
| France  | 2019-03   | 3     | 10         |
| France  | 2019-04   | 8     | 18         |
| France  | 2019-05   | 9     | 27         |
| France  | 2019-06 6 | 6     | 33         |
| USA     | 2019-01   | 10    | 10         |
| USA     | 2019-02   | 6     | 16         |
| USA     | 2019-03   | 6     | 22         |
| USA     | 2019-04   | 4     | 26         |
| USA     | 2019-05   | 8     | 34         |
| USA     | 2019-06 6 | 7     | 41         |

### `dateextract` step

Extract date information (eg. *day*, *week*, *year* etc.). The following information can be extracted:

* `year`: extract 'year' from date,
* `month`: extract 'month' from date,
* `day`: extract 'day of month' from date,
* `week'`: extract 'week number' (ranging from 0 to 53) from date,
* `quarter`: extract 'quarter number' from date (1 for Jan-Feb-Mar)
* `dayOfWeek`: extract 'day of week' (ranging from 1 for Sunday to 7 for Staurday) from date,
* `dayOfYear`: extract 'day of year' from date,
* `isoYear`: extract 'year number' in ISO 8601 format (ranging from 1 to 53) from date.
* `isoWeek`: extract 'week number' in ISO 8601 format (ranging from 1 to 53) from date.
* `isoDayOfWeek`: extract 'day of week' in ISO 8601 format (ranging from 1 for Monday to 7 for Sunday) from date,
* `firstDayOfYear`: calendar date corresponding to the first day (1st of January) of the year ,
* `firstDayOfMonth`: calendar date corresponding to the first day of the month,
* `firstDayOfWeek`: calendar date corresponding to the first day of the week,
* `firstDayOfQuarter`: calendar date corresponding to the first day of the quarter,
* `firstDayOfIsoWeek`: calendar date corresponding to the first day of the week in ISO 8601 format,
* `currentDay`: calendar date of the target date,
* `previousDay`: calendar date one day before the target date,
* `firstDayOfPreviousYear`: calendar date corresponding to the first day (1st of January) of the previous year,
* `firstDayOfPreviousMonth`: calendar date corresponding to the first day of the previous month,
* `firstDayOfPreviousWeek`: calendar date corresponding to the first day of the previous week,
* `firstDayOfPreviousQuarter`: calendar date corresponding to the first day of the previous quarter,
* `firstDayOfPreviousISOWeek`: calendar date corresponding to the first day of the previous ISO week,
* `previousYear`: extract previous 'year number' from date
* `previousMonth`: extract previous 'month number' from date
* `previousWeek`: extract previous 'week number' from date
* `previousQuarter`: extract previous 'quarter number' from date
* `previousISOWeek`: extract previous 'week number' in ISO 8601 format (ranging from 1 for Monday to 7 for Sunday) from date
* `hour`: extract 'hour' from date,
* `minutes`: extract 'minutes' from date,
* `seconds`: extract 'seconds' from date,
* `milliseconds`: extract 'milliseconds' from date,

Here's an example of such a step:

```json
{
  "name": "dateextract",
  "column": "date",
  "dateInfo": [ "year", "month", "day"],
  "newColumns": [ "date_year", "date_month", "date_day"]
}

```

#### Example

**Input dataset:**

| Date                     |
| ------------------------ |
| 2019-10-30T00:00:00.000Z |
| 2019-10-15T00:00:00.000Z |
| 2019-10-01T00:00:00.000Z |
| 2019-09-30T00:00:00.000Z |
| 2019-09-15T00:00:00.000Z |
| 2019-09-01T00:00:00.000Z |

**Step configuration:**

```json
{
  "name": "dateextract",
  "column": "Date",
  "dateInfo": [ "year", "month", "day"],
  "newColumns": [ "Date_year", "Date_month", "Date_day"]
}

```

**Output dataset:**

| Date                     | Date\_year | Date\_month | Date\_day |
| ------------------------ | ---------- | ----------- | --------- |
| 2019-10-30T00:00:00.000Z | 2019       | 10          | 30        |
| 2019-10-15T00:00:00.000Z | 2019       | 10          | 15        |
| 2019-10-01T00:00:00.000Z | 2019       | 10          | 1         |
| 2019-09-30T00:00:00.000Z | 2020       | 10          | 30        |
| 2019-09-15T00:00:00.000Z | 2020       | 10          | 15        |
| 2019-09-01T00:00:00.000Z | 2020       | 10          | 1         |

### `dategranularity` step

Extract date information (eg. *day*, *week*, *year* etc.) in a column intended for aggregation. The following granularities are supported:

* `year`: calendar date corresponding to the first day (1st of January) of the year
* `quarter`: calendar date corresponding to the first day of the quarter
* `month`: calendar date corresponding to the first day of the month
* `week`: calendar date corresponding to the first day of the week (sunday)
* `isoWeek`: calendar date corresponding to the first day of the week (monday)
* `day`: calendar date corresponding to the first hour of the day

Here's an example of such a step:

```json
{
  "name": "dategranularity",
  "column": "date",
  "granularity": "year",
  "newColumn": "do_the_aggregate_on_this"
}

```

#### Example

**Input dataset:**

| Date                     |
| ------------------------ |
| 2019-10-30T00:00:00.000Z |
| 2019-10-15T00:00:00.000Z |
| 2019-10-01T00:00:00.000Z |
| 2019-09-30T05:11:31.000Z |
| 2019-09-15T00:00:00.000Z |
| 2019-09-01T00:00:00.000Z |

**Step configuration:**

```json
{
  "name": "dategranularity",
  "column": "Date",
  "granularity": "month"
}

```

**Output dataset:**

| Date                     |
| ------------------------ |
| 2019-10-01T00:00:00.000Z |
| 2019-10-01T00:00:00.000Z |
| 2019-10-01T00:00:00.000Z |
| 2019-09-01T00:00:00.000Z |
| 2019-09-01T00:00:00.000Z |
| 2019-09-01T00:00:00.000Z |

### `delete` step

Delete a column.

```json
{
    "name": "delete",
    "columns": [ "my-column", "some-other-column"]
}

```

#### Example

**Input dataset:**

| Company   | Group   | Value | Label               |
| --------- | ------- | ----- | ------------------- |
| Company 1 | Group 1 | 13    | Company 1 - Group 1 |
| Company 2 | Group 1 | 7     | Company 2 - Group 1 |
| Company 3 | Group 1 | 20    | Company 3 - Group 1 |
| Company 4 | Group 2 | 1     | Company 4 - Group 2 |
| Company 5 | Group 2 | 10    | Company 5 - Group 2 |
| Company 6 | Group 2 | 5     | Company 6 - Group 2 |

**Step configuration:**

```json
{
  "name": "delete",
  "columns": [ "Company", "Group"]
}

```

**Output dataset:**

| Value | Label               |
| ----- | ------------------- |
| 13    | Company 1 - Group 1 |
| 7     | Company 2 - Group 1 |
| 20    | Company 3 - Group 1 |
| 1     | Company 4 - Group 2 |
| 10    | Company 5 - Group 2 |
| 5     | Company 6 - Group 2 |

### `duplicate` step

This step is meant to duplicate a column.

```json
{
    "name": "duplicate",
    "column": "my-column"
    "newColumnName": "my-duplicate"
}

```

#### Example

**Input dataset:**

| Company   | Value |
| --------- | ----- |
| Company 1 | 13    |
| Company 2 | 0     |
| Company 3 | 20    |

**Step configuration:**

```json
{
  "name": "duplicate",
  "column": "Company",
  "newColumnName": "Company-copy"
}

```

**Output dataset:**

| Company   | Value | Company-copy |
| --------- | ----- | ------------ |
| Company 1 | 13    | Company 1    |
| Company 2 | 0     | Company 2    |
| Company 3 | 20    | Company 3    |

### `duration` step

Compute the duration (in days, hours, minutes or seconds) between 2 dates in a new column.

```json
{
  "name": "duration",
  "newColumnName": "DURATION",
  "startDateColumn": "START_DATE",
  "endDateColumn": "END_DATE",
  "durationIn": "days"
}

```

#### Example 1: duration in days

**Input dataset:**

| START\_DATE                | END\_DATE                  |
| -------------------------- | -------------------------- |
| "2020-01-01T00:00:00.000Z" | "2020-01-31T00:00:00.000Z" |
| "2020-01-01T00:00:00.000Z" | "2020-12-31T00:00:00.000Z" |

**Step configuration:**

```json
{
  "name": "duration",
  "newColumnName": "DURATION",
  "startDateColumn": "START_DATE",
  "endDateColumn": "END_DATE",
  "durationIn": "days"
}

```

**Output dataset:**

| START\_HOUR                | END\_HOUR                  | DURATION |
| -------------------------- | -------------------------- | -------- |
| "2020-01-01T00:00:00.000Z" | "2020-01-31T00:00:00.000Z" | 30       |
| "2020-01-01T00:00:00.000Z" | "2020-12-31T00:00:00.000Z" | 365      |

#### Example 2: duration in minutes

**Input dataset:**

| START\_HOUR                | END\_HOUR                  |
| -------------------------- | -------------------------- |
| "2020-01-01T14:00:00.000Z" | "2020-01-31T15:00:00.000Z" |
| "2020-01-01T15:00:00.000Z" | "2020-12-31T20:00:00.000Z" |

**Step configuration:**

```json
{
  "name": "duration",
  "newColumnName": "DURATION",
  "startDateColumn": "START_HOUR",
  "endDateColumn": "END_HOUR",
  "durationIn": "minutes"
}

```

**Output dataset:**

| START\_HOUR                | END\_HOUR                  | DURATION |
| -------------------------- | -------------------------- | -------- |
| "2020-01-01T14:00:00.000Z" | "2020-01-31T15:00:00.000Z" | 60       |
| "2020-01-01T15:00:00.000Z" | "2020-12-31T20:00:00.000Z" | 300      |

### `evolution` step

Use this step if you need to compute the row-by-row evolution of a value column, based on a date column. It will output 2 columns: one for the evolution in absolute value, the other for the evolution in percentage.

You must be careful that the computation is scoped so that there are no dates duplicates (so that any date finds no more than one previous date). That means that you may need to specify "group by" columns to make any date unique inside each group. You should specify those columns in the `indexColumns` parameter.

```json
{
  "name": "evolution",
  "dateCol": "DATE",
  "valueCol": "VALUE",
  "evolutionType": "vsLastYear",
  "evolutionFormat": "abs",
  "indexColumns": [ "COUNTRY"],
  "newColumn": "MY_EVOL"
}

```

#### Example 1: Basic configuration - evolution in absolute value

**Input dataset:**

| DATE    | VALUE |
| ------- | ----- |
| 2019-06 | 79    |
| 2019-07 | 81    |
| 2019-08 | 77    |
| 2019-09 | 75    |
| 2019-11 | 78    |
| 2019-12 | 88    |

**Step configuration:**

```json
{
  "name": "evolution",
  "dateCol": "DATE",
  "valueCol": "VALUE",
  "evolutionType": "vsLastMonth",
  "evolutionFormat": "abs",
  "indexColumns": []
}

```

**Output dataset:**

| DATE    | VALUE | VALUE\_EVOL\_ABS |
| ------- | ----- | ---------------- |
| 2019-06 | 79    |                  |
| 2019-07 | 81    | 2                |
| 2019-08 | 77    | -4               |
| 2019-09 | 75    | -2               |
| 2019-11 | 78    |                  |
| 2019-12 | 88    | 10               |

#### Example 2: Basic configuration - evolution in percentage

**Input dataset:**

| DATE    | VALUE |
| ------- | ----- |
| 2019-06 | 79    |
| 2019-07 | 81    |
| 2019-08 | 77    |
| 2019-09 | 75    |
| 2019-11 | 78    |
| 2019-12 | 88    |

**Step configuration:**

```json
{
  "name": "evolution",
  "dateCol": "DATE",
  "valueCol": "VALUE",
  "evolutionType": "vsLastMonth",
  "evolutionFormat": "pct",
  "indexColumns": []
}

```

**Output dataset:**

| DATE    | VALUE | VALUE\_EVOL\_PCT      |
| ------- | ----- | --------------------- |
| 2019-06 | 79    |                       |
| 2019-07 | 81    | 0.02531645569620253   |
| 2019-08 | 77    | -0.04938271604938271  |
| 2019-09 | 75    | -0.025974025974025976 |
| 2019-11 | 78    |                       |
| 2019-12 | 88    | 0.1282051282051282    |

#### Example 3: Error on duplicate dates

If 'COUNTRY' is not specified as indexColumn, the computation will not be scoped by country. Then there are duplicate dates in the "DATE" columns which is prohibited and will lead to an error.

**Input dataset:**

| DATE    | COUNTRY | VALUE |
| ------- | ------- | ----- |
| 2014-12 | France  | 79    |
| 2015-12 | France  | 81    |
| 2016-12 | France  | 77    |
| 2017-12 | France  | 75    |
| 2014-12 | USA     | 74    |
| 2015-12 | USA     | 74    |
| 2016-12 | USA     | 73    |
| 2017-12 | USA     | 72    |

**Step configuration:**

```json
{
  "name": "evolution",
  "dateCol": "DATE",
  "valueCol": "VALUE",
  "evolutionType": "vsLastYear",
  "evolutionFormat": "abs",
  "indexColumns": []
}

```

**Output dataset:**

| DATE    | COUNTRY | VALUE | MY\_EVOL  |
| ------- | ------- | ----- | --------- |
| 2014-12 | France  | 79    |           |
| 2015-12 | France  | 81    | Error ... |
| 2016-12 | France  | 77    | Error ... |
| 2017-12 | France  | 75    | Error ... |
| 2014-12 | USA     | 74    |           |
| 2015-12 | USA     | 74    | Error ... |
| 2016-12 | USA     | 73    | Error ... |
| 2017-12 | USA     | 72    | Error ... |

#### Example 4: Complete configuration with index columns

**Input dataset:**

| DATE    | COUNTRY | VALUE |
| ------- | ------- | ----- |
| 2014-12 | France  | 79    |
| 2015-12 | France  | 81    |
| 2016-12 | France  | 77    |
| 2017-12 | France  | 75    |
| 2019-12 | France  | 78    |
| 2020-12 | France  | 88    |
| 2014-12 | USA     | 74    |
| 2015-12 | USA     | 74    |
| 2016-12 | USA     | 73    |
| 2017-12 | USA     | 72    |
| 2018-11 | USA     | 75    |
| 2020-12 | USA     | 76    |

**Step configuration:**

```json
{
  "name": "evolution",
  "dateCol": "DATE",
  "valueCol": "VALUE",
  "evolutionType": "vsLastYear",
  "evolutionFormat": "abs",
  "indexColumns": [ "COUNTRY"],
  "newColumn": "MY_EVOL"
}

```

**Output dataset:**

| DATE    | COUNTRY | VALUE | MY\_EVOL |
| ------- | ------- | ----- | -------- |
| 2014-12 | France  | 79    |          |
| 2015-12 | France  | 81    | 2        |
| 2016-12 | France  | 77    | -4       |
| 2017-12 | France  | 75    | -2       |
| 2019-12 | France  | 78    |          |
| 2020-12 | France  | 88    | 10       |
| 2014-12 | USA     | 74    |          |
| 2015-12 | USA     | 74    | 0        |
| 2016-12 | USA     | 73    | -1       |
| 2017-12 | USA     | 72    | -1       |
| 2018-11 | USA     | 75    | 3        |
| 2020-12 | USA     | 76    |          |

### `fillna` step

Replace null values by a given value in specified columns.

```json
{
    "name": "fillna",
    "columns": ["foo", "bar"],
    "value": 0
}

```

#### Example

**Input dataset:**

| Company   | Group   | Value | KPI |
| --------- | ------- | ----- | --- |
| Company 1 | Group 1 | 13    |     |
| Company 2 | Group 1 |       | 12  |
| Company 3 | Group 1 | 20    | 40  |
| Company 4 | Group 2 | 1     |     |
| Company 5 | Group 2 |       | 38  |
| Company 6 | Group 2 | 5     | 4   |

**Step configuration:**

```json
{
  "name": "fillna",
  "columns": ["Value", "KPI"],
  "value": 0
}

```

**Output dataset:**

| Company   | Group   | Value | KPI |
| --------- | ------- | ----- | --- |
| Company 1 | Group 1 | 13    | 0   |
| Company 2 | Group 1 | 0     | 12  |
| Company 3 | Group 1 | 20    | 40  |
| Company 4 | Group 2 | 1     | 0   |
| Company 5 | Group 2 | 0     | 38  |
| Company 6 | Group 2 | 5     | 4   |

### `filter` step

Filter out lines that don't match a filter definition.

```json
{
    "name": "filter",
    "condition": {
      "column": "my-column",
      "value": 42,
      "operator": "ne"
    }
}

```

`operator` is optional, and defaults to `eq`. Allowed operators are `eq`, `ne`, `gt`, `ge`, `lt`, `le`, `in`, `nin`, `matches`, `notmatches` `isnull` or `notnull`.

`value` can be an arbitrary value depending on the selected operator (e.g a list when used with the `in` operator, or `null` when used with the `isnull` operator).

`matches` and `notmatches` operators are used to test value against a regular expression.

Conditions can be grouped and nested with logical operators `and` and `or`.

```json
{
    "name": "filter",
    "condition": {
      "and": [
        {
          "column": "my-column",
          "value": 42,
          "operator": "gte"
        },
        {
          "column": "my-column",
          "value": 118,
          "operator": "lte"
        },
        {
          "or": [
            {
              "column": "my-other-column",
              "value": "blue",
              "operator": "eq"
            },
            {
              "column": "my-other-column",
              "value": "red",
              "operator": "eq"
            }
          ]
        }
      ]
    }
}

```

#### Relative dates

Date values can be relative to the moment to the moment when the query is executed. This is expressed by using a `RelativeDate` object instead of the value, of the form:

```json
{
  "quantity": Number,
  "duration": "year" | "quarter" | "month" | "week" | "day"
}

```

### `formula` step

Add a computation based on a formula. Usually column names do not need to be escaped, unless they include whitespaces, in which case you'll need to use brackets '\[]' (e.g. \[myColumn]). Any string escaped with quotes (`'`, `"`, `'''`, `"""`) will be considered a string literal.

```json
{
  {
    "name": "formula",
    "newColumn": "result",
    "formula": "(Value1 + Value2) / Value3 - Value4 * 2"
  }
}

```

#### Supported operators

The following operators are supported by the formula step (note that a value can be a column name or a literal, such as `42` or `foo`).

* `+`: Does an addition of two numeric values. **See the `concatenate` step to append strings**
* `-`: Does an substraction of two numeric values. **See the `replace` step to remove a part of a string**
* `*`: Multiplies two numeric values.
* `/`: Divides a numeric value by another. Divisions by zero will return `null`.
* `%`: Returns the rest of an integer division. Divisions by zero will return `null`.

#### Example 1: Basic usage

**Input dataset:**

| Label   | Value1 | Value2 | Value3 | Value4 |
| ------- | ------ | ------ | ------ | ------ |
| Label 1 | 10     | 2      | 3      | 1      |
| Label 2 | 1      | 13     | 7      | 3      |
| Label 3 | 5      | 20     | 5      | 2      |

**Step configuration:**

```json
{
  "name": "formula",
  "newColumn": "Result",
  "formula": "(Value1 + Value2) / Value3 - Value4 * 2"
}

```

**Output dataset:**

| Label   | Value1 | Value2 | Value3 | Value4 | Result |
| ------- | ------ | ------ | ------ | ------ | ------ |
| Label 1 | 10     | 2      | 3      | 1      | 2      |
| Label 2 | 1      | 13     | 7      | 3      | -4     |
| Label 3 | 5      | 20     | 5      | 2      | 1      |

#### Example 2: Column name with whitespaces

**Input dataset:**

| Label   | Value1 | Value2 | Value3 | Value 4 |
| ------- | ------ | ------ | ------ | ------- |
| Label 1 | 10     | 2      | 3      | 1       |
| Label 2 | 1      | 13     | 7      | 3       |
| Label 3 | 5      | 20     | 5      | 2       |

**Step configuration:**

```json
{
  "name": "formula",
  "newColumn": "Result",
  "formula": "(Value1 + Value2) / Value3 - [Value 4] * 2"
}

```

**Output dataset:**

| Label   | Value1 | Value2 | Value3 | Value 4 | Result |
| ------- | ------ | ------ | ------ | ------- | ------ |
| Label 1 | 10     | 2      | 3      | 1       | 2      |
| Label 2 | 1      | 13     | 7      | 3       | -4     |
| Label 3 | 5      | 20     | 5      | 2       | 1      |

### `ifthenelse` step

Creates a new column, which values will depend on a condition expressed on existing columns.

The condition is expressed in the `if` parameter with a condition object, which is the same object expected by the `condition` parameter of the [filter step](#filter-step)). Conditions can be grouped and nested with logical operators `and` and `or`.

The `then` parameter only supports a string, that will be interpreted as a formula (cf. [formula step](#formula-step)). If you want it to be interpreted striclty as a string and not a formula, you must escape the string with quotes (e.g. '"this is a text"').

`if...then...else` blocks can be nested as the `else` parameter supports either a string that will be interpreted as a formula (cf. [formula step](#formula-step)), or a nested if `if...then...else` object.

```json
{
  "name": "ifthenelse",
  "newColumn": "",
  "if": { "column": "", "value": "", "operator": "eq" },
  "then": "",
  "else": ""
}

```

#### Example

**Input dataset:**

| Label   | number |
| ------- | ------ |
| Label 1 | -2     |
| Label 2 | 2      |
| Label 3 | 0      |

**Step configuration:**

```json
{
    "name": "ifthenelse",
    "newColumn": "result",
    "if": { "column": "number", "value": 0, "operator": "eq" },
    "then": ""zero""
    "else": {
      "if": { "column": "rel", "value": 0, "operator": "lt" },
      "then": "number * -1",
      "else": "number"
    }
}

```

**Output dataset:**

| Label   | number | result |
| ------- | ------ | ------ |
| Label 1 | -2     | 2      |
| Label 2 | 5      | 5      |
| Label 3 | 0      | zero   |

### `join` step

Joins a dataset to the current dataset, i.e. brings columns from the former into the latter, and matches rows based `on` columns correspondance. It is similar to a `JOIN` clause in SQL, or to a `VLOOKUP` in excel. The joined dataset is the result from the query of the `right_pipeline`.

The join type can be:

* 'left': will keep every row of the current dataset and fill unmatched rows with `null` values,
* 'inner': will only keep rows that match rows of the joined dataset.

In the `on` parameter, you must specify 1 or more column couple(s) that will be compared to determine rows correspondance between the 2 datasets. The first element of a couple is for the current dataset column, and the second for the corresponding column in the right dataset to be joined. If you specify more than 1 couple, the matching rows will be those that find a correspondance between the 2 datasets for every column couple specified (logical 'AND').

Weaverbird allows you to save `pipelines` referenced by name in the Vuex store of the application. You can then call them by their unique names in this step.

```json
{
  "name": "join",
  "right": {
     "source": {
        "table": {
          "schema": "other_schema",
          "name": "other_table", 
          "columns": ["first_name", "age", "department"],
        }
     },
     "steps": []
  },
  "type": "left",
  "on": [
    [ "currentDatasetColumn1", "rightDatasetColumn1"],
    [ "currentDatasetColumn2", "rightDatasetColumn2"]
  ]
}

```

#### Example 1: Left join with one column couple as `on` parameter

**Input dataset:**

| Label   | Value |
| ------- | ----- |
| Label 1 | 13    |
| Label 2 | 7     |
| Label 3 | 20    |
| Label 4 | 1     |
| Label 5 | 1     |
| Label 6 | 1     |

**rightDataset (saved in the application Vuex store)**:

| Label   | Group   |
| ------- | ------- |
| Label 1 | Group 1 |
| Label 2 | Group 1 |
| Label 3 | Group 2 |
| Label 4 | Group 2 |

**Step configuration:**

```json
{
  "name": "join",
  "right": {
     "source": {
        "table": {
          "schema": "other_schema",
          "name": "other_table",     
          "columns": ["first_name", "age", "department"],
        }
     },
     "steps": []
  },
  "type": "left",
  "on": [["Label", "Label"]];
}

```

**Output dataset:**

| Label   | Value | Group   |
| ------- | ----- | ------- |
| Label 1 | 13    | Group 1 |
| Label 2 | 7     | Group 1 |
| Label 3 | 20    | Group 2 |
| Label 4 | 1     | Group 2 |
| Label 5 | 1     |         |
| Label 6 | 1     |         |

#### Example 2: inner join with different column names in the `on` parameter

**Input dataset:**

| Label   | Value |
| ------- | ----- |
| Label 1 | 13    |
| Label 2 | 7     |
| Label 3 | 20    |
| Label 4 | 1     |
| Label 5 | 1     |
| Label 6 | 1     |

**rightDataset (saved in the application Vuex store)**:

| LabelRight | Group   |
| ---------- | ------- |
| Label 1    | Group 1 |
| Label 2    | Group 1 |
| Label 3    | Group 2 |
| Label 4    | Group 2 |

**Step configuration:**

```json
{
  "name": "join",
  "right": {
     "source": {
        "table": {
          "schema": "other_schema",
          "name": "other_table",     
          "columns": ["first_name", "age", "department"],
        }
     },
     "steps": []
  },
  "type": "inner",
  "on": [["Label", "LabelRight"]];
}

```

**Output dataset:**

| Label   | Value | LabelRight | Group   |
| ------- | ----- | ---------- | ------- |
| Label 1 | 13    | Label 1    | Group 1 |
| Label 2 | 7     | Label 2    | Group 1 |
| Label 3 | 20    | Label 3    | Group 2 |
| Label 4 | 1     | Label 4    | Group 2 |

### `fromdate` step

Converts a date `column` into a string column based on a specified `format`.

```json
{
    "name": "fromdate",
    "column": "myDateColumn"
    "format": "%Y-%m-%d"

}

```

#### Example

**Input dataset:**

| Company   | Date                     | Value |
| --------- | ------------------------ | ----- |
| Company 1 | 2019-10-06T00:00:00.000Z | 13    |
| Company 1 | 2019-10-07T00:00:00.000Z | 7     |
| Company 1 | 2019-10-08T00:00:00.000Z | 20    |
| Company 2 | 2019-10-06T00:00:00.000Z | 1     |
| Company 2 | 2019-10-07T00:00:00.000Z | 10    |
| Company 2 | 2019-10-08T00:00:00.000Z | 5     |

**Step configuration:**

```json
{
  "name": "fromdate",
  "column": "Date",
  "format": "%d/%m/%Y"
}

```

**Output dataset:**

| Company   | Date       | Value |
| --------- | ---------- | ----- |
| Company 1 | 06/10/2019 | 13    |
| Company 1 | 07/10/2019 | 7     |
| Company 1 | 08/10/2019 | 20    |
| Company 2 | 06/10/2019 | 1     |
| Company 2 | 07/10/2019 | 10    |
| Company 2 | 08/10/2019 | 5     |

### `lowercase` step

Converts a string `column` to lowercase.

```json
{
  "name": "lowercase",
  "column": "foo"
}

```

#### Example:

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| LABEL 1 | Group 1 | 13    |
| LABEL 2 | Group 1 | 7     |
| LABEL 3 | Group 1 | 20    |

**Step configuration:**

```json
{
  "name": "lowercase",
  "column": "Label"
}

```

**Output dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| label 1 | Group 1 | 13    |
| label 2 | Group 1 | 7     |
| label 3 | Group 1 | 20    |

### `movingaverage` step

Compute the moving average based on a value column, a reference column to sort (usually a date column) and a moving window (in number of rows i.e. data points). If needed, the computation can be performed by group of rows. The computation result is added in a new column.

```json
{
  "name": "movingaverage",
  "valueColumn": "value",
  "columnToSort": "dates",
  "movingWindow": 12,
  "groups": [ "foo", "bar"],
  "newColumnName": "myNewColumn"
}

```

#### Example 1: Basic usage

**Input dataset:**

| DATE       | VALUE |
| ---------- | ----- |
| 2018-01-01 | 75    |
| 2018-01-02 | 80    |
| 2018-01-03 | 82    |
| 2018-01-04 | 83    |
| 2018-01-05 | 80    |
| 2018-01-06 | 86    |
| 2018-01-07 | 79    |
| 2018-01-08 | 76    |

**Step configuration:**

```json
{
  "name": "movingaverage",
  "valueColumn": "VALUE",
  "columnToSort": "DATE",
  "movingWindow": 2
}

```

**Output dataset:**

| DATE       | VALUE | VALUE\_MOVING\_AVG |
| ---------- | ----- | ------------------ |
| 2018-01-01 | 75    | null               |
| 2018-01-02 | 80    | 77.5               |
| 2018-01-03 | 82    | 81                 |
| 2018-01-04 | 83    | 82.5               |
| 2018-01-05 | 80    | 81.5               |
| 2018-01-06 | 86    | 83                 |
| 2018-01-07 | 79    | 82.5               |
| 2018-01-08 | 76    | 77.5               |

#### Example 2: with groups and custom newColumnName

**Input dataset:**

| COUNTRY | DATE       | VALUE |
| ------- | ---------- | ----- |
| France  | 2018-01-01 | 75    |
| France  | 2018-01-02 | 80    |
| France  | 2018-01-03 | 82    |
| France  | 2018-01-04 | 83    |
| France  | 2018-01-05 | 80    |
| France  | 2018-01-06 | 86    |
| USA     | 2018-01-01 | 69    |
| USA     | 2018-01-02 | 73    |
| USA     | 2018-01-03 | 73    |
| USA     | 2018-01-04 | 75    |
| USA     | 2018-01-05 | 70    |
| USA     | 2018-01-06 | 76    |

**Step configuration:**

```json
{
  "name": "movingaverage",
  "valueColumn": "VALUE",
  "columnToSort": "DATE",
  "movingWindow": 2,
  "groups": [ "COUNTRY"],
  "newColumnName": "ROLLING_AVERAGE"
}

```

**Output dataset:**

| COUNTRY | DATE       | VALUE | ROLLING\_AVERAGE |
| ------- | ---------- | ----- | ---------------- |
| France  | 2018-01-01 | 75    | null             |
| France  | 2018-01-02 | 80    | null             |
| France  | 2018-01-03 | 82    | 79               |
| France  | 2018-01-04 | 83    | 81.7             |
| France  | 2018-01-05 | 80    | 81.7             |
| France  | 2018-01-06 | 86    | 83               |
| USA     | 2018-01-01 | 69    | null             |
| USA     | 2018-01-02 | 73    | null             |
| USA     | 2018-01-03 | 73    | 71.7             |
| USA     | 2018-01-04 | 75    | 73.7             |
| USA     | 2018-01-05 | 70    | 72.7             |
| USA     | 2018-01-06 | 76    | 73.7             |

### `percentage` step

Compute the percentage of total, i.e. for every row the value in `column` divided by the total as the sum of every values in `column`. The computation can be performed by `group` if specified. The result is written in a new column.

```json
{
  "name": "percentage",
  "column": "bar",
  "group": [ "foo"],
  "newColumnName": "myNewColumn"
}

```

#### Example:

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 5     |
| Label 2 | Group 1 | 10    |
| Label 3 | Group 1 | 15    |
| Label 4 | Group 2 | 2     |
| Label 5 | Group 2 | 7     |
| Label 6 | Group 2 | 5     |

**Step configuration:**

```json
{
  "name": "percentage",
  "newColumn": "Percentage_of_total",
  "column": "Value",
  "group": [ "Group"],
  "newColumn": "Percentage"
}

```

**Output dataset:**

| Label   | Group   | Value | Percentage |
| ------- | ------- | ----- | ---------- |
| Label 1 | Group 1 | 5     | 0.167      |
| Label 2 | Group 1 | 10    | 0.333      |
| Label 3 | Group 1 | 15    | 0.5        |
| Label 4 | Group 2 | 2     | 0.143      |
| Label 5 | Group 2 | 7     | 0.5        |
| Label 6 | Group 2 | 5     | 0.357      |

### `pivot` step

Pivot rows into columns around a given `index` (expressed as a combination of column(s)). Values to be used as new column names are found in the column `column_to_pivot`. Values to populate new columns are found in the column `value_column`. The function used to aggregate data (when several rows are found by index group) must be among `sum`, `avg`, `count`, `min` or `max`.

```json
{
 "name": "pivot",
 "index": [ "column_1", "column_2"],
 "columnToPivot": "column_3",
 "valueColumn": "column_4",
 "aggFunction": "sum"
}

```

#### Example:

**Input dataset:**

| Label   | Country  | Value |
| ------- | -------- | ----- |
| Label 1 | Country1 | 13    |
| Label 2 | Country1 | 7     |
| Label 3 | Country1 | 20    |
| Label 1 | Country2 | 1     |
| Label 2 | Country2 | 10    |
| Label 3 | Country2 | 5     |
| label 3 | Country2 | 1     |

**Step configuration:**

```json
{
 "name": "pivot",
 "index": [ "Label"],
 "columnToPivot": "Country",
 "valueColumn": "Value",
 "aggFunction": "sum"
}

```

**Output dataset:**

| Label   | Country1 | Country2 |
| ------- | -------- | -------- |
| Label 1 | 13       | 1        |
| Label 2 | 7        | 10       |
| Label 3 | 20       | 6        |

### `statistics` step

Compute statistics of a column.,

```json
{
    "name": "statistics",
    "column": "Value",
    "groupby": [],
    "statistics": [ "average", "count"],
    "quantiles": [{"label": "median", "nth": 1, "order": 2}]
}

```

#### Example:

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

**Step configuration:**

```json
{
    "name": "statistics",
    "column": "Value",
    "groupby": [],
    "statistics": [ "average", "count"],
    "quantiles": [{"label": "median", "nth": 1, "order": 2}]
}

```

**Output dataset:**

| average | count | median |
| ------- | ----- | ------ |
| 9.33333 | 6     | 8.5    |

### `rank` step

This step allows to compute a rank column based on a value column that can be sorted in ascending or descending order. The ranking can be computed by group.

There are 2 ranking methods available, that you will understand easily through those examples:

* `standard`: input = \[10, 20, 20, 20, 25, 25, 30] => ranking = \[1, 2, 2, 2, 5, 5, 7]
* `dense`: input = \[10, 20, 20, 20, 25, 25, 30] => ranking = \[1, 2, 2, 2, 3, 3, 4]

(The `dense` method is basically the same as the `standard` method, but rank always increases by 1 at most).

```json
{
  "name": "rank",
  "valueCol": "VALUE",
  "order": "desc",
  "method": "standard",
  "groupby": [ "foo", "bar"],
  "newColumnName": "columnA"
}

```

#### Example 1: Basic usage

**Input dataset:**

| COUNTRY | VALUE |
| ------- | ----- |
| FRANCE  | 15    |
| FRANCE  | 5     |
| FRANCE  | 10    |
| FRANCE  | 20    |
| FRANCE  | 10    |
| FRANCE  | 15    |
| USA     | 20    |
| USA     | 30    |
| USA     | 20    |
| USA     | 25    |
| USA     | 15    |
| USA     | 20    |

**Step configuration:**

```json
{
  "name": "rank",
  "valueCol": "VALUE",
  "order": "desc",
  "method": "standard"
}

```

**Output dataset:**

| COUNTRY | VALUE | VALUE\_RANK |
| ------- | ----- | ----------- |
| USA     | 30    | 1           |
| USA     | 25    | 2           |
| FRANCE  | 20    | 3           |
| USA     | 20    | 3           |
| USA     | 20    | 3           |
| USA     | 20    | 3           |
| FRANCE  | 15    | 7           |
| FRANCE  | 15    | 7           |
| USA     | 15    | 7           |
| FRANCE  | 10    | 10          |
| FRANCE  | 10    | 10          |
| FRANCE  | 5     | 12          |

#### Example 2: With more options

**Input dataset:**

| COUNTRY | VALUE |
| ------- | ----- |
| FRANCE  | 15    |
| FRANCE  | 5     |
| FRANCE  | 10    |
| FRANCE  | 20    |
| FRANCE  | 10    |
| FRANCE  | 15    |
| USA     | 20    |
| USA     | 30    |
| USA     | 20    |
| USA     | 25    |
| USA     | 15    |
| USA     | 20    |

**Step configuration:**

```json
{
  "name": "rank",
  "valueCol": "VALUE",
  "order": "asc",
  "method": "dense",
  "groupby": [ "COUNTRY"],
  "newColumnName": "MY_RANK"
}

```

**Output dataset:**

| COUNTRY | VALUE | MY\_RANK |
| ------- | ----- | -------- |
| FRANCE  | 5     | 1        |
| FRANCE  | 10    | 2        |
| FRANCE  | 10    | 2        |
| FRANCE  | 15    | 3        |
| FRANCE  | 15    | 3        |
| FRANCE  | 20    | 4        |
| USA     | 15    | 1        |
| USA     | 20    | 2        |
| USA     | 20    | 2        |
| USA     | 20    | 2        |
| USA     | 25    | 3        |
| USA     | 30    | 4        |

### `rename` step

Rename one or several columns. The `toRename` parameter takes as input a list of 2-elements lists in the form \['oldColumnName', 'newColumnName'].

```json
{
    "name": "rename",
    "toRename": [
      [ "oldCol1", "newCol1"]
      [ "oldCol2", "newCol2"]
    ]
}

```

#### Example:

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

**Step configuration:**

```json
{
  "name": "rename",
  "toRename": [["Label", "Company"]]
}

```

**Output dataset:**

| Company | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

### `replace` step

Replace one or several values in a column.

A replace step has the following strucure:

```json
{
   "name": "replace",
   "searchColumn": "column_1",
   "toReplace": [
     [ "foo", "bar"],
     [42, 0]
   ]
}

```

#### Example

**Input dataset:**

| COMPANY   | COUNTRY |
| --------- | ------- |
| Company 1 | Fr      |
| Company 2 | UK      |

**Step configuration:**

```json
{
   "name": "replace",
   "searchColumn": "COUNTRY",
   "toReplace": [
     [ "Fr", "France"],
     [ "UK", "United Kingdom"]
   ]
}

```

**Output dataset:**

| COMPANY   | COUNTRY        |
| --------- | -------------- |
| Company 1 | France         |
| Company 2 | United Kingdom |

### `replacetext` step

Replace a substring in a column.

A replace-text step has the following structure:

```json
{
   "name": "replacetext",
   "searchColumn": "column_1",
   "oldStr": "foo",
   "newStr": "bar"
}

```

#### Example

**Input dataset:**

| COMPANY   | COUNTRY      |
| --------- | ------------ |
| Company 1 | Fr is boring |
| Company 2 | UK           |

**Step configuration:**

```json
{
   "name": "replacetext",
   "searchColumn": "COUNTRY",
   "oldStr": "Fr",
   "newStr": "France"
}

```

**Output dataset:**

| COMPANY   | COUNTRY          |
| --------- | ---------------- |
| Company 1 | France is boring |
| Company 2 | UK               |

### `rollup` step

Use this step if you need to compute aggregated data at every level of a `hierarchy`, specified as a series of columns from top to bottom level. The output data structure stacks the data of every level of the hierarchy, specifying for every row the label, level and parent in dedicated columns.

Aggregated rows can be computed with using either sum, average, count, count distinct, min, max, first or last.

```json
{
   "name": "rollup",
   "hierarchy": [ "continent", "country", "city"],
   "aggregations": [
   {
      "newcolumns": [ "sum_value1", "sum_value2"],
      "aggfunction": "sum",
      "columns": [ "value1", "value2"]
    }
    {
      "newcolumns": [ "avg_value1"],
      "aggfunction": "avg",
      "columns": [ "value1"]
    }
   ],
   "groupby": [ "date"],
   "labelCol": "label",
   "levelCol": "level",
   "childLevelCol": "child_level",
   "parentLabelCol": "parent"
}

```

#### Example 1 : Basic configuration

**Input dataset:**

| CITY      | COUNTRY | CONTINENT     | YEAR | VALUE |
| --------- | ------- | ------------- | ---- | ----- |
| Paris     | France  | Europe        | 2018 | 10    |
| Bordeaux  | France  | Europe        | 2018 | 5     |
| Barcelona | Spain   | Europe        | 2018 | 8     |
| Madrid    | Spain   | Europe        | 2018 | 3     |
| Boston    | USA     | North America | 2018 | 12    |
| New-York  | USA     | North America | 2018 | 21    |
| Montreal  | Canada  | North America | 2018 | 10    |
| Ottawa    | Canada  | North America | 2018 | 7     |
| Paris     | France  | Europe        | 2019 | 13    |
| Bordeaux  | France  | Europe        | 2019 | 8     |
| Barcelona | Spain   | Europe        | 2019 | 11    |
| Madrid    | Spain   | Europe        | 2019 | 6     |
| Boston    | USA     | North America | 2019 | 15    |
| New-York  | USA     | North America | 2019 | 24    |
| Montreal  | Canada  | North America | 2019 | 10    |
| Ottawa    | Canada  | North America | 2019 | 13    |

**Step configuration:**

```json
{
   "name": "rollup",
   "hierarchy": [ "CONTINENT", "COUNTRY", "CITY"],
   "aggregations": [
    {
      "newcolumns": [ "VALUE"],
      "aggfunction": "sum",
      "columns": [ "VALUE"]
    }
   ]
}

```

**Output dataset:**

| CITY      | COUNTRY | CONTINENT     | label         | level     | child\_level | parent        | VALUE |
| --------- | ------- | ------------- | ------------- | --------- | ------------ | ------------- | ----- |
|           |         | Europe        | Europe        | CONTINENT | COUNTRY      |               | 64    |
|           |         | North America | North America | CONTINENT | COUNTRY      |               | 112   |
|           | France  | Europe        | France        | COUNTRY   | CITY         | Europe        | 36    |
|           | Spain   | Europe        | Spain         | COUNTRY   | CITY         | Europe        | 28    |
|           | USA     | North America | USA           | COUNTRY   | CITY         | North America | 72    |
|           | Canada  | North America | Canada        | COUNTRY   | CITY         | North America | 40    |
| Paris     | France  | Europe        | Paris         | CITY      |              | France        | 23    |
| Bordeaux  | France  | Europe        | Bordeaux      | CITY      |              | France        | 13    |
| Barcelona | Spain   | Europe        | Barcelona     | CITY      |              | Spain         | 19    |
| Madrid    | Spain   | Europe        | Madrid        | CITY      |              | Spain         | 9     |
| Boston    | USA     | North America | Boston        | CITY      |              | USA           | 27    |
| New-York  | USA     | North America | New-York      | CITY      |              | USA           | 45    |
| Montreal  | Canada  | North America | Montreal      | CITY      |              | Canada        | 20    |
| Ottawa    | Canada  | North America | Ottawa        | CITY      |              | Canada        | 20    |

#### Example 2 : Configuration with optional parameters

**Input dataset:**

| CITY      | COUNTRY | CONTINENT     | YEAR | VALUE | COUNT |
| --------- | ------- | ------------- | ---- | ----- | ----- |
| Paris     | France  | Europe        | 2018 | 10    | 1     |
| Bordeaux  | France  | Europe        | 2018 | 5     | 1     |
| Barcelona | Spain   | Europe        | 2018 | 8     | 1     |
| Madrid    | Spain   | Europe        | 2018 | 3     | 1     |
| Boston    | USA     | North America | 2018 | 12    | 1     |
| New-York  | USA     | North America | 2018 | 21    | 1     |
| Montreal  | Canada  | North America | 2018 | 10    | 1     |
| Ottawa    | Canada  | North America | 2018 | 7     | 1     |
| Paris     | France  | Europe        | 2019 | 13    | 1     |
| Bordeaux  | France  | Europe        | 2019 | 8     | 1     |
| Barcelona | Spain   | Europe        | 2019 | 11    | 1     |
| Madrid    | Spain   | Europe        | 2019 | 6     | 1     |
| Boston    | USA     | North America | 2019 | 15    | 1     |
| New-York  | USA     | North America | 2019 | 24    | 1     |
| Montreal  | Canada  | North America | 2019 | 10    | 1     |
| Ottawa    | Canada  | North America | 2019 | 13    | 1     |

**Step configuration:**

```json
{
   "name": "rollup",
   "hierarchy": [ "CONTINENT", "COUNTRY", "CITY"],
   "aggregations": [
    {
      "newcolumns": [ "VALUE-sum", "COUNT"],
      "aggfunction": "sum",
      "columns": [ "VALUE", "COUNT"]
    },
    {
      "newcolumns": [ "VALUE-avg"],
      "aggfunction": "avg",
      "columns": [ "VALUE"]
    }
   ],
   "groupby": [ "YEAR"],
   "labelCol": "MY_LABEL",
   "levelCol": "MY_LEVEL",
   "childLevelCol": "MY_CHILD_LEVEL",
   "parentLabelCol": "MY_PARENT"
}

```

**Output dataset:**

| CITY      | COUNTRY | CONTINENT     | YEAR | MY\_LABEL     | MY\_LEVEL | MY\_CHILD\_LEVEL | MY\_PARENT    | VALUE-sum | VALUE-avg | COUNT |
| --------- | ------- | ------------- | ---- | ------------- | --------- | ---------------- | ------------- | --------- | --------- | ----- |
|           |         | North America | 2018 | Europe        | CONTINENT | COUNTRY          |               | 26        | 6.5       | 4     |
|           |         | North America | 2018 | North America | CONTINENT | COUNTRY          |               | 50        | 12.5      | 4     |
|           | France  | Europe        | 2018 | France        | COUNTRY   | CITY             | Europe        | 15        | 7.5       | 2     |
|           | Spain   | Europe        | 2018 | Spain         | COUNTRY   | CITY             | Europe        | 11        | 5.5       | 2     |
|           | USA     | North America | 2018 | USA           | COUNTRY   | CITY             | North America | 33        | 16.5      | 2     |
|           | Canada  | North America | 2018 | Canada        | COUNTRY   | CITY             | North America | 17        | 8.5       | 2     |
| Paris     | France  | Europe        | 2018 | Paris         | CITY      |                  | France        | 10        | 10        | 1     |
| Bordeaux  | France  | Europe        | 2018 | Bordeaux      | CITY      |                  | France        | 5         | 5         | 1     |
| Barcelona | Spain   | Europe        | 2018 | Barcelona     | CITY      |                  | Spain         | 8         | 8         | 1     |
| Madrid    | Spain   | Europe        | 2018 | Madrid        | CITY      |                  | Spain         | 3         | 3         | 1     |
| Boston    | USA     | North America | 2018 | Boston        | CITY      |                  | USA           | 12        | 12        | 1     |
| New-York  | USA     | North America | 2018 | New-York      | CITY      |                  | USA           | 21        | 21        | 1     |
| Montreal  | Canada  | North America | 2018 | Montreal      | CITY      |                  | Canada        | 10        | 10        | 1     |
| Ottawa    | Canada  | North America | 2018 | Ottawa        | CITY      |                  | Canada        | 7         | 7         | 1     |
|           |         | North America | 2019 | Europe        | CONTINENT | COUNTRY          |               | 38        | 9.5       | 4     |
|           |         | North America | 2019 | North America | CONTINENT | COUNTRY          |               | 62        | 15.5      | 4     |
|           | France  | Europe        | 2019 | France        | COUNTRY   | CITY             | Europe        | 21        | 10.5      | 2     |
|           | Spain   | Europe        | 2019 | Spain         | COUNTRY   | CITY             | Europe        | 17        | 8.5       | 2     |
|           | USA     | North America | 2019 | USA           | COUNTRY   | CITY             | North America | 39        | 19.5      | 2     |
|           | Canada  | North America | 2019 | Canada        | COUNTRY   | CITY             | North America | 23        | 11.5      | 2     |
| Paris     | France  | Europe        | 2019 | Paris         | CITY      |                  | France        | 13        | 13        | 1     |
| Bordeaux  | France  | Europe        | 2019 | Bordeaux      | CITY      |                  | France        | 8         | 8         | 1     |
| Barcelona | Spain   | Europe        | 2019 | Barcelona     | CITY      |                  | Spain         | 11        | 11        | 1     |
| Madrid    | Spain   | Europe        | 2019 | Madrid        | CITY      |                  | Spain         | 6         | 6         | 1     |
| Boston    | USA     | North America | 2019 | Boston        | CITY      |                  | USA           | 15        | 15        | 1     |
| New-York  | USA     | North America | 2019 | New-York      | CITY      |                  | USA           | 24        | 24        | 1     |
| Montreal  | Canada  | North America | 2019 | Montreal      | CITY      |                  | Canada        | 10        | 10        | 1     |
| Ottawa    | Canada  | North America | 2019 | Ottawa        | CITY      |                  | Canada        | 13        | 13        | 1     |

### `select` step

Select a column. The default is to keep every columns of the input domain. If the `select` is used, it will only keep selected columns in the output.

```json
{
    "name": "select",
    "columns": [ "my-column", "some-other-column"]
}

```

#### Example

**Input dataset:**

| Company   | Group   | Value | Label               |
| --------- | ------- | ----- | ------------------- |
| Company 1 | Group 1 | 13    | Company 1 - Group 1 |
| Company 2 | Group 1 | 7     | Company 2 - Group 1 |
| Company 3 | Group 1 | 20    | Company 3 - Group 1 |
| Company 4 | Group 2 | 1     | Company 4 - Group 2 |
| Company 5 | Group 2 | 10    | Company 5 - Group 2 |
| Company 6 | Group 2 | 5     | Company 6 - Group 2 |

**Step configuration:**

```json
{
  {
    "name": "select",
    "columns": [ "Value", "Label"]
}
}

```

**Output dataset:**

| Value | Label               |
| ----- | ------------------- |
| 13    | Company 1 - Group 1 |
| 7     | Company 2 - Group 1 |
| 20    | Company 3 - Group 1 |
| 1     | Company 4 - Group 2 |
| 10    | Company 5 - Group 2 |
| 5     | Company 6 - Group 2 |

### `sort` step

Sort values in one or several columns. Order can be either 'asc' or 'desc'. When sorting on several columns, order of columns specified in `columns` matters.

```json
{
    "name": "sort",
    "columns": [{"column": "foo", "order": "asc"}, {"column": "bar", "order": "desc"}]
}

```

#### Example

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

**Step configuration:**

```json
{
    "name": "sort",
    "columns": [{ "column": "Group", "order": "asc"}, {"column": "Value", "order": "desc" }]
}

```

**Output dataset:**

| Company | Group   | Value |
| ------- | ------- | ----- |
| Label 3 | Group 1 | 20    |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |
| Label 4 | Group 2 | 1     |

### `split` step

Split a string `column` into several columns based on a `delimiter`.

```json
{
  "name": "split",
  "column": "foo",
  "delimiter": " - ",
  "numberColsToKeep": 3

}

```

#### Example 1

**Input dataset:**

| Label                      | Value |
| -------------------------- | ----- |
| Label 1 - Group 1 - France | 13    |
| Label 2 - Group 1 - Spain  | 7     |
| Label 3 - Group 1 - USA    | 20    |
| Label 4 - Group 2 - France | 1     |
| Label 5 - Group 2 - Spain  | 10    |
| Label 6 - Group 2 - USA    | 5     |

**Step configuration:**

```json
{
  "name": "split",
  "column": "Label",
  "delimiter": " - ",
  "numberColsToKeep": 3
}

```

**Output dataset:**

| Label\_1 | Label\_2 | Label\_3 | Value |
| -------- | -------- | -------- | ----- |
| Label 1  | Group 1  | Spain    | 13    |
| Label 2  | Group 1  | USA      | 7     |
| Label 3  | Group 1  | France   | 20    |
| Label 4  | Group 2  | USA      | 1     |
| Label 5  | Group 2  | France   | 10    |
| Label 6  | Group 2  | Spain    | 5     |

#### Example 2: keeping less columns

**Input dataset:**

| Label                      | Value |
| -------------------------- | ----- |
| Label 1 - Group 1 - France | 13    |
| Label 2 - Group 1 - Spain  | 7     |
| Label 3 - Group 1 - USA    | 20    |
| Label 4 - Group 2 - France | 1     |
| Label 5 - Group 2 - Spain  | 10    |
| Label 6 - Group 2 - USA    | 5     |

**Step configuration:**

```json
{
  "name": "split",
  "column": "Label",
  "delimiter": " - ",
  "numberColsToKeep": 2
}

```

**Output dataset:**

| Label\_1 | Label\_2 | Value |
| -------- | -------- | ----- |
| Label 1  | Group 1  | 13    |
| Label 2  | Group 1  | 7     |
| Label 3  | Group 1  | 20    |
| Label 4  | Group 2  | 1     |
| Label 5  | Group 2  | 10    |
| Label 6  | Group 2  | 5     |

### `simplify` step

Simplifies geographical data.

When simplifying your data, every point that is closer than a specific distance to the previous one is suppressed. This step can be useful if you have a very precise shape for a country (such as one-meter precision), but want to quickly draw a map chart. In that case, you may want to *simplify* your data.

After simplification, no points will be closer than `tolerance`. The unit depends on data's projection and on its unit, but in general, it's expressed in meters for CRS projections. For more details, see the [GeoPandas documentation](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.simplify.html).

**Step configuration:**

```json
{
  "name": "simplify",
  "tolerance": 1.0
}

```

### `substring` step

Extract a substring in a string `column`. The substring begins at index `start_index` (beginning at `1`) and stops at `end_index`. You can specify negative indexes, in such a case the index search will start from the end of the string (with `-1` being the last index of the string). Please refer to the examples below for illustration. Neither `start_index` nor `end_index` can be equal to 0.

```json
{
  "name": "substring",
  "column": "foo",
  "startIndex": 1,
  "endIndex": -1,
  "newColumnName": "myNewColumn"
}

```

#### Example 1: positive `start_index` and `end_index`

**Input dataset:**

| Group      | Value |
| ---------- | ----- |
| foo        | 13    |
| overflow   | 7     |
| some\_text | 20    |
| a\_word    | 1     |
| toucan     | 10    |
| toco       | 5     |

**Step configuration:**

```json
{
  "column": "Label",
  "name": "substring",
  "startIndex": 1,
  "endIndex": 4
}

```

| Label      | Value | Label\_PCT |
| ---------- | ----- | ---------- |
| foo        | 13    | foo        |
| overflow   | 7     | over       |
| some\_text | 20    | some       |
| a\_word    | 1     | a\_wo      |
| toucan     | 10    | touc       |
| toco       | 5     | toco       |

#### Example 2: `start_index` is positive and `end_index` is negative

**Input dataset:**

| Label      | Value |
| ---------- | ----- |
| foo        | 13    |
| overflow   | 7     |
| some\_text | 20    |
| a\_word    | 1     |
| toucan     | 10    |
| toco       | 5     |

**Step configuration:**

```json
{
  "name": "substring",
  "column": "Label",
  "startIndex": 2,
  "endIndex": -2,
  "newColumnName": "short_label"
}

```

**Output dataset:**

| Label      | Value | short\_label |
| ---------- | ----- | ------------ |
| foo        | 13    | o            |
| overflow   | 7     | verflo       |
| some\_text | 20    | ome\_tex     |
| a\_word    | 1     | \_wor        |
| toucan     | 10    | ouca         |
| toco       | 5     | oc           |

#### Example 3: `start_index` and `end_index` are negative

**Input dataset:**

| Label      | Value |
| ---------- | ----- |
| foo        | 13    |
| overflow   | 7     |
| some\_text | 20    |
| a\_word    | 1     |
| toucan     | 10    |
| toco       | 5     |

**Step configuration:**

```json
{
  "name": "substring",
  "column": "Label",
  "startIndex": -3,
  "endIndex": -1
}

```

**Output dataset:**

| Label      | Value | Label\_PCT |
| ---------- | ----- | ---------- |
| foo        | 13    | foo        |
| overflow   | 7     | low        |
| some\_text | 20    | ext        |
| a\_word    | 1     | ord        |
| toucan     | 10    | can        |
| toco       | 5     | oco        |

### `text` step

Use this step to add a text column where every value will be equal to the specified `text`.

```json
{
  {
    "name": "text",
    "newColumn": "new",
    "text": "some text"
  }
}

```

#### Example

**Input dataset:**

| Label   | Value1 |
| ------- | ------ |
| Label 1 | 10     |
| Label 2 | 1      |
| Label 3 | 5      |

**Step configuration:**

```json
{
  "name": "text",
  "newColumn": "KPI",
  "text": "Sales"
}

```

**Output dataset:**

| Label   | Value1 | KPI   |
| ------- | ------ | ----- |
| Label 1 | 10     | Sales |
| Label 2 | 1      | Sales |
| Label 3 | 5      | Sales |

### `todate` step

Converts a string `column` into a date column based on a specified `format`.

```json
{
    "name": "todate",
    "column": "myTextColumn",
    "format": "%Y-%m-%d"



}

```

#### Example

**Input dataset:**

| Company   | Date       | Value |
| --------- | ---------- | ----- |
| Company 1 | 06/10/2019 | 13    |
| Company 1 | 07/10/2019 | 7     |
| Company 1 | 08/10/2019 | 20    |
| Company 2 | 06/10/2019 | 1     |
| Company 2 | 07/10/2019 | 10    |
| Company 2 | 08/10/2019 | 5     |

**Step configuration:**

```json
{
  "name": "todate",
  "column": "Date",
  "format": "%d/%m/%Y"
}

```

**Output dataset:**

| Company   | Date                     | Value |
| --------- | ------------------------ | ----- |
| Company 1 | 2019-10-06T00:00:00.000Z | 13    |
| Company 1 | 2019-10-07T00:00:00.000Z | 7     |
| Company 1 | 2019-10-08T00:00:00.000Z | 20    |
| Company 2 | 2019-10-06T00:00:00.000Z | 1     |
| Company 2 | 2019-10-07T00:00:00.000Z | 10    |
| Company 2 | 2019-10-08T00:00:00.000Z | 5     |

### `top` step

Return top N rows by group if `groups` is specified, else over full dataset.

```json
{
  "name": "top",
  "groups": [ "foo"],
  "rankOn": "bar",
  "sort": "desc",
  "limit": 10
}

```

#### Example 1: top without `groups`, ascending order

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

**Step configuration:**

```json
{
  "name": "top",
  "rankOn": "Value",
  "sort": "asc",
  "limit": 3
}

```

**Output dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 4 | Group 2 | 1     |
| Label 6 | Group 2 | 5     |
| Label 2 | Group 1 | 7     |

#### Example 2: top with `groups`, descending order

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 4 | Group 2 | 1     |
| Label 5 | Group 2 | 10    |
| Label 6 | Group 2 | 5     |

**Step configuration:**

```json
{
  "name": "top",
  "groups": [ "Group"],
  "rankOn": "Value",
  "sort": "desc",
  "limit": 1
}

```

**Output dataset:**

| Company | Group   | Value |
| ------- | ------- | ----- |
| Label 3 | Group 1 | 20    |
| Label 5 | Group 2 | 10    |

### `totals` step

Append "total" rows to the dataset for specified dimensions. Computed rows result from an aggregation (either sum, average, count, count distinct, min, max, first or last)

```json
{
  "name": "totals",

  "totalDimensions": [
    { "totalColumn": "foo", "totalRowsLabel": "Total foos" },
    { "totalColumn": "bar", "totalRowsLabel": "Total bars" }
  ],
  "aggregations": [
    {
      "columns": [ "value1", "value2"]
      "aggfunction": "sum",
      "newcolumns": [ "sum_value1", "sum_value2"]
    },
    {
      "columns": [ "value1"]
      "aggfunction": "avg",
      "newcolumns": [ "avg_value1"]
    }
   ],
  "groups": [ "someDimension"]
}

```

#### Example 1: basic usage

**Input dataset:**

| COUNTRY | PRODUCT   | YEAR | VALUE |
| ------- | --------- | ---- | ----- |
| France  | product A | 2019 | 5     |
| USA     | product A | 2019 | 10    |
| France  | product B | 2019 | 10    |
| USA     | product B | 2019 | 15    |
| France  | product A | 2020 | 20    |
| USA     | product A | 2020 | 20    |
| France  | product B | 2020 | 30    |
| USA     | product B | 2020 | 25    |

**Step configuration:**

```json
{
  "name": "totals",
  "totalDimensions": [{ "totalColumn": "COUNTRY", "totalRowsLabel": "All countries" }],
  "aggregations": [
    {
      "columns": [ "VALUE"]
      "aggfunction": "sum",
      "newcolumns": [ "VALUE"]
    }
   ]
}

```

**Output dataset:**

| COUNTRY       | PRODUCT   | YEAR | VALUE |
| ------------- | --------- | ---- | ----- |
| France        | product A | 2019 | 5     |
| USA           | product A | 2019 | 10    |
| France        | product B | 2019 | 10    |
| USA           | product B | 2019 | 15    |
| France        | product A | 2020 | 20    |
| USA           | product A | 2020 | 20    |
| France        | product B | 2020 | 30    |
| USA           | product B | 2020 | 25    |
| All countries | null      | null | 135   |

#### Example 2: With several totals and groups

**Input dataset:**

| COUNTRY | PRODUCT   | YEAR | VALUE\_1 | VALUE\_2 |
| ------- | --------- | ---- | -------- | -------- |
| France  | product A | 2019 | 5        | 50       |
| USA     | product A | 2019 | 10       | 100      |
| France  | product B | 2019 | 10       | 100      |
| USA     | product B | 2019 | 15       | 150      |
| France  | product A | 2020 | 20       | 200      |
| USA     | product A | 2020 | 20       | 200      |
| France  | product B | 2020 | 30       | 300      |
| USA     | product B | 2020 | 25       | 250      |

**Step configuration:**

```json
{
  "name": "totals",
  "totalDimensions": [
    {"totalColumn": "COUNTRY", "totalRowsLabel": "All countries"},
    {"totalColumn": "PRODUCT", "totalRowsLabel": "All products"}
  ],
  "aggregations": [
    {
      "columns": [ "VALUE_1-sum", "VALUE_2"]
      "aggfunction": "sum",
      "newcolumns": [ "VALUE_1", "VALUE_2"]
    },
    {
      "columns": [ "VALUE_1-avg"]
      "aggfunction": "avg",
      "newcolumns": [ "VALUE_1"]
    }
   ],
   "groups": [ "YEAR"]
}

```

**Output dataset:**

| COUNTRY       | PRODUCT      | YEAR | VALUE\_2 | VALUE\_1-sum | VALUE\_1-avg |
| ------------- | ------------ | ---- | -------- | ------------ | ------------ |
| France        | product A    | 2019 | 50       | 5            | 5            |
| USA           | product A    | 2019 | 100      | 10           | 10           |
| France        | product B    | 2019 | 100      | 10           | 10           |
| USA           | product B    | 2019 | 150      | 15           | 15           |
| France        | product A    | 2020 | 200      | 20           | 20           |
| USA           | product A    | 2020 | 200      | 20           | 20           |
| France        | product B    | 2020 | 300      | 30           | 30           |
| USA           | product B    | 2020 | 250      | 25           | 25           |
| USA           | All products | 2020 | 450      | 45           | 22.5         |
| France        | All products | 2020 | 500      | 50           | 25           |
| USA           | All products | 2019 | 250      | 25           | 12.5         |
| France        | All products | 2019 | 150      | 15           | 7.5          |
| All countries | product B    | 2020 | 550      | 55           | 27.5         |
| All countries | product A    | 2020 | 400      | 40           | 20           |
| All countries | product B    | 2019 | 250      | 25           | 12.5         |
| All countries | product A    | 2019 | 150      | 15           | 7.5          |
| All countries | All products | 2020 | 950      | 95           | 23.75        |
| All countries | All products | 2019 | 400      | 40           | 10           |

### `trim` step

Trim spaces in a column.

```json
{
    "name": "trim",
    "columns": [ "my-column", "some-other-column"]
}

```

#### Example

**Input dataset:**

| Company       | Group   | Value | Label               |
| ------------- | ------- | ----- | ------------------- |
| ' Company 1 ' | Group 1 | 13    | Company 1 - Group 1 |
| ' Company 2 ' | Group 1 | 7     | Company 2 - Group 1 |

**Step configuration:**

```json
{
  "name": "trim",
  "columns": [ "Company"]
}

```

**Output dataset:**

| Company     | Group   | Value | Label               |
| ----------- | ------- | ----- | ------------------- |
| 'Company 1' | Group 1 | 13    | Company 1 - Group 1 |
| 'Company 2' | Group 1 | 7     | Company 2 - Group 1 |

### `unpivot` step

Unpivot a list of columns to rows.

```json
{
  "name": "unpivot",
  "keep": [ "COMPANY", "COUNTRY"],
  "unpivot": [ "NB_CLIENTS", "REVENUES"],
  "unpivotColumnName": "KPI",
  "valueColumnName": "VALUE",
  "dropna": true
}

```

#### Example 1: with `dropna`parameter to true

**Input dataset:**

| COMPANY   | COUNTRY | NB\_CLIENTS | REVENUES |
| --------- | ------- | ----------- | -------- |
| Company 1 | France  | 7           | 10       |
| Company 2 | France  | 2           |          |
| Company 1 | USA     | 12          | 6        |
| Company 2 | USA     | 1           | 3        |

**Step configuration:**

```json
{
  "name": "unpivot",
  "keep": [ "COMPANY", "COUNTRY"],
  "unpivot": [ "NB_CLIENTS", "REVENUES"],
  "unpivotColumnName": "KPI",
  "valueColumnName": "VALUE",
  "dropna": true
}

```

**Output dataset:**

| COMPANY   | COUNTRY | KPI         | VALUE |
| --------- | ------- | ----------- | ----- |
| Company 1 | France  | NB\_CLIENTS | 7     |
| Company 1 | France  | REVENUES    | 10    |
| Company 2 | France  | NB\_CLIENTS | 2     |
| Company 1 | USA     | NB\_CLIENTS | 12    |
| Company 1 | USA     | REVENUES    | 6     |
| Company 2 | USA     | NB\_CLIENTS | 1     |
| Company 2 | USA     | REVENUES    | 3     |

#### Example 1: with `dropna`parameter to false

**Input dataset:**

| COMPANY   | COUNTRY | NB\_CLIENTS | REVENUES |
| --------- | ------- | ----------- | -------- |
| Company 1 | France  | 7           | 10       |
| Company 2 | France  | 2           |          |
| Company 1 | USA     | 12          | 6        |
| Company 2 | USA     | 1           | 3        |

**Step configuration:**

```json
{
  "name": "unpivot",
  "keep": [ "COMPANY", "COUNTRY"],
  "unpivot": [ "NB_CLIENTS", "REVENUES"],
  "unpivotColumnName": "KPI",
  "valueColumnName": "VALUE",
  "dropna": false
}

```

**Output dataset:**

| COMPANY   | COUNTRY | KPI         | VALUE |
| --------- | ------- | ----------- | ----- |
| Company 1 | France  | NB\_CLIENTS | 7     |
| Company 1 | France  | REVENUES    | 10    |
| Company 2 | France  | NB\_CLIENTS | 2     |
| Company 2 | France  | REVENUES    |       |
| Company 1 | USA     | NB\_CLIENTS | 12    |
| Company 1 | USA     | REVENUES    | 6     |
| Company 2 | USA     | NB\_CLIENTS | 1     |
| Company 2 | USA     | REVENUES    | 3     |

### `uppercase` step

Converts a string `column` to uppercase.

```json
{
  "name": "uppercase",
  "column": "foo"
}

```

#### Example:

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |

**Step configuration:**

```json
{
  "name": "uppercase",
  "column": "Label"
}

```

**Output dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| LABEL 1 | Group 1 | 13    |
| LABEL 2 | Group 1 | 7     |
| LABEL 3 | Group 1 | 20    |

### `uniquegroups` step

Allow to get unique groups of values from one or several columns.

```json
{
  "name": "uniquegroups",
  "on": [ "foo", "bar"]
}

```

#### Example:

**Input dataset:**

| Label   | Group   | Value |
| ------- | ------- | ----- |
| Label 1 | Group 1 | 13    |
| Label 2 | Group 1 | 7     |
| Label 3 | Group 1 | 20    |
| Label 1 | Group 2 | 1     |
| Label 2 | Group 1 | 2     |
| Label 3 | Group 1 | 3     |

**Step configuration:**

```json
{
  "name": "uniquegroups",
  "column": [ "Label", "Group"]
}

```

**Output dataset:**

| Label   | Group   |
| ------- | ------- |
| Label 1 | Group 1 |
| Label 1 | Group 2 |
| Label 2 | Group 1 |
| Label 3 | Group 1 |

### `waterfall` step

This step allows to generate a data structure useful to build waterfall charts. It breaks down the variation between two values (usually between two dates) accross entities. Entities are found in the `labelsColumn`, and can optionally be regrouped under common parents found in the `parentsColumn` for drill-down purposes.

```json
{
  "name": "waterfall",
  "valueColumn": "VALUE",
  "milestonesColumn": "DATE",

  "start": "2019",
  "end": "2020",
  "labelsColumn": "PRODUCT",
  "groupby": [ "COUNTRY"],
  "sortBy": "value",

  "order": "desc"

}

```

#### Example 1: Basic usage

**Input dataset:**

| city     | year | revenue |
| -------- | ---- | ------- |
| Bordeaux | 2019 | 135     |
| Boston   | 2019 | 275     |
| New-York | 2019 | 115     |
| Paris    | 2019 | 450     |
| Bordeaux | 2018 | 98      |
| Boston   | 2018 | 245     |
| New-York | 2018 | 103     |
| Paris    | 2018 | 385     |

**Step configuration:**

```json
{
  "name": "waterfall",
  "valueColumn": "revenue",
  "milestonesColumn": "year",
  "start": "2018",
  "end": "2019",
  "labelsColumn": "city",
  "sortBy": "value",
  "order": "desc"
}

```

**Output dataset:**

| LABEL\_waterfall | TYPE\_waterfall | revenue |
| ---------------- | --------------- | ------- |
| 2018             | null            | 831     |
| Paris            | parent          | 65      |
| Bordeaux         | parent          | 37      |
| Boston           | parent          | 30      |
| New-York         | parent          | 12      |
| 2019             | null            | 975     |

#### Example 2: With more options

**Input dataset:**

| city     | country | product  | year | revenue |
| -------- | ------- | -------- | ---- | ------- |
| Bordeaux | France  | product1 | 2019 | 65      |
| Bordeaux | France  | product2 | 2019 | 70      |
| Paris    | France  | product1 | 2019 | 210     |
| Paris    | France  | product2 | 2019 | 240     |
| Boston   | USA     | product1 | 2019 | 130     |
| Boston   | USA     | product2 | 2019 | 145     |
| New-York | USA     | product1 | 2019 | 55      |
| New-York | USA     | product2 | 2019 | 60      |
| Bordeaux | France  | product1 | 2018 | 38      |
| Bordeaux | France  | product2 | 2018 | 60      |
| Paris    | France  | product1 | 2018 | 175     |
| Paris    | France  | product2 | 2018 | 210     |
| Boston   | USA     | product1 | 2018 | 95      |
| Boston   | USA     | product2 | 2018 | 150     |
| New-York | USA     | product1 | 2018 | 50      |
| New-York | USA     | product2 | 2018 | 53      |

**Step configuration:**

```json
{
  "name": "waterfall",
  "valueColumn": "revenue",
  "milestonesColumn": "year",
  "start": "2018",
  "end": "2019",
  "labelsColumn": "city",
  "parentsColumn": "country",
  "groupby": [ "product"],
  "sortBy": "label",
  "order": "asc"
}

```

**Output dataset:**

| LABEL\_waterfall | GROUP\_waterfall | TYPE\_waterfall | product  | revenue |
| ---------------- | ---------------- | --------------- | -------- | ------- |
| 2018             | 2018             | null            | product1 | 358     |
| 2018             | 2018             | null            | product2 | 473     |
| Bordeaux         | France           | child           | product1 | 27      |
| Bordeaux         | France           | child           | product2 | 10      |
| Boston           | USA              | child           | product1 | 35      |
| Boston           | USA              | child           | product2 | -5      |
| France           | France           | parent          | product2 | 40      |
| France           | France           | parent          | product1 | 62      |
| New-York         | USA              | child           | product1 | 5       |
| New-York         | USA              | child           | product2 | 7       |
| Paris            | France           | child           | product1 | 35      |
| Paris            | France           | child           | product2 | 30      |
| USA              | USA              | parent          | product2 | 2       |
| USA              | USA              | parent          | product1 | 40      |
| 2019             | 2019             | null            | product2 | 515     |
| 2019             | 2019             | null            | product1 | 460     |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.toucanai.cloud/references/apis/data-steps.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
