Data steps

Steps are the building blocks of data transformation. They are used in chart queries: each step is a single operation in the query (filtering, aggregating, renaming columns, etc.). Steps run in sequence; the output of one step becomes the input of the next.

The JSON examples in this reference can be copied and adapted for your chart queries. Use the name field to identify the step type and fill in the other properties as described for each step below.

`absolutevalue` step

This step is meant to compute the absolute value of a given input column.

{
    "name": "absolutevalue",
    "column": "my-column",
    "newColumn": "my-new-column"
}

Example

Input dataset:

Company

Value

Company 1

-33

Company 2

Company 3

Step configuration:

{
  "name": "absolutevalue",
  "column": "Value",
  "newColumn": "My-absolute-value"
}

Output dataset:

Company

Value

My-absolute-value

Company 1

-33

Company 2

Company 3

`addmissingdates` step

Add missing dates as new rows in a dates column. Exhaustive dates will range between the minimum and maximum date found in the dataset (or in each group if a group by logic is applied - see thereafter).

Added rows will be set to null in columns not referenced in the step configuration.

You should make sure to use a group by logic if you want to add missing dates in independent groups of rows (e.g. you may need to add missing rows for every country found in a "COUNTRY" column). And you should ensure that every date is unique in every group of rows at the specified granularity, else you may get inconsistent results. You can specify "group by" columns in the groups parameter.

An addmissingdates step has the following structure:

{
  "name": "addmissingdates",
  "datesColumn": "DATE",
  "datesGranularity": "day",
  "groups": [ "COUNTRY"]
}

Example 1: day granularity without groups

Input dataset:

DATE

VALUE

"2018-01-01T00:00:00.000Z"

"2018-01-02T00:00:00.000Z"

"2018-01-03T00:00:00.000Z"

"2018-01-04T00:00:00.000Z"

"2018-01-05T00:00:00.000Z"

"2018-01-07T00:00:00.000Z"

"2018-01-08T00:00:00.000Z"

"2018-01-09T00:00:00.000Z"

"2018-01-10T00:00:00.000Z"

"2018-01-11T00:00:00.000Z"

Here the day "2018-01-06" is missing.

Step configuration:

{
  "name": "addmissingdates",
  "datesColumn": "DATE",
  "datesGranularity": "day"
}

Output dataset:

DATE

VALUE

"2018-01-01T00:00:00.000Z"

"2018-01-02T00:00:00.000Z"

"2018-01-03T00:00:00.000Z"

"2018-01-04T00:00:00.000Z"

"2018-01-05T00:00:00.000Z"

"2018-01-06T00:00:00.000Z"

"2018-01-07T00:00:00.000Z"

"2018-01-08T00:00:00.000Z"

"2018-01-09T00:00:00.000Z"

"2018-01-10T00:00:00.000Z"

"2018-01-11T00:00:00.000Z"

Example 2: day granularity with groups

Input dataset:

COUNTRY

DATE

VALUE

France

"2018-01-01T00:00:00.000Z"

France

"2018-01-02T00:00:00.000Z"

France

"2018-01-03T00:00:00.000Z"

France

"2018-01-04T00:00:00.000Z"

France

"2018-01-05T00:00:00.000Z"

France

"2018-01-07T00:00:00.000Z"

France

"2018-01-08T00:00:00.000Z"

France

"2018-01-09T00:00:00.000Z"

France

"2018-01-10T00:00:00.000Z"

France

"2018-01-11T00:00:00.000Z"

USA

"2018-01-01T00:00:00.000Z"

USA

"2018-01-02T00:00:00.000Z"

USA

"2018-01-03T00:00:00.000Z"

USA

"2018-01-05T00:00:00.000Z"

USA

"2018-01-06T00:00:00.000Z"

USA

"2018-01-07T00:00:00.000Z"

USA

"2018-01-08T00:00:00.000Z"

USA

"2018-01-09T00:00:00.000Z"

USA

"2018-01-10T00:00:00.000Z"

USA

"2018-01-12T00:00:00.000Z"

Here the day "2018-01-06" is missing for "France" rows, and "2018-01-11" and "2018-01-11" are missing for "USA" rows.

Note that "2018-01-12" will not be considered as a missing row for "France" rows, because the latest date found for this group of rows is "2018-01-11" (even though "2018-01-12" is the latest date found for "USA" rows).

Step configuration:

{
  "name": "addmissingdates",
  "datesColumn": "DATE",
  "datesGranularity": "day",
  "groups": "COUNTRY"
}

Output dataset:

COUNTRY

DATE

VALUE

France

"2018-01-01T00:00:00.000Z"

France

"2018-01-02T00:00:00.000Z"

France

"2018-01-03T00:00:00.000Z"

France

"2018-01-04T00:00:00.000Z"

France

"2018-01-05T00:00:00.000Z"

France

"2018-01-06T00:00:00.000Z"

France

"2018-01-07T00:00:00.000Z"

France

"2018-01-08T00:00:00.000Z"

France

"2018-01-09T00:00:00.000Z"

France

"2018-01-10T00:00:00.000Z"

France

"2018-01-11T00:00:00.000Z"

USA

"2018-01-01T00:00:00.000Z"

USA

"2018-01-02T00:00:00.000Z"

USA

"2018-01-03T00:00:00.000Z"

USA

"2018-01-04T00:00:00.000Z"

USA

"2018-01-05T00:00:00.000Z"

USA

"2018-01-06T00:00:00.000Z"

USA

"2018-01-07T00:00:00.000Z"

USA

"2018-01-08T00:00:00.000Z"

USA

"2018-01-09T00:00:00.000Z"

USA

"2018-01-10T00:00:00.000Z"

USA

"2018-01-11T00:00:00.000Z"

USA

"2018-01-12T00:00:00.000Z"

Example 3: month granularity

Input dataset:

DATE

VALUE

"2019-01-01T00:00:00.000Z"

"2019-02-01T00:00:00.000Z"

"2019-03-01T00:00:00.000Z"

"2019-04-01T00:00:00.000Z"

"2019-06-01T00:00:00.000Z"

"2019-07-01T00:00:00.000Z"

"2019-08-01T00:00:00.000Z"

"2019-09-01T00:00:00.000Z"

"2019-10-01T00:00:00.000Z"

"2019-12-01T00:00:00.000Z"

Here "2019-05" and "2019-11" are missing.

Step configuration:

{
  "name": "addmissingdates",
  "datesColumn": "DATE",
  "datesGranularity": "month"
}

Output dataset:

DATE

VALUE

"2019-01-01T00:00:00.000Z"

"2019-02-01T00:00:00.000Z"

"2019-03-01T00:00:00.000Z"

"2019-04-01T00:00:00.000Z"

"2019-05-01T00:00:00.000Z"

"2019-06-01T00:00:00.000Z"

"2019-07-01T00:00:00.000Z"

"2019-08-01T00:00:00.000Z"

"2019-09-01T00:00:00.000Z"

"2019-10-01T00:00:00.000Z"

"2019-11-01T00:00:00.000Z"

"2019-12-01T00:00:00.000Z"

`aggregate` step

Perform aggregations on one or several columns. Available aggregation functions are sum, average, count, count distinct, min, max, first, last.

An aggreation step has the following structure:

{
   "name": "aggregate",
   "on": [ "column1", "column2"],
   "aggregations":  [
    {
      "newcolumns": [ "sum_value1", "sum_value2"],
      "aggfunction": "sum",
      "columns": [ "value1", "value2"]
    }
    {
      "newcolumns": [ "avg_value1"],
      "aggfunction": "avg",
      "columns": [ "value1"]
    }

  ]
  "keepOriginalGranularity": false

}

Example 1: keepOriginalGranularity set to false

Input dataset:

Label

Group

Value1

Value2

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

Step configuration:

{
  "name": "aggregate",
   "on": [ "Group"],
   "aggregations":  [
    {
      "newcolumns": [ "Sum-Value1", "Sum-Value2"],
      "aggfunction": "sum",
      "columns": [ "Value1", "Value2"]
    }
    {
      "newcolumns": [ "Avg-Value1"],
      "aggfunction": "avg",
      "columns": [ "Value1"]
    }
  ],
  "keepOriginalGranularity": false
}

Output dataset:

Group

Sum-Value1

Sum-Value2

Avg-Value1

Group 1

13.333333

Group 2

5.333333

Example 2: keepOriginalGranularity set to true

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

Step configuration:

{
  "name": "aggregate",
   "on": [ "Group"],
   "aggregations":  [
    {
      "newcolumns": [ "Total"],
      "aggfunction": "sum",
      "columns": [ "Value"]
    }
  ],
  "keepOriginalGranularity": true
}

Output dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

`append` step

Appends to the current dataset, one or several datasets resulting from other pipelines. WeaverBird allows you to save pipelines referenced by name in the Vuex store of the application. You can then call them by their unique names in this step.

{
  "name": "append",
  "pipelines": [ "pipeline1", "pipeline2"]
}

Example

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

dataset1 (saved in the application Vuex store):

Label

Group

Value

Label 3

Group 1

Label 4

Group 2

dataset2 (saved in the application Vuex store):

Label

Group

Value

Label 5

Group 2

Label 6

Group 2

Step configuration:

{
  "name": "append",
  "pipelines": [ "dataset1", "dataset2"]
}

Output dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

`argmax` step

Get row(s) matching the maximum value in a given column, by group if groups is specified.

{
  "name": "argmax",
  "column": "value",
  "groups": [ "group1", "group2"]
}

Example 1: without `groups`

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

Step configuration:

{
  "name": "argmax",
  "column": "Value"
}

Output dataset:

Label

Group

Value

Label 3

Group 1

Example 2: with `groups`

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

Step configuration:

{
  "name": "argmax",
  "column": "Value",
  "groups": [ "Group"]
}

Output dataset:

Label

Group

Value

Label 3

Group 1

Label 5

Group 2

`argmin` step

Get row(s) matching the minimum value in a given column, by group if groups is specified.

{
  "name": "argmin",
  "column": "value",
  "groups": [ "group1", "group2"]
}

Example 1: without `groups`

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

Step configuration:

{
  "name": "argmin",
  "column": "Value"
}

Output dataset:

Label

Group

Value

Label 4

Group 2

Example 2: with `groups`

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

Step configuration:

{
  "name": "argmin",
  "column": "Value",
  "groups": [ "Groups"]
}

Output dataset:

Label

Group

Value

Label 2

Group 1

Label 4

Group 2

`comparetext` step

Compares 2 string columns and returns true if the string values are equal, and false oteherwise. The comparison is case-sensitive (see examples below).

{
  "name": "comparetext",
  "newColumnName": "NEW",
  "strCol1": "TEXT_1",
  "strCol2": "TEXT_2"
}

Example

Input dataset:

TEXT_1

TEXT_2

France

france

France

England

France

USA

Step configuration:

{
  "name": "split",
  "newColumnName": "RESULT",
  "ctrCol1": "TEXT_1",
  "ctrCol2": "TEXT_2"
}

Output dataset:

TEXT_1

TEXT_2

RESULT

France

false

France

true

France

france

false

France

England

false

France

USA

false

`concatenate` step

This step allows to concatenate several columns using a separator.

{
  "name": "concatenate",
  "columns": [ "Company", "Group"]
  "separator": " - "
  "newColumnName": "Label"
}

Example

Input dataset:

Company

Group

Value

Company 1

Group 1

Company 2

Group 1

Company 3

Group 1

Company 4

Group 2

Company 5

Group 2

Company 6

Group 2

Step configuration:

{
  "name": "concatenate",
  "columns": [ "Company", "Group"]
  "separator": " - "
  "newColumnName": "Label"
}

Output dataset:

Company

Group

Value

Label

Company 1

Group 1

Company 1 - Group 1

Company 2

Group 1

Company 2 - Group 1

Company 3

Group 1

Company 3 - Group 1

Company 4

Group 2

Company 4 - Group 2

Company 5

Group 2

Company 5 - Group 2

Company 6

Group 2

Company 6 - Group 2

`convert` step

This step allows to convert columns data types.

{
  "name": "convert",
  "columns": [ "col1", "col2"]
  "dataType": "integer"

}

In a effort to harmonize as much as possible the conversion behaviors, for some cases, the Sql translator implements casting otherwise than the CAST AS method.

Precisely, when casting float to integer, the default behaviour rounds the result, other languages truncate it. That's why the use of TRUNCATE was implemented when converting float to int. The same implementation was done when converting strings to int (for date represented as string). As for the conversion of date to int, we handled it by assuming the dataset's timestamp is in TIMESTAMP_NTZ format.

Example

Input dataset:

Company

Value

Company 1

'13'

Company 2

'7'

Company 3

'20'

Company 4

'1'

Company 5

'10'

Company 6

'5'

Step configuration:

{
  "name": "convert",
  "columns": [ "Value"]
  "dataType": "integer"
}

Output dataset:

Company

Value

Company 1

Company 2

Company 3

Company 4

Company 5

Company 6

`cumsum` step

This step allows to compute the cumulated sum of value columns based on a reference column (usually dates) to be sorted by ascending order for the needs of the computation. The computation can be scoped by group if needed.

The toCumSum parameter takes as input a list of 2-elements lists in the form ['valueColumn', 'newColumn'].

{
  "name": "cumsum",
  "toCumSum": [["myValues", "myCumsum"]],
  "referenceColumn": "myDates",
  "groupby": [ "foo", "bar"]
}

Example 1: Basic usage

Input dataset:

DATE

VALUE

2019-01

2019-02

2019-03

2019-04

2019-05

2019-06

Step configuration:

{
  "name": "cumsum",
  "toCumSum": [["VALUE", ""]]
  "referenceColumn": "DATE"
}

Output dataset:

DATE

VALUE

VALUE_CUMSUM

2019-01

2019-02

2019-03

2019-04

2019-05

2019-06 6

Example 2: With more advanced options

Input dataset:

COUNTRY

DATE

VALUE

France

2019-01

France

2019-02

France

2019-03

France

2019-04

France

2019-05

France

2019-06 6

USA

2019-01

USA

2019-02

USA

2019-03

USA

2019-04

USA

2019-05

USA

2019-06 6

Step configuration:

{
  "name": "cumsum",
  "toCumSum": [["VALUE", "MY_CUMSUM"]],
  "referenceColumn": "DATE",
  "groupby": [ "COUNTRY"]
}

Output dataset:

COUNTRY

DATE

VALUE

MY_CUMSUM

France

2019-01

France

2019-02

France

2019-03

France

2019-04

France

2019-05

France

2019-06 6

USA

2019-01

USA

2019-02

USA

2019-03

USA

2019-04

USA

2019-05

USA

2019-06 6

`custom` step

This step allows to define a custom query that can't be expressed using the other existing steps.

{
    "name": "custom",
    "query": "$group: {"_id": ...}"
}

Example: using Mongo query language

Input dataset:

Company

Group

Value

Company 1

Group 1

Company 2

Group 1

Company 3

Group 1

Company 4

Group 2

Company 5

Group 2

Company 6

Group 2

Step configuration:

{
  "name": "custom",
  "query": "$addFields: { "Label": { $concat: [ "$Label", " - ", "$Group" ] ] } }"
}

Output dataset:

Company

Group

Value

Label

Company 1

Group 1

Company 1 - Group 1

Company 2

Group 1

Company 2 - Group 1

Company 3

Group 1

Company 3 - Group 1

Company 4

Group 2

Company 4 - Group 2

Company 5

Group 2

Company 5 - Group 2

Company 6

Group 2

Company 6 - Group 2

`dateextract` step

Extract date information (eg. day, week, year etc.). The following information can be extracted:

year: extract 'year' from date,
month: extract 'month' from date,
day: extract 'day of month' from date,
week': extract 'week number' (ranging from 0 to 53) from date,
quarter: extract 'quarter number' from date (1 for Jan-Feb-Mar)
dayOfWeek: extract 'day of week' (ranging from 1 for Sunday to 7 for Staurday) from date,
dayOfYear: extract 'day of year' from date,
isoYear: extract 'year number' in ISO 8601 format (ranging from 1 to 53) from date.
isoWeek: extract 'week number' in ISO 8601 format (ranging from 1 to 53) from date.
isoDayOfWeek: extract 'day of week' in ISO 8601 format (ranging from 1 for Monday to 7 for Sunday) from date,
firstDayOfYear: calendar date corresponding to the first day (1st of January) of the year ,
firstDayOfMonth: calendar date corresponding to the first day of the month,
firstDayOfWeek: calendar date corresponding to the first day of the week,
firstDayOfQuarter: calendar date corresponding to the first day of the quarter,
firstDayOfIsoWeek: calendar date corresponding to the first day of the week in ISO 8601 format,
currentDay: calendar date of the target date,
previousDay: calendar date one day before the target date,
firstDayOfPreviousYear: calendar date corresponding to the first day (1st of January) of the previous year,
firstDayOfPreviousMonth: calendar date corresponding to the first day of the previous month,
firstDayOfPreviousWeek: calendar date corresponding to the first day of the previous week,
firstDayOfPreviousQuarter: calendar date corresponding to the first day of the previous quarter,
firstDayOfPreviousISOWeek: calendar date corresponding to the first day of the previous ISO week,
previousYear: extract previous 'year number' from date
previousMonth: extract previous 'month number' from date
previousWeek: extract previous 'week number' from date
previousQuarter: extract previous 'quarter number' from date
previousISOWeek: extract previous 'week number' in ISO 8601 format (ranging from 1 for Monday to 7 for Sunday) from date
hour: extract 'hour' from date,
minutes: extract 'minutes' from date,
seconds: extract 'seconds' from date,
milliseconds: extract 'milliseconds' from date,

Here's an example of such a step:

{
  "name": "dateextract",
  "column": "date",
  "dateInfo": [ "year", "month", "day"],
  "newColumns": [ "date_year", "date_month", "date_day"]
}

Example

Input dataset:

Date

2019-10-30T00:00:00.000Z

2019-10-15T00:00:00.000Z

2019-10-01T00:00:00.000Z

2019-09-30T00:00:00.000Z

2019-09-15T00:00:00.000Z

2019-09-01T00:00:00.000Z

Step configuration:

{
  "name": "dateextract",
  "column": "Date",
  "dateInfo": [ "year", "month", "day"],
  "newColumns": [ "Date_year", "Date_month", "Date_day"]
}

Output dataset:

Date

Date_year

Date_month

Date_day

2019-10-30T00:00:00.000Z

2019

2019-10-15T00:00:00.000Z

2019

2019-10-01T00:00:00.000Z

2019

2019-09-30T00:00:00.000Z

2020

2019-09-15T00:00:00.000Z

2020

2019-09-01T00:00:00.000Z

2020

`dategranularity` step

Extract date information (eg. day, week, year etc.) in a column intended for aggregation. The following granularities are supported:

year: calendar date corresponding to the first day (1st of January) of the year
quarter: calendar date corresponding to the first day of the quarter
month: calendar date corresponding to the first day of the month
week: calendar date corresponding to the first day of the week (sunday)
isoWeek: calendar date corresponding to the first day of the week (monday)
day: calendar date corresponding to the first hour of the day

Here's an example of such a step:

{
  "name": "dategranularity",
  "column": "date",
  "granularity": "year",
  "newColumn": "do_the_aggregate_on_this"
}

Example

Input dataset:

Date

2019-10-30T00:00:00.000Z

2019-10-15T00:00:00.000Z

2019-10-01T00:00:00.000Z

2019-09-30T05:11:31.000Z

2019-09-15T00:00:00.000Z

2019-09-01T00:00:00.000Z

Step configuration:

{
  "name": "dategranularity",
  "column": "Date",
  "granularity": "month"
}

Output dataset:

Date

2019-10-01T00:00:00.000Z

2019-09-01T00:00:00.000Z

`delete` step

Delete a column.

{
    "name": "delete",
    "columns": [ "my-column", "some-other-column"]
}

Example

Input dataset:

Company

Group

Value

Label

Company 1

Group 1

Company 1 - Group 1

Company 2

Group 1

Company 2 - Group 1

Company 3

Group 1

Company 3 - Group 1

Company 4

Group 2

Company 4 - Group 2

Company 5

Group 2

Company 5 - Group 2

Company 6

Group 2

Company 6 - Group 2

Step configuration:

{
  "name": "delete",
  "columns": [ "Company", "Group"]
}

Output dataset:

Value

Label

Company 1 - Group 1

Company 2 - Group 1

Company 3 - Group 1

Company 4 - Group 2

Company 5 - Group 2

Company 6 - Group 2

`dissolve` step

Geographically dissolve data

{
    "name": "dissolve",
    "groups": [ "my-column", "some-other-column"],
    "includeNulls": true
}

Example without aggregations

Input dataset:

Country

City

geometry

Country 1

City 1

Polygon

Country 2

City 2

Polygon

Country 2

City 3

Polygon

Country 1

City 4

Polygon

Country 2

City 5

Polygon

Country 1

City 6

Polygon

Step configuration:

{
    "name": "dissolve",
    "groups": [ "Country"],
    "includeNulls": true
}

Output dataset:

Country

geometry

Country 1

MultiPolygon

Country 2

MultiPolygon

Example with aggregations

Input dataset:

Country

City

geometry

Population

Country 1

City 1

Polygon

100_000

Country 2

City 2

Polygon

50_000

Country 2

City 3

Polygon

200_000

Country 1

City 4

Polygon

30_000

Country 2

City 5

Polygon

25_000

Country 1

City 6

Polygon

10_000

Step configuration:

{
    "name": "dissolve",
    "groups": [ "Country"],
    "includeNulls": true,
    "aggregations": [
        {
            "aggfunction": "sum",
            "columns": [ "Population"],
            "newcolumns": [ "Total population"]
        }
    ]
}

Output dataset:

Country

geometry

Total population

Country 1

MultiPolygon

140_000

Country 2

MultiPolygon

275_000

`domain` step

This step is meant to select a specific domain (using MongoDB terminology).

{
    "name": "domain",
    "domain": "my-domain"
}

`duplicate` step

This step is meant to duplicate a column.

{
    "name": "duplicate",
    "column": "my-column"
    "newColumnName": "my-duplicate"
}

Example

Input dataset:

Company

Value

Company 1

Company 2

Company 3

Step configuration:

{
  "name": "duplicate",
  "column": "Company",
  "newColumnName": "Company-copy"
}

Output dataset:

Company

Value

Company-copy

Company 1

Company 2

Company 3

`duration` step

Compute the duration (in days, hours, minutes or seconds) between 2 dates in a new column.

{
  "name": "duration",
  "newColumnName": "DURATION",
  "startDateColumn": "START_DATE",
  "endDateColumn": "END_DATE",
  "durationIn": "days"
}

Example 1: duration in days

Input dataset:

START_DATE

END_DATE

"2020-01-01T00:00:00.000Z"

"2020-01-31T00:00:00.000Z"

"2020-01-01T00:00:00.000Z"

"2020-12-31T00:00:00.000Z"

Step configuration:

{
  "name": "duration",
  "newColumnName": "DURATION",
  "startDateColumn": "START_DATE",
  "endDateColumn": "END_DATE",
  "durationIn": "days"
}

Output dataset:

START_HOUR

END_HOUR

DURATION

"2020-01-01T00:00:00.000Z"

"2020-01-31T00:00:00.000Z"

"2020-01-01T00:00:00.000Z"

"2020-12-31T00:00:00.000Z"

365

Example 2: duration in minutes

Input dataset:

START_HOUR

END_HOUR

"2020-01-01T14:00:00.000Z"

"2020-01-31T15:00:00.000Z"

"2020-01-01T15:00:00.000Z"

"2020-12-31T20:00:00.000Z"

Step configuration:

{
  "name": "duration",
  "newColumnName": "DURATION",
  "startDateColumn": "START_HOUR",
  "endDateColumn": "END_HOUR",
  "durationIn": "minutes"
}

Output dataset:

START_HOUR

END_HOUR

DURATION

"2020-01-01T14:00:00.000Z"

"2020-01-31T15:00:00.000Z"

"2020-01-01T15:00:00.000Z"

"2020-12-31T20:00:00.000Z"

300

`evolution` step

Use this step if you need to compute the row-by-row evolution of a value column, based on a date column. It will output 2 columns: one for the evolution in absolute value, the other for the evolution in percentage.

You must be careful that the computation is scoped so that there are no dates duplicates (so that any date finds no more than one previous date). That means that you may need to specify "group by" columns to make any date unique inside each group. You should specify those columns in the indexColumns parameter.

{
  "name": "evolution",
  "dateCol": "DATE",
  "valueCol": "VALUE",
  "evolutionType": "vsLastYear",
  "evolutionFormat": "abs",

  "indexColumns": [ "COUNTRY"],
  "newColumn": "MY_EVOL"
}

Example 1: Basic configuration - evolution in absolute value

Input dataset:

DATE

VALUE

2019-06

2019-07

2019-08

2019-09

2019-11

2019-12

Step configuration:

{
  "name": "evolution",
  "dateCol": "DATE",
  "valueCol": "VALUE",
  "evolutionType": "vsLastMonth",
  "evolutionFormat": "abs",
  "indexColumns": []
}

Output dataset:

DATE

VALUE

VALUE_EVOL_ABS

2019-06

2019-07

2019-08

-4

2019-09

-2

2019-11

2019-12

Example 2: Basic configuration - evolution in percentage

Input dataset:

DATE

VALUE

2019-06

2019-07

2019-08

2019-09

2019-11

2019-12

Step configuration:

{
  "name": "evolution",
  "dateCol": "DATE",
  "valueCol": "VALUE",
  "evolutionType": "vsLastMonth",
  "evolutionFormat": "pct",
  "indexColumns": []
}

Output dataset:

DATE

VALUE

VALUE_EVOL_PCT

2019-06

2019-07

0.02531645569620253

2019-08

-0.04938271604938271

2019-09

-0.025974025974025976

2019-11

2019-12

0.1282051282051282

Example 3: Error on duplicate dates

If 'COUNTRY' is not specified as indexColumn, the computation will not be scoped by country. Then there are duplicate dates in the "DATE" columns which is prohibited and will lead to an error.

Input dataset:

DATE

COUNTRY

VALUE

2014-12

France

2015-12

France

2016-12

France

2017-12

France

2014-12

USA

2015-12

USA

2016-12

USA

2017-12

USA

Step configuration:

{
  "name": "evolution",
  "dateCol": "DATE",
  "valueCol": "VALUE",
  "evolutionType": "vsLastYear",
  "evolutionFormat": "abs",
  "indexColumns": []
}

Output dataset:

With the mongo translator, you will get an error at row-level as shown below:

DATE

COUNTRY

VALUE

MY_EVOL

2014-12

France

2015-12

France

Error ...

2016-12

France

Error ...

2017-12

France

Error ...

2014-12

USA

2015-12

USA

Error ...

2016-12

USA

Error ...

2017-12

USA

Error ...

The pandas translator will just return an error, and you will not get any data.

Example 4: Complete configuration with index columns

Input dataset:

DATE

COUNTRY

VALUE

2014-12

France

2015-12

France

2016-12

France

2017-12

France

2019-12

France

2020-12

France

2014-12

USA

2015-12

USA

2016-12

USA

2017-12

USA

2018-11

USA

2020-12

USA

Step configuration:

{
  "name": "evolution",
  "dateCol": "DATE",
  "valueCol": "VALUE",
  "evolutionType": "vsLastYear",
  "evolutionFormat": "abs",
  "indexColumns": [ "COUNTRY"],
  "newColumn": "MY_EVOL"
}

Output dataset:

DATE

COUNTRY

VALUE

MY_EVOL

2014-12

France

2015-12

France

2016-12

France

-4

2017-12

France

-2

2019-12

France

2020-12

France

2014-12

USA

2015-12

USA

2016-12

USA

-1

2017-12

USA

-1

2018-11

USA

2020-12

USA

`fillna` step

Replace null values by a given value in specified columns.

{
    "name": "fillna",
    "columns": ["foo", "bar"],
    "value": 0
}

Example

Input dataset:

Company

Group

Value

KPI

Company 1

Group 1

Company 2

Group 1

Company 3

Group 1

Company 4

Group 2

Company 5

Group 2

Company 6

Group 2

Step configuration:

{
  "name": "fillna",
  "columns": ["Value", "KPI"],
  "value": 0
}

Output dataset:

Company

Group

Value

KPI

Company 1

Group 1

Company 2

Group 1

Company 3

Group 1

Company 4

Group 2

Company 5

Group 2

Company 6

Group 2

`filter` step

Filter out lines that don't match a filter definition.

{
    "name": "filter",
    "condition": {
      "column": "my-column",
      "value": 42,
      "operator": "ne"
    }
}

operator is optional, and defaults to eq. Allowed operators are eq, ne, gt, ge, lt, le, in, nin, matches, notmatches isnull or notnull.

value can be an arbitrary value depending on the selected operator (e.g a list when used with the in operator, or null when used with the isnull operator).

matches and notmatches operators are used to test value against a regular expression.

Conditions can be grouped and nested with logical operators and and or.

{
    "name": "filter",
    "condition": {
      "and": [
        {
          "column": "my-column",
          "value": 42,
          "operator": "gte"
        },
        {
          "column": "my-column",
          "value": 118,
          "operator": "lte"
        },
        {
          "or": [
            {
              "column": "my-other-column",
              "value": "blue",
              "operator": "eq"
            },
            {
              "column": "my-other-column",
              "value": "red",
              "operator": "eq"
            }
          ]
        }
      ]
    }
}

Relative dates

Date values can be relative to the moment to the moment when the query is executed. This is expressed by using a RelativeDate object instead of the value, of the form:

{
  "quantity": Number
  "duration": "year" | "quarter" | "month" | "week" | "day"
}

`formula` step

Add a computation based on a formula. Usually column names do not need to be escaped, unless they include whitespaces, in which case you'll need to use brackets '[]' (e.g. [myColumn]). Any string escaped with quotes (', ", ''', """) will be considered a string literal.

{
  {
    "name": "formula",
    "newColumn": "result",
    "formula": "(Value1 + Value2) / Value3 - Value4 * 2"
  }
}

Supported operators

The following operators are supported by the formula step (note that a value can be a column name or a literal, such as 42 or foo).

+: Does an addition of two numeric values. See the concatenate step to append strings
-: Does an substraction of two numeric values. See the replace step to remove a part of a string
*: Multiplies two numeric values.
/: Divides a numeric value by another. Divisions by zero will return null.
%: Returns the rest of an integer division. Divisions by zero will return null.

Example 1: Basic usage

Input dataset:

Label

Value1

Value2

Value3

Value4

Label 1

Label 2

Label 3

Step configuration:

{
  "name": "formula",
  "newColumn": "Result",
  "formula": "(Value1 + Value2) / Value3 - Value4 * 2"
}

Output dataset:

Label

Value1

Value2

Value3

Value4

Result

Label 1

Label 2

-4

Label 3

Example 2: Column name with whitespaces

Input dataset:

Label

Value1

Value2

Value3

Value 4

Label 1

Label 2

Label 3

Step configuration:

{
  "name": "formula",
  "newColumn": "Result",
  "formula": "(Value1 + Value2) / Value3 - [Value 4] * 2"
}

Output dataset:

Label

Value1

Value2

Value3

Value 4

Result

Label 1

Label 2

-4

Label 3

`hierarchy` step

Hierarchy for geographical data.

This step dissolves data for every hierarchy level, and adds a hierarchy level column containing a level (with 0 being the lowest granularity, i.e. the highest level).

{
    "name": "hierarchy",
    "hierarchy": [ "Country", "City"],
    "includeNulls": false
}

Example

Input dataset:

Country

City

geometry

Population

Country 1

City 1

Polygon

100_000

Country 2

City 2

Polygon

50_000

Country 2

City 3

Polygon

200_000

Country 1

City 4

Polygon

30_000

Country 2

City 5

Polygon

25_000

Country 1

City 6

Polygon

10_000

Step configuration:

    "name": "hierarchy",
    "hierarchy": [ "Country", "City"],
    "includeNulls": false,

Output dataset:

Country

City

geometry

Population

hierarchy_level

Country 1

City 1

Polygon

100_000

Country 2

City 2

Polygon

50_000

Country 2

City 3

Polygon

200_000

Country 1

City 4

Polygon

30_000

Country 2

City 5

Polygon

25_000

Country 1

City 6

Polygon

10_000

Country 1

null

MultiPolygon

null

Country 2

null

MultiPolygon

null

MultiPolygon

null

`ifthenelse` step

Creates a new column, which values will depend on a condition expressed on existing columns.

The condition is expressed in the if parameter with a condition object, which is the same object expected by the condition parameter of the filter step). Conditions can be grouped and nested with logical operators and and or.

The then parameter only supports a string, that will be interpreted as a formula (cf. formula step). If you want it to be interpreted striclty as a string and not a formula, you must escape the string with quotes (e.g. '"this is a text"').

if...then...else blocks can be nested as the else parameter supports either a string that will be interpreted as a formula (cf. formula step), or a nested if if...then...else object.

{
  "name": "ifthenelse",
  "newColumn": "",
  "if": { "column": "", "value": "", "operator": "eq" },
  "then": "",
  "else": ""
}

Example

Input dataset:

Label

number

Label 1

-2

Label 2

Label 3

Step configuration:

{
    "name": "ifthenelse",
    "newColumn": "result",
    "if": { "column": "number", "value": 0, "operator": "eq" },
    "then": ""zero""
    "else": {
      "if": { "column": "rel", "value": 0, "operator": "lt" },
      "then": "number * -1",
      "else": "number"
    }
}

Output dataset:

Label

number

result

Label 1

-2

Label 2

Label 3

zero

`join` step

Joins a dataset to the current dataset, i.e. brings columns from the former into the latter, and matches rows based on columns correspondance. It is similar to a JOIN clause in SQL, or to a VLOOKUP in excel. The joined dataset is the result from the query of the right_pipeline.

The join type can be:

'left': will keep every row of the current dataset and fill unmatched rows with null values,
'inner': will only keep rows that match rows of the joined dataset.

In the on parameter, you must specify 1 or more column couple(s) that will be compared to determine rows correspondance between the 2 datasets. The first element of a couple is for the current dataset column, and the second for the corresponding column in the right dataset to be joined. If you specify more than 1 couple, the matching rows will be those that find a correspondance between the 2 datasets for every column couple specified (logical 'AND').

Weaverbird allows you to save pipelines referenced by name in the Vuex store of the application. You can then call them by their unique names in this step.

{
  "name": "join",
  "rightPipeline": "somePipelineReference",
  "type": "left",
  "on": [
    [ "currentDatasetColumn1", "rightDatasetColumn1"],
    [ "currentDatasetColumn2", "rightDatasetColumn2"]
  ]
}

Example 1: Left join with one column couple as `on` parameter

Input dataset:

Label

Value

Label 1

Label 2

Label 3

Label 4

Label 5

Label 6

rightDataset (saved in the application Vuex store):

Label

Group

Label 1

Group 1

Label 2

Group 1

Label 3

Group 2

Label 4

Group 2

Step configuration:

{
  "name": "join",
  "rightPipeline": "rightDataset",
  "type": "left",
  "on": [["Label", "Label"]];
}

Output dataset:

Label

Value

Group

Label 1

Group 1

Label 2

Group 1

Label 3

Group 2

Label 4

Group 2

Label 5

Label 6

Example 2: inner join with different column names in the `on` parameter

Input dataset:

Label

Value

Label 1

Label 2

Label 3

Label 4

Label 5

Label 6

rightDataset (saved in the application Vuex store):

LabelRight

Group

Label 1

Group 1

Label 2

Group 1

Label 3

Group 2

Label 4

Group 2

Step configuration:

{
  "name": "join",
  "rightPipeline": "rightDataset",
  "type": "inner",
  "on": [["Label", "LabelRight"]];
}

Output dataset:

Label

Value

LabelRight

Group

Label 1

Group 1

Label 2

Group 1

Label 3

Group 2

Label 4

Group 2

`fromdate` step

Converts a date column into a string column based on a specified format.

{
    "name": "fromdate",
    "column": "myDateColumn"
    "format": "%Y-%m-%d"

}

Example

Input dataset:

Company

Date

Value

Company 1

2019-10-06T00:00:00.000Z

Company 1

2019-10-07T00:00:00.000Z

Company 1

2019-10-08T00:00:00.000Z

Company 2

2019-10-06T00:00:00.000Z

Company 2

2019-10-07T00:00:00.000Z

Company 2

2019-10-08T00:00:00.000Z

Step configuration:

{
  "name": "fromdate",
  "column": "Date",
  "format": "%d/%m/%Y"
}

Output dataset:

Company

Date

Value

Company 1

06/10/2019

Company 1

07/10/2019

Company 1

08/10/2019

Company 2

06/10/2019

Company 2

07/10/2019

Company 2

08/10/2019

`lowercase` step

⚠️ Mongo's $toLower operator does not support accents. If you have accents you need to lowercase with Mongo, use a replacetext step after lowercase.

Converts a string column to lowercase.

{
  "name": "lowercase",
  "column": "foo"
}

Example:

Input dataset:

Label

Group

Value

LABEL 1

Group 1

LABEL 2

Group 1

LABEL 3

Group 1

Step configuration:

{
  "name": "lowercase",
  "column": "Label"
}

Output dataset:

Label

Group

Value

label 1

Group 1

label 2

Group 1

label 3

Group 1

`movingaverage` step

Compute the moving average based on a value column, a reference column to sort (usually a date column) and a moving window (in number of rows i.e. data points). If needed, the computation can be performed by group of rows. The computation result is added in a new column.

{
  "name": "movingaverage",
  "valueColumn": "value",
  "columnToSort": "dates"
  "movingWindow": 12,
  "groups": [ "foo", "bar"]
  "newColumnName": "myNewColumn"
}

Example 1: Basic usage

Input dataset:

DATE

VALUE

2018-01-01

2018-01-02

2018-01-03

2018-01-04

2018-01-05

2018-01-06

2018-01-07

2018-01-08

Step configuration:

{
  "name": "movingaverage",
  "valueColumn": "VALUE",
  "columnToSort": "DATE"
  "movingWindow": 2
}

Output dataset:

DATE

VALUE

VALUE_MOVING_AVG

2018-01-01

null

2018-01-02

77.5

2018-01-03

2018-01-04

82.5

2018-01-05

81.5

2018-01-06

2018-01-07

82.5

2018-01-08

77.5

Example 2: with groups and custom newColumnName

Input dataset:

COUNTRY

DATE

VALUE

France

2018-01-01

France

2018-01-02

France

2018-01-03

France

2018-01-04

France

2018-01-05

France

2018-01-06

USA

2018-01-01

USA

2018-01-02

USA

2018-01-03

USA

2018-01-04

USA

2018-01-05

USA

2018-01-06

Step configuration:

{
  "name": "movingaverage",
  "valueColumn": "VALUE",
  "columnToSort": "DATE"
  "movingWindow": 2,
  "groups": [ "COUNTRY"]
  "newColumnName": "ROLLING_AVERAGE"
}

Output dataset:

COUNTRY

DATE

VALUE

ROLLING_AVERAGE

France

2018-01-01

null

France

2018-01-02

null

France

2018-01-03

France

2018-01-04

81.7

France

2018-01-05

81.7

France

2018-01-06

USA

2018-01-01

null

USA

2018-01-02

null

USA

2018-01-03

71.7

USA

2018-01-04

73.7

USA

2018-01-05

72.7

USA

2018-01-06

73.7

`percentage` step

Compute the percentage of total, i.e. for every row the value in column divided by the total as the sum of every values in column. The computation can be performed by group if specified. The result is written in a new column.

{
  "name": "percentage",
  "column": "bar",
  "group": [ "foo"]
  "newColumnName": "myNewColumn"
}

Example:

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

Step configuration:

  "name": "percentage",
  "newColumn": "Percentage_of_total",
  "column": "Value",
  "group": [ "Group"]
  "newColumn": "Percentage"
}

Output dataset:

Label

Group

Value

Percentage

Label 1

Group 1

0.167

Label 2

Group 1

0.333

Label 3

Group 1

0.5

Label 4

Group 2

0.143

Label 5

Group 2

0.5

Label 6

Group 2

0.357

`pivot` step

Pivot rows into columns around a given index (expressed as a combination of column(s)). Values to be used as new column names are found in the column column_to_pivot. Values to populate new columns are found in the column value_column. The function used to aggregate data (when several rows are found by index group) must be among sum, avg, count, min or max.

{
 "name": "pivot",
 "index": [ "column_1", "column_2"],
 "columnToPivot": "column_3",
 "valueColumn": "column_4",
 "aggFunction": "sum"
}

Example:

Input dataset:

Label

Country

Value

Label 1

Country1

Label 2

Country1

Label 3

Country1

Label 1

Country2

Label 2

Country2

Label 3

Country2

label 3

Country2

Step configuration:

{
 "name": "pivot",
 "index": [ "Label"],
 "columnToPivot": "Country",
 "valueColumn": "Value",
 "aggFunction": "sum"
}

Output dataset:

Label

Country1

Country2

Label 1

Label 2

Label 3

`statistics` step

Compute statistics of a column.,

{
    "name": "statistics",
    "column": "Value",
    "groupby": [],
    "statistics": [ "average", "count"],
    "quantiles": [{"label": "median", "nth": 1, "order": 2}]
}

Example:

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

Step configuration:

{
    "name": "statistics",
    "column": "Value",
    "groupby": [],
    "statistics": [ "average", "count"],
    "quantiles": [{"label": "median", "nth": 1, "order": 2}]
}

Output dataset:

average

count

median

9.33333

8.5

`rank` step

This step allows to compute a rank column based on a value column that can be sorted in ascending or descending order. The ranking can be computed by group.

There are 2 ranking methods available, that you will understand easily through those examples:

standard: input = [10, 20, 20, 20, 25, 25, 30] => ranking = [1, 2, 2, 2, 5, 5, 7]
dense: input = [10, 20, 20, 20, 25, 25, 30] => ranking = [1, 2, 2, 2, 3, 3, 4]

(The dense method is basically the same as the standard method, but rank always increases by 1 at most).

{
  "name": "rank",
  "valueCol": "VALUE",
  "order": "desc",

  "method": "standard",
  "groupby": [ "foo", "bar"],
  "newColumnName": "columnA"
}

Example 1: Basic usage

Input dataset:

COUNTRY

VALUE

FRANCE

USA

Step configuration:

{
  "name": "rank",
  "valueCol": "VALUE",
  "order": "desc",
  "method": "standard"
}

Output dataset:

COUNTRY

VALUE

VALUE_RANK

USA

FRANCE

USA

FRANCE

USA

FRANCE

Example 2: With more options

Input dataset:

COUNTRY

VALUE

FRANCE

USA

Step configuration:

{
  "name": "rank",
  "valueCol": "VALUE",
  "order": "asc",
  "method": "dense",
  "groupby": [ "COUNTRY"],
  "newColumnName": "MY_RANK"
}

Output dataset:

COUNTRY

VALUE

MY_RANK

FRANCE

USA

`rename` step

Rename one or several columns. The toRename parameter takes as input a list of 2-elements lists in the form ['oldColumnName', 'newColumnName'].

{
    "name": "rename",
    "toRename": [
      [ "oldCol1", "newCol1"]
      [ "oldCol2", "newCol2"]
    ]
}

Example:

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

Step configuration:

{
  "name": "rename",
  "toRename": [["Label", "Company"]]
}

Output dataset:

Company

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

`replace` step

Replace one or several values in a column.

A replace step has the following strucure:

{
   "name": "replace",
   "searchColumn": "column_1",
   "toReplace": [
     [ "foo", "bar"],
     [42, 0]
   ]
}

Example

Input dataset:

COMPANY

COUNTRY

Company 1

Company 2

Step configuration:

{
   "name": "replace",
   "searchColumn": "COUNTRY",
   "toReplace": [
     [ "Fr", "France"]
     [ "UK", "United Kingdom"]
   ]
}

Output dataset:

COMPANY

COUNTRY

Company 1

France

Company 2

United Kingdom

`replacetext` step

Replace a substring in a column.

A replace-text step has the following structure:

{
   "name": "replacetext",
   "searchColumn": "column_1",
   "oldStr": "foo",
   "newStr": "bar"
}

Example

Input dataset:

COMPANY

COUNTRY

Company 1

Fr is boring

Company 2

Step configuration:

{
   "name": "replacetext",
   "searchColumn": "COUNTRY",
   "oldStr": "Fr",
   "newStr": "France"
}

Output dataset:

COMPANY

COUNTRY

Company 1

France is boring

Company 2

`rollup` step

Use this step if you need to compute aggregated data at every level of a hierarchy, specified as a series of columns from top to bottom level. The output data structure stacks the data of every level of the hierarchy, specifying for every row the label, level and parent in dedicated columns.

Aggregated rows can be computed with using either sum, average, count, count distinct, min, max, first or last.

{
   "name": "rollup",
   "hierarchy": [ "continent", "country", "city"],
   "aggregations": [
   {
      "newcolumns": [ "sum_value1", "sum_value2"],
      "aggfunction": "sum",
      "columns": [ "value1", "value2"]
    }
    {
      "newcolumns": [ "avg_value1"],
      "aggfunction": "avg",
      "columns": [ "value1"]
    }
   ],
   "groupby": [ "date"],
   "labelCol": "label",
   "levelCol": "level",
   "childLevelCol": "child_level",
   "parentLabelCol": "parent"
}

Example 1 : Basic configuration

Input dataset:

CITY

COUNTRY

CONTINENT

YEAR

VALUE

Paris

France

Europe

2018

Bordeaux

France

Europe

2018

Barcelona

Spain

Europe

2018

Madrid

Spain

Europe

2018

Boston

USA

North America

2018

New-York

USA

North America

2018

Montreal

Canada

North America

2018

Ottawa

Canada

North America

2018

Paris

France

Europe

2019

Bordeaux

France

Europe

2019

Barcelona

Spain

Europe

2019

Madrid

Spain

Europe

2019

Boston

USA

North America

2019

New-York

USA

North America

2019

Montreal

Canada

North America

2019

Ottawa

Canada

North America

2019

Step configuration:

{
   "name": "rollup",
   "hierarchy": [ "CONTINENT", "COUNTRY", "CITY"],
   "aggregations": [
    {
      "newcolumns": [ "VALUE"],
      "aggfunction": "sum",
      "columns": [ "VALUE"]
    }
   ]
}

Output dataset:

CITY

COUNTRY

CONTINENT

label

level

child_level

parent

VALUE

Europe

CONTINENT

COUNTRY

North America

CONTINENT

COUNTRY

112

France

Europe

France

COUNTRY

CITY

Europe

Spain

Europe

Spain

COUNTRY

CITY

Europe

USA

North America

USA

COUNTRY

CITY

North America

Canada

North America

Canada

COUNTRY

CITY

North America

Paris

France

Europe

Paris

CITY

France

Bordeaux

France

Europe

Bordeaux

CITY

France

Barcelona

Spain

Europe

Barcelona

CITY

Spain

Madrid

Spain

Europe

Madrid

CITY

Spain

Boston

USA

North America

Boston

CITY

USA

New-York

USA

North America

New-York

CITY

USA

Montreal

Canada

North America

Montreal

CITY

Canada

Ottawa

Canada

North America

Ottawa

CITY

Canada

Example 2 : Configuration with optional parameters

Input dataset:

CITY

COUNTRY

CONTINENT

YEAR

VALUE

COUNT

Paris

France

Europe

2018

Bordeaux

France

Europe

2018

Barcelona

Spain

Europe

2018

Madrid

Spain

Europe

2018

Boston

USA

North America

2018

New-York

USA

North America

2018

Montreal

Canada

North America

2018

Ottawa

Canada

North America

2018

Paris

France

Europe

2019

Bordeaux

France

Europe

2019

Barcelona

Spain

Europe

2019

Madrid

Spain

Europe

2019

Boston

USA

North America

2019

New-York

USA

North America

2019

Montreal

Canada

North America

2019

Ottawa

Canada

North America

2019

Step configuration:

{
   "name": "rollup",
   "hierarchy": [ "CONTINENT", "COUNTRY", "CITY"],
   "aggregations": [
    {
      "newcolumns": [ "VALUE-sum", "COUNT"],
      "aggfunction": "sum",
      "columns": [ "VALUE", "COUNT"]
    },
    {
      "newcolumns": [ "VALUE-avg"],
      "aggfunction": "avg",
      "columns": [ "VALUE"]
    }
   ],
   "groupby": [ "YEAR"],
   "labelCol": "MY_LABEL",
   "levelCol": "MY_LEVEL",
   "childLevelCol": "MY_CHILD_LEVEL",
   "parentLabelCol": "MY_PARENT"
}

Output dataset:

CITY

COUNTRY

CONTINENT

YEAR

MY_LABEL

MY_LEVEL

MY_CHILD_LEVEL

MY_PARENT

VALUE-sum

VALUE-avg

COUNT

North America

2018

Europe

CONTINENT

COUNTRY

6.5

North America

2018

North America

CONTINENT

COUNTRY

12.5

France

Europe

2018

France

COUNTRY

CITY

Europe

7.5

Spain

Europe

2018

Spain

COUNTRY

CITY

Europe

5.5

USA

North America

2018

USA

COUNTRY

CITY

North America

16.5

Canada

North America

2018

Canada

COUNTRY

CITY

North America

8.5

Paris

France

Europe

2018

Paris

CITY

France

Bordeaux

France

Europe

2018

Bordeaux

CITY

France

Barcelona

Spain

Europe

2018

Barcelona

CITY

Spain

Madrid

Spain

Europe

2018

Madrid

CITY

Spain

Boston

USA

North America

2018

Boston

CITY

USA

New-York

USA

North America

2018

New-York

CITY

USA

Montreal

Canada

North America

2018

Montreal

CITY

Canada

Ottawa

Canada

North America

2018

Ottawa

CITY

Canada

North America

2019

Europe

CONTINENT

COUNTRY

9.5

North America

2019

North America

CONTINENT

COUNTRY

15.5

France

Europe

2019

France

COUNTRY

CITY

Europe

10.5

Spain

Europe

2019

Spain

COUNTRY

CITY

Europe

8.5

USA

North America

2019

USA

COUNTRY

CITY

North America

19.5

Canada

North America

2019

Canada

COUNTRY

CITY

North America

11.5

Paris

France

Europe

2019

Paris

CITY

France

Bordeaux

France

Europe

2019

Bordeaux

CITY

France

Barcelona

Spain

Europe

2019

Barcelona

CITY

Spain

Madrid

Spain

Europe

2019

Madrid

CITY

Spain

Boston

USA

North America

2019

Boston

CITY

USA

New-York

USA

North America

2019

New-York

CITY

USA

Montreal

Canada

North America

2019

Montreal

CITY

Canada

Ottawa

Canada

North America

2019

Ottawa

CITY

Canada

`select` step

Select a column. The default is to keep every columns of the input domain. If the select is used, it will only keep selected columns in the output.

{
    "name": "select",
    "columns": [ "my-column", "some-other-column"]
}

Example

Input dataset:

Company

Group

Value

Label

Company 1

Group 1

Company 1 - Group 1

Company 2

Group 1

Company 2 - Group 1

Company 3

Group 1

Company 3 - Group 1

Company 4

Group 2

Company 4 - Group 2

Company 5

Group 2

Company 5 - Group 2

Company 6

Group 2

Company 6 - Group 2

Step configuration:

{
  {
    "name": "select",
    "columns": [ "Value", "Label"]
}
}

Output dataset:

Value

Label

Company 1 - Group 1

Company 2 - Group 1

Company 3 - Group 1

Company 4 - Group 2

Company 5 - Group 2

Company 6 - Group 2

`sort` step

Sort values in one or several columns. Order can be either 'asc' or 'desc'. When sorting on several columns, order of columns specified in columns matters.

{
    "name": "sort",
    "columns": [{"column": "foo", "order": "asc"}, {"column": "bar", "order": "desc"}]
}

Example

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

Step configuration:

{
    "name": "sort",
    "columns": [{ "column": "Group", "order": "asc"}, {"column": "Value", "order": "desc" }]
}

Output dataset:

Company

Group

Value

Label 3

Group 1

Label 1

Group 1

Label 2

Group 1

Label 5

Group 2

Label 6

Group 2

Label 4

Group 2

`split` step

Split a string column into several columns based on a delimiter.

{
  "name": "split",
  "column": "foo",
  "delimiter": " - ",
  "numberColsToKeep": 3

}

Example 1

Input dataset:

Label

Value

Label 1 - Group 1 - France

Label 2 - Group 1 - Spain

Label 3 - Group 1 - USA

Label 4 - Group 2 - France

Label 5 - Group 2 - Spain

Label 6 - Group 2 - USA

Step configuration:

{
  "name": "split",
  "column": "Label",
  "delimiter": " - ",
  "numberColsToKeep": 3
}

Output dataset:

Label_1

Label_2

Label_3

Value

Label 1

Group 1

Spain

Label 2

Group 1

USA

Label 3

Group 1

France

Label 4

Group 2

USA

Label 5

Group 2

France

Label 6

Group 2

Spain

Example 2: keeping less columns

Input dataset:

Label

Value

Label 1 - Group 1 - France

Label 2 - Group 1 - Spain

Label 3 - Group 1 - USA

Label 4 - Group 2 - France

Label 5 - Group 2 - Spain

Label 6 - Group 2 - USA

Step configuration:

{
  "name": "split",
  "column": "Label",
  "delimiter": " - ",
  "numberColsToKeep": 2
}

Output dataset:

Label_1

Label_2

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

`simplify` step

Simplifies geographical data.

When simplifying your data, every point that is closer than a specific distance to the previous one is suppressed. This step can be useful if you have a very precise shape for a country (such as one-meter precision), but want to quickly draw a map chart. In that case, you may want to simplify your data.

After simplification, no points will be closer than tolerance. The unit depends on data's projection and on its unit, but in general, it's expressed in meters for CRS projections. For more details, see the GeoPandas documentation.

Step configuration:

{
  "name": "simplify",
  "tolerance": 1.0
}

`substring` step

Extract a substring in a string column. The substring begins at index start_index (beginning at 1) and stops at end_index. You can specify negative indexes, in such a case the index search will start from the end of the string (with -1 being the last index of the string). Please refer to the examples below for illustration. Neither start_index nor end_index can be equal to 0.

{
  "name": "substring",
  "column": "foo",
  "startIndex": 1,
  "endIndex": -1,
  "newColumnName": "myNewColumn"
}

Example 1: positive `start_index` and `end_index`

Input dataset:

Group

Value

foo

overflow

some_text

a_word

toucan

toco

Step configuration:

{
  "column": "Label",
  "name": "substring",
  "startIndex": 1,
  "endIndex": 4
}

Label

Value

Label_PCT

foo

overflow

over

some_text

some

a_word

a_wo

toucan

touc

toco

Example 2: `start_index` is positive and `end_index` is negative

Input dataset:

Label

Value

foo

overflow

some_text

a_word

toucan

toco

Step configuration:

{
  "name": "substring",
  "column": "Label",
  "startIndex": 2,
  "endIndex": -2,
  "newColumnName": "short_label"
}

Output dataset:

Label

Value

short_label

foo

overflow

verflo

some_text

ome_tex

a_word

_wor

toucan

ouca

toco

Example 3: `start_index` and `end_index` are negative

Input dataset:

Label

Value

foo

overflow

some_text

a_word

toucan

toco

Step configuration:

{
  "name": "substring",
  "column": "Label",
  "startIndex": -3,
  "endIndex": -1
}

Output dataset:

Label

Value

Label_PCT

foo

overflow

low

some_text

ext

a_word

ord

toucan

can

toco

oco

`text` step

Use this step to add a text column where every value will be equal to the specified text.

{
  {
    "name": "text",
    "newColumn": "new",
    "text": "some text"
  }
}

Example

Input dataset:

Label

Value1

Label 1

Label 2

Label 3

Step configuration:

{
  "name": "text",
  "newColumn": "KPI",
  "text": "Sales"
}

Output dataset:

Label

Value1

KPI

Label 1

Sales

Label 2

Sales

Label 3

Sales

`todate` step

Converts a string column into a date column based on a specified format.

{
    "name": "todate",
    "column": "myTextColumn"
    "format": "%Y-%m-%d"



}

Example

Input dataset:

Company

Date

Value

Company 1

06/10/2019

Company 1

07/10/2019

Company 1

08/10/2019

Company 2

06/10/2019

Company 2

07/10/2019

Company 2

08/10/2019

Step configuration:

{
  "name": "todate",
  "column": "Date",
  "format": "%d/%m/%Y"
}

Output dataset:

Company

Date

Value

Company 1

2019-10-06T00:00:00.000Z

Company 1

2019-10-07T00:00:00.000Z

Company 1

2019-10-08T00:00:00.000Z

Company 2

2019-10-06T00:00:00.000Z

Company 2

2019-10-07T00:00:00.000Z

Company 2

2019-10-08T00:00:00.000Z

`top` step

Return top N rows by group if groups is specified, else over full dataset.

{
  "name": "top",
  "groups": [ "foo"],
  "rankOn": "bar",
  "sort": "desc",
  "limit": 10
}

Example 1: top without `groups`, ascending order

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

Step configuration:

{
  "name": "top",
  "rankOn": "Value",
  "sort": "asc",
  "limit": 3
}

Output dataset:

Label

Group

Value

Label 4

Group 2

Label 6

Group 2

Label 2

Group 1

Example 2: top with `groups`, descending order

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 4

Group 2

Label 5

Group 2

Label 6

Group 2

Step configuration:

{
  "name": "top",
  "groups": [ "Group"],
  "rankOn": "Value",
  "sort": "desc",
  "limit": 1
}

Output dataset:

Company

Group

Value

Label 3

Group 1

Label 5

Group 2

`totals` step

Append "total" rows to the dataset for specified dimensions. Computed rows result from an aggregation (either sum, average, count, count distinct, min, max, first or last)

{
  "name": "totals",

  "totalDimensions": [
    { "totalColumn": "foo", "totalRowsLabel": "Total foos" },
    { "totalColumn": "bar", "totalRowsLabel": "Total bars" }
  ],
  "aggregations": [
    {
      "columns": [ "value1", "value2"]
      "aggfunction": "sum",
      "newcolumns": [ "sum_value1", "sum_value2"]
    },
    {
      "columns": [ "value1"]
      "aggfunction": "avg",
      "newcolumns": [ "avg_value1"]
    }
   ],
  "groups": [ "someDimension"]
}

Example 1: basic usage

Input dataset:

COUNTRY

PRODUCT

YEAR

VALUE

France

product A

2019

USA

product A

2019

France

product B

2019

USA

product B

2019

France

product A

2020

USA

product A

2020

France

product B

2020

USA

product B

2020

Step configuration:

{
  "name": "totals",
  "totalDimensions": [{ "totalColumn": "COUNTRY", "totalRowsLabel": "All countries" }],
  "aggregations": [
    {
      "columns": [ "VALUE"]
      "aggfunction": "sum",
      "newcolumns": [ "VALUE"]
    }
   ]
}

Output dataset:

COUNTRY

PRODUCT

YEAR

VALUE

France

product A

2019

USA

product A

2019

France

product B

2019

USA

product B

2019

France

product A

2020

USA

product A

2020

France

product B

2020

USA

product B

2020

All countries

null

135

Example 2: With several totals and groups

Input dataset:

COUNTRY

PRODUCT

YEAR

VALUE_1

VALUE_2

France

product A

2019

USA

product A

2019

100

France

product B

2019

100

USA

product B

2019

150

France

product A

2020

200

USA

product A

2020

200

France

product B

2020

300

USA

product B

2020

250

Step configuration:

{
  "name": "totals",
  "totalDimensions": [
    {"totalColumn": "COUNTRY", "totalRowsLabel": "All countries"},
    {"totalColumn": "PRODUCT", "totalRowsLabel": "All products"}
  ],
  "aggregations": [
    {
      "columns": [ "VALUE_1-sum", "VALUE_2"]
      "aggfunction": "sum",
      "newcolumns": [ "VALUE_1", "VALUE_2"]
    },
    {
      "columns": [ "VALUE_1-avg"]
      "aggfunction": "avg",
      "newcolumns": [ "VALUE_1"]
    }
   ],
   "groups": [ "YEAR"]
}

Output dataset:

COUNTRY

PRODUCT

YEAR

VALUE_2

VALUE_1-sum

VALUE_1-avg

France

product A

2019

USA

product A

2019

100

France

product B

2019

100

USA

product B

2019

150

France

product A

2020

200

USA

product A

2020

200

France

product B

2020

300

USA

product B

2020

250

USA

All products

2020

450

22.5

France

All products

2020

500

USA

All products

2019

250

12.5

France

All products

2019

150

7.5

All countries

product B

2020

550

27.5

All countries

product A

2020

400

All countries

product B

2019

250

12.5

All countries

product A

2019

150

7.5

All countries

All products

2020

950

23.75

All countries

All products

2019

400

`trim` step

Trim spaces in a column.

{
    "name": "trim",
    "columns": [ "my-column", "some-other-column"]
}

Example

Input dataset:

Company

Group

Value

Label

' Company 1 '

Group 1

Company 1 - Group 1

' Company 2 '

Group 1

Company 2 - Group 1

Step configuration:

{
  "name": "trim",
  "columns": [ "Company"]
}

Output dataset:

Company

Group

Value

Label

'Company 1'

Group 1

Company 1 - Group 1

'Company 2'

Group 1

Company 2 - Group 1

`unpivot` step

Unpivot a list of columns to rows.

{
  "name": "unpivot",
  "keep": [ "COMPANY", "COUNTRY"],
  "unpivot": [ "NB_CLIENTS", "REVENUES"],
  "unpivotColumnName": "KPI",
  "valueColumnName": "VALUE",
  "dropna": true
}

Example 1: with `dropna`parameter to true

Input dataset:

COMPANY

COUNTRY

NB_CLIENTS

REVENUES

Company 1

France

Company 2

France

Company 1

USA

Company 2

USA

Step configuration:

{
  "name": "unpivot",
  "keep": [ "COMPANY", "COUNTRY"],
  "unpivot": [ "NB_CLIENTS", "REVENUES"],
  "unpivotColumnName": "KPI",
  "valueColumnName": "VALUE",
  "dropna": true
}

Output dataset:

COMPANY

COUNTRY

KPI

VALUE

Company 1

France

NB_CLIENTS

Company 1

France

REVENUES

Company 2

France

NB_CLIENTS

Company 1

USA

NB_CLIENTS

Company 1

USA

REVENUES

Company 2

USA

NB_CLIENTS

Company 2

USA

REVENUES

Example 1: with `dropna`parameter to false

Input dataset:

COMPANY

COUNTRY

NB_CLIENTS

REVENUES

Company 1

France

Company 2

France

Company 1

USA

Company 2

USA

Step configuration:

{
  "name": "unpivot",
  "keep": [ "COMPANY", "COUNTRY"],
  "unpivot": [ "NB_CLIENTS", "REVENUES"],
  "unpivotColumnName": "KPI",
  "valueColumnName": "VALUE",
  "dropna": false
}

Output dataset:

COMPANY

COUNTRY

KPI

VALUE

Company 1

France

NB_CLIENTS

Company 1

France

REVENUES

Company 2

France

NB_CLIENTS

Company 2

France

REVENUES

Company 1

USA

NB_CLIENTS

Company 1

USA

REVENUES

Company 2

USA

NB_CLIENTS

Company 2

USA

REVENUES

`uppercase` step

⚠️ Mongo's $toUpper operator does not support accents. If you have accents you need to uppercase with Mongo, use a replacetext step after uppercase.

Converts a string column to uppercase.

{
  "name": "uppercase",
  "column": "foo"
}

Example:

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Step configuration:

{
  "name": "uppercase",
  "column": "Label"
}

Output dataset:

Label

Group

Value

LABEL 1

Group 1

LABEL 2

Group 1

LABEL 3

Group 1

`uniquegroups` step

Allow to get unique groups of values from one or several columns.

{
  "name": "uniquegroups",
  "on": [ "foo", "bar"]
}

Example:

Input dataset:

Label

Group

Value

Label 1

Group 1

Label 2

Group 1

Label 3

Group 1

Label 1

Group 2

Label 2

Group 1

Label 3

Group 1

Step configuration:

{
  "name": "uniquegroups",
  "column": [ "Label", "Group"]
}

Output dataset:

Label

Group

Label 1

Group 1

Label 1

Group 2

Label 2

Group 1

Label 3

Group 1

`waterfall` step

This step allows to generate a data structure useful to build waterfall charts. It breaks down the variation between two values (usually between two dates) accross entities. Entities are found in the labelsColumn, and can optionally be regrouped under common parents found in the parentsColumn for drill-down purposes.

{
  "name": "waterfall",
  "valueColumn": "VALUE",
  "milestonesColumn": "DATE",

  "start": "2019",
  "end": "2020",
  "labelsColumn": "PRODUCT",
  "groupby": [ "COUNTRY"],
  "sortBy": "value",

  "order": "desc"

}

Example 1: Basic usage

Input dataset:

city

year

revenue

Bordeaux

2019

135

Boston

2019

275

New-York

2019

115

Paris

2019

450

Bordeaux

2018

Boston

2018

245

New-York

2018

103

Paris

2018

385

Step configuration:

{
  "name": "waterfall",
  "valueColumn": "revenue",
  "milestonesColumn": "year",
  "start": "2018",
  "end": "2019",
  "labelsColumn": "city",
  "sortBy": "value",
  "order": "desc"
}

Output dataset:

LABEL_waterfall

TYPE_waterfall

revenue

2018

null

831

Paris

parent

Bordeaux

parent

Boston

parent

New-York

parent

2019

null

975

Example 2: With more options

Input dataset:

city

country

product

year

revenue

Bordeaux

France

product1

2019

Bordeaux

France

product2

2019

Paris

France

product1

2019

210

Paris

France

product2

2019

240

Boston

USA

product1

2019

130

Boston

USA

product2

2019

145

New-York

USA

product1

2019

New-York

USA

product2

2019

Bordeaux

France

product1

2018

Bordeaux

France

product2

2018

Paris

France

product1

2018

175

Paris

France

product2

2018

210

Boston

USA

product1

2018

Boston

USA

product2

2018

150

New-York

USA

product1

2018

New-York

USA

product2

2018

Step configuration:

{
  "name": "waterfall",
  "valueColumn": "revenue",
  "milestonesColumn": "year",
  "start": "2018",
  "end": "2019",
  "labelsColumn": "city",
  "parentsColumn": "country",
  "groupby": [ "product"],
  "sortBy": "label",
  "order": "asc"
}

Output dataset:

LABEL_waterfall

GROUP_waterfall

TYPE_waterfall

product

revenue

2018

null

product1

358

2018

null

product2

473

Bordeaux

France

child

product1

Bordeaux

France

child

product2

Boston

USA

child

product1

Boston

USA

child

product2

-5

France

parent

product2

France

parent

product1

New-York

USA

child

product1

New-York

USA

child

product2

Paris

France

child

product1

Paris

France

child

product2

USA

parent

product2

USA

parent

product1

2019

null

product2

515

2019

null

product1

460

PreviousAPIs NextFAQ

Last updated 1 month ago

Was this helpful?

hashtagabsolutevalue step

hashtagExample

hashtagaddmissingdates step

hashtagExample 1: day granularity without groups

hashtagExample 2: day granularity with groups

hashtagExample 3: month granularity

hashtagaggregate step

hashtagExample 1: keepOriginalGranularity set to false

hashtagExample 2: keepOriginalGranularity set to true

hashtagappend step

hashtagExample

hashtagargmax step

hashtagExample 1: without groups

hashtagExample 2: with groups

hashtagargmin step

hashtagExample 1: without groups

hashtagExample 2: with groups

hashtagcomparetext step

hashtagExample

hashtagconcatenate step

hashtagExample

hashtagconvert step

hashtagExample

hashtagcumsum step

hashtagExample 1: Basic usage

hashtagExample 2: With more advanced options

hashtagcustom step

hashtagExample: using Mongo query language

hashtagdateextract step

hashtagExample

hashtagdategranularity step

hashtagExample

hashtagdelete step

hashtagExample

hashtagdissolve step

hashtagExample without aggregations

hashtagExample with aggregations

hashtagdomain step

hashtagduplicate step

hashtagExample

hashtagduration step

hashtagExample 1: duration in days

hashtagExample 2: duration in minutes

hashtagevolution step

hashtagExample 1: Basic configuration - evolution in absolute value

hashtagExample 2: Basic configuration - evolution in percentage

hashtagExample 3: Error on duplicate dates

hashtagExample 4: Complete configuration with index columns

hashtagfillna step

hashtagExample

hashtagfilter step

hashtagRelative dates

hashtagformula step

hashtagSupported operators

hashtagExample 1: Basic usage

hashtagExample 2: Column name with whitespaces

hashtaghierarchy step

hashtagExample

hashtagifthenelse step

hashtagExample

hashtagjoin step

hashtagExample 1: Left join with one column couple as on parameter

hashtagExample 2: inner join with different column names in the on parameter

hashtagfromdate step

hashtagExample

hashtaglowercase step

hashtagExample:

hashtagmovingaverage step

hashtagExample 1: Basic usage

hashtagExample 2: with groups and custom newColumnName

hashtagpercentage step

hashtagExample:

hashtagpivot step

hashtagExample:

hashtagstatistics step

hashtagExample:

hashtagrank step

hashtagExample 1: Basic usage

hashtagExample 2: With more options

hashtagrename step

`absolutevalue` step

Example

`addmissingdates` step

Example 1: day granularity without groups

Example 2: day granularity with groups

Example 3: month granularity

`aggregate` step

Example 1: keepOriginalGranularity set to false

Example 2: keepOriginalGranularity set to true

`append` step

Example

`argmax` step

Example 1: without `groups`

Example 2: with `groups`

`argmin` step

Example 1: without `groups`

Example 2: with `groups`

`comparetext` step

Example

`concatenate` step

Example

`convert` step

Example

`cumsum` step

Example 1: Basic usage

Example 2: With more advanced options

`custom` step

Example: using Mongo query language

`dateextract` step

Example

`dategranularity` step

Example

`delete` step

Example

`dissolve` step

Example without aggregations

Example with aggregations

`domain` step

`duplicate` step

Example

`duration` step

Example 1: duration in days

Example 2: duration in minutes

`evolution` step

Example 1: Basic configuration - evolution in absolute value

Example 2: Basic configuration - evolution in percentage

Example 3: Error on duplicate dates

Example 4: Complete configuration with index columns

`fillna` step

Example

`filter` step

Relative dates

`formula` step

Supported operators

Example 1: Basic usage

Example 2: Column name with whitespaces

`hierarchy` step

Example

`ifthenelse` step

Example

`join` step

Example 1: Left join with one column couple as `on` parameter

Example 2: inner join with different column names in the `on` parameter

`fromdate` step

Example

`lowercase` step

Example:

`movingaverage` step

Example 1: Basic usage

Example 2: with groups and custom newColumnName

`percentage` step

Example:

`pivot` step

Example:

`statistics` step

Example:

`rank` step

Example 1: Basic usage

Example 2: With more options

`rename` step