Data steps

Steps are the building blocks of data transformation. They are used in chart queries: each step is a single operation in the query (filtering, aggregating, renaming columns, etc.). Steps run in sequence; the output of one step becomes the input of the next.

The JSON examples in this reference can be copied and adapted for your chart queries. Use the name field to identify the step type and fill in the other properties as described for each step below.

absolutevalue step

This step is meant to compute the absolute value of a given input column.

{
    "name": "absolutevalue",
    "column": "my-column",
    "newColumn": "my-new-column"
}

Example

Input dataset:

Company
Value

Company 1

-33

Company 2

0

Company 3

10

Step configuration:

{
  "name": "absolutevalue",
  "column": "Value",
  "newColumn": "My-absolute-value"
}

Output dataset:

Company
Value
My-absolute-value

Company 1

-33

33

Company 2

0

0

Company 3

10

10

addmissingdates step

Add missing dates as new rows in a dates column. Exhaustive dates will range between the minimum and maximum date found in the dataset (or in each group if a group by logic is applied - see thereafter).

Added rows will be set to null in columns not referenced in the step configuration.

You should make sure to use a group by logic if you want to add missing dates in independent groups of rows (e.g. you may need to add missing rows for every country found in a "COUNTRY" column). And you should ensure that every date is unique in every group of rows at the specified granularity, else you may get inconsistent results. You can specify "group by" columns in the groups parameter.

An addmissingdates step has the following structure:

Example 1: day granularity without groups

Input dataset:

DATE
VALUE

"2018-01-01T00:00:00.000Z"

75

"2018-01-02T00:00:00.000Z"

80

"2018-01-03T00:00:00.000Z"

82

"2018-01-04T00:00:00.000Z"

83

"2018-01-05T00:00:00.000Z"

80

"2018-01-07T00:00:00.000Z"

86

"2018-01-08T00:00:00.000Z"

79

"2018-01-09T00:00:00.000Z"

76

"2018-01-10T00:00:00.000Z"

79

"2018-01-11T00:00:00.000Z"

75

Here the day "2018-01-06" is missing.

Step configuration:

Output dataset:

DATE
VALUE

"2018-01-01T00:00:00.000Z"

75

"2018-01-02T00:00:00.000Z"

80

"2018-01-03T00:00:00.000Z"

82

"2018-01-04T00:00:00.000Z"

83

"2018-01-05T00:00:00.000Z"

80

"2018-01-06T00:00:00.000Z"

"2018-01-07T00:00:00.000Z"

86

"2018-01-08T00:00:00.000Z"

79

"2018-01-09T00:00:00.000Z"

76

"2018-01-10T00:00:00.000Z"

79

"2018-01-11T00:00:00.000Z"

75

Example 2: day granularity with groups

Input dataset:

COUNTRY
DATE
VALUE

France

"2018-01-01T00:00:00.000Z"

75

France

"2018-01-02T00:00:00.000Z"

80

France

"2018-01-03T00:00:00.000Z"

82

France

"2018-01-04T00:00:00.000Z"

83

France

"2018-01-05T00:00:00.000Z"

80

France

"2018-01-07T00:00:00.000Z"

86

France

"2018-01-08T00:00:00.000Z"

79

France

"2018-01-09T00:00:00.000Z"

76

France

"2018-01-10T00:00:00.000Z"

79

France

"2018-01-11T00:00:00.000Z"

85

USA

"2018-01-01T00:00:00.000Z"

69

USA

"2018-01-02T00:00:00.000Z"

73

USA

"2018-01-03T00:00:00.000Z"

73

USA

"2018-01-05T00:00:00.000Z"

75

USA

"2018-01-06T00:00:00.000Z"

70

USA

"2018-01-07T00:00:00.000Z"

76

USA

"2018-01-08T00:00:00.000Z"

73

USA

"2018-01-09T00:00:00.000Z"

70

USA

"2018-01-10T00:00:00.000Z"

72

USA

"2018-01-12T00:00:00.000Z"

78

Here the day "2018-01-06" is missing for "France" rows, and "2018-01-11" and "2018-01-11" are missing for "USA" rows.

Note that "2018-01-12" will not be considered as a missing row for "France" rows, because the latest date found for this group of rows is "2018-01-11" (even though "2018-01-12" is the latest date found for "USA" rows).

Step configuration:

Output dataset:

COUNTRY
DATE
VALUE

France

"2018-01-01T00:00:00.000Z"

75

France

"2018-01-02T00:00:00.000Z"

80

France

"2018-01-03T00:00:00.000Z"

82

France

"2018-01-04T00:00:00.000Z"

83

France

"2018-01-05T00:00:00.000Z"

80

France

"2018-01-06T00:00:00.000Z"

France

"2018-01-07T00:00:00.000Z"

86

France

"2018-01-08T00:00:00.000Z"

79

France

"2018-01-09T00:00:00.000Z"

76

France

"2018-01-10T00:00:00.000Z"

79

France

"2018-01-11T00:00:00.000Z"

85

USA

"2018-01-01T00:00:00.000Z"

69

USA

"2018-01-02T00:00:00.000Z"

73

USA

"2018-01-03T00:00:00.000Z"

73

USA

"2018-01-04T00:00:00.000Z"

USA

"2018-01-05T00:00:00.000Z"

75

USA

"2018-01-06T00:00:00.000Z"

70

USA

"2018-01-07T00:00:00.000Z"

76

USA

"2018-01-08T00:00:00.000Z"

73

USA

"2018-01-09T00:00:00.000Z"

70

USA

"2018-01-10T00:00:00.000Z"

72

USA

"2018-01-11T00:00:00.000Z"

USA

"2018-01-12T00:00:00.000Z"

78

Example 3: month granularity

Input dataset:

DATE
VALUE

"2019-01-01T00:00:00.000Z"

74

"2019-02-01T00:00:00.000Z"

73

"2019-03-01T00:00:00.000Z"

68

"2019-04-01T00:00:00.000Z"

71

"2019-06-01T00:00:00.000Z"

74

"2019-07-01T00:00:00.000Z"

74

"2019-08-01T00:00:00.000Z"

73

"2019-09-01T00:00:00.000Z"

72

"2019-10-01T00:00:00.000Z"

75

"2019-12-01T00:00:00.000Z"

76

Here "2019-05" and "2019-11" are missing.

Step configuration:

Output dataset:

DATE
VALUE

"2019-01-01T00:00:00.000Z"

74

"2019-02-01T00:00:00.000Z"

73

"2019-03-01T00:00:00.000Z"

68

"2019-04-01T00:00:00.000Z"

71

"2019-05-01T00:00:00.000Z"

"2019-06-01T00:00:00.000Z"

74

"2019-07-01T00:00:00.000Z"

74

"2019-08-01T00:00:00.000Z"

73

"2019-09-01T00:00:00.000Z"

72

"2019-10-01T00:00:00.000Z"

75

"2019-11-01T00:00:00.000Z"

"2019-12-01T00:00:00.000Z"

76

aggregate step

Perform aggregations on one or several columns. Available aggregation functions are sum, average, count, count distinct, min, max, first, last.

An aggreation step has the following structure:

Example 1: keepOriginalGranularity set to false

Input dataset:

Label
Group
Value1
Value2

Label 1

Group 1

13

10

Label 2

Group 1

7

21

Label 3

Group 1

20

4

Label 4

Group 2

1

17

Label 5

Group 2

9

12

Label 6

Group 2

5

2

Step configuration:

Output dataset:

Group
Sum-Value1
Sum-Value2
Avg-Value1

Group 1

40

35

13.333333

Group 2

16

31

5.333333

Example 2: keepOriginalGranularity set to true

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

Step configuration:

Output dataset:

Label
Group
Value

Label 1

Group 1

40

Label 2

Group 1

40

Label 3

Group 1

40

Label 4

Group 2

16

Label 5

Group 2

16

Label 6

Group 2

16

append step

Appends to the current dataset, one or several datasets resulting from other pipelines. WeaverBird allows you to save pipelines referenced by name in the Vuex store of the application. You can then call them by their unique names in this step.

Example

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

dataset1 (saved in the application Vuex store):

Label
Group
Value

Label 3

Group 1

20

Label 4

Group 2

1

dataset2 (saved in the application Vuex store):

Label
Group
Value

Label 5

Group 2

10

Label 6

Group 2

5

Step configuration:

Output dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

argmax step

Get row(s) matching the maximum value in a given column, by group if groups is specified.

Example 1: without groups

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

Step configuration:

Output dataset:

Label
Group
Value

Label 3

Group 1

20

Example 2: with groups

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

Step configuration:

Output dataset:

Label
Group
Value

Label 3

Group 1

20

Label 5

Group 2

10

argmin step

Get row(s) matching the minimum value in a given column, by group if groups is specified.

Example 1: without groups

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

Step configuration:

Output dataset:

Label
Group
Value

Label 4

Group 2

1

Example 2: with groups

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

Step configuration:

Output dataset:

Label
Group
Value

Label 2

Group 1

7

Label 4

Group 2

1

comparetext step

Compares 2 string columns and returns true if the string values are equal, and false oteherwise. The comparison is case-sensitive (see examples below).

Example

Input dataset:

TEXT_1
TEXT_2

France

Fr

France

France

France

france

France

England

France

USA

Step configuration:

Output dataset:

TEXT_1
TEXT_2
RESULT

France

Fr

false

France

France

true

France

france

false

France

England

false

France

USA

false

concatenate step

This step allows to concatenate several columns using a separator.

Example

Input dataset:

Company
Group
Value

Company 1

Group 1

13

Company 2

Group 1

7

Company 3

Group 1

20

Company 4

Group 2

1

Company 5

Group 2

10

Company 6

Group 2

5

Step configuration:

Output dataset:

Company
Group
Value
Label

Company 1

Group 1

13

Company 1 - Group 1

Company 2

Group 1

7

Company 2 - Group 1

Company 3

Group 1

20

Company 3 - Group 1

Company 4

Group 2

1

Company 4 - Group 2

Company 5

Group 2

10

Company 5 - Group 2

Company 6

Group 2

5

Company 6 - Group 2

convert step

This step allows to convert columns data types.

In a effort to harmonize as much as possible the conversion behaviors, for some cases, the Sql translator implements casting otherwise than the CAST AS method.

Precisely, when casting float to integer, the default behaviour rounds the result, other languages truncate it. That's why the use of TRUNCATE was implemented when converting float to int. The same implementation was done when converting strings to int (for date represented as string). As for the conversion of date to int, we handled it by assuming the dataset's timestamp is in TIMESTAMP_NTZ format.

Example

Input dataset:

Company
Value

Company 1

'13'

Company 2

'7'

Company 3

'20'

Company 4

'1'

Company 5

'10'

Company 6

'5'

Step configuration:

Output dataset:

Company
Value

Company 1

13

Company 2

7

Company 3

20

Company 4

1

Company 5

10

Company 6

5

cumsum step

This step allows to compute the cumulated sum of value columns based on a reference column (usually dates) to be sorted by ascending order for the needs of the computation. The computation can be scoped by group if needed.

The toCumSum parameter takes as input a list of 2-elements lists in the form ['valueColumn', 'newColumn'].

Example 1: Basic usage

Input dataset:

DATE
VALUE

2019-01

2

2019-02

5

2019-03

3

2019-04

8

2019-05

9

2019-06

6

Step configuration:

Output dataset:

DATE
VALUE
VALUE_CUMSUM

2019-01

2

2

2019-02

5

7

2019-03

3

10

2019-04

8

18

2019-05

9

27

2019-06 6

6

33

Example 2: With more advanced options

Input dataset:

COUNTRY
DATE
VALUE

France

2019-01

2

France

2019-02

5

France

2019-03

3

France

2019-04

8

France

2019-05

9

France

2019-06 6

6

USA

2019-01

10

USA

2019-02

6

USA

2019-03

6

USA

2019-04

4

USA

2019-05

8

USA

2019-06 6

7

Step configuration:

Output dataset:

COUNTRY
DATE
VALUE
MY_CUMSUM

France

2019-01

2

2

France

2019-02

5

7

France

2019-03

3

10

France

2019-04

8

18

France

2019-05

9

27

France

2019-06 6

6

33

USA

2019-01

10

10

USA

2019-02

6

16

USA

2019-03

6

22

USA

2019-04

4

26

USA

2019-05

8

34

USA

2019-06 6

7

41

custom step

This step allows to define a custom query that can't be expressed using the other existing steps.

Example: using Mongo query language

Input dataset:

Company
Group
Value

Company 1

Group 1

13

Company 2

Group 1

7

Company 3

Group 1

20

Company 4

Group 2

1

Company 5

Group 2

10

Company 6

Group 2

5

Step configuration:

Output dataset:

Company
Group
Value
Label

Company 1

Group 1

13

Company 1 - Group 1

Company 2

Group 1

7

Company 2 - Group 1

Company 3

Group 1

20

Company 3 - Group 1

Company 4

Group 2

1

Company 4 - Group 2

Company 5

Group 2

10

Company 5 - Group 2

Company 6

Group 2

5

Company 6 - Group 2

dateextract step

Extract date information (eg. day, week, year etc.). The following information can be extracted:

  • year: extract 'year' from date,

  • month: extract 'month' from date,

  • day: extract 'day of month' from date,

  • week': extract 'week number' (ranging from 0 to 53) from date,

  • quarter: extract 'quarter number' from date (1 for Jan-Feb-Mar)

  • dayOfWeek: extract 'day of week' (ranging from 1 for Sunday to 7 for Staurday) from date,

  • dayOfYear: extract 'day of year' from date,

  • isoYear: extract 'year number' in ISO 8601 format (ranging from 1 to 53) from date.

  • isoWeek: extract 'week number' in ISO 8601 format (ranging from 1 to 53) from date.

  • isoDayOfWeek: extract 'day of week' in ISO 8601 format (ranging from 1 for Monday to 7 for Sunday) from date,

  • firstDayOfYear: calendar date corresponding to the first day (1st of January) of the year ,

  • firstDayOfMonth: calendar date corresponding to the first day of the month,

  • firstDayOfWeek: calendar date corresponding to the first day of the week,

  • firstDayOfQuarter: calendar date corresponding to the first day of the quarter,

  • firstDayOfIsoWeek: calendar date corresponding to the first day of the week in ISO 8601 format,

  • currentDay: calendar date of the target date,

  • previousDay: calendar date one day before the target date,

  • firstDayOfPreviousYear: calendar date corresponding to the first day (1st of January) of the previous year,

  • firstDayOfPreviousMonth: calendar date corresponding to the first day of the previous month,

  • firstDayOfPreviousWeek: calendar date corresponding to the first day of the previous week,

  • firstDayOfPreviousQuarter: calendar date corresponding to the first day of the previous quarter,

  • firstDayOfPreviousISOWeek: calendar date corresponding to the first day of the previous ISO week,

  • previousYear: extract previous 'year number' from date

  • previousMonth: extract previous 'month number' from date

  • previousWeek: extract previous 'week number' from date

  • previousQuarter: extract previous 'quarter number' from date

  • previousISOWeek: extract previous 'week number' in ISO 8601 format (ranging from 1 for Monday to 7 for Sunday) from date

  • hour: extract 'hour' from date,

  • minutes: extract 'minutes' from date,

  • seconds: extract 'seconds' from date,

  • milliseconds: extract 'milliseconds' from date,

Here's an example of such a step:

Example

Input dataset:

Date

2019-10-30T00:00:00.000Z

2019-10-15T00:00:00.000Z

2019-10-01T00:00:00.000Z

2019-09-30T00:00:00.000Z

2019-09-15T00:00:00.000Z

2019-09-01T00:00:00.000Z

Step configuration:

Output dataset:

Date
Date_year
Date_month
Date_day

2019-10-30T00:00:00.000Z

2019

10

30

2019-10-15T00:00:00.000Z

2019

10

15

2019-10-01T00:00:00.000Z

2019

10

1

2019-09-30T00:00:00.000Z

2020

10

30

2019-09-15T00:00:00.000Z

2020

10

15

2019-09-01T00:00:00.000Z

2020

10

1

dategranularity step

Extract date information (eg. day, week, year etc.) in a column intended for aggregation. The following granularities are supported:

  • year: calendar date corresponding to the first day (1st of January) of the year

  • quarter: calendar date corresponding to the first day of the quarter

  • month: calendar date corresponding to the first day of the month

  • week: calendar date corresponding to the first day of the week (sunday)

  • isoWeek: calendar date corresponding to the first day of the week (monday)

  • day: calendar date corresponding to the first hour of the day

Here's an example of such a step:

Example

Input dataset:

Date

2019-10-30T00:00:00.000Z

2019-10-15T00:00:00.000Z

2019-10-01T00:00:00.000Z

2019-09-30T05:11:31.000Z

2019-09-15T00:00:00.000Z

2019-09-01T00:00:00.000Z

Step configuration:

Output dataset:

Date

2019-10-01T00:00:00.000Z

2019-10-01T00:00:00.000Z

2019-10-01T00:00:00.000Z

2019-09-01T00:00:00.000Z

2019-09-01T00:00:00.000Z

2019-09-01T00:00:00.000Z

delete step

Delete a column.

Example

Input dataset:

Company
Group
Value
Label

Company 1

Group 1

13

Company 1 - Group 1

Company 2

Group 1

7

Company 2 - Group 1

Company 3

Group 1

20

Company 3 - Group 1

Company 4

Group 2

1

Company 4 - Group 2

Company 5

Group 2

10

Company 5 - Group 2

Company 6

Group 2

5

Company 6 - Group 2

Step configuration:

Output dataset:

Value
Label

13

Company 1 - Group 1

7

Company 2 - Group 1

20

Company 3 - Group 1

1

Company 4 - Group 2

10

Company 5 - Group 2

5

Company 6 - Group 2

dissolve step

Geographically dissolve data

Example without aggregations

Input dataset:

Country
City
geometry

Country 1

City 1

Polygon

Country 2

City 2

Polygon

Country 2

City 3

Polygon

Country 1

City 4

Polygon

Country 2

City 5

Polygon

Country 1

City 6

Polygon

Step configuration:

Output dataset:

Country
geometry

Country 1

MultiPolygon

Country 2

MultiPolygon

Example with aggregations

Input dataset:

Country
City
geometry
Population

Country 1

City 1

Polygon

100_000

Country 2

City 2

Polygon

50_000

Country 2

City 3

Polygon

200_000

Country 1

City 4

Polygon

30_000

Country 2

City 5

Polygon

25_000

Country 1

City 6

Polygon

10_000

Step configuration:

Output dataset:

Country
geometry
Total population

Country 1

MultiPolygon

140_000

Country 2

MultiPolygon

275_000

domain step

This step is meant to select a specific domain (using MongoDB terminology).

duplicate step

This step is meant to duplicate a column.

Example

Input dataset:

Company
Value

Company 1

13

Company 2

0

Company 3

20

Step configuration:

Output dataset:

Company
Value
Company-copy

Company 1

13

Company 1

Company 2

0

Company 2

Company 3

20

Company 3

duration step

Compute the duration (in days, hours, minutes or seconds) between 2 dates in a new column.

Example 1: duration in days

Input dataset:

START_DATE
END_DATE

"2020-01-01T00:00:00.000Z"

"2020-01-31T00:00:00.000Z"

"2020-01-01T00:00:00.000Z"

"2020-12-31T00:00:00.000Z"

Step configuration:

Output dataset:

START_HOUR
END_HOUR
DURATION

"2020-01-01T00:00:00.000Z"

"2020-01-31T00:00:00.000Z"

30

"2020-01-01T00:00:00.000Z"

"2020-12-31T00:00:00.000Z"

365

Example 2: duration in minutes

Input dataset:

START_HOUR
END_HOUR

"2020-01-01T14:00:00.000Z"

"2020-01-31T15:00:00.000Z"

"2020-01-01T15:00:00.000Z"

"2020-12-31T20:00:00.000Z"

Step configuration:

Output dataset:

START_HOUR
END_HOUR
DURATION

"2020-01-01T14:00:00.000Z"

"2020-01-31T15:00:00.000Z"

60

"2020-01-01T15:00:00.000Z"

"2020-12-31T20:00:00.000Z"

300

evolution step

Use this step if you need to compute the row-by-row evolution of a value column, based on a date column. It will output 2 columns: one for the evolution in absolute value, the other for the evolution in percentage.

You must be careful that the computation is scoped so that there are no dates duplicates (so that any date finds no more than one previous date). That means that you may need to specify "group by" columns to make any date unique inside each group. You should specify those columns in the indexColumns parameter.

Example 1: Basic configuration - evolution in absolute value

Input dataset:

DATE
VALUE

2019-06

79

2019-07

81

2019-08

77

2019-09

75

2019-11

78

2019-12

88

Step configuration:

Output dataset:

DATE
VALUE
VALUE_EVOL_ABS

2019-06

79

2019-07

81

2

2019-08

77

-4

2019-09

75

-2

2019-11

78

2019-12

88

10

Example 2: Basic configuration - evolution in percentage

Input dataset:

DATE
VALUE

2019-06

79

2019-07

81

2019-08

77

2019-09

75

2019-11

78

2019-12

88

Step configuration:

Output dataset:

DATE
VALUE
VALUE_EVOL_PCT

2019-06

79

2019-07

81

0.02531645569620253

2019-08

77

-0.04938271604938271

2019-09

75

-0.025974025974025976

2019-11

78

2019-12

88

0.1282051282051282

Example 3: Error on duplicate dates

If 'COUNTRY' is not specified as indexColumn, the computation will not be scoped by country. Then there are duplicate dates in the "DATE" columns which is prohibited and will lead to an error.

Input dataset:

DATE
COUNTRY
VALUE

2014-12

France

79

2015-12

France

81

2016-12

France

77

2017-12

France

75

2014-12

USA

74

2015-12

USA

74

2016-12

USA

73

2017-12

USA

72

Step configuration:

Output dataset:

With the mongo translator, you will get an error at row-level as shown below:

DATE
COUNTRY
VALUE
MY_EVOL

2014-12

France

79

2015-12

France

81

Error ...

2016-12

France

77

Error ...

2017-12

France

75

Error ...

2014-12

USA

74

2015-12

USA

74

Error ...

2016-12

USA

73

Error ...

2017-12

USA

72

Error ...

The pandas translator will just return an error, and you will not get any data.

Example 4: Complete configuration with index columns

Input dataset:

DATE
COUNTRY
VALUE

2014-12

France

79

2015-12

France

81

2016-12

France

77

2017-12

France

75

2019-12

France

78

2020-12

France

88

2014-12

USA

74

2015-12

USA

74

2016-12

USA

73

2017-12

USA

72

2018-11

USA

75

2020-12

USA

76

Step configuration:

Output dataset:

DATE
COUNTRY
VALUE
MY_EVOL

2014-12

France

79

2015-12

France

81

2

2016-12

France

77

-4

2017-12

France

75

-2

2019-12

France

78

2020-12

France

88

10

2014-12

USA

74

2015-12

USA

74

0

2016-12

USA

73

-1

2017-12

USA

72

-1

2018-11

USA

75

3

2020-12

USA

76

fillna step

Replace null values by a given value in specified columns.

Example

Input dataset:

Company
Group
Value
KPI

Company 1

Group 1

13

Company 2

Group 1

12

Company 3

Group 1

20

40

Company 4

Group 2

1

Company 5

Group 2

38

Company 6

Group 2

5

4

Step configuration:

Output dataset:

Company
Group
Value
KPI

Company 1

Group 1

13

0

Company 2

Group 1

0

12

Company 3

Group 1

20

40

Company 4

Group 2

1

0

Company 5

Group 2

0

38

Company 6

Group 2

5

4

filter step

Filter out lines that don't match a filter definition.

operator is optional, and defaults to eq. Allowed operators are eq, ne, gt, ge, lt, le, in, nin, matches, notmatches isnull or notnull.

value can be an arbitrary value depending on the selected operator (e.g a list when used with the in operator, or null when used with the isnull operator).

matches and notmatches operators are used to test value against a regular expression.

Conditions can be grouped and nested with logical operators and and or.

Relative dates

Date values can be relative to the moment to the moment when the query is executed. This is expressed by using a RelativeDate object instead of the value, of the form:

formula step

Add a computation based on a formula. Usually column names do not need to be escaped, unless they include whitespaces, in which case you'll need to use brackets '[]' (e.g. [myColumn]). Any string escaped with quotes (', ", ''', """) will be considered a string literal.

Supported operators

The following operators are supported by the formula step (note that a value can be a column name or a literal, such as 42 or foo).

  • +: Does an addition of two numeric values. See the concatenate step to append strings

  • -: Does an substraction of two numeric values. See the replace step to remove a part of a string

  • *: Multiplies two numeric values.

  • /: Divides a numeric value by another. Divisions by zero will return null.

  • %: Returns the rest of an integer division. Divisions by zero will return null.

Example 1: Basic usage

Input dataset:

Label
Value1
Value2
Value3
Value4

Label 1

10

2

3

1

Label 2

1

13

7

3

Label 3

5

20

5

2

Step configuration:

Output dataset:

Label
Value1
Value2
Value3
Value4
Result

Label 1

10

2

3

1

2

Label 2

1

13

7

3

-4

Label 3

5

20

5

2

1

Example 2: Column name with whitespaces

Input dataset:

Label
Value1
Value2
Value3
Value 4

Label 1

10

2

3

1

Label 2

1

13

7

3

Label 3

5

20

5

2

Step configuration:

Output dataset:

Label
Value1
Value2
Value3
Value 4
Result

Label 1

10

2

3

1

2

Label 2

1

13

7

3

-4

Label 3

5

20

5

2

1

hierarchy step

Hierarchy for geographical data.

This step dissolves data for every hierarchy level, and adds a hierarchy level column containing a level (with 0 being the lowest granularity, i.e. the highest level).

Example

Input dataset:

Country
City
geometry
Population

Country 1

City 1

Polygon

100_000

Country 2

City 2

Polygon

50_000

Country 2

City 3

Polygon

200_000

Country 1

City 4

Polygon

30_000

Country 2

City 5

Polygon

25_000

Country 1

City 6

Polygon

10_000

Step configuration:

Output dataset:

Country
City
geometry
Population
hierarchy_level

Country 1

City 1

Polygon

100_000

2

Country 2

City 2

Polygon

50_000

2

Country 2

City 3

Polygon

200_000

2

Country 1

City 4

Polygon

30_000

2

Country 2

City 5

Polygon

25_000

2

Country 1

City 6

Polygon

10_000

2

Country 1

null

MultiPolygon

null

1

Country 2

null

MultiPolygon

null

1

null

null

MultiPolygon

null

0

ifthenelse step

Creates a new column, which values will depend on a condition expressed on existing columns.

The condition is expressed in the if parameter with a condition object, which is the same object expected by the condition parameter of the filter step). Conditions can be grouped and nested with logical operators and and or.

The then parameter only supports a string, that will be interpreted as a formula (cf. formula step). If you want it to be interpreted striclty as a string and not a formula, you must escape the string with quotes (e.g. '"this is a text"').

if...then...else blocks can be nested as the else parameter supports either a string that will be interpreted as a formula (cf. formula step), or a nested if if...then...else object.

Example

Input dataset:

Label
number

Label 1

-2

Label 2

2

Label 3

0

Step configuration:

Output dataset:

Label
number
result

Label 1

-2

2

Label 2

5

5

Label 3

0

zero

join step

Joins a dataset to the current dataset, i.e. brings columns from the former into the latter, and matches rows based on columns correspondance. It is similar to a JOIN clause in SQL, or to a VLOOKUP in excel. The joined dataset is the result from the query of the right_pipeline.

The join type can be:

  • 'left': will keep every row of the current dataset and fill unmatched rows with null values,

  • 'inner': will only keep rows that match rows of the joined dataset.

In the on parameter, you must specify 1 or more column couple(s) that will be compared to determine rows correspondance between the 2 datasets. The first element of a couple is for the current dataset column, and the second for the corresponding column in the right dataset to be joined. If you specify more than 1 couple, the matching rows will be those that find a correspondance between the 2 datasets for every column couple specified (logical 'AND').

Weaverbird allows you to save pipelines referenced by name in the Vuex store of the application. You can then call them by their unique names in this step.

Example 1: Left join with one column couple as on parameter

Input dataset:

Label
Value

Label 1

13

Label 2

7

Label 3

20

Label 4

1

Label 5

1

Label 6

1

rightDataset (saved in the application Vuex store):

Label
Group

Label 1

Group 1

Label 2

Group 1

Label 3

Group 2

Label 4

Group 2

Step configuration:

Output dataset:

Label
Value
Group

Label 1

13

Group 1

Label 2

7

Group 1

Label 3

20

Group 2

Label 4

1

Group 2

Label 5

1

Label 6

1

Example 2: inner join with different column names in the on parameter

Input dataset:

Label
Value

Label 1

13

Label 2

7

Label 3

20

Label 4

1

Label 5

1

Label 6

1

rightDataset (saved in the application Vuex store):

LabelRight
Group

Label 1

Group 1

Label 2

Group 1

Label 3

Group 2

Label 4

Group 2

Step configuration:

Output dataset:

Label
Value
LabelRight
Group

Label 1

13

Label 1

Group 1

Label 2

7

Label 2

Group 1

Label 3

20

Label 3

Group 2

Label 4

1

Label 4

Group 2

fromdate step

Converts a date column into a string column based on a specified format.

Example

Input dataset:

Company
Date
Value

Company 1

2019-10-06T00:00:00.000Z

13

Company 1

2019-10-07T00:00:00.000Z

7

Company 1

2019-10-08T00:00:00.000Z

20

Company 2

2019-10-06T00:00:00.000Z

1

Company 2

2019-10-07T00:00:00.000Z

10

Company 2

2019-10-08T00:00:00.000Z

5

Step configuration:

Output dataset:

Company
Date
Value

Company 1

06/10/2019

13

Company 1

07/10/2019

7

Company 1

08/10/2019

20

Company 2

06/10/2019

1

Company 2

07/10/2019

10

Company 2

08/10/2019

5

lowercase step

⚠️ Mongo's $toLower operator does not support accents. If you have accents you need to lowercase with Mongo, use a replacetext step after lowercase.

Converts a string column to lowercase.

Example:

Input dataset:

Label
Group
Value

LABEL 1

Group 1

13

LABEL 2

Group 1

7

LABEL 3

Group 1

20

Step configuration:

Output dataset:

Label
Group
Value

label 1

Group 1

13

label 2

Group 1

7

label 3

Group 1

20

movingaverage step

Compute the moving average based on a value column, a reference column to sort (usually a date column) and a moving window (in number of rows i.e. data points). If needed, the computation can be performed by group of rows. The computation result is added in a new column.

Example 1: Basic usage

Input dataset:

DATE
VALUE

2018-01-01

75

2018-01-02

80

2018-01-03

82

2018-01-04

83

2018-01-05

80

2018-01-06

86

2018-01-07

79

2018-01-08

76

Step configuration:

Output dataset:

DATE
VALUE
VALUE_MOVING_AVG

2018-01-01

75

null

2018-01-02

80

77.5

2018-01-03

82

81

2018-01-04

83

82.5

2018-01-05

80

81.5

2018-01-06

86

83

2018-01-07

79

82.5

2018-01-08

76

77.5

Example 2: with groups and custom newColumnName

Input dataset:

COUNTRY
DATE
VALUE

France

2018-01-01

75

France

2018-01-02

80

France

2018-01-03

82

France

2018-01-04

83

France

2018-01-05

80

France

2018-01-06

86

USA

2018-01-01

69

USA

2018-01-02

73

USA

2018-01-03

73

USA

2018-01-04

75

USA

2018-01-05

70

USA

2018-01-06

76

Step configuration:

Output dataset:

COUNTRY
DATE
VALUE
ROLLING_AVERAGE

France

2018-01-01

75

null

France

2018-01-02

80

null

France

2018-01-03

82

79

France

2018-01-04

83

81.7

France

2018-01-05

80

81.7

France

2018-01-06

86

83

USA

2018-01-01

69

null

USA

2018-01-02

73

null

USA

2018-01-03

73

71.7

USA

2018-01-04

75

73.7

USA

2018-01-05

70

72.7

USA

2018-01-06

76

73.7

percentage step

Compute the percentage of total, i.e. for every row the value in column divided by the total as the sum of every values in column. The computation can be performed by group if specified. The result is written in a new column.

Example:

Input dataset:

Label
Group
Value

Label 1

Group 1

5

Label 2

Group 1

10

Label 3

Group 1

15

Label 4

Group 2

2

Label 5

Group 2

7

Label 6

Group 2

5

Step configuration:

Output dataset:

Label
Group
Value
Percentage

Label 1

Group 1

5

0.167

Label 2

Group 1

10

0.333

Label 3

Group 1

15

0.5

Label 4

Group 2

2

0.143

Label 5

Group 2

7

0.5

Label 6

Group 2

5

0.357

pivot step

Pivot rows into columns around a given index (expressed as a combination of column(s)). Values to be used as new column names are found in the column column_to_pivot. Values to populate new columns are found in the column value_column. The function used to aggregate data (when several rows are found by index group) must be among sum, avg, count, min or max.

Example:

Input dataset:

Label
Country
Value

Label 1

Country1

13

Label 2

Country1

7

Label 3

Country1

20

Label 1

Country2

1

Label 2

Country2

10

Label 3

Country2

5

label 3

Country2

1

Step configuration:

Output dataset:

Label
Country1
Country2

Label 1

13

1

Label 2

7

10

Label 3

20

6

statistics step

Compute statistics of a column.,

Example:

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

Step configuration:

Output dataset:

average
count
median

9.33333

6

8.5

rank step

This step allows to compute a rank column based on a value column that can be sorted in ascending or descending order. The ranking can be computed by group.

There are 2 ranking methods available, that you will understand easily through those examples:

  • standard: input = [10, 20, 20, 20, 25, 25, 30] => ranking = [1, 2, 2, 2, 5, 5, 7]

  • dense: input = [10, 20, 20, 20, 25, 25, 30] => ranking = [1, 2, 2, 2, 3, 3, 4]

(The dense method is basically the same as the standard method, but rank always increases by 1 at most).

Example 1: Basic usage

Input dataset:

COUNTRY
VALUE

FRANCE

15

FRANCE

5

FRANCE

10

FRANCE

20

FRANCE

10

FRANCE

15

USA

20

USA

30

USA

20

USA

25

USA

15

USA

20

Step configuration:

Output dataset:

COUNTRY
VALUE
VALUE_RANK

USA

30

1

USA

25

2

FRANCE

20

3

USA

20

3

USA

20

3

USA

20

3

FRANCE

15

7

FRANCE

15

7

USA

15

7

FRANCE

10

10

FRANCE

10

10

FRANCE

5

12

Example 2: With more options

Input dataset:

COUNTRY
VALUE

FRANCE

15

FRANCE

5

FRANCE

10

FRANCE

20

FRANCE

10

FRANCE

15

USA

20

USA

30

USA

20

USA

25

USA

15

USA

20

Step configuration:

Output dataset:

COUNTRY
VALUE
MY_RANK

FRANCE

5

1

FRANCE

10

2

FRANCE

10

2

FRANCE

15

3

FRANCE

15

3

FRANCE

20

4

USA

15

1

USA

20

2

USA

20

2

USA

20

2

USA

25

3

USA

30

4

rename step

Rename one or several columns. The toRename parameter takes as input a list of 2-elements lists in the form ['oldColumnName', 'newColumnName'].

Example:

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

Step configuration:

Output dataset:

Company
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

replace step

Replace one or several values in a column.

A replace step has the following strucure:

Example

Input dataset:

COMPANY
COUNTRY

Company 1

Fr

Company 2

UK

Step configuration:

Output dataset:

COMPANY
COUNTRY

Company 1

France

Company 2

United Kingdom

replacetext step

Replace a substring in a column.

A replace-text step has the following structure:

Example

Input dataset:

COMPANY
COUNTRY

Company 1

Fr is boring

Company 2

UK

Step configuration:

Output dataset:

COMPANY
COUNTRY

Company 1

France is boring

Company 2

UK

rollup step

Use this step if you need to compute aggregated data at every level of a hierarchy, specified as a series of columns from top to bottom level. The output data structure stacks the data of every level of the hierarchy, specifying for every row the label, level and parent in dedicated columns.

Aggregated rows can be computed with using either sum, average, count, count distinct, min, max, first or last.

Example 1 : Basic configuration

Input dataset:

CITY
COUNTRY
CONTINENT
YEAR
VALUE

Paris

France

Europe

2018

10

Bordeaux

France

Europe

2018

5

Barcelona

Spain

Europe

2018

8

Madrid

Spain

Europe

2018

3

Boston

USA

North America

2018

12

New-York

USA

North America

2018

21

Montreal

Canada

North America

2018

10

Ottawa

Canada

North America

2018

7

Paris

France

Europe

2019

13

Bordeaux

France

Europe

2019

8

Barcelona

Spain

Europe

2019

11

Madrid

Spain

Europe

2019

6

Boston

USA

North America

2019

15

New-York

USA

North America

2019

24

Montreal

Canada

North America

2019

10

Ottawa

Canada

North America

2019

13

Step configuration:

Output dataset:

CITY
COUNTRY
CONTINENT
label
level
child_level
parent
VALUE

Europe

Europe

CONTINENT

COUNTRY

64

North America

North America

CONTINENT

COUNTRY

112

France

Europe

France

COUNTRY

CITY

Europe

36

Spain

Europe

Spain

COUNTRY

CITY

Europe

28

USA

North America

USA

COUNTRY

CITY

North America

72

Canada

North America

Canada

COUNTRY

CITY

North America

40

Paris

France

Europe

Paris

CITY

France

23

Bordeaux

France

Europe

Bordeaux

CITY

France

13

Barcelona

Spain

Europe

Barcelona

CITY

Spain

19

Madrid

Spain

Europe

Madrid

CITY

Spain

9

Boston

USA

North America

Boston

CITY

USA

27

New-York

USA

North America

New-York

CITY

USA

45

Montreal

Canada

North America

Montreal

CITY

Canada

20

Ottawa

Canada

North America

Ottawa

CITY

Canada

20

Example 2 : Configuration with optional parameters

Input dataset:

CITY
COUNTRY
CONTINENT
YEAR
VALUE
COUNT

Paris

France

Europe

2018

10

1

Bordeaux

France

Europe

2018

5

1

Barcelona

Spain

Europe

2018

8

1

Madrid

Spain

Europe

2018

3

1

Boston

USA

North America

2018

12

1

New-York

USA

North America

2018

21

1

Montreal

Canada

North America

2018

10

1

Ottawa

Canada

North America

2018

7

1

Paris

France

Europe

2019

13

1

Bordeaux

France

Europe

2019

8

1

Barcelona

Spain

Europe

2019

11

1

Madrid

Spain

Europe

2019

6

1

Boston

USA

North America

2019

15

1

New-York

USA

North America

2019

24

1

Montreal

Canada

North America

2019

10

1

Ottawa

Canada

North America

2019

13

1

Step configuration:

Output dataset:

CITY
COUNTRY
CONTINENT
YEAR
MY_LABEL
MY_LEVEL
MY_CHILD_LEVEL
MY_PARENT
VALUE-sum
VALUE-avg
COUNT

North America

2018

Europe

CONTINENT

COUNTRY

26

6.5

4

North America

2018

North America

CONTINENT

COUNTRY

50

12.5

4

France

Europe

2018

France

COUNTRY

CITY

Europe

15

7.5

2

Spain

Europe

2018

Spain

COUNTRY

CITY

Europe

11

5.5

2

USA

North America

2018

USA

COUNTRY

CITY

North America

33

16.5

2

Canada

North America

2018

Canada

COUNTRY

CITY

North America

17

8.5

2

Paris

France

Europe

2018

Paris

CITY

France

10

10

1

Bordeaux

France

Europe

2018

Bordeaux

CITY

France

5

5

1

Barcelona

Spain

Europe

2018

Barcelona

CITY

Spain

8

8

1

Madrid

Spain

Europe

2018

Madrid

CITY

Spain

3

3

1

Boston

USA

North America

2018

Boston

CITY

USA

12

12

1

New-York

USA

North America

2018

New-York

CITY

USA

21

21

1

Montreal

Canada

North America

2018

Montreal

CITY

Canada

10

10

1

Ottawa

Canada

North America

2018

Ottawa

CITY

Canada

7

7

1

North America

2019

Europe

CONTINENT

COUNTRY

38

9.5

4

North America

2019

North America

CONTINENT

COUNTRY

62

15.5

4

France

Europe

2019

France

COUNTRY

CITY

Europe

21

10.5

2

Spain

Europe

2019

Spain

COUNTRY

CITY

Europe

17

8.5

2

USA

North America

2019

USA

COUNTRY

CITY

North America

39

19.5

2

Canada

North America

2019

Canada

COUNTRY

CITY

North America

23

11.5

2

Paris

France

Europe

2019

Paris

CITY

France

13

13

1

Bordeaux

France

Europe

2019

Bordeaux

CITY

France

8

8

1

Barcelona

Spain

Europe

2019

Barcelona

CITY

Spain

11

11

1

Madrid

Spain

Europe

2019

Madrid

CITY

Spain

6

6

1

Boston

USA

North America

2019

Boston

CITY

USA

15

15

1

New-York

USA

North America

2019

New-York

CITY

USA

24

24

1

Montreal

Canada

North America

2019

Montreal

CITY

Canada

10

10

1

Ottawa

Canada

North America

2019

Ottawa

CITY

Canada

13

13

1

select step

Select a column. The default is to keep every columns of the input domain. If the select is used, it will only keep selected columns in the output.

Example

Input dataset:

Company
Group
Value
Label

Company 1

Group 1

13

Company 1 - Group 1

Company 2

Group 1

7

Company 2 - Group 1

Company 3

Group 1

20

Company 3 - Group 1

Company 4

Group 2

1

Company 4 - Group 2

Company 5

Group 2

10

Company 5 - Group 2

Company 6

Group 2

5

Company 6 - Group 2

Step configuration:

Output dataset:

Value
Label

13

Company 1 - Group 1

7

Company 2 - Group 1

20

Company 3 - Group 1

1

Company 4 - Group 2

10

Company 5 - Group 2

5

Company 6 - Group 2

sort step

Sort values in one or several columns. Order can be either 'asc' or 'desc'. When sorting on several columns, order of columns specified in columns matters.

Example

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

Step configuration:

Output dataset:

Company
Group
Value

Label 3

Group 1

20

Label 1

Group 1

13

Label 2

Group 1

7

Label 5

Group 2

10

Label 6

Group 2

5

Label 4

Group 2

1

split step

Split a string column into several columns based on a delimiter.

Example 1

Input dataset:

Label
Value

Label 1 - Group 1 - France

13

Label 2 - Group 1 - Spain

7

Label 3 - Group 1 - USA

20

Label 4 - Group 2 - France

1

Label 5 - Group 2 - Spain

10

Label 6 - Group 2 - USA

5

Step configuration:

Output dataset:

Label_1
Label_2
Label_3
Value

Label 1

Group 1

Spain

13

Label 2

Group 1

USA

7

Label 3

Group 1

France

20

Label 4

Group 2

USA

1

Label 5

Group 2

France

10

Label 6

Group 2

Spain

5

Example 2: keeping less columns

Input dataset:

Label
Value

Label 1 - Group 1 - France

13

Label 2 - Group 1 - Spain

7

Label 3 - Group 1 - USA

20

Label 4 - Group 2 - France

1

Label 5 - Group 2 - Spain

10

Label 6 - Group 2 - USA

5

Step configuration:

Output dataset:

Label_1
Label_2
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

simplify step

Simplifies geographical data.

When simplifying your data, every point that is closer than a specific distance to the previous one is suppressed. This step can be useful if you have a very precise shape for a country (such as one-meter precision), but want to quickly draw a map chart. In that case, you may want to simplify your data.

After simplification, no points will be closer than tolerance. The unit depends on data's projection and on its unit, but in general, it's expressed in meters for CRS projections. For more details, see the GeoPandas documentationarrow-up-right.

Step configuration:

substring step

Extract a substring in a string column. The substring begins at index start_index (beginning at 1) and stops at end_index. You can specify negative indexes, in such a case the index search will start from the end of the string (with -1 being the last index of the string). Please refer to the examples below for illustration. Neither start_index nor end_index can be equal to 0.

Example 1: positive start_index and end_index

Input dataset:

Group
Value

foo

13

overflow

7

some_text

20

a_word

1

toucan

10

toco

5

Step configuration:

Label
Value
Label_PCT

foo

13

foo

overflow

7

over

some_text

20

some

a_word

1

a_wo

toucan

10

touc

toco

5

toco

Example 2: start_index is positive and end_index is negative

Input dataset:

Label
Value

foo

13

overflow

7

some_text

20

a_word

1

toucan

10

toco

5

Step configuration:

Output dataset:

Label
Value
short_label

foo

13

o

overflow

7

verflo

some_text

20

ome_tex

a_word

1

_wor

toucan

10

ouca

toco

5

oc

Example 3: start_index and end_index are negative

Input dataset:

Label
Value

foo

13

overflow

7

some_text

20

a_word

1

toucan

10

toco

5

Step configuration:

Output dataset:

Label
Value
Label_PCT

foo

13

foo

overflow

7

low

some_text

20

ext

a_word

1

ord

toucan

10

can

toco

5

oco

text step

Use this step to add a text column where every value will be equal to the specified text.

Example

Input dataset:

Label
Value1

Label 1

10

Label 2

1

Label 3

5

Step configuration:

Output dataset:

Label
Value1
KPI

Label 1

10

Sales

Label 2

1

Sales

Label 3

5

Sales

todate step

Converts a string column into a date column based on a specified format.

Example

Input dataset:

Company
Date
Value

Company 1

06/10/2019

13

Company 1

07/10/2019

7

Company 1

08/10/2019

20

Company 2

06/10/2019

1

Company 2

07/10/2019

10

Company 2

08/10/2019

5

Step configuration:

Output dataset:

Company
Date
Value

Company 1

2019-10-06T00:00:00.000Z

13

Company 1

2019-10-07T00:00:00.000Z

7

Company 1

2019-10-08T00:00:00.000Z

20

Company 2

2019-10-06T00:00:00.000Z

1

Company 2

2019-10-07T00:00:00.000Z

10

Company 2

2019-10-08T00:00:00.000Z

5

top step

Return top N rows by group if groups is specified, else over full dataset.

Example 1: top without groups, ascending order

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

Step configuration:

Output dataset:

Label
Group
Value

Label 4

Group 2

1

Label 6

Group 2

5

Label 2

Group 1

7

Example 2: top with groups, descending order

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 4

Group 2

1

Label 5

Group 2

10

Label 6

Group 2

5

Step configuration:

Output dataset:

Company
Group
Value

Label 3

Group 1

20

Label 5

Group 2

10

totals step

Append "total" rows to the dataset for specified dimensions. Computed rows result from an aggregation (either sum, average, count, count distinct, min, max, first or last)

Example 1: basic usage

Input dataset:

COUNTRY
PRODUCT
YEAR
VALUE

France

product A

2019

5

USA

product A

2019

10

France

product B

2019

10

USA

product B

2019

15

France

product A

2020

20

USA

product A

2020

20

France

product B

2020

30

USA

product B

2020

25

Step configuration:

Output dataset:

COUNTRY
PRODUCT
YEAR
VALUE

France

product A

2019

5

USA

product A

2019

10

France

product B

2019

10

USA

product B

2019

15

France

product A

2020

20

USA

product A

2020

20

France

product B

2020

30

USA

product B

2020

25

All countries

null

null

135

Example 2: With several totals and groups

Input dataset:

COUNTRY
PRODUCT
YEAR
VALUE_1
VALUE_2

France

product A

2019

5

50

USA

product A

2019

10

100

France

product B

2019

10

100

USA

product B

2019

15

150

France

product A

2020

20

200

USA

product A

2020

20

200

France

product B

2020

30

300

USA

product B

2020

25

250

Step configuration:

Output dataset:

COUNTRY
PRODUCT
YEAR
VALUE_2
VALUE_1-sum
VALUE_1-avg

France

product A

2019

50

5

5

USA

product A

2019

100

10

10

France

product B

2019

100

10

10

USA

product B

2019

150

15

15

France

product A

2020

200

20

20

USA

product A

2020

200

20

20

France

product B

2020

300

30

30

USA

product B

2020

250

25

25

USA

All products

2020

450

45

22.5

France

All products

2020

500

50

25

USA

All products

2019

250

25

12.5

France

All products

2019

150

15

7.5

All countries

product B

2020

550

55

27.5

All countries

product A

2020

400

40

20

All countries

product B

2019

250

25

12.5

All countries

product A

2019

150

15

7.5

All countries

All products

2020

950

95

23.75

All countries

All products

2019

400

40

10

trim step

Trim spaces in a column.

Example

Input dataset:

Company
Group
Value
Label

' Company 1 '

Group 1

13

Company 1 - Group 1

' Company 2 '

Group 1

7

Company 2 - Group 1

Step configuration:

Output dataset:

Company
Group
Value
Label

'Company 1'

Group 1

13

Company 1 - Group 1

'Company 2'

Group 1

7

Company 2 - Group 1

unpivot step

Unpivot a list of columns to rows.

Example 1: with dropnaparameter to true

Input dataset:

COMPANY
COUNTRY
NB_CLIENTS
REVENUES

Company 1

France

7

10

Company 2

France

2

Company 1

USA

12

6

Company 2

USA

1

3

Step configuration:

Output dataset:

COMPANY
COUNTRY
KPI
VALUE

Company 1

France

NB_CLIENTS

7

Company 1

France

REVENUES

10

Company 2

France

NB_CLIENTS

2

Company 1

USA

NB_CLIENTS

12

Company 1

USA

REVENUES

6

Company 2

USA

NB_CLIENTS

1

Company 2

USA

REVENUES

3

Example 1: with dropnaparameter to false

Input dataset:

COMPANY
COUNTRY
NB_CLIENTS
REVENUES

Company 1

France

7

10

Company 2

France

2

Company 1

USA

12

6

Company 2

USA

1

3

Step configuration:

Output dataset:

COMPANY
COUNTRY
KPI
VALUE

Company 1

France

NB_CLIENTS

7

Company 1

France

REVENUES

10

Company 2

France

NB_CLIENTS

2

Company 2

France

REVENUES

Company 1

USA

NB_CLIENTS

12

Company 1

USA

REVENUES

6

Company 2

USA

NB_CLIENTS

1

Company 2

USA

REVENUES

3

uppercase step

⚠️ Mongo's $toUpper operator does not support accents. If you have accents you need to uppercase with Mongo, use a replacetext step after uppercase.

Converts a string column to uppercase.

Example:

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Step configuration:

Output dataset:

Label
Group
Value

LABEL 1

Group 1

13

LABEL 2

Group 1

7

LABEL 3

Group 1

20

uniquegroups step

Allow to get unique groups of values from one or several columns.

Example:

Input dataset:

Label
Group
Value

Label 1

Group 1

13

Label 2

Group 1

7

Label 3

Group 1

20

Label 1

Group 2

1

Label 2

Group 1

2

Label 3

Group 1

3

Step configuration:

Output dataset:

Label
Group

Label 1

Group 1

Label 1

Group 2

Label 2

Group 1

Label 3

Group 1

waterfall step

This step allows to generate a data structure useful to build waterfall charts. It breaks down the variation between two values (usually between two dates) accross entities. Entities are found in the labelsColumn, and can optionally be regrouped under common parents found in the parentsColumn for drill-down purposes.

Example 1: Basic usage

Input dataset:

city
year
revenue

Bordeaux

2019

135

Boston

2019

275

New-York

2019

115

Paris

2019

450

Bordeaux

2018

98

Boston

2018

245

New-York

2018

103

Paris

2018

385

Step configuration:

Output dataset:

LABEL_waterfall
TYPE_waterfall
revenue

2018

null

831

Paris

parent

65

Bordeaux

parent

37

Boston

parent

30

New-York

parent

12

2019

null

975

Example 2: With more options

Input dataset:

city
country
product
year
revenue

Bordeaux

France

product1

2019

65

Bordeaux

France

product2

2019

70

Paris

France

product1

2019

210

Paris

France

product2

2019

240

Boston

USA

product1

2019

130

Boston

USA

product2

2019

145

New-York

USA

product1

2019

55

New-York

USA

product2

2019

60

Bordeaux

France

product1

2018

38

Bordeaux

France

product2

2018

60

Paris

France

product1

2018

175

Paris

France

product2

2018

210

Boston

USA

product1

2018

95

Boston

USA

product2

2018

150

New-York

USA

product1

2018

50

New-York

USA

product2

2018

53

Step configuration:

Output dataset:

LABEL_waterfall
GROUP_waterfall
TYPE_waterfall
product
revenue

2018

2018

null

product1

358

2018

2018

null

product2

473

Bordeaux

France

child

product1

27

Bordeaux

France

child

product2

10

Boston

USA

child

product1

35

Boston

USA

child

product2

-5

France

France

parent

product2

40

France

France

parent

product1

62

New-York

USA

child

product1

5

New-York

USA

child

product2

7

Paris

France

child

product1

35

Paris

France

child

product2

30

USA

USA

parent

product2

2

USA

USA

parent

product1

40

2019

2019

null

product2

515

2019

2019

null

product1

460

Last updated

Was this helpful?